Skip Navigation

This Article
Right arrow FREE Full Text (Print PDF) Freely available
Right arrow FREE Full Text (Screen PDF)
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in ISI Web of Science
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrow Search for citing articles in:
ISI Web of Science (5)
Right arrowRequest Permissions
Google Scholar
Right arrow Articles by Li, W.
Right arrow Articles by Xiong, M.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Li, W.
Right arrow Articles by Xiong, M.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?

Bioinformatics Vol. 19 no. 7 2003
Pages 811-817
© 2003 Oxford University Press

SamCluster: an integrated scheme for automatic discovery of sample classes using gene expression profile

Wuju Li 1,*, Ming Fan 1 and Momiao Xiong 2

1 Beijing Institute of Basic Medical Sciences, PO Box 130(3), Beijing 100850, People’s Republic of China
2 Human Genetics Center, School of Public Health, University of Texas-Houston Health Science Center, TX 77225, USA

Received on April 17, 2002 ; revised on October 27, 2002 and November 11, 2002 ; accepted on November 13, 2002

Motivation: Feature (gene) selection can dramatically improve the accuracy of gene expression profile based sample class prediction. Many statistical methods for feature (gene) selection such as stepwise optimization and Monte Carlo simulation have been developed for tissue sample classification. In contrast to class prediction, few statistical and computational methods for feature selection have been applied to clustering algorithms for pattern discovery.

Results: An integrated scheme and corresponding program SamCluster for automatic discovery of sample classes based on gene expression profile is presented in this report. The scheme incorporates the feature selection algorithms based on the calculation of CV (coefficient of variation) and t-test into hierarchical clustering and proceeds as follows. At first, the genes with their CV greater than the pre-specified threshold are selected for cluster analysis, which results in two putative sample classes. Then, significantly differentially expressed genes in the two putative sample classes with p-values <= 0.01, 0.05, or 0.1 from t-test are selected for further cluster analysis. The above processes were iterated until the two stable sample classes were found. Finally, the consensus sample classes are constructed from the putative classes that are derived from the different CV thresholds, and the best putative sample classes that have the minimum distance between the consensus classes and the putative classes are identified. To evaluate the performance of the feature selection for cluster analysis, the proposed scheme was applied to four expression datasets COLON, LEUKEMIA72, LEUKEMIA38, and OVARIAN. The results show that there are only 5, 1, 0, and 0 samples that have been misclassified, respectively. We conclude that the proposed scheme, SamCluster, is an efficient method for discovery of sample classes using gene expression profile.

Availability: The related program SamCluster is available upon request or from the web page http://www.sph.uth.tmc.edu:8052/hgc/Downloads.asp

Contact: wujuli{at}yahoo.com

* To whom correspondence should be addressed.


Add to CiteULike CiteULike   Add to Connotea Connotea   Add to Del.icio.us Del.icio.us    What's this?




Disclaimer:
Please note that abstracts for content published before 1996 were created through digital scanning and may therefore not exactly replicate the text of the original print issues. All efforts have been made to ensure accuracy, but the Publisher will not be held responsible for any remaining inaccuracies. If you require any further clarification, please contact our Customer Services Department.