Abstract
In this paper, we present a gene selection method based on genetic algorithm (GA) and support vector machines (SVM) for cancer classification. First, the Wilcoxon rank sum test is used to filter noisy and redundant genes in high dimensional microarray data. Then, the different highly informative genes subsets are selected by GA/SVM using different training sets. The final subset, consisting of highly discriminating genes, is obtained by analyzing the frequency of appearance of each gene in the different gene subsets. The proposed method is tested on three open datasets: leukemia, breast cancer, and colon cancer data. The results show that the proposed method has excellent selection and classification performance, especially for breast cancer data, which can yield 100% classification accuracy using only four genes.
Similar content being viewed by others
References
Alon U, Barkai U and Notterman DA et al (1999). Broad patterns of gene expression revealed by clustering of tumor and normal colon tissues probed by oligonucleotide arrays. Proc Natl Acad Sci USA 96: 9745–6750
Balaji K, Lawrence C and Alexander H (2004). Gene expression analysis: joint feature selection and classifier design. In: Schölkopf, B, Tsuda, K, and Vert, J (eds) Kernel methods in computational biology, pp 299–318. MIT Press, Cambridge
Baty F, Bihl MP, Perrière G, Culhane AC and Brutsche MH (2005). Optimized between-group classification: a new jackknife-based gene selection procedure for genome-wide expression data. BMC Bioinf 6: 239
Ben-Dor A, Bruhn L and Friedman N et al (2000). Tissue classification with gene expression profiles. J Comput Biol 7: 559–584
Ben-Dor A, Friedman N, Yakhini Z (2001) Class discovery in gene expression data. In: Proceedings of the 5th annual international conference on computational molecular biology, pp 31–38
Deng L, Pei J, Ma J et al. (2004) A rank sum test method for informative gene discovery. In: Proceedings of the 10th ACM SIGKDD international conference on knowledge discovery and data mining, pp 410–419
Dettling M and Bühlmann P (2003). Boosting for tumor classification with gene expression data. Bioinformatics 19: 1061–1069
Ding C and Peng H (2005). Minimum redundancy feature selection from microarray gene expression data. J Bioinf Comput Biol 3: 185–205
Golub T, Slonim D and Tamayo P et al (1999). Classification of cancer: class discovery and class prediction by gene expression monitoring. Science 28: 531–537
Guyon I, Weston J and Barnhill S et al (2002). Gene selection for cancer classification using support vector machines. Machine Learn 46: 389–422
Jaeger J, Sengupta R, Ruzzo WL (2003) Improved gene selection for classification of microarrays. In: Proceedings of the pacific symposium on biocomputing, pp 53–64
Kohavi R and John G (1997). Wrappers for feature subset selection. Artif Intell 1(2): 273–324
Lee KE, Sha N and Dougherty ER et al (2003). Gene selection: a Bayesian variable selection approach. Bioinformatics 19: 90–97
Liu J, Iba H and Ishizuka M (2001). Selecting informative genes with parallel genetic algorithms in tissue classification. Genome Inf 12: 14–23
Liu X, Krishnan A and Mondry A (2005). An Entropy-based gene selection method for cancer classification using microarray data. BMC Bioinf 6: 76
Model F, Adorján P and Olek A et al (2001). Feature selection for DNA methylation based cancer classification. Bioinformatics 17: 157–164
Pan F, Wang BY and Hu X et al (2004). Comprehensive vertical sample-based KNN/LSVM classification for gene expression analysis. J Biomed Inf 37: 280–288
Peng SH, Xu QH and Ling XB et al (2003). Molecular classification of cancer types from microarray data using the combination of genetic algorithms and support vector machines. FEBS Lett 555: 358–362
Shi C and Chen L (2005). Feature dimension reduction for microarray data analysis using locally linear embedding. In: Chen, YP and Wong, L (eds) Kernel methods in computational biology, pp 211–217. Imperial College Press, Singapore
West M, Blanchette C and Dressman H et al (2001). Predicting the clinical status of human breast cancer using gene expression profiles. Proc Natl Acad Sci USA 98: 11462–11467
Yang J and Honavar V (1998). Feature subset selection using a genetic algorithm. IEEE Intell Syst 13: 44–49
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Li, S., Wu, X. & Hu, X. Gene selection using genetic algorithm and support vectors machines. Soft Comput 12, 693–698 (2008). https://doi.org/10.1007/s00500-007-0251-2
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00500-007-0251-2