Skip to main content

Advertisement

Log in

Gene selection using genetic algorithm and support vectors machines

  • Focus
  • Published:
Soft Computing Aims and scope Submit manuscript

Abstract

In this paper, we present a gene selection method based on genetic algorithm (GA) and support vector machines (SVM) for cancer classification. First, the Wilcoxon rank sum test is used to filter noisy and redundant genes in high dimensional microarray data. Then, the different highly informative genes subsets are selected by GA/SVM using different training sets. The final subset, consisting of highly discriminating genes, is obtained by analyzing the frequency of appearance of each gene in the different gene subsets. The proposed method is tested on three open datasets: leukemia, breast cancer, and colon cancer data. The results show that the proposed method has excellent selection and classification performance, especially for breast cancer data, which can yield 100% classification accuracy using only four genes.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  • Alon U, Barkai U and Notterman DA et al (1999). Broad patterns of gene expression revealed by clustering of tumor and normal colon tissues probed by oligonucleotide arrays. Proc Natl Acad Sci USA 96: 9745–6750

    Article  Google Scholar 

  • Balaji K, Lawrence C and Alexander H (2004). Gene expression analysis: joint feature selection and classifier design. In: Schölkopf, B, Tsuda, K, and Vert, J (eds) Kernel methods in computational biology, pp 299–318. MIT Press, Cambridge

    Google Scholar 

  • Baty F, Bihl MP, Perrière G, Culhane AC and Brutsche MH (2005). Optimized between-group classification: a new jackknife-based gene selection procedure for genome-wide expression data. BMC Bioinf 6: 239

    Article  Google Scholar 

  • Ben-Dor A, Bruhn L and Friedman N et al (2000). Tissue classification with gene expression profiles. J Comput Biol 7: 559–584

    Article  Google Scholar 

  • Ben-Dor A, Friedman N, Yakhini Z (2001) Class discovery in gene expression data. In: Proceedings of the 5th annual international conference on computational molecular biology, pp 31–38

  • Deng L, Pei J, Ma J et al. (2004) A rank sum test method for informative gene discovery. In: Proceedings of the 10th ACM SIGKDD international conference on knowledge discovery and data mining, pp 410–419

  • Dettling M and Bühlmann P (2003). Boosting for tumor classification with gene expression data. Bioinformatics 19: 1061–1069

    Article  Google Scholar 

  • Ding C and Peng H (2005). Minimum redundancy feature selection from microarray gene expression data. J Bioinf Comput Biol 3: 185–205

    Article  Google Scholar 

  • Golub T, Slonim D and Tamayo P et al (1999). Classification of cancer: class discovery and class prediction by gene expression monitoring. Science 28: 531–537

    Article  Google Scholar 

  • Guyon I, Weston J and Barnhill S et al (2002). Gene selection for cancer classification using support vector machines. Machine Learn 46: 389–422

    Article  MATH  Google Scholar 

  • Jaeger J, Sengupta R, Ruzzo WL (2003) Improved gene selection for classification of microarrays. In: Proceedings of the pacific symposium on biocomputing, pp 53–64

  • Kohavi R and John G (1997). Wrappers for feature subset selection. Artif Intell 1(2): 273–324

    Article  Google Scholar 

  • Lee KE, Sha N and Dougherty ER et al (2003). Gene selection: a Bayesian variable selection approach. Bioinformatics 19: 90–97

    Article  Google Scholar 

  • Liu J, Iba H and Ishizuka M (2001). Selecting informative genes with parallel genetic algorithms in tissue classification. Genome Inf 12: 14–23

    Google Scholar 

  • Liu X, Krishnan A and Mondry A (2005). An Entropy-based gene selection method for cancer classification using microarray data. BMC Bioinf 6: 76

    Article  Google Scholar 

  • Model F, Adorján P and Olek A et al (2001). Feature selection for DNA methylation based cancer classification. Bioinformatics 17: 157–164

    Google Scholar 

  • Pan F, Wang BY and Hu X et al (2004). Comprehensive vertical sample-based KNN/LSVM classification for gene expression analysis. J Biomed Inf 37: 280–288

    Article  Google Scholar 

  • Peng SH, Xu QH and Ling XB et al (2003). Molecular classification of cancer types from microarray data using the combination of genetic algorithms and support vector machines. FEBS Lett 555: 358–362

    Article  Google Scholar 

  • Shi C and Chen L (2005). Feature dimension reduction for microarray data analysis using locally linear embedding. In: Chen, YP and Wong, L (eds) Kernel methods in computational biology, pp 211–217. Imperial College Press, Singapore

    Google Scholar 

  • West M, Blanchette C and Dressman H et al (2001). Predicting the clinical status of human breast cancer using gene expression profiles. Proc Natl Acad Sci USA 98: 11462–11467

    Article  Google Scholar 

  • Yang J and Honavar V (1998). Feature subset selection using a genetic algorithm. IEEE Intell Syst 13: 44–49

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Shutao Li.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Li, S., Wu, X. & Hu, X. Gene selection using genetic algorithm and support vectors machines. Soft Comput 12, 693–698 (2008). https://doi.org/10.1007/s00500-007-0251-2

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00500-007-0251-2

Keywords

Navigation