Skip to main content

Advertisement

Log in

Optimal gene subset selection using the modified SFFS algorithm for tumor classification

  • Review
  • Published:
Neural Computing and Applications Aims and scope Submit manuscript

Abstract

A reliable and precise classification of tumors is essential for successful treatment of cancer. Gene selection is an important step for improved diagnostics. The modified SFFS (sequential forward floating selection) algorithm based on weighted Mahalanobis distance, called MSWM, is proposed to identify optimal informative gene subsets taking into account joint discriminatory power for accurate discrimination in this study. Firstly, we make use of the one-dimensional weighted Mahalanobis distance to perform a preliminary selection of genes and then make use of the modified SFFS method and multidimensional weighted Mahalanobis distance to obtain the optimal informative gene subset for tumor classification. Finally, we used the k nearest neighbor and naive Bayes methods to classify tumors based on the optimal gene subset selected using the MSWM method. To validate the efficiency, the proposed MSWM method is applied to classify two different DNA microarray datasets. Our empirical study shows that the MSWM method for tumor classification can obtain better effectiveness of classification than the BWR (the ratio of between-groups to within-groups sum of squares) and IVGA_I (independent variable group analysis I) methods. It suggests that the MSWM gene selection method is ability to obtain correct informative gene subsets taking into account genes’ joint discriminatory power for tumor classification.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. Bittner M, Chen Y et al (2000) Molecular classification of cutaneous malignant melanoma by gene expression profiling. Nature 406(6795):536–540

    Article  Google Scholar 

  2. Golub TR, Slonim DK, Tamayo P et al (1999) Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. Science 286(5439):531–537

    Article  Google Scholar 

  3. Shipp MA, Ross KN, Tamayo P et al (2002) Diffuse large B-cell lymphoma outcome prediction by gene-expression profiling and supervised machine learning. Nat Med 8(1):68–74

    Article  Google Scholar 

  4. Alon U et al (1999) Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon tissues probes by oligonucleotide arrays. Proc Natl Acad Sci USA 96:6745–6750

    Article  Google Scholar 

  5. Ben-Dor A et al (2000) Tissue classification with gene expression profiles. J Comput Biol 7:559–583

    Article  Google Scholar 

  6. Nanni L, Lumini A, Brahnam S (2010) Advanced machine learning technique for microarray spot quality classification. Neural Comput Appl 19(3):471–475

    Article  Google Scholar 

  7. Zheng CH, Huang DS et al (2009) Tumor clustering using non-negative matrix factorization with gene selection. IEEE Trans Info Technol Biomed 13(4):599–607

    Article  Google Scholar 

  8. Yeung KY, Ruzzo WL (2001) Principal component analysis for clustering gene expression data. Bioinformatics 17(9):763–774

    Article  Google Scholar 

  9. Jain AK, Duin RPW, Mao J (2000) Statistical pattern recognition: a review. IEEE Trans Pattern Anal Mach Intell 22(1):4–37

    Article  Google Scholar 

  10. Dudiot S, Fridlyand J, Speed TP (2002) Comparison of discrimination methods for the classification of tumors using gene expression data. J Am Stat Assoc 97(457):77–87

    Article  Google Scholar 

  11. Li T, Zhang C, Ogihara M (2004) A comparative study of feature selection and multiclass methods for tissue classification based on gene expression. Bioinformatics 20(15):2429–2437

    Article  Google Scholar 

  12. Bae K, Mallick BK (2004) Gene selection using a two-level hierarchical Bayesian model. Bioinformatics 20:3423–3430

    Article  Google Scholar 

  13. Lee KE, Sha N et al (2003) Gene selection: a Bayesian variable selection approach. Bioinformatics 19:90–97

    Article  Google Scholar 

  14. Li W, Sun F, Grosse I (2004) Extreme value distribution based on gene selection criteria for discriminant microarray data analysis using logistic regression. J Comput Biol 1:215–226

    Article  Google Scholar 

  15. Draghici S, Kulaeva O et al (2003) Sorin noise sample method: an ANOVA approach allowing robust selection of differentially regulated genes measured by DNA microarray. Bioinformatics 19:1348–1359

    Article  Google Scholar 

  16. Shevade SK, Keerthi S (2003) A simple and efficient algorithm for gene selection using sparse logistic regression. Bioinformatics 19:2246–2253

    Article  Google Scholar 

  17. Lagus K, Alhomiemi E et al (2005) Independent variable group analysis in learning compact representations for data. In: Honkela T, Könönen V, Pöllä M, Simula O (eds) Proceedings of the international and interdisciplinary conference on adaptive knowledge representation and reasoning (AKRR’05). Espoo, Finland, pp 49–56

  18. Alhoniemi E, Honkela A et al (2006) Compact modeling of data using independent variable group analysis. Technical Report E3, Helsinki University of Technology. Publications in Computer and Information Science, Espoo, Finland

  19. Zheng CH, Chong YW, Wang HQ (2011) Gene selection using independent variable group analysis for tumor classification. Neural Comput Appl 20:161–170

    Article  Google Scholar 

  20. Narendra PM, Fukunaga K (1977) A branch and bound algorithm for feature subset selection. IEEE Trans Comput 26(9):917–922

    Article  MATH  Google Scholar 

  21. Marill T, Green DM (1963) On the effectiveness of receptors in cognition systems. IEEE Trans Inf Theory 9:11–17

    Article  Google Scholar 

  22. Whitney AW (1971) A direct method of nonparametric measurement selection. IEEE Trans Comput 20(9):1100–1103

    Article  MathSciNet  MATH  Google Scholar 

  23. Stearns SD (1976) On selecting features for pattern classifiers. In: Proceedings of the 3rd international conference on pattern recognition, Coronado, pp 71–75

  24. Jain AK, Zongker D (1997) Feature selection: evaluation, application, and small sample performance. IEEE Trans Pattern Anal Mach Intell 19(2):153–158

    Article  Google Scholar 

  25. Pudil P, Novovicova J, Kittler J (1994) Floating search methods in feature selection. Pattern Recogn Lett 15(11):1119–1125

    Article  Google Scholar 

  26. Ross DT, Scherf U et al (2000) Systematic variation in gene expression patterns in human cancer cell lines. Nat Genet 24:227–234

    Article  Google Scholar 

  27. Duda RO, Hart PE (1973) Pattern classification and scene analysis. Wiley, New York

    MATH  Google Scholar 

Download references

Acknowledgments

This work is supported by the National Natural Science Foundation of China under Grant Nos. 31071528, 71101095 and 11171117, the National Natural Foundation of Guangdong Province, China under No. S2011010002371, and the Ministry of Education in China Project of Humanities and Social Science under No. 11YJCZH195.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Hongyi Peng.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Peng, H., Fu, Y., Liu, J. et al. Optimal gene subset selection using the modified SFFS algorithm for tumor classification. Neural Comput & Applic 23, 1531–1538 (2013). https://doi.org/10.1007/s00521-012-1148-2

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00521-012-1148-2

Keywords

Navigation