Skip to main content
Log in

A composite gene selection for DNA microarray data analysis

  • Published:
Multimedia Tools and Applications Aims and scope Submit manuscript

Abstract

An important aspect in microarray data analysis is the selection of an appropriate number of the most relevant genes among a large population of genes. In this study, we have proposed a composite gene selection using both unsupervised and supervised gene selections. In the unsupervised gene selection, we used the threshold number of misclassification (TNoM) score to select an appropriate number of the top-ranked genes for microarray data analysis. In the supervised gene selection, the minimum number of genes showing the highest accuracy is obtained using the non-overlap area distribution measurement (NADM) method provided by the neural network with weighted fuzzy membership functions (NEWFM) from the top-ranked genes. In this study, from a colon cancer dataset and a leukemia dataset, we selected the top-ranked 93 colon cancer and 143 leukemia genes with ≤14 (colon cancer) and ≤13 (leukemia) TNoM scores from a total of 2000 colon cancer and 7129 leukemia genes. By the NADM method, a minimum of 4 colon cancer and 13 leukemia genes were selected from the top-ranked 93 colon cancer and 143 leukemia genes. When the minimal 4 colon cancer and 13 leukemia genes were used as inputs for the NEWFM, the performance accuracies were 98.39 % and 100 % for colon cancer and leukemia, respectively.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4

Similar content being viewed by others

References

  1. Alon U, Barkai N, Notterman D, Gish K, Ybarra S, Mack D, Levine AJ (1999) Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon tissues probed by oligonucleotide arrays. Proc Natl Acad Sci USA 96:6745–6750

    Article  Google Scholar 

  2. Ben-Dor A, Bruhn L, Friedman N, Nachman I, Schummer M, Yakhini Z (2000) Tissue classification with gene expression profiles. J Comput Biol 7:559–584

    Article  Google Scholar 

  3. Cho JH, Lee D, Park JH, Lee IB (2004) Gene selection and classification from microarray data using kernel machine. FEBS Lett 571:93–98

    Article  Google Scholar 

  4. Frank O, Brors B, Fabarius A, Li L, Haak M, Merk S, Schwindel U, Zheng C, Müller MC, Gretz N, Hehlmann R, Hochhaus A, Seifarth W (2006) Gene expression signature of primary imatinib-resistant chronic myeloid leukemia patients. Leukemia 20:1400–1407

    Article  Google Scholar 

  5. Golub T, Slonim D, Tamayo P, Huard C, Caasenbeek JM, Coller H, Loh M, Downing J, Caligiuri M, Bloomfield C, Lander E (1999) Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. Science 286:531–537

    Article  Google Scholar 

  6. Guyon I, Weston J, Barnhill S, Vapnik V (2002) Gene selection for cancer classification using support vector machines. Mach Learn 46:389–422

    Article  MATH  Google Scholar 

  7. Hong Y, Kwong S, Chang Y, Ren Q (2008) Unsupervised feature selection using clustering ensembles and population based incremental learning algorithm. Pattern Recogn 41:2742–2756

    Article  MATH  Google Scholar 

  8. Hopfgartner F, Urruty T, Lopez PB, Jose JM (2010) Simulated evaluation of faceted browsing based on feature selection. Multimed Tools Appl 47:631–662

    Article  Google Scholar 

  9. Huang HL, Chang FL (2007) ESVM: evolutionary support vector machine for automatic feature selection and classification of microarray data. Biosystems 90:516–528

    Article  Google Scholar 

  10. Kabir M, Shahjahan MK (2011) A new local search based hybrid genetic algorithm for feature selection. Neurocomputing 74:2914–2928

    Article  Google Scholar 

  11. Krishnamoorthy P, Kumar S (2011) Hierarchical audio content classification system using an optimal feature selection algorithm. Multimed Tools Appl 54:415–444

    Article  Google Scholar 

  12. Lee CP, Leu Y (2011) A novel hybrid feature selection method for microarray data analysis. Appl Soft Comput 11:208–213

    Article  Google Scholar 

  13. Lee SH, Lim JS (2011) Forecasting KOSPI based on a neural network with weighted fuzzy membership functions. Expert Syst Appl 38:4259–4263

    Article  Google Scholar 

  14. Lee SH, Lim JS (2012) Parkinson“s disease classification using gait characteristics and wavelet-based feature extraction. Expert Syst Appl 39:7338–7344

    Article  Google Scholar 

  15. Lee SH, Lim JS (2013) Comparison of DBS and levodopa on resting tremor using a fuzzy neural network system. Measurement 46:1995–2002

    Article  Google Scholar 

  16. Li L, Darden TA, Weinberg CR, Levine AJ, Pedersen LG (2011) Gene assessment and sample classification for gene expression data using a genetic algorithm/k-nearest neighbor method. Comb Chem High T Scr 4:727–739

    Google Scholar 

  17. Li J, Su H, Chen H, Futscher BW (2007) Optimal search-based gene subset selection for gene array cancer classification. IEEE Trans Inf Technol Biomed 11:398–405

    Article  Google Scholar 

  18. Lim JS (2009) Finding features for real-time premature ventricular contraction detection using a fuzzy neural network system. IEEE Trans Neural Networks 20:522–527

    Article  Google Scholar 

  19. Liu X, Krishnan A, Mondry A (2005) An entropy based gene selection method for cancer classification using microarray data. BMC Bioinforma 6:1–14

    Article  Google Scholar 

  20. Maji P, Paul S (2011) Rough set based maximum relevance-maximum significance criterion and gene selection from microarray data. Int J Approx Reason 52:408–426

    Article  Google Scholar 

  21. Mejdoub M, Amar CB (2013) Classification improvement of local feature vectors over the KNN algorithm. Multimed Tools Appl 64:197–218

    Article  Google Scholar 

  22. Peng H, Long F, Ding C (2005) Feature selection based on mutual information: criteria of Max-Dependency, Max-Relevance, and Min-Redundancy. IEEE Trans Pattern Anal Mach Intell 27:1226–1238

    Article  Google Scholar 

  23. Sotoca JM, Pla F (2010) Supervised feature selection by clustering using conditional mutual information-based distances. Pattern Recogn 43:2068–2081

    Article  MATH  Google Scholar 

  24. Tapia E, Bulacio P, Angelone L (2012) Sparse and stable gene selection with consensus SVM-RFE. Pattern Recognit Lett 33:64–172

    Article  Google Scholar 

  25. Wang L, Khan L (2006) Automatic image annotation and retrieval using weighted feature selection. Multimed Tools Appl 29:55–71

    Article  Google Scholar 

  26. Wang S, Li D, Song X, Wei Y, Li H (2011) A feature selection method based on improved fisher’s discriminant ratio for text sentiment classification. Expert Syst Appl 38:8696–8702

    Article  Google Scholar 

  27. Wang Y, Makedon FS, Ford JC, Pearlman J (2005) HykGene: a hybrid approach for selecting marker genes for phenotype classification using microarray gene expression data. Bioinformatics 21:1530–1537

    Article  Google Scholar 

  28. Zhou X, Tuck DP (2007) MSVM-RFE: extensions of SVM-REF for multiclass gene selection on DNA microarray data. Bioinformatics 23:1106–1114

    Article  Google Scholar 

Download references

Acknowledgements

“This research was supported by Basic Science Research Program through the National Research Foundation of Korea (NRF) funded by the Ministry of Education, Science and Technology. (2012R1A1A2044134)”

“This study was supported by a grant of the Korean Health Technology R&D Project, Ministry of Health & Welfare, Republic of Korea. (A112020)”

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Joon S. Lim.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Park, D.K., Jung, EY., Lee, SH. et al. A composite gene selection for DNA microarray data analysis. Multimed Tools Appl 74, 9031–9041 (2015). https://doi.org/10.1007/s11042-013-1583-9

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11042-013-1583-9

Keywords

Navigation