Abstract
An important aspect in microarray data analysis is the selection of an appropriate number of the most relevant genes among a large population of genes. In this study, we have proposed a composite gene selection using both unsupervised and supervised gene selections. In the unsupervised gene selection, we used the threshold number of misclassification (TNoM) score to select an appropriate number of the top-ranked genes for microarray data analysis. In the supervised gene selection, the minimum number of genes showing the highest accuracy is obtained using the non-overlap area distribution measurement (NADM) method provided by the neural network with weighted fuzzy membership functions (NEWFM) from the top-ranked genes. In this study, from a colon cancer dataset and a leukemia dataset, we selected the top-ranked 93 colon cancer and 143 leukemia genes with ≤14 (colon cancer) and ≤13 (leukemia) TNoM scores from a total of 2000 colon cancer and 7129 leukemia genes. By the NADM method, a minimum of 4 colon cancer and 13 leukemia genes were selected from the top-ranked 93 colon cancer and 143 leukemia genes. When the minimal 4 colon cancer and 13 leukemia genes were used as inputs for the NEWFM, the performance accuracies were 98.39 % and 100 % for colon cancer and leukemia, respectively.
Similar content being viewed by others
References
Alon U, Barkai N, Notterman D, Gish K, Ybarra S, Mack D, Levine AJ (1999) Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon tissues probed by oligonucleotide arrays. Proc Natl Acad Sci USA 96:6745–6750
Ben-Dor A, Bruhn L, Friedman N, Nachman I, Schummer M, Yakhini Z (2000) Tissue classification with gene expression profiles. J Comput Biol 7:559–584
Cho JH, Lee D, Park JH, Lee IB (2004) Gene selection and classification from microarray data using kernel machine. FEBS Lett 571:93–98
Frank O, Brors B, Fabarius A, Li L, Haak M, Merk S, Schwindel U, Zheng C, Müller MC, Gretz N, Hehlmann R, Hochhaus A, Seifarth W (2006) Gene expression signature of primary imatinib-resistant chronic myeloid leukemia patients. Leukemia 20:1400–1407
Golub T, Slonim D, Tamayo P, Huard C, Caasenbeek JM, Coller H, Loh M, Downing J, Caligiuri M, Bloomfield C, Lander E (1999) Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. Science 286:531–537
Guyon I, Weston J, Barnhill S, Vapnik V (2002) Gene selection for cancer classification using support vector machines. Mach Learn 46:389–422
Hong Y, Kwong S, Chang Y, Ren Q (2008) Unsupervised feature selection using clustering ensembles and population based incremental learning algorithm. Pattern Recogn 41:2742–2756
Hopfgartner F, Urruty T, Lopez PB, Jose JM (2010) Simulated evaluation of faceted browsing based on feature selection. Multimed Tools Appl 47:631–662
Huang HL, Chang FL (2007) ESVM: evolutionary support vector machine for automatic feature selection and classification of microarray data. Biosystems 90:516–528
Kabir M, Shahjahan MK (2011) A new local search based hybrid genetic algorithm for feature selection. Neurocomputing 74:2914–2928
Krishnamoorthy P, Kumar S (2011) Hierarchical audio content classification system using an optimal feature selection algorithm. Multimed Tools Appl 54:415–444
Lee CP, Leu Y (2011) A novel hybrid feature selection method for microarray data analysis. Appl Soft Comput 11:208–213
Lee SH, Lim JS (2011) Forecasting KOSPI based on a neural network with weighted fuzzy membership functions. Expert Syst Appl 38:4259–4263
Lee SH, Lim JS (2012) Parkinson“s disease classification using gait characteristics and wavelet-based feature extraction. Expert Syst Appl 39:7338–7344
Lee SH, Lim JS (2013) Comparison of DBS and levodopa on resting tremor using a fuzzy neural network system. Measurement 46:1995–2002
Li L, Darden TA, Weinberg CR, Levine AJ, Pedersen LG (2011) Gene assessment and sample classification for gene expression data using a genetic algorithm/k-nearest neighbor method. Comb Chem High T Scr 4:727–739
Li J, Su H, Chen H, Futscher BW (2007) Optimal search-based gene subset selection for gene array cancer classification. IEEE Trans Inf Technol Biomed 11:398–405
Lim JS (2009) Finding features for real-time premature ventricular contraction detection using a fuzzy neural network system. IEEE Trans Neural Networks 20:522–527
Liu X, Krishnan A, Mondry A (2005) An entropy based gene selection method for cancer classification using microarray data. BMC Bioinforma 6:1–14
Maji P, Paul S (2011) Rough set based maximum relevance-maximum significance criterion and gene selection from microarray data. Int J Approx Reason 52:408–426
Mejdoub M, Amar CB (2013) Classification improvement of local feature vectors over the KNN algorithm. Multimed Tools Appl 64:197–218
Peng H, Long F, Ding C (2005) Feature selection based on mutual information: criteria of Max-Dependency, Max-Relevance, and Min-Redundancy. IEEE Trans Pattern Anal Mach Intell 27:1226–1238
Sotoca JM, Pla F (2010) Supervised feature selection by clustering using conditional mutual information-based distances. Pattern Recogn 43:2068–2081
Tapia E, Bulacio P, Angelone L (2012) Sparse and stable gene selection with consensus SVM-RFE. Pattern Recognit Lett 33:64–172
Wang L, Khan L (2006) Automatic image annotation and retrieval using weighted feature selection. Multimed Tools Appl 29:55–71
Wang S, Li D, Song X, Wei Y, Li H (2011) A feature selection method based on improved fisher’s discriminant ratio for text sentiment classification. Expert Syst Appl 38:8696–8702
Wang Y, Makedon FS, Ford JC, Pearlman J (2005) HykGene: a hybrid approach for selecting marker genes for phenotype classification using microarray gene expression data. Bioinformatics 21:1530–1537
Zhou X, Tuck DP (2007) MSVM-RFE: extensions of SVM-REF for multiclass gene selection on DNA microarray data. Bioinformatics 23:1106–1114
Acknowledgements
“This research was supported by Basic Science Research Program through the National Research Foundation of Korea (NRF) funded by the Ministry of Education, Science and Technology. (2012R1A1A2044134)”
“This study was supported by a grant of the Korean Health Technology R&D Project, Ministry of Health & Welfare, Republic of Korea. (A112020)”
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Park, D.K., Jung, EY., Lee, SH. et al. A composite gene selection for DNA microarray data analysis. Multimed Tools Appl 74, 9031–9041 (2015). https://doi.org/10.1007/s11042-013-1583-9
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11042-013-1583-9