Abstract
For each microarray data set, only a small number of genes are beneficial. Due to the high-dimensional problem, gene selection research work remains a challenge. In order to solve the high-dimensional problem, we propose a dimensionality reduction algorithm named K value maximum relevance minimum redundancy improved grey wolf optimizer (KMR2IGWO). First, in the processing of KMR2, the K genes are selected. Second, the K genes are initialized by two ways according to random selection feature and different proportions of selection feature. Finally, the IGWO algorithm selects the optimal classification accuracy and the optimal combination of gene by adjusting the parameters of fitness function. The algorithm has a significant dimensionality reduction effect and is suitable for high-dimensional data sets. Experimental results show that the proposing KMR2IGWO strategy significantly reduces the dimension of microarray data and removes the redundant features. On the 14 microarray data sets, compared with the four algorithms mRMR + PSO, mRMR + GA, mRMR + BA, mRMR + CS, the proposed algorithm has higher performance in classification accuracy and feature subset length. In five data sets, the proposed algorithm average classification accuracy is 100%. On the 14 data sets, the proposed algorithm has a very significant dimensionality reduction effect, and the dimensionality reduction range is between 0.4% and 0.04%.
Similar content being viewed by others
References
Cheng Qiang (2010) A sparse learning machine for high-dimensional data with application to microarray gene analysis. IEEE/ACM Trans Comput Biol Bioinf 7(4):636–646
Heisig J et al (2013) Target gene analysis by microarrays and chromatin immunoprecipitation identifies HEY proteins as highly redundant bHLH repressors. PLoS Genet 8(5):e1002728
Armanfard N, Reilly JP, Komeili M (2016) Local feature selection for data classification. IEEE Trans Pattern Anal Mach Intell 38(6):1217–1227
Wang D, Nie F, Huang H (2015) Feature selection via global redundancy minimization. IEEE Trans Knowl Data Eng 27(10):2743–2755
Sebban M, Nock R (2002) A hybrid filter/wrapper approach of feature selection using information theory. Pattern Recogn 35(4):835–846
Freeman C, Kulic D, Basir O (2015) An evaluation of classifier-specific filter measure performance for feature selection. Pattern Recogn 48(5):1812–1826
Zhao X, Li D, Yang B et al (2015) A two-stage feature selection method with its application. Comput Electr Eng 47:114–125
Pourpanah F, Lim CP, Saleh JM (2016) A hybrid model of fuzzy ARTMAP and genetic algorithm for data classification and rule extraction. Expert Syst Appl 49:74–85
Akadi AE, Amine A, Ouardighi AE, Aboutajdine D (2011) A two-stage gene selection scheme utilizing MRMR filter and GA wrapper. Knowl Inf Syst 26(3):487–500
Alshamlan H, Badr G, Alohali Y (2015) mRMR-ABC: a hybrid gene selection algorithm for cancer classification using microarray gene expression profiling. Biomed Res Int 2015(4):1–15
Che J, Yang Y, Li L, Bai X, Zhang H, Dheng C (2017) Maximum relevance minimum common redundancy feature selection for nonlinear data. Inform Sci 407:68–86
Lu H, Chen J, Yan K, Qun J, Yu X, Zhigang G (2017) A hybrid feature selection algorithm for gene expression data classification. Neurocomputing 256(C):56–62
García V, Sánchez JS (2015) Mapping microarray gene expression data into dissimilarity spaces for tumor classification. Inform Sci 294:362–375
Elyasigomari V, Lee DA, Screen HR, Shaheen MH (2017) Development of a two-stage gene selection method that incorporates a novel hybrid approach using the cuckoo optimization algorithm and harmony search for cancer classification. J Biomed Inform 67:11–20
Yang P, Ho JWK, Yang YH, Zhou BB (2011) Gene–gene interaction filtering with ensemble of filters. BMC Bioinform 12:2901–2917
Mundra PA, Rajapakse JC (2010) SVM-RFE with MRMR filter for gene selection. IEEE Trans Nanobiosci 9(1):31–37
Zhang Y, Ding C, Li T (2008) Gene selection algorithm by combining reliefF and mRMR. BMC Genom 9(Suppl 2):S27
Peng H, Long F, Ding C (2005) Feature selection based on mutual information: criteria of max-dependency, max-relevance, and min-redundancy. IEEE Trans Pattern Anal Mach Intell 27(8):1226–1238
Mirjalili S, Mirjalili SM, Lewis A (2014) Grey wolf optimizer. Adv Eng Softw 69(3):46–61
Wong LI, Sulaiman MH, Mohamed MR, Hong MS (2014) Grey wolf optimizer for solving economic dispatch problems. Paper Presented at 2014 IEEE International Conference on Power and Energy (PECon), Kuching, Malaysia
Emary E, Zawbaa HM, Hassanien AE (2016) Binary grey wolf optimization approaches for feature selection. Neurocomputing 172(C):371–381
Suykens JAK, Vandewalle J (1999) Least squares support vector machine classifiers. Neural Process Lett 9(3):293–300
Hsu CW, Lin CJ (2002) A comparison of methods for multi-class support vector machines. IEEE Trans Neural Netw 13(2):415–425
Braga-Neto UM, Dougherty ER (2004) Is cross-validation valid for small-sample microarray classification. Bioinformatics 20(3):374–380
Kennedy J, Eberhart RC (1995) Particle swarm optimization. In: Proceedings of the IEEE International Conference on Neural Networks, Perth, Australia
Vieira SM, Mendonça LF, Farinha GJ, Sousa JMC (2013) Modified binary PSO for feature selection using SVM applied to mortality prediction of septic patients. Appl Soft Comput 13(8):3494–3504
Tsai C-F, Eberle W, Chu C-Y (2013) Genetic algorithms in feature and instance selection. Knowl-Based Syst 39:240–247
Wang Z, Shao Y-H, Wu T-R (2013) A GA-based model selection for smooth twin parametric-margin support vector machine. Pattern Recogn 46(8):2267–2277
Yang XS, He X (2013) Bat algorithm: literature review and applications. Int J Bio-Inspired Comput 5(3):141–149
Rodrigues D, Pereira LAM, Nakamura RYM et al (2014) A wrapper approach for feature selection based on Bat Algorithm and Optimum-Path Forest. Expert Syst Appl 41(5):2250–2258
Yang X-S, Deb S (2009) Cuckoo search via Lévy flights. Paper Presented at 2009 World Congress on Nature and Biologically Inspired Computing (NaBIC), Coimbatore, India
Mohapatra P, Chakravarty S, Dash PK (2015) An improved cuckoo search based extreme learning machine for medical data classification. Swarm Evol Comput 24:25–49
Unler A, Murat A, Chinnam RB (2011) mr2PSO: a maximum relevance minimum redundancy feature selection method based on swarm intelligence for support vector machine classification. Inf Sci 181(20):4625–4641
Huang CL, Wang CJ (2006) A GA-based feature selection and parameters optimization for support vector machines. Expert Syst Appl 31(2):231–240
Lin S-W, Lee Z-J, Chen S-C, Tseng T-Y (2008) Parameter determination of support vector machine and feature selection using simulated annealing approach. Appl Soft Comput 8(4):1505–1512
Ouaarab A, Ahiod B, Yang X-S (2014) Discrete cuckoo search algorithm for the travelling salesman problem. Neural Comput Appl 24(7–8):1659–1669
Chen Y-P, Li Y, Wang G (2017) A novel bacterial foraging optimization algorithm for feature selection. Expert Syst Appl 83(C):1–17
Acknowledgements
This research is supported by the National Science Foundation of China (NSFC) under Grant No. 61602206.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Zheng, Y., Li, Y., Wang, G. et al. A hybrid feature selection algorithm for microarray data. J Supercomput 76, 3494–3526 (2020). https://doi.org/10.1007/s11227-018-2640-y
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11227-018-2640-y