Abstract
Feature selection is an important filtering method for data analysis, pattern classification, data mining, and so on. Feature selection reduces the number of features by removing irrelevant and redundant data. In this paper, we propose a hybrid filter–wrapper feature subset selection algorithm called the maximum Spearman minimum covariance cuckoo search (MSMCCS). First, based on Spearman and covariance, a filter algorithm is proposed called maximum Spearman minimum covariance (MSMC). Second, three parameters are proposed in MSMC to adjust the weights of the correlation and redundancy, improve the relevance of feature subsets, and reduce the redundancy. Third, in the improved cuckoo search algorithm, a weighted combination strategy is used to select candidate feature subsets, a crossover mutation concept is used to adjust the candidate feature subsets, and finally, the filtered features are selected into optimal feature subsets. Therefore, the MSMCCS combines the efficiency of filters with the greater accuracy of wrappers. Experimental results on eight common data sets from the University of California at Irvine Machine Learning Repository showed that the MSMCCS algorithm had better classification accuracy than the seven wrapper methods, the one filter method, and the two hybrid methods. Furthermore, the proposed algorithm achieved preferable performance on the Wilcoxon signed-rank test and the sensitivity–specificity test.
Similar content being viewed by others
References
Armanfard N, Reilly JP, Komeili M (2016) Local feature selection for data classification. IEEE Trans Pattern Anal Mach Intell 38:1217–1227
Zeng H, Cheung YM (2011) Feature selection and kernel learning for local learning-based clustering. IEEE Trans Pattern Anal Mach Intell 33:1532–1547
Wang D, Nie F, Huang H (2015) Feature selection via global redundancy minimization. IEEE Trans Knowl Data Eng 27:2743–2755
Belhumeur PN, Hespanha JP, Kriegman DJ (1997) Eigenfaces vs. Fisherfaces: recognition using class specific linear projection. IEEE Trans Pattern Anal Mach Intell 19:711–720
Lu H, Plataniotis KN, Venetsanopoulos AN (2008) MPCA: multilinear principal component analysis of tensor objects. IEEE Trans Neural Netw 19:18–39
He X, Yan S, Hu Y, Niyogi P, Zhang HJ (2005) Face recognition using laplacianfaces. IEEE Trans Pattern Anal Mach Intell 27:328–340
Belkin M, Niyogi P (2003) Laplacian Eigenmaps for dimensionality reduction and data representation. Neural Comput 15:1373–1396
Miguel GT, Ruben A, Concha B, Pedro L (2013) Comparison of metaheuristic strategies for peakbin selection in proteomic mass spectrometry data. Inf Sci 222:229–246
Mirjalili S (2015) The ant lion optimizer. Adv Eng Softw 83:80–98
Yang XS, He X (2013) Bat algorithm: literature review and applications. Int J Bio-Inspir Com 5:141–149
Rodrigues D, Pereira LAM, Nakamura RYM, Costa KAP, Yang XS, Souza AN, Papa JP (2014) A wrapper approach for feature selection based on Bat Algorithm and Optimum-Path Forest. Expert Syst Appl 41:2250–2258
Passino KM (2002) Biomimicry of bacterial foraging for distributed optimization and control. IEEE Control Syst 22:52–67
Chen YP, Li Y, Wang G, Zheng YF, Xu Q, Fan JH, Cui XT (2017) A novel bacterial foraging optimization algorithm for feature selection [J].Expert Syst Appl 83(C):1–17
Yang XS, Deb S (2009) Cuckoo search via Lévy flights. World Congress on Nature & Biologically Inspired Computing, 210–214
Mohapatra P, Chakravarty S, Dash PK (2015) An improved cuckoo search based extreme learning machine for medical data classification. Swarm Evol Compu 24:25–49
Tsai CF, Eberle W, Chu CY (2013) Genetic algorithms in feature and instance selection. Knowl-Based Syst 39:240–247
Wang Z, Shao YH, Wu TR (2013) A GA-based model selection for smooth twin parametric-margin support vector machine. Pattern Recogn 46:2267–2277
Kennedy J, Eberhart RC (1995) Particle swarm optimization. In: Proceedings of the conference on neural networks, IEEE Perth, Australia, 1942–1948
Vieira SM, Mendonc LF, Farinha GJ, Sousa JMC (2013) Modified binary PSO for feature selection using SVM applied to mortality prediction of septic patients. Appl Soft Comput 13:3494–3504
Kirkpatrick S, Gelatt CD, Vecchi MP (1983) Optimization by simulated annealing. Science 220(4598):671–680
Lin SW, Lee ZJ, Chen SC, Tseng TY (2008) Parameter determination of support vector machine and feature selection using simulated annealing approach. Appl Soft Comput 8:1505–1512
Sebban M, Nock R (2002) A hybrid filter/wrapper approach of feature selection using information theory. Pattern Recogn 35:835–846
Freeman C, Dana, Basir O (2015) An evaluation of classifier-specific filter measure performance for feature selection. Pattern Recogn 48:1812–1826
Sardana M, Agrawal RK, Kaur B (2015) An incremental feature selection approach based on scatter matrices for classification of cancer microarray data. Int J Comput Math 92(2):277–295
Mohamed NS, Zainudin S, Othman ZA (2017) Metaheuristic approach for an enhanced mRMR filter method for classification using drug response microarray data. Expert Syst Appl 90:224–231
Yang P, Ho JW, Yang YH, Zhou BB (2011) Gene-gene interaction filtering with ensemble of filters. Bmc Bioinf 12:2901–2917
Dai J, Xu Q (2013) Attribute selection based on information gain ratio in fuzzy rough set theory with application to tumor classification. Appl Soft Comput 13(1):211–221
Chernbumroong S, Shuang C, Yu H (2015) Maximum relevancy maximum complementary feature selection for multi-sensor activity recognition [J]. Expert Syst Appl 42(1):573–583
Peng H, Long F, Ding C (2005) Feature selection based on mutual information: criteria of max-dependency, max-relevance, and min-redundancy. IEEE Trans Pattern Anal Mach Intell 27:1226–1238
Akadi AE, Amine A, Ouardighi AE, Aboutajdine D (2011) A two-stage gene selection scheme utilizing MRMR filter and GA wrapper. Knowl Inf Syst 26:487–500
Alshamlan H, Badr G, Alohali Y (2015) mRMR-abc: a hybrid gene selection algorithm for cancer classification using microarray gene expression profiling. Biomed Res Int 2015(4):1–15
Unler A, Murat A, Chinnam RB (2011) Mr(2)PSO: a maximum relevance minimum redundancy feature selection method based on swarm intelligence for support vector machine classification. Inf Syst 181:4625–4641
Moradi P, Gholampour M (2016) A hybrid particle swarm optimization for feature subset selection by integrating a novel local search strategy [J]. Appl Soft Comput 43:117–130
Yang XS, Deb S (2014) Cuckoo search: recent advances and applications. Neural Comput Applic 24(1):169–174
Ouaarab A, Ahiod B, Yang X-S (2014) Discrete cuckoo search algorithm for the travelling salesman problem. Neural Comput & Applic 24(7–8):1659–1669
Turhal ÜÇ, Duysak A (2015) Cross grouping strategy based 2DPCA method for face recognition. Appl Soft Comput 29:270–279
Katrutsa AM, Strijov VV (2015) Stress test procedure for feature selection algorithms. Chemom Intell Lab Syst 142:172–183
Berrendero JR, Cuevas A, Torrecilla JL (2014) Variable selection in functional data classification: a maxima-hunting proposal. Stat Sin 619–638. https://doi.org/10.5705/ss.202014.0014
Li SY, Li TR, Liu D (2013) Incremental updating approximations in dominance-based rough sets approach under the variation of the attribute set. Knowl Based Syst 40:17–26
Huang CL, Wang CJ (2006) A GA-based feature selection and parameters optimization for support vector machines. Expert Syst Appl 31:231–240
Kane MD, Jatkoe TA, Stumpf CR, Lu J, Thomas JD, Madore SJ (2000) Assessment of the sensitivity and specificity of oligonucleotide (50mer) microarrays. Nucleic Acids Res 28:4552–4557
Conover WJ (1973) On methods of handling ties in the Wilcoxon signed-rank test. J Am Stat Assoc 68:985–988
Soria D, Garibaldi JM, Ambrogi F, Biganzoli EM, Ellis IO (2011) A ‘non-parametric’ version of the naive Bayes classifier. Knowl Based Syst 24:775–784
Acknowledgments
This research is supported by the National Natural Science Foundation of China (NSFC) under grant no. 61602206.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Zheng, Y., Li, Y., Wang, G. et al. A novel hybrid algorithm for feature selection. Pers Ubiquit Comput 22, 971–985 (2018). https://doi.org/10.1007/s00779-018-1156-z
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00779-018-1156-z