Abstract
With the expansion of data size and data dimension, feature selection attracts more and more attention. In this paper, we propose a novel feature selection algorithm, namely, Hybrid filter and Symmetrical Complementary Coefficient based Multi-Objective Genetic Algorithm feature selection (HSMOGA). HSMOGA contains a new hybrid filter, Symmetrical Complementary Coefficient which is a well-performed metric of feature interactions proposed recently, and a novel way to limit feature subset’s size. A new Pareto-based ranking function is proposed when solving multi-objective problems. Besides, HSMOGA starts with a novel step called knowledge reserve, which precalculate the knowledge required for fitness function calculation and initial population generation. In this way, HSMOGA is classifier-independent in each generation, and its initial population generation makes full use of the knowledge of data set which makes solutions converge faster. Compared with other GA-based feature selection methods, HSMOGA has a much lower time complexity. According to experimental results, HSMOGA outperforms other nine state-of-art feature selection algorithms including five classic and four more recent algorithms in terms of kappa coefficient, accuracy, and G-mean for the data sets tested.
Similar content being viewed by others
References
Abdi H, Williams LJ (2010) Tukey’s honestly significant difference (hsd) test. Encyclopedia of Research Design 3:583–585
Breiman L (1996) Bagging predictors. Mach Learn 24(2):123–140
Breiman L (2001) Random forests. Mach Learn 45(1):5–32
Colas C, Madhavan V, Huizinga J, Clune J (2020) Scaling map-elites to deep neuroevolution. In: GECCO, vol 2020, pp 67–75
Cover TM, Thomas JA (2012) Elements of information theory. Wiley, Berlin
Das AK, Pati SK, Ghosh A (2019) Relevant feature selection and ensemble classifier design using bi-objective genetic algorithm. Knowl Inf Syst 62(2):423–455
Das S (2001) Filters, wrappers and a boosting-based hybrid for feature selection. In: ICML 2001, vol 1, pp 74–81
Davis L (1991) Handbook of genetic algorithms. CUMINCAD
Deb K, Agrawal S, Pratap A, Meyarivan T (2000) A fast elitist non-dominated sorting genetic algorithm for multi-objective optimization: Nsga-ii. In: PPSN VI. Springer, pp 849–858
Dua D, Graff C (2017) UCI machine learning repository. http://archive.ics.uci.edu/ml
Fernandes K, Cardoso JS, Fernandes J (2017) Transfer learning with partial observability applied to cervical cancer screening. In: IbPRIA 2017. Springer, pp 243–250
Fioravanzo S, Iacca G (2019) Evaluating map-elites on constrained optimization problems. arXiv:190200703
Friedman M (1937) The use of ranks to avoid the assumption of normality implicit in the analysis of variance. J Am Stat Assoc 32(200):675–701
Gao W, Hu L, Zhang P, He J (2018) Feature selection considering the composition of feature relevancy. Pattern Recogn Lett 112:70–74
Ghareb AS, Bakar AA, Hamdan AR (2016) Hybrid feature selection based on enhanced genetic algorithm for text categorization. Expert Syst Appl 49:31–47
Ghosh M, Guha R, Sarkar R, Abraham A (2019) A wrapper-filter feature selection technique based on ant colony optimization. Neural Comput Appl 32(12):7839–7857
Gonzalez-Abril L, Cuberos FJ, Velasco F, Ortega JA (2009) Ameva: an autonomous discretization algorithm. EXPERT SYST APPL 36(3):5327–5332
González-López J, Ventura S, Cano A (2019) Distributed selection of continuous features in multilabel classification using mutual information. IEEE T Neur Net Lear
González-López J, Ventura S, Cano A (2020) Distributed multi-label feature selection using individual mutual information measures. Knowl-Based Syst 188:105052
Guyon I, Elisseeff A (2003) An introduction to variable and feature selection. J Mach Learn Res 3(6):1157–1182
Hammami M, Bechikh S, Hung CC, Said LB (2019) A multi-objective hybrid filter-wrapper evolutionary approach for feature selection. Memet Comput 11(2):193–208
Hsu HH, Hsieh CW, Lu MD (2011) Hybrid feature selection by combining filters and wrappers. Expert Syst Appl 38(7):8144–8150
Huang J, Cai Y, Xu X (2007) A hybrid genetic algorithm for feature selection wrapper based on mutual information. Pattern Recogn Lett 28(13):1825–1844
Jadhav S, He H, Jenkins K (2018) Information gain directed genetic algorithm wrapper feature selection for credit rating. Appl Soft Comput 69:541–553
Jakulin A, Bratko I (2004) Testing the significance of attribute interactions. In: ICML, vol 2004, pp 409–416
John GH, Kohavi R, Pfleger K (1994) Irrelevant features and the subset selection problem. In: ML 94. Elsevier, pp 121–129
Konak A, Coit DW, Smith AE (2006) Multi-objective optimization using genetic algorithms: a tutorial. Reliab Eng Syst Safe 91(9):992–1007
Kononenko I (1994) Estimating attributes: analysis and extensions of relief. In: ECML-94, pp 171–182
Kursa MB, Rudnicki WR, et al. (2010) Feature selection with the boruta package. J STAT SOFTW 36(11):1–13
Mesejo P, Pizarro D, Abergel A, Rouquette O, Beorchia S, Poincloux L, Bartoli A (2016) Computer-aided classification of gastrointestinal lesions in regular colonoscopy. IEEE T Med Imaging 35(9):2051–2063
Mouret JB, Clune J (2015) Illuminating search spaces by mapping elites. arXiv:150404909
Nemenyi P (1963) Distribution-eree multiple comparison. PhD thesis
Ng AY (2004) Feature selection, L 1 vs. L 2 regularization, and rotational invariance. In: ICML 2004. ACM, p 78
Peng H, Long F, Ding C (2005) Feature selection based on mutual information criteria of max-dependency, max-relevance, and min-redundancy. IEEE T Pattern Anal 27(8):1226–1238
Quinlan JR (1986) Induction of decision trees. Mach Learn 1(1):81–106
Quinlan JR (2014) C4. 5: programs for machine learning. Elsevier
Quinonez B, Pinto-Roa DP, García-Torres M, García-Díaz ME, Núnez-Castillo C, Divina F (2019) Map-elites algorithm for features selection problem. In: AMW
Sakar CO, Polat SO, Katircioglu M, Kastro Y (2019) Real-time prediction of online shoppers’ purchasing intention using multilayer perceptron and lstm recurrent neural networks. Neural Comput Appl 31 (10):6893–6908
Shieh MD, Yang CC (2008) Multiclass SVM-RFE for product form feature selection. Expert Syst Appl 35(1):531–541
Song X, Zhang Y, Guo Y, Sun X, Wang Y (2020) Variable-size cooperative coevolutionary particle swarm optimization for feature selection on high-dimensional data. IEEE T Evolut Comput 24 (5):882–895
Tang X, Dai Y, Xiang Y (2019) Feature selection based on feature interactions with application to text categorization. Expert Syst Appl 120:207–216
Tsanas A, Little MA, Fox C, Ramig LO (2013) Objective automatic assessment of rehabilitative speech treatment in Parkinson’s disease. IEEE T Neur Sys Reh 22(1):181–190
Wang G, Song Q (2012) Selecting feature subset via constraint association rules. In: PAKDD, vol 2012, pp 304–321
Wang H, Lo SH, Zheng T, Hu I (2012) Interaction-based feature selection and classification for high-dimensional biological data. Bioinformatics 28(21):2834–2842
Wang Y, Feng L (2018) Hybrid feature selection using component co-occurrence based feature relevance measurement. Expert Syst Appl 102:83–99
Wang Y, Feng L (2019) A new hybrid feature selection based on multi-filter weights and multi-feature weights. Appl Intell 49:4033–4057
Yeh IC, Lien C h (2009) The comparisons of data mining techniques for the predictive accuracy of probability of default of credit card clients. Expert Syst Appl 36(2):2473–2480
Yu L, Liu H (2004) Efficient feature selection via analysis of relevance and redundancy. J Mach Learn Res 5(12):1205–1224
Zeng Z, Zhang H, Zhang R, Yin C (2015) A novel feature selection method considering feature interaction. Pattern Recogn 48(8):2656–2666
Zhang R, Zhang Z (2020) Feature selection with symmetrical complementary coefficient for quantifying feature interactions. Appl Intell 50:101–118
Zhang Y, Gong DW, Cheng J (2015) Multi-objective particle swarm optimization approach for cost-based feature selection in classification. IEEE ACM T Comput Bi 14(1):64–75
Zhang Y, Cheng S, Shi Y, wei Gong D, Zhao X (2019) Cost-sensitive feature selection using two-archive multi-objective artificial bee colony algorithm. Expert Syst Appl 137:46–58
Zhang Y, Gong D, Gao X, Tian T, Sun X (2020) Binary differential evolution with self-learning for multi-objective feature selection. Inform Sciences 507:67–85
Zikeba M, Tomczak JM, Lubicz M, Świkatek J (2014) Boosted svm for extracting rules from imbalanced data in application to prediction of the post-operative life expectancy in the lung cancer patients. Appl Soft Comput 14:99–108
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interests
The authors declare that they have no conflict of interest.
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
This work is supported by the National Natural Science Foundation of China under Grant 51727813.
Rights and permissions
About this article
Cite this article
Zhang, R., Zhang, Z., Wang, D. et al. Feature selection with multi-objective genetic algorithm based on a hybrid filter and the symmetrical complementary coefficient. Appl Intell 51, 3899–3916 (2021). https://doi.org/10.1007/s10489-020-02028-0
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10489-020-02028-0