Skip to main content

Advertisement

Log in

Feature selection with multi-objective genetic algorithm based on a hybrid filter and the symmetrical complementary coefficient

  • Published:
Applied Intelligence Aims and scope Submit manuscript

Abstract

With the expansion of data size and data dimension, feature selection attracts more and more attention. In this paper, we propose a novel feature selection algorithm, namely, Hybrid filter and Symmetrical Complementary Coefficient based Multi-Objective Genetic Algorithm feature selection (HSMOGA). HSMOGA contains a new hybrid filter, Symmetrical Complementary Coefficient which is a well-performed metric of feature interactions proposed recently, and a novel way to limit feature subset’s size. A new Pareto-based ranking function is proposed when solving multi-objective problems. Besides, HSMOGA starts with a novel step called knowledge reserve, which precalculate the knowledge required for fitness function calculation and initial population generation. In this way, HSMOGA is classifier-independent in each generation, and its initial population generation makes full use of the knowledge of data set which makes solutions converge faster. Compared with other GA-based feature selection methods, HSMOGA has a much lower time complexity. According to experimental results, HSMOGA outperforms other nine state-of-art feature selection algorithms including five classic and four more recent algorithms in terms of kappa coefficient, accuracy, and G-mean for the data sets tested.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4

Similar content being viewed by others

References

  1. Abdi H, Williams LJ (2010) Tukey’s honestly significant difference (hsd) test. Encyclopedia of Research Design 3:583–585

    Google Scholar 

  2. Breiman L (1996) Bagging predictors. Mach Learn 24(2):123–140

    MATH  Google Scholar 

  3. Breiman L (2001) Random forests. Mach Learn 45(1):5–32

    Article  Google Scholar 

  4. Colas C, Madhavan V, Huizinga J, Clune J (2020) Scaling map-elites to deep neuroevolution. In: GECCO, vol 2020, pp 67–75

  5. Cover TM, Thomas JA (2012) Elements of information theory. Wiley, Berlin

    MATH  Google Scholar 

  6. Das AK, Pati SK, Ghosh A (2019) Relevant feature selection and ensemble classifier design using bi-objective genetic algorithm. Knowl Inf Syst 62(2):423–455

    Article  Google Scholar 

  7. Das S (2001) Filters, wrappers and a boosting-based hybrid for feature selection. In: ICML 2001, vol 1, pp 74–81

  8. Davis L (1991) Handbook of genetic algorithms. CUMINCAD

  9. Deb K, Agrawal S, Pratap A, Meyarivan T (2000) A fast elitist non-dominated sorting genetic algorithm for multi-objective optimization: Nsga-ii. In: PPSN VI. Springer, pp 849–858

  10. Dua D, Graff C (2017) UCI machine learning repository. http://archive.ics.uci.edu/ml

  11. Fernandes K, Cardoso JS, Fernandes J (2017) Transfer learning with partial observability applied to cervical cancer screening. In: IbPRIA 2017. Springer, pp 243–250

  12. Fioravanzo S, Iacca G (2019) Evaluating map-elites on constrained optimization problems. arXiv:190200703

  13. Friedman M (1937) The use of ranks to avoid the assumption of normality implicit in the analysis of variance. J Am Stat Assoc 32(200):675–701

    Article  Google Scholar 

  14. Gao W, Hu L, Zhang P, He J (2018) Feature selection considering the composition of feature relevancy. Pattern Recogn Lett 112:70–74

    Article  Google Scholar 

  15. Ghareb AS, Bakar AA, Hamdan AR (2016) Hybrid feature selection based on enhanced genetic algorithm for text categorization. Expert Syst Appl 49:31–47

    Article  Google Scholar 

  16. Ghosh M, Guha R, Sarkar R, Abraham A (2019) A wrapper-filter feature selection technique based on ant colony optimization. Neural Comput Appl 32(12):7839–7857

    Article  Google Scholar 

  17. Gonzalez-Abril L, Cuberos FJ, Velasco F, Ortega JA (2009) Ameva: an autonomous discretization algorithm. EXPERT SYST APPL 36(3):5327–5332

    Article  Google Scholar 

  18. González-López J, Ventura S, Cano A (2019) Distributed selection of continuous features in multilabel classification using mutual information. IEEE T Neur Net Lear

  19. González-López J, Ventura S, Cano A (2020) Distributed multi-label feature selection using individual mutual information measures. Knowl-Based Syst 188:105052

    Article  Google Scholar 

  20. Guyon I, Elisseeff A (2003) An introduction to variable and feature selection. J Mach Learn Res 3(6):1157–1182

    MATH  Google Scholar 

  21. Hammami M, Bechikh S, Hung CC, Said LB (2019) A multi-objective hybrid filter-wrapper evolutionary approach for feature selection. Memet Comput 11(2):193–208

    Article  Google Scholar 

  22. Hsu HH, Hsieh CW, Lu MD (2011) Hybrid feature selection by combining filters and wrappers. Expert Syst Appl 38(7):8144–8150

    Article  Google Scholar 

  23. Huang J, Cai Y, Xu X (2007) A hybrid genetic algorithm for feature selection wrapper based on mutual information. Pattern Recogn Lett 28(13):1825–1844

    Article  Google Scholar 

  24. Jadhav S, He H, Jenkins K (2018) Information gain directed genetic algorithm wrapper feature selection for credit rating. Appl Soft Comput 69:541–553

    Article  Google Scholar 

  25. Jakulin A, Bratko I (2004) Testing the significance of attribute interactions. In: ICML, vol 2004, pp 409–416

  26. John GH, Kohavi R, Pfleger K (1994) Irrelevant features and the subset selection problem. In: ML 94. Elsevier, pp 121–129

  27. Konak A, Coit DW, Smith AE (2006) Multi-objective optimization using genetic algorithms: a tutorial. Reliab Eng Syst Safe 91(9):992–1007

    Article  Google Scholar 

  28. Kononenko I (1994) Estimating attributes: analysis and extensions of relief. In: ECML-94, pp 171–182

  29. Kursa MB, Rudnicki WR, et al. (2010) Feature selection with the boruta package. J STAT SOFTW 36(11):1–13

    Article  Google Scholar 

  30. Mesejo P, Pizarro D, Abergel A, Rouquette O, Beorchia S, Poincloux L, Bartoli A (2016) Computer-aided classification of gastrointestinal lesions in regular colonoscopy. IEEE T Med Imaging 35(9):2051–2063

    Article  Google Scholar 

  31. Mouret JB, Clune J (2015) Illuminating search spaces by mapping elites. arXiv:150404909

  32. Nemenyi P (1963) Distribution-eree multiple comparison. PhD thesis

  33. Ng AY (2004) Feature selection, L 1 vs. L 2 regularization, and rotational invariance. In: ICML 2004. ACM, p 78

  34. Peng H, Long F, Ding C (2005) Feature selection based on mutual information criteria of max-dependency, max-relevance, and min-redundancy. IEEE T Pattern Anal 27(8):1226–1238

    Article  Google Scholar 

  35. Quinlan JR (1986) Induction of decision trees. Mach Learn 1(1):81–106

    Google Scholar 

  36. Quinlan JR (2014) C4. 5: programs for machine learning. Elsevier

  37. Quinonez B, Pinto-Roa DP, García-Torres M, García-Díaz ME, Núnez-Castillo C, Divina F (2019) Map-elites algorithm for features selection problem. In: AMW

  38. Sakar CO, Polat SO, Katircioglu M, Kastro Y (2019) Real-time prediction of online shoppers’ purchasing intention using multilayer perceptron and lstm recurrent neural networks. Neural Comput Appl 31 (10):6893–6908

    Article  Google Scholar 

  39. Shieh MD, Yang CC (2008) Multiclass SVM-RFE for product form feature selection. Expert Syst Appl 35(1):531–541

    Article  Google Scholar 

  40. Song X, Zhang Y, Guo Y, Sun X, Wang Y (2020) Variable-size cooperative coevolutionary particle swarm optimization for feature selection on high-dimensional data. IEEE T Evolut Comput 24 (5):882–895

    Article  Google Scholar 

  41. Tang X, Dai Y, Xiang Y (2019) Feature selection based on feature interactions with application to text categorization. Expert Syst Appl 120:207–216

    Article  Google Scholar 

  42. Tsanas A, Little MA, Fox C, Ramig LO (2013) Objective automatic assessment of rehabilitative speech treatment in Parkinson’s disease. IEEE T Neur Sys Reh 22(1):181–190

    Article  Google Scholar 

  43. Wang G, Song Q (2012) Selecting feature subset via constraint association rules. In: PAKDD, vol 2012, pp 304–321

  44. Wang H, Lo SH, Zheng T, Hu I (2012) Interaction-based feature selection and classification for high-dimensional biological data. Bioinformatics 28(21):2834–2842

    Article  Google Scholar 

  45. Wang Y, Feng L (2018) Hybrid feature selection using component co-occurrence based feature relevance measurement. Expert Syst Appl 102:83–99

    Article  Google Scholar 

  46. Wang Y, Feng L (2019) A new hybrid feature selection based on multi-filter weights and multi-feature weights. Appl Intell 49:4033–4057

    Article  Google Scholar 

  47. Yeh IC, Lien C h (2009) The comparisons of data mining techniques for the predictive accuracy of probability of default of credit card clients. Expert Syst Appl 36(2):2473–2480

    Article  Google Scholar 

  48. Yu L, Liu H (2004) Efficient feature selection via analysis of relevance and redundancy. J Mach Learn Res 5(12):1205–1224

    MathSciNet  MATH  Google Scholar 

  49. Zeng Z, Zhang H, Zhang R, Yin C (2015) A novel feature selection method considering feature interaction. Pattern Recogn 48(8):2656–2666

    Article  Google Scholar 

  50. Zhang R, Zhang Z (2020) Feature selection with symmetrical complementary coefficient for quantifying feature interactions. Appl Intell 50:101–118

    Article  Google Scholar 

  51. Zhang Y, Gong DW, Cheng J (2015) Multi-objective particle swarm optimization approach for cost-based feature selection in classification. IEEE ACM T Comput Bi 14(1):64–75

    Google Scholar 

  52. Zhang Y, Cheng S, Shi Y, wei Gong D, Zhao X (2019) Cost-sensitive feature selection using two-archive multi-objective artificial bee colony algorithm. Expert Syst Appl 137:46–58

    Article  Google Scholar 

  53. Zhang Y, Gong D, Gao X, Tian T, Sun X (2020) Binary differential evolution with self-learning for multi-objective feature selection. Inform Sciences 507:67–85

    Article  MathSciNet  Google Scholar 

  54. Zikeba M, Tomczak JM, Lubicz M, Świkatek J (2014) Boosted svm for extracting rules from imbalanced data in application to prediction of the post-operative life expectancy in the lung cancer patients. Appl Soft Comput 14:99–108

    Article  Google Scholar 

Download references

Acknowledgements

Thanks to the data sets provided by the UCI repository. The Thoracic data set was from [54]. The Online data set was from [38]. The Default data set was from [47]. The Quality data set was from [11]. The LSVT data set was from [42]. The Gastro1 and Gastro2 data sets were from [30].

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Zuoquan Zhang.

Ethics declarations

Conflict of interests

The authors declare that they have no conflict of interest.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

This work is supported by the National Natural Science Foundation of China under Grant 51727813.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Zhang, R., Zhang, Z., Wang, D. et al. Feature selection with multi-objective genetic algorithm based on a hybrid filter and the symmetrical complementary coefficient. Appl Intell 51, 3899–3916 (2021). https://doi.org/10.1007/s10489-020-02028-0

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10489-020-02028-0

Keywords

Navigation