Skip to main content

A Hybrid Approach Based on Genetic Algorithm with Ranking Aggregation for Feature Selection

  • Conference paper
  • First Online:
Advances and Trends in Artificial Intelligence. Theory and Practices in Artificial Intelligence (IEA/AIE 2022)

Abstract

Recently, feature selection has become challenging for many machine learning disciplines. The success of most existing approaches depends on the effectiveness of searching strategies to select the most salient features from the original feature space. Unfortunately, these approaches may become impractical when dealing with high-dimensional datasets. In order to overcome this problem, recent studies rely on a boundary scheme (i.e., fixing a number of selected features) to reduce the searching space or a ranking scheme (e.g., features with less correlated scores) to guide the selection phase. However, choosing the best-fitted size for the feature subset is also a hard problem, and relying on one feature comparison criteria may ignore important features. In this paper, we propose a genetic algorithm that aims to optimize the feature subset and the appropriate number of selected features to maximize the performance of an Artificial Neural Network (ANN) classifier. To improve the efficiency of the selection phase, we combine the proposed GA with a local search algorithm based on a ranking aggregation approach. Our objective is to speed up the searching algorithm by taking advantage of different feature scoring criteria. We have assessed the performance of our approach over three categories of datasets: small, medium and high in terms of feature dimensionality (e.g., the smallest and the largest datasets include 8 and 7129 features, respectively). The empirical results have shown that our proposed approach outperforms the other state-of-the-art works when dealing with medium- and high-dimensional datasets and is comparable to them in the case of small-dimensional datasets.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 99.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 129.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    The accuracy of the classification model is calculated using the testing dataset.

References

  1. Abellan, J., Mantas, C.J., Castellano, J.G., Moral-Garcia, S.: Increasing diversity in random forest learning algorithm via imprecise probabilities. Expert Syst. Appl. 97, 228–243 (2018)

    Article  Google Scholar 

  2. Bhola A, Tiwari, A.K.: Machine learning based approaches for cancer classification using gene expression data. Mach. Learn. Appl. Int. J. (MLAIJ) (2015)

    Google Scholar 

  3. Borda, J.d.: Mémoire sur les élections au scrutin. Histoire de l’Academie Royale des Sciences pour 1781 (Paris, 1784) (1784)

    Google Scholar 

  4. Bouaguel, W., Brahim, A.B., Limam, M.: Feature selection by rank aggregation and genetic algorithms. In: Proceedings of the International Conference on Knowledge Discovery and Information Retrieval and the International Conference on Knowledge Management and Information Sharing, pp. 74–81 (2013)

    Google Scholar 

  5. David, H.A., Nagaraja, H.N.: Order Statistics. Wiley, Hoboken (2004)

    Google Scholar 

  6. De Condorcet, N.: Essai sur l’application de l’analyse à la probabilité des décisions rendues à la pluralité des voix. Cambridge University Press, Cambridge (2014)

    Google Scholar 

  7. Dua, D., Graff, C.: UCI machine learning repository (2017). http://archive.ics.uci.edu/ml

  8. Gasca, E., Sánchez, J.S., Alonso, R.: Eliminating redundancy and irrelevance using a new MLP-based feature selection method. Pattern Recogn. 39(2), 313–315 (2006)

    Article  Google Scholar 

  9. Gu, Q., Li, Z., Han, J.: Generalized fisher score for feature selection. arXiv preprint arXiv:1202.3725 (2012)

  10. Guyon, I., Elisseeff, A.: An introduction to variable and feature selection. J. Mach. Learn. Res. 3(Mar), 1157–1182 (2003)

    Google Scholar 

  11. Hancer, E., Xue, B., Zhang, M.: Differential evolution for filter feature selection based on information theory and feature ranking. Knowl.-Based Syst. 140, 103–119 (2018)

    Article  Google Scholar 

  12. Huang, J., Cai, Y., Xu, X.: A hybrid genetic algorithm for feature selection wrapper based on mutual information. Pattern Recogn. Lett. 28(13), 1825–1844 (2007)

    Article  Google Scholar 

  13. Ibrahim, S., Nazir, S., Velastin, S.A.: Feature selection using correlation analysis and principal component analysis for accurate breast cancer diagnosis. J. Imaging 7(11), 225 (2021)

    Article  Google Scholar 

  14. Jia, L.: A hybrid feature selection method for software defect prediction. In: IOP Conference Series: Materials Science and Engineering, vol. 394, pp. 32–35. IOP Publishing (2018)

    Google Scholar 

  15. Kabir, M.M., Islam, M.M., Murase, K.: A new wrapper feature selection approach using neural network. Neurocomputing 73(16–18), 3273–3283 (2010)

    Article  Google Scholar 

  16. Kabir, M.M., Shahjahan, M., Murase, K.: A new local search based hybrid genetic algorithm for feature selection. Neurocomputing 74(17), 2914–2928 (2011)

    Article  Google Scholar 

  17. Kemeny, J.G.: Mathematics without numbers. Daedalus 88(4), 577–591 (1959)

    Google Scholar 

  18. Kolde, R., Laur, S., Adler, P., Vilo, J.: Robust rank aggregation for gene list integration and meta-analysis. Bioinformatics 28(4), 573–580 (2012)

    Article  Google Scholar 

  19. Lai, C., Reinders, M.J., Wessels, L.: Random subspace method for multivariate feature selection. Pattern Recogn. Lett. 27(10), 1067–1076 (2006)

    Article  Google Scholar 

  20. Li, J., Tang, J., Liu, H.: Reconstruction-based unsupervised feature selection: an embedded approach. In: IJCAI, pp. 2159–2165 (2017)

    Google Scholar 

  21. Liu, X.Y., Liang, Y., Wang, S., Yang, Z.Y., Ye, H.S.: A hybrid genetic algorithm with wrapper-embedded approaches for feature selection. IEEE Access 6, 22863–22874 (2018)

    Article  Google Scholar 

  22. Maldonado, S., López, J.: Dealing with high-dimensional class-imbalanced datasets: embedded feature selection for SVM classification. Appl. Soft Comput. 67, 94–105 (2018)

    Article  Google Scholar 

  23. Mokdad, F., Bouchaffra, D., Zerrouki, N., Touazi, A.: Determination of an optimal feature selection method based on maximum shapley value. In: 2015 15th International Conference on Intelligent Systems Design and Applications (ISDA), pp. 116–121. IEEE (2015)

    Google Scholar 

  24. Nekkaa, M., Boughaci, D.: A memetic algorithm with support vector machine for feature selection and classification. Memetic Comput. 7(1), 59–73 (2015). https://doi.org/10.1007/s12293-015-0153-2

    Article  Google Scholar 

  25. Peng, H., Long, F., Ding, C.: Feature selection based on mutual information criteria of max-dependency, max-relevance, and min-redundancy. IEEE Trans. Pattern Anal. Mach. Intell. 27(8), 1226–1238 (2005)

    Article  Google Scholar 

  26. Qiu, C.: A novel multi-swarm particle swarm optimization for feature selection. Genet. Program Evolvable Mach. 20(4), 503–529 (2019). https://doi.org/10.1007/s10710-019-09358-0

    Article  Google Scholar 

  27. Quanquan Gu, Zhenhui Li, J.H.: Generalized fisher score for feature selection. In: UAI 2011: Proceedings of the Twenty-Seventh Conference on Uncertainty in Artificial Intelligence (2011)

    Google Scholar 

  28. Rahman, M.A., Muniyandi, R.C.: Feature selection from colon cancer dataset for cancer classification using artificial neural network. Int. J. Adv. Sci. Eng. Inf. Technol. (2018)

    Google Scholar 

  29. Taherkhani, A., Cosma, G., McGinnity, T.M.: Deep-FS: a feature selection algorithm for deep Boltzmann machines. Neurocomputing 322, 22–37 (2018)

    Article  Google Scholar 

  30. Thede, S.M.: An introduction to genetic algorithms. J. Comput. Sci. Coll. 20(1), 115–123 (2004)

    Article  Google Scholar 

  31. Li, X., Zhang, J., Safara, F.: Improving the accuracy of diabetes diagnosis applications through a hybrid feature selection algorithm. Neural Process. Lett. (2021)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Bui Thi Mai Anh .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2022 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Trung, B.Q., Duc, L.M., Anh, B.T.M. (2022). A Hybrid Approach Based on Genetic Algorithm with Ranking Aggregation for Feature Selection. In: Fujita, H., Fournier-Viger, P., Ali, M., Wang, Y. (eds) Advances and Trends in Artificial Intelligence. Theory and Practices in Artificial Intelligence. IEA/AIE 2022. Lecture Notes in Computer Science(), vol 13343. Springer, Cham. https://doi.org/10.1007/978-3-031-08530-7_19

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-08530-7_19

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-08529-1

  • Online ISBN: 978-3-031-08530-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics