Skip to main content

Trade Between Population Size and Mutation Rate for GAAM (Genetic Algorithm with Aggressive Mutation) for Feature Selection

  • Conference paper
  • First Online:
Artificial Intelligence Applications and Innovations (AIAI 2022)

Abstract

The “curse of dimensions” is a term that describes the many difficulties that arise in machine learning tasks as the number of features in the dataset increases. One way to solve this problem is to reduce the number of features to be provided to the model during the learning phase. This reduction in the number of dimensions can be done in two ways, either by merging dimensions together or by selecting a subset of dimensions. There are many methods to select the dimensions to be kept. One technique is to use a genetic algorithm to find a subset of dimensions that will maximize the accuracy of the classifier. A genetic algorithm specially created for this purpose is called genetic algorithm with aggressive mutation. This very efficient algorithm has several particularities compared to classical genetic algorithms. The main one is that its population is composed of a small number of individuals that are aggressively mutated. Our contribution consists in a modification of the algorithm. Indeed we propose a different version of the algorithm in which the number of mutated individuals is reduced in favor of a larger population. We have compared our method to the original one on 17 datasets, which allowed us to conclude that our method provides better results than the original algorithm while reducing the computation time.

Supported by Synaltic: www.synaltic.fr.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 119.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 159.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 159.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Redman, T.: The impact of poor data quality on the typical enterprise. Commun. ACM 41, 79–82 (1998)

    Article  Google Scholar 

  2. Ilyas, I., Chu, X.: Data Cleaning. Association for Computing Machinery (2019)

    Google Scholar 

  3. Abedjan, Z., Golab, L., Naumann, F., Papenbrock, T.: Data profiling. Synth. Lect. Data Manag. 10, 1–154 (2018). https://doi.org/10.2200/s00878ed1v01y201810dtm052

    Article  Google Scholar 

  4. Hulsebos, M., et al.: Sherlock: a deep learning approach to semantic data type detection. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 1500–1508 (2019). https://doi.org/10.1145/3292500.3330993

  5. Chevallier, M., Boufarès, F., Grozavu, N., Rogovschi, N., Clairmont, C.: Near duplicate column identification: a machine learning approach. In: 2021 IEEE Symposium Series on Computational Intelligence (SSCI), pp. 1–7 (2021). https://doi.org/10.1109/SSCI50451.2021.9659897

  6. Chevallier, M., Rogovschi, N. Boufarès, F., Grozavu, N., Clairmont, C.: Detecting near duplicate dataset. In: Proceedings of the 13th International Conference on Soft Computing and Pattern Recognition (SoCPaR) 2021. LNNS, vol. 417, pp. 1–10 (2022). https://doi.org/10.1007/978-3-030-96302-6_36

  7. Karl Pearson F.R.S.: LIII. On lines and planes of closest fit to systems of points in space. Lond. Edinb. Dublin Philos. Mag. J. Sci. 2, 559–572 (1901). https://doi.org/10.1080/14786440109462720

  8. Wang, Y., Yao, H., Zhao, S.: Auto-encoder based dimensionality reduction. Neurocomputing 184, 232–242 (2016). https://www.sciencedirect.com/science/article/pii/S0925231215017671. RoLoD: Robust Local Descriptors for Computer Vision 2014

  9. Liu, H., Motoda, H.: Feature Selection for Knowledge Discovery and Data Mining. Springer, New York (1998). https://doi.org/10.1007/978-1-4615-5689-3

  10. Hall, M.: Correlation-based feature selection of discrete and numeric class machine learning. University of Waikato, Department of Computer Science (2000). https://hdl.handle.net/10289/1024

  11. Urbanowicz, R., Meeker, M., La Cava, W., Olson, R., Moore, J.: Relief-based feature selection: introduction and review. J. Biomed. Inform. 85, 189–203 (2018). https://www.sciencedirect.com/science/article/pii/S1532046418301400

  12. Tibshirani, R.: Regression shrinkage and selection via the lasso. J. Roy. Stat. Soc.: Ser. B (Methodol.) 58, 267–288 (1996). https://rss.onlinelibrary.wiley.com/doi/abs/10.1111/j.2517-6161.1996.tb02080.x

  13. Guyon, I., Weston, J., Barnhill, S., Vapnik, V.: Gene selection for cancer classification using support vector machines. Mach. Learn. 46, 389–422 (2002). https://doi.org/10.1023/A:1012487302797

    Article  MATH  Google Scholar 

  14. Reeves, C.: Genetic algorithms. In: Handbook of Metaheuristics, pp. 109–139 (2010)

    Google Scholar 

  15. Holland, J.: Adaptation in natural and artificial systems. University of Michigan Press (1975)

    Google Scholar 

  16. Rimcharoen, S., Leelathakul, N.: Ring-based crossovers in genetic algorithms: characteristic decomposition and their generalization. IEEE Access 9, 137902–137922 (2021)

    Article  Google Scholar 

  17. Rejer, I., Lorenz, K.: Classic genetic algorithm vs. genetic algorithm with aggressive mutation for feature selection for a brain-computer interface. Przeglad Elektrotechniczny 1(2), 100–104 (2015)

    Google Scholar 

  18. Rejer, I.: Genetic algorithm with aggressive mutation for feature selection in BCI feature space. Pattern Anal. Appl. 18(3), 485–492 (2014). https://doi.org/10.1007/s10044-014-0425-3

    Article  MathSciNet  Google Scholar 

  19. Zhang, H.: The Optimality of Naive Bayes (2004)

    Google Scholar 

  20. Dua, D., Graff, C.: UCI Machine Learning Repository. University of California, Irvine, School of Information (2017). http://archive.ics.uci.edu/ml

  21. Vanschoren, J., Rijn, J., Bischl, B., Torgo, L.: OpenML: networked science in machine learning. SIGKDD Explor. 15, 49–60 (2013). http://doi.acm.org/10.1145/2641190.2641198

  22. Matthias Feurer OpenML-Python: an extensible Python API for OpenML. arXiv:1911.0249. https://arxiv.org/pdf/1911.02490.pdf

  23. Bisong, E.: Google Colaboratory. Building Machine Learning and Deep Learning Models on Google Cloud Platform: A Comprehensive Guide for Beginners, pp. 59–64 (2019). https://doi.org/10.1007/978-1-4842-4470-8_7

  24. Silva, P.F.B., Marçal, A.R.S., da Silva, R.M.A.: Evaluation of features for leaf discrimination. In: Kamel, M., Campilho, A. (eds.) ICIAR 2013. LNCS, vol. 7950, pp. 197–204. Springer, Heidelberg (2013). https://doi.org/10.1007/978-3-642-39094-4_23

    Chapter  Google Scholar 

  25. Hooda, N., Bawa, S., Rana, P.: Fraudulent firm classification: a case study of an external audit. Appl. Artif. Intell. 32, 48–64 (2018). https://doi.org/10.1080/08839514.2018.1451032

    Article  Google Scholar 

  26. Sakar, B., et al.: Collection and analysis of a Parkinson speech dataset with multiple types of sound recordings. IEEE J. Biomed. Health Inform. 17, 828–834 (2013)

    Article  Google Scholar 

  27. Groemping, U.: South German Credit Data: Correcting a Widely Used Data Set. Reports in Mathematics (2019)

    Google Scholar 

  28. Lucas, D., et al.: Failure analysis of parameter-induced simulation crashes in climate models. Geosci. Model Dev. 6, 1157–1171 (2013). https://gmd.copernicus.org/articles/6/1157/2013/

  29. Anguita, D., Ghio, A., Oneto, L., Parra, X., Reyes-Ortiz, J.: A Public Domain Dataset for Human Activity Recognition using Smartphones. ESANN (2013)

    Google Scholar 

  30. Chevallier, M., Rogovschi, N., Boufarès, F., Grozavu, N., Clairmont, C.: Seeding initial population, in genetic algorithm for features selection. In: Abraham, A., et al. (eds.) SoCPaR 2020. AISC, vol. 1383, pp. 572–582. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-73689-7_55

    Chapter  Google Scholar 

  31. Rejer, I., Jankowski, J.: fGAAM: a fast and resizable genetic algorithm with aggressive mutation for feature selection. Pattern Anal. Appl. (3), 1–17 (2021). https://doi.org/10.1007/s10044-021-01000-z

  32. Eid, H., Abraham, A.: Adaptive feature selection and classification using modified whale optimization algorithm. Int. J. Comput. Inf. Syst. Ind. Manag. Appl. 10, 174–182 (2018)

    Google Scholar 

  33. Chotchantarakun, K., Sornil, O.: An adaptive multi-levels sequential feature selection. Int. J. Comput. Inf. Syst. Ind. Manag. Appl. 13, 10–19 (2021)

    Google Scholar 

Download references

Acknowledgements

I gratefully acknowledge Astrid Balick for her generous support. Supported by organization Synaltic.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Marc Chevallier .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2022 IFIP International Federation for Information Processing

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Chevallier, M., Grozavu, N., Boufarès, F., Rogovschi, N., Clairmont, C. (2022). Trade Between Population Size and Mutation Rate for GAAM (Genetic Algorithm with Aggressive Mutation) for Feature Selection. In: Maglogiannis, I., Iliadis, L., Macintyre, J., Cortez, P. (eds) Artificial Intelligence Applications and Innovations. AIAI 2022. IFIP Advances in Information and Communication Technology, vol 646. Springer, Cham. https://doi.org/10.1007/978-3-031-08333-4_35

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-08333-4_35

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-08332-7

  • Online ISBN: 978-3-031-08333-4

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics