Trade Between Population Size and Mutation Rate for GAAM (Genetic Algorithm with Aggressive Mutation) for Feature Selection

Chevallier, Marc; Grozavu, Nistor; Boufarès, Faouzi; Rogovschi, Nicoleta; Clairmont, Charly

doi:10.1007/978-3-031-08333-4_35

Marc Chevallier ORCID: orcid.org/0000-0002-7983-6147¹⁹,
Nistor Grozavu¹⁹,
Faouzi Boufarès¹⁹,
Nicoleta Rogovschi¹⁹ &
…
Charly Clairmont¹⁹

Part of the book series: IFIP Advances in Information and Communication Technology ((IFIPAICT,volume 646))

Included in the following conference series:

IFIP International Conference on Artificial Intelligence Applications and Innovations

1166 Accesses
1 Citations

Abstract

The “curse of dimensions” is a term that describes the many difficulties that arise in machine learning tasks as the number of features in the dataset increases. One way to solve this problem is to reduce the number of features to be provided to the model during the learning phase. This reduction in the number of dimensions can be done in two ways, either by merging dimensions together or by selecting a subset of dimensions. There are many methods to select the dimensions to be kept. One technique is to use a genetic algorithm to find a subset of dimensions that will maximize the accuracy of the classifier. A genetic algorithm specially created for this purpose is called genetic algorithm with aggressive mutation. This very efficient algorithm has several particularities compared to classical genetic algorithms. The main one is that its population is composed of a small number of individuals that are aggressively mutated. Our contribution consists in a modification of the algorithm. Indeed we propose a different version of the algorithm in which the number of mutated individuals is reduced in favor of a larger population. We have compared our method to the original one on 17 datasets, which allowed us to conclude that our method provides better results than the original algorithm while reducing the computation time.

Supported by Synaltic: www.synaltic.fr.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 119.00; Price excludes VAT (USA)

Softcover Book: USD 159.99; Price excludes VAT (USA)

Hardcover Book: USD 159.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Redman, T.: The impact of poor data quality on the typical enterprise. Commun. ACM 41, 79–82 (1998)
Article Google Scholar
Ilyas, I., Chu, X.: Data Cleaning. Association for Computing Machinery (2019)
Google Scholar
Abedjan, Z., Golab, L., Naumann, F., Papenbrock, T.: Data profiling. Synth. Lect. Data Manag. 10, 1–154 (2018). https://doi.org/10.2200/s00878ed1v01y201810dtm052
Article Google Scholar
Hulsebos, M., et al.: Sherlock: a deep learning approach to semantic data type detection. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 1500–1508 (2019). https://doi.org/10.1145/3292500.3330993
Chevallier, M., Boufarès, F., Grozavu, N., Rogovschi, N., Clairmont, C.: Near duplicate column identification: a machine learning approach. In: 2021 IEEE Symposium Series on Computational Intelligence (SSCI), pp. 1–7 (2021). https://doi.org/10.1109/SSCI50451.2021.9659897
Chevallier, M., Rogovschi, N. Boufarès, F., Grozavu, N., Clairmont, C.: Detecting near duplicate dataset. In: Proceedings of the 13th International Conference on Soft Computing and Pattern Recognition (SoCPaR) 2021. LNNS, vol. 417, pp. 1–10 (2022). https://doi.org/10.1007/978-3-030-96302-6_36
Karl Pearson F.R.S.: LIII. On lines and planes of closest fit to systems of points in space. Lond. Edinb. Dublin Philos. Mag. J. Sci. 2, 559–572 (1901). https://doi.org/10.1080/14786440109462720
Wang, Y., Yao, H., Zhao, S.: Auto-encoder based dimensionality reduction. Neurocomputing 184, 232–242 (2016). https://www.sciencedirect.com/science/article/pii/S0925231215017671. RoLoD: Robust Local Descriptors for Computer Vision 2014
Liu, H., Motoda, H.: Feature Selection for Knowledge Discovery and Data Mining. Springer, New York (1998). https://doi.org/10.1007/978-1-4615-5689-3
Hall, M.: Correlation-based feature selection of discrete and numeric class machine learning. University of Waikato, Department of Computer Science (2000). https://hdl.handle.net/10289/1024
Urbanowicz, R., Meeker, M., La Cava, W., Olson, R., Moore, J.: Relief-based feature selection: introduction and review. J. Biomed. Inform. 85, 189–203 (2018). https://www.sciencedirect.com/science/article/pii/S1532046418301400
Tibshirani, R.: Regression shrinkage and selection via the lasso. J. Roy. Stat. Soc.: Ser. B (Methodol.) 58, 267–288 (1996). https://rss.onlinelibrary.wiley.com/doi/abs/10.1111/j.2517-6161.1996.tb02080.x
Guyon, I., Weston, J., Barnhill, S., Vapnik, V.: Gene selection for cancer classification using support vector machines. Mach. Learn. 46, 389–422 (2002). https://doi.org/10.1023/A:1012487302797
Article MATH Google Scholar
Reeves, C.: Genetic algorithms. In: Handbook of Metaheuristics, pp. 109–139 (2010)
Google Scholar
Holland, J.: Adaptation in natural and artificial systems. University of Michigan Press (1975)
Google Scholar
Rimcharoen, S., Leelathakul, N.: Ring-based crossovers in genetic algorithms: characteristic decomposition and their generalization. IEEE Access 9, 137902–137922 (2021)
Article Google Scholar
Rejer, I., Lorenz, K.: Classic genetic algorithm vs. genetic algorithm with aggressive mutation for feature selection for a brain-computer interface. Przeglad Elektrotechniczny 1(2), 100–104 (2015)
Google Scholar
Rejer, I.: Genetic algorithm with aggressive mutation for feature selection in BCI feature space. Pattern Anal. Appl. 18(3), 485–492 (2014). https://doi.org/10.1007/s10044-014-0425-3
Article MathSciNet Google Scholar
Zhang, H.: The Optimality of Naive Bayes (2004)
Google Scholar
Dua, D., Graff, C.: UCI Machine Learning Repository. University of California, Irvine, School of Information (2017). http://archive.ics.uci.edu/ml
Vanschoren, J., Rijn, J., Bischl, B., Torgo, L.: OpenML: networked science in machine learning. SIGKDD Explor. 15, 49–60 (2013). http://doi.acm.org/10.1145/2641190.2641198
Matthias Feurer OpenML-Python: an extensible Python API for OpenML. arXiv:1911.0249. https://arxiv.org/pdf/1911.02490.pdf
Bisong, E.: Google Colaboratory. Building Machine Learning and Deep Learning Models on Google Cloud Platform: A Comprehensive Guide for Beginners, pp. 59–64 (2019). https://doi.org/10.1007/978-1-4842-4470-8_7
Silva, P.F.B., Marçal, A.R.S., da Silva, R.M.A.: Evaluation of features for leaf discrimination. In: Kamel, M., Campilho, A. (eds.) ICIAR 2013. LNCS, vol. 7950, pp. 197–204. Springer, Heidelberg (2013). https://doi.org/10.1007/978-3-642-39094-4_23
Chapter Google Scholar
Hooda, N., Bawa, S., Rana, P.: Fraudulent firm classification: a case study of an external audit. Appl. Artif. Intell. 32, 48–64 (2018). https://doi.org/10.1080/08839514.2018.1451032
Article Google Scholar
Sakar, B., et al.: Collection and analysis of a Parkinson speech dataset with multiple types of sound recordings. IEEE J. Biomed. Health Inform. 17, 828–834 (2013)
Article Google Scholar
Groemping, U.: South German Credit Data: Correcting a Widely Used Data Set. Reports in Mathematics (2019)
Google Scholar
Lucas, D., et al.: Failure analysis of parameter-induced simulation crashes in climate models. Geosci. Model Dev. 6, 1157–1171 (2013). https://gmd.copernicus.org/articles/6/1157/2013/
Anguita, D., Ghio, A., Oneto, L., Parra, X., Reyes-Ortiz, J.: A Public Domain Dataset for Human Activity Recognition using Smartphones. ESANN (2013)
Google Scholar
Chevallier, M., Rogovschi, N., Boufarès, F., Grozavu, N., Clairmont, C.: Seeding initial population, in genetic algorithm for features selection. In: Abraham, A., et al. (eds.) SoCPaR 2020. AISC, vol. 1383, pp. 572–582. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-73689-7_55
Chapter Google Scholar
Rejer, I., Jankowski, J.: fGAAM: a fast and resizable genetic algorithm with aggressive mutation for feature selection. Pattern Anal. Appl. (3), 1–17 (2021). https://doi.org/10.1007/s10044-021-01000-z
Eid, H., Abraham, A.: Adaptive feature selection and classification using modified whale optimization algorithm. Int. J. Comput. Inf. Syst. Ind. Manag. Appl. 10, 174–182 (2018)
Google Scholar
Chotchantarakun, K., Sornil, O.: An adaptive multi-levels sequential feature selection. Int. J. Comput. Inf. Syst. Ind. Manag. Appl. 13, 10–19 (2021)
Google Scholar

Download references

Acknowledgements

I gratefully acknowledge Astrid Balick for her generous support. Supported by organization Synaltic.

Author information

Authors and Affiliations

LIPN Laboratory, Sorbonne Paris Nord University, Villetaneuse, France
Marc Chevallier, Nistor Grozavu, Faouzi Boufarès, Nicoleta Rogovschi & Charly Clairmont

Authors

Marc Chevallier
View author publications
You can also search for this author in PubMed Google Scholar
Nistor Grozavu
View author publications
You can also search for this author in PubMed Google Scholar
Faouzi Boufarès
View author publications
You can also search for this author in PubMed Google Scholar
Nicoleta Rogovschi
View author publications
You can also search for this author in PubMed Google Scholar
Charly Clairmont
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Marc Chevallier .

Editor information

Editors and Affiliations

University of Piraeus, Piraeus, Greece
Ilias Maglogiannis
Democritus University of Thrace, Xanthi, Greece
Lazaros Iliadis
University of Sunderland, Sunderland, UK
John Macintyre
Universidade do Minho, Guimaraes, Portugal
Paulo Cortez

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Chevallier, M., Grozavu, N., Boufarès, F., Rogovschi, N., Clairmont, C. (2022). Trade Between Population Size and Mutation Rate for GAAM (Genetic Algorithm with Aggressive Mutation) for Feature Selection. In: Maglogiannis, I., Iliadis, L., Macintyre, J., Cortez, P. (eds) Artificial Intelligence Applications and Innovations. AIAI 2022. IFIP Advances in Information and Communication Technology, vol 646. Springer, Cham. https://doi.org/10.1007/978-3-031-08333-4_35

Download citation

DOI: https://doi.org/10.1007/978-3-031-08333-4_35
Published: 10 June 2022
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-08332-7
Online ISBN: 978-3-031-08333-4
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Societies and partnerships

The International Federation for Information Processing (opens in a new tab)

Trade Between Population Size and Mutation Rate for GAAM (Genetic Algorithm with Aggressive Mutation) for Feature Selection