Skip to main content

Seeding Initial Population, in Genetic Algorithm for Features Selection

  • Conference paper
  • First Online:
Proceedings of the 12th International Conference on Soft Computing and Pattern Recognition (SoCPaR 2020) (SoCPaR 2020)

Abstract

The feature selection process is a difficult task that can be tackled by various algorithms. Our work uses a subclass of metaheuristic algorithms called genetic algorithms (GA) to select the best subset of features that has given, for a machine learning algorithm, the best results (based on accuracy). GA are easy to implement and understand, and their results are readily explainable. However, they don’t ensure to find the absolute best solution for a given problem, but only the best solution found. In order to improve the performance of GA, we introduce two seeding methods for the initial population of the GA that rely on the use of a Random Forest algorithm. The two methods are applied on two different GA using Bayesian networks as classifier to evaluate accuracy. The tests are done on five data-sets, and the two methods are compared to other dimensional reduction techniques. Our results show a better convergence of the genetic algorithms when they are seeded.

Supported by organization Synaltic.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 169.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 219.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Hassanat, A., Almohammadi, K., Alkafaween, E., Abunawas, E., Hammouri, A., Surya Prasath, V.B.: Choosing mutation and crossover ratios for genetic algorithms-a review with a new dynamic approach. Information 10(12), 390 (2019)

    Article  Google Scholar 

  2. Kuri-Morales, A., Aldana-Bobadilla, E.: The best genetic Algorithm I. In: Castro, F., Gelbukh, A., González, M. (eds.) Advances in Soft Computing and Its Applications, MICAI 2013. Lecture Notes in Computer Science, vol. 8266. Springer, Heidelberg (2013). https://doi.org/10.1007/978-3-642-45111-9_1

  3. Breiman, L.: Random forests. Mach. Learn. 45, 5–32 (2001). https://doi.org/10.1023/A:1010933404324

  4. Burduk, R.: Recognition task with feature selection and weighted majority voting based on interval-valued fuzzy sets. In: Nguyen, N.T., Hoang, K., Jȩdrzejowicz, P. (eds.) Computational Collective Intelligence. Technologies and Applications, ICCCI 2012. Lecture Notes in Computer Science, vol. 7653. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-642-34630-9_21

  5. Osaba, E., Carballedo, R., Diaz, F., Onieva, E., Lopez, P., Perallos, A.: On the influence of using initialization functions on genetic algorithms solving combinatorial optimization problems: a first study on the TSP. In: 2014 IEEE Conference on Evolving and Adaptive Intelligent Systems (EAIS), Linz, Austria, pp. 1–6 (2014). https://doi.org/10.1109/EAIS.2014.6867465

  6. Karl Pearson, F.R.S.: LIII. On lines and planes of closest fit to systems of points in space. Lond. Edinb. Dubl. Phil. Mag. J. Sci. 2(11), 559–572 (1901). https://doi.org/10.1080/14786440109462720

    Article  MATH  Google Scholar 

  7. Guyon, I., Weston, J., Barnhill, S., Vapnik, V.: Gene selection for cancer classification using support vector machines. Mach. Learn. 46(1), 389–422 (2002)

    Article  Google Scholar 

  8. Hall, M.: Correlation-based feature selection for machine learning. Department of Computer Science, 19 (2000)

    Google Scholar 

  9. Liu, H., Motoda, H.: Feature Selection for Knowledge Discovery and Data Mining. Kluwer Academic Publishers, Norwell (1998)

    Book  Google Scholar 

  10. Ilyas, I.F., Chu, X.: Data Cleaning. Association for Computing Machinery, New York (2019)

    Book  Google Scholar 

  11. Rejer, I., Lorenz, K.: Classic genetic algorithm vs. genetic algorithm with aggressive mutation for feature selection for a brain-computer interface. Przeglad Elektrotechniczny 1, 100–104 (2015). https://doi.org/10.15199/48.2015.02.24

  12. Berti-Equille, L., Harmouch, H., Naumann, F., Novelli, N., Thirumuruganathan, S.: Discovery of genuine functional dependencies from relational data with missing values. Proc. VLDB Endow. 11 (2018). https://doi.org/10.14778/3204028.3204032

  13. Reeves, C.R.: Genetic Algorithms. Springer, Boston (2010)

    MATH  Google Scholar 

  14. Rejer, I.: Genetic algorithm with aggressive mutation for feature selection in BCI feature space. Pattern Anal. Appl. 18(3), 485–492 (2015). https://doi.org/10.1007/s10044-014-0425-3

  15. Bryant, A.J.: Seeding the population: improved performance in a genetic algorithm for the rectilinear Steiner problem. In: Proceedings of the 1994 ACM Symposium on Applied Computing (SAC 1994), pp. 222–226. Association for Computing Machinery, New York (1994). https://doi.org/10.1145/326619.326728

  16. Tibshirani, R.: Regression shrinkage and selection via the Lasso. J. R. Stat. Soc.: Ser. B (Methodol.) 58, 267–288 (1996). https://doi.org/10.1111/j.2517-6161.1996.tb02080.x

  17. Wang, Y., Yao, H., Zhao, S.: Auto-encoder based dimensionality reduction. Neurocomputing 184, 232–242 (2016). https://doi.org/10.1016/j.neucom.2015.08.104, http://www.sciencedirect.com/science/article/pii/S0925231215017671, roLoD: Robust Local Descriptors for Computer Vision 2014

  18. Zhang, H.: The optimality of Naive Bayes. In: Proceedings of the Seventeenth International Florida Artificial Intelligence Research Society Conference, FLAIRS 2004, vol. 2 (2004)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Marc Chevallier .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2021 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Chevallier, M., Rogovschi, N., Boufarès, F., Grozavu, N., Clairmont, C. (2021). Seeding Initial Population, in Genetic Algorithm for Features Selection. In: Abraham, A., et al. Proceedings of the 12th International Conference on Soft Computing and Pattern Recognition (SoCPaR 2020). SoCPaR 2020. Advances in Intelligent Systems and Computing, vol 1383. Springer, Cham. https://doi.org/10.1007/978-3-030-73689-7_55

Download citation

Publish with us

Policies and ethics