Skip to main content

Evolutionary Automated Feature Engineering

  • Conference paper
  • First Online:
PRICAI 2022: Trends in Artificial Intelligence (PRICAI 2022)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 13629))

Included in the following conference series:

  • 1252 Accesses

Abstract

Effective feature engineering serves as a prerequisite for many machine learning tasks. Feature engineering, which usually uses a series of mathematical functions to transform the features, aims to find valuable new features that can reflect the insight aspect of data. Traditional feature engineering is a labor-intensive and time-consuming task, which depends on expert domain knowledge and requires iterative manner with trial and error. In recent years, many automated feature engineering (AutoFE) methods have been proposed. These methods automatically transform the original features to a set of new features to improve the performance of the machine learning model. However, existing methods either suffer from computational bottleneck, or do not support high-order transformations and various feature types. In this paper, we propose EAAFE, to the best of our knowledge, the first evolutionary algorithm-based automated feature engineering method. We first formalize the AutoFE problem as a search problem of the optimal feature transformation sequence. Then, we leverage roulette wheel selection, subsequence-exchange-based DNA crossover, and \(\epsilon \)-greedy-based DNA mutation to achieve evolution. Despite its simplicity, EAAFE is flexible and effective, which can not only support feature transformations for both numerical and categorical features, but also support high-order feature transformations. Extensive experimental results on public datasets demonstrate that EAAFE outperforms the existing AutoFE methods in both effectiveness and efficiency.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 79.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 99.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    EAAFE is available at https://github.com/PasaLab/EAAFE.

  2. 2.

    https://www.openml.org/.

  3. 3.

    https://archive.ics.uci.edu/.

  4. 4.

    https://www.kaggle.com/.

References

  1. Bengio, Y., Courville, A., Vincent, P.: Representation learning: a review and new perspectives. IEEE Trans. Patt. Anal. Mach. Intell. 35(8), 1798–1828 (2013)

    Article  Google Scholar 

  2. Chen, X., Qiao, B., Zhang, W., Wu, W., Zhang, X.: Neural feature search: a neural architecture for automated feature engineering. In: 2019 IEEE International Conference on Data Mining (ICDM) (2019)

    Google Scholar 

  3. Coello, C.A.C.: Theoretical and numerical constraint-handling techniques used with evolutionary algorithms: a survey of the state of the art. Comput. Methods Appl. Mech. Eng. 191(11–12), 1245–1287 (2002)

    Article  MathSciNet  MATH  Google Scholar 

  4. Deb, K., Anand, A., Joshi, D.: A computationally efficient evolutionary algorithm for real-parameter optimization. Evol. Comput. 10(4), 371–395 (2002)

    Article  Google Scholar 

  5. Dor, O., Reich, Y.: Strengthening learning algorithms by feature discovery. Inf. Sci. 189, 176–190 (2012)

    Article  Google Scholar 

  6. Goldberg, D.E.: Genetic algorithms. Pearson Education India (2006)

    Google Scholar 

  7. Holland, J.H., et al.: Adaptation in Natural and Artificial Systems: An Introductory Analysis with Applications to Biology, Control, and Artificial Intelligence. MIT press, Cambridge (1992)

    Google Scholar 

  8. Horn, F., Pack, R., Rieger, M.: The autofeat python library for automated feature engineering and selection. arXiv preprint arXiv:1901.07329 (2019)

  9. Hosmer Jr, D.W., Lemeshow, S., Sturdivant, R.X.: Applied logistic regression, vol. 398. John Wiley & Sons (2013)

    Google Scholar 

  10. Kanter, J.M., Veeramachaneni, K.: Deep feature synthesis: towards automating data science endeavors, pp. 1–10 (2015)

    Google Scholar 

  11. Katz, G., Shin, E.C.R., Song, D.: ExploreKit: automatic feature generation and selection, pp. 979–984 (2016)

    Google Scholar 

  12. Ke, G., et al.: LightGBM: a highly efficient gradient boosting decision tree. In: Advances in Neural Information Processing Systems, pp. 3146–3154 (2017)

    Google Scholar 

  13. Khurana, U., Samulowitz, H., Turaga, D.S.: Feature engineering for predictive modeling using reinforcement learning, pp. 3407–3414 (2018)

    Google Scholar 

  14. Khurana, U., Turaga, D.S., Samulowitz, H., Parthasrathy, S.: Cognito: automated feature engineering for supervised learning, pp. 1304–1307 (2016)

    Google Scholar 

  15. Lam, H.T., Thiebaut, J., Sinn, M., Chen, B., Mai, T., Alkan, O.: One button machine for automating feature engineering in relational databases. arXiv: Databases (2017)

  16. Luts, J., Ojeda, F., De Plas, R.V., De Moor, B., Van Huffel, S., Suykens, J.A.K.: A tutorial on support vector machine-based methods for classification problems in chemometrics. Anal. Chim. Acta 665(2), 129–145 (2010)

    Article  Google Scholar 

  17. Michalewicz, Z., Schoenauer, M.: Evolutionary algorithms for constrained parameter optimization problems. Evol. Comput. 4(1), 1–32 (1996)

    Article  Google Scholar 

  18. Moritz, P., et al.: Ray: a distributed framework for emerging AI applications, pp. 561–577 (2018)

    Google Scholar 

  19. Sutton, R.S., Barto, A.G.: Reinforcement Learning: An Introduction. MIT press (2018)

    Google Scholar 

  20. Nargesian, F., Samulowitz, H., Khurana, U., Khalil, E.B., Turaga, D.S.: Learning feature engineering for classification, pp. 2529–2535 (2017)

    Google Scholar 

  21. Olson, R.S., Moore, J.H.: TPOT: a tree-based pipeline optimization tool for automating machine learning, pp. 66–74 (2016)

    Google Scholar 

  22. Real, E., Aggarwal, A., Huang, Y., Le, Q.V.: Regularized evolution for image classifier architecture search. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 33, pp. 4780–4789 (2019)

    Google Scholar 

  23. Real, E., et al.: Large-scale evolution of image classifiers. In: Proceedings of the 34th International Conference on Machine Learning, pp. 2902–2911 (2017)

    Google Scholar 

  24. Shcherbakov, M.V., Brebels, A., Shcherbakova, N.L., Tyukov, A.P., Kamaev, V.A.: A survey of forecast error measures. World Appl. Sci. J. 24(24), 171–176 (2013)

    Google Scholar 

  25. Smola, A.J., Scholkopf, B.: A tutorial on support vector regression. Stat. Comput. 14(3), 199–222 (2004)

    Article  MathSciNet  Google Scholar 

  26. Tibshirani, R.: Regression shrinkage and selection via the lasso. J. Royal Stat. Soc.: Ser. B (Methodol.) 58(1), 267–288 (1996)

    Google Scholar 

  27. Zhu, G., Xu, Z., Yuan, C., Huang, Y.: DIFER: differentiable automated feature engineering. In: First Conference on Automated Machine Learning (Main Track) (2022)

    Google Scholar 

Download references

Acknowledgment

This work was supported in part by the National Natural Science Foundation of China (No. 62102177 and No. U1811461), the Natural Science Foundation of Jiangsu Province (No. BK20210181), the Key R &D Program of Jiangsu Province (No. BE2021729), and the Collaborative Innovation Center of Novel Software Technology and Industrialization, Jiangsu, China.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Guanghui Zhu .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Zhu, G., Jiang, S., Guo, X., Yuan, C., Huang, Y. (2022). Evolutionary Automated Feature Engineering. In: Khanna, S., Cao, J., Bai, Q., Xu, G. (eds) PRICAI 2022: Trends in Artificial Intelligence. PRICAI 2022. Lecture Notes in Computer Science, vol 13629. Springer, Cham. https://doi.org/10.1007/978-3-031-20862-1_42

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-20862-1_42

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-20861-4

  • Online ISBN: 978-3-031-20862-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics