Evolutionary Automated Feature Engineering

Zhu, Guanghui; Jiang, Shen; Guo, Xu; Yuan, Chunfeng; Huang, Yihua

doi:10.1007/978-3-031-20862-1_42

Guanghui Zhu¹¹,
Shen Jiang¹¹,
Xu Guo¹¹,
Chunfeng Yuan¹¹ &
…
Yihua Huang¹¹

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 13629))

Included in the following conference series:

Pacific Rim International Conference on Artificial Intelligence

1252 Accesses

Abstract

Effective feature engineering serves as a prerequisite for many machine learning tasks. Feature engineering, which usually uses a series of mathematical functions to transform the features, aims to find valuable new features that can reflect the insight aspect of data. Traditional feature engineering is a labor-intensive and time-consuming task, which depends on expert domain knowledge and requires iterative manner with trial and error. In recent years, many automated feature engineering (AutoFE) methods have been proposed. These methods automatically transform the original features to a set of new features to improve the performance of the machine learning model. However, existing methods either suffer from computational bottleneck, or do not support high-order transformations and various feature types. In this paper, we propose EAAFE, to the best of our knowledge, the first evolutionary algorithm-based automated feature engineering method. We first formalize the AutoFE problem as a search problem of the optimal feature transformation sequence. Then, we leverage roulette wheel selection, subsequence-exchange-based DNA crossover, and \(\epsilon \)-greedy-based DNA mutation to achieve evolution. Despite its simplicity, EAAFE is flexible and effective, which can not only support feature transformations for both numerical and categorical features, but also support high-order feature transformations. Extensive experimental results on public datasets demonstrate that EAAFE outperforms the existing AutoFE methods in both effectiveness and efficiency.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 79.99; Price excludes VAT (USA)

Softcover Book: USD 99.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
EAAFE is available at https://github.com/PasaLab/EAAFE.
2.
https://www.openml.org/.
3.
https://archive.ics.uci.edu/.
4.
https://www.kaggle.com/.

References

Bengio, Y., Courville, A., Vincent, P.: Representation learning: a review and new perspectives. IEEE Trans. Patt. Anal. Mach. Intell. 35(8), 1798–1828 (2013)
Article Google Scholar
Chen, X., Qiao, B., Zhang, W., Wu, W., Zhang, X.: Neural feature search: a neural architecture for automated feature engineering. In: 2019 IEEE International Conference on Data Mining (ICDM) (2019)
Google Scholar
Coello, C.A.C.: Theoretical and numerical constraint-handling techniques used with evolutionary algorithms: a survey of the state of the art. Comput. Methods Appl. Mech. Eng. 191(11–12), 1245–1287 (2002)
Article MathSciNet MATH Google Scholar
Deb, K., Anand, A., Joshi, D.: A computationally efficient evolutionary algorithm for real-parameter optimization. Evol. Comput. 10(4), 371–395 (2002)
Article Google Scholar
Dor, O., Reich, Y.: Strengthening learning algorithms by feature discovery. Inf. Sci. 189, 176–190 (2012)
Article Google Scholar
Goldberg, D.E.: Genetic algorithms. Pearson Education India (2006)
Google Scholar
Holland, J.H., et al.: Adaptation in Natural and Artificial Systems: An Introductory Analysis with Applications to Biology, Control, and Artificial Intelligence. MIT press, Cambridge (1992)
Google Scholar
Horn, F., Pack, R., Rieger, M.: The autofeat python library for automated feature engineering and selection. arXiv preprint arXiv:1901.07329 (2019)
Hosmer Jr, D.W., Lemeshow, S., Sturdivant, R.X.: Applied logistic regression, vol. 398. John Wiley & Sons (2013)
Google Scholar
Kanter, J.M., Veeramachaneni, K.: Deep feature synthesis: towards automating data science endeavors, pp. 1–10 (2015)
Google Scholar
Katz, G., Shin, E.C.R., Song, D.: ExploreKit: automatic feature generation and selection, pp. 979–984 (2016)
Google Scholar
Ke, G., et al.: LightGBM: a highly efficient gradient boosting decision tree. In: Advances in Neural Information Processing Systems, pp. 3146–3154 (2017)
Google Scholar
Khurana, U., Samulowitz, H., Turaga, D.S.: Feature engineering for predictive modeling using reinforcement learning, pp. 3407–3414 (2018)
Google Scholar
Khurana, U., Turaga, D.S., Samulowitz, H., Parthasrathy, S.: Cognito: automated feature engineering for supervised learning, pp. 1304–1307 (2016)
Google Scholar
Lam, H.T., Thiebaut, J., Sinn, M., Chen, B., Mai, T., Alkan, O.: One button machine for automating feature engineering in relational databases. arXiv: Databases (2017)
Luts, J., Ojeda, F., De Plas, R.V., De Moor, B., Van Huffel, S., Suykens, J.A.K.: A tutorial on support vector machine-based methods for classification problems in chemometrics. Anal. Chim. Acta 665(2), 129–145 (2010)
Article Google Scholar
Michalewicz, Z., Schoenauer, M.: Evolutionary algorithms for constrained parameter optimization problems. Evol. Comput. 4(1), 1–32 (1996)
Article Google Scholar
Moritz, P., et al.: Ray: a distributed framework for emerging AI applications, pp. 561–577 (2018)
Google Scholar
Sutton, R.S., Barto, A.G.: Reinforcement Learning: An Introduction. MIT press (2018)
Google Scholar
Nargesian, F., Samulowitz, H., Khurana, U., Khalil, E.B., Turaga, D.S.: Learning feature engineering for classification, pp. 2529–2535 (2017)
Google Scholar
Olson, R.S., Moore, J.H.: TPOT: a tree-based pipeline optimization tool for automating machine learning, pp. 66–74 (2016)
Google Scholar
Real, E., Aggarwal, A., Huang, Y., Le, Q.V.: Regularized evolution for image classifier architecture search. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 33, pp. 4780–4789 (2019)
Google Scholar
Real, E., et al.: Large-scale evolution of image classifiers. In: Proceedings of the 34th International Conference on Machine Learning, pp. 2902–2911 (2017)
Google Scholar
Shcherbakov, M.V., Brebels, A., Shcherbakova, N.L., Tyukov, A.P., Kamaev, V.A.: A survey of forecast error measures. World Appl. Sci. J. 24(24), 171–176 (2013)
Google Scholar
Smola, A.J., Scholkopf, B.: A tutorial on support vector regression. Stat. Comput. 14(3), 199–222 (2004)
Article MathSciNet Google Scholar
Tibshirani, R.: Regression shrinkage and selection via the lasso. J. Royal Stat. Soc.: Ser. B (Methodol.) 58(1), 267–288 (1996)
Google Scholar
Zhu, G., Xu, Z., Yuan, C., Huang, Y.: DIFER: differentiable automated feature engineering. In: First Conference on Automated Machine Learning (Main Track) (2022)
Google Scholar

Download references

Acknowledgment

This work was supported in part by the National Natural Science Foundation of China (No. 62102177 and No. U1811461), the Natural Science Foundation of Jiangsu Province (No. BK20210181), the Key R &D Program of Jiangsu Province (No. BE2021729), and the Collaborative Innovation Center of Novel Software Technology and Industrialization, Jiangsu, China.

Author information

Authors and Affiliations

State Key Laboratory for Novel Software Technology, Nanjing University, Nanjing, China
Guanghui Zhu, Shen Jiang, Xu Guo, Chunfeng Yuan & Yihua Huang

Authors

Guanghui Zhu
View author publications
You can also search for this author in PubMed Google Scholar
Shen Jiang
View author publications
You can also search for this author in PubMed Google Scholar
Xu Guo
View author publications
You can also search for this author in PubMed Google Scholar
Chunfeng Yuan
View author publications
You can also search for this author in PubMed Google Scholar
Yihua Huang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Guanghui Zhu .

Editor information

Editors and Affiliations

CSIRO Australian e-Health Research Centre, Brisbane, QLD, Australia
Sankalp Khanna
Shanghai Jiao Tong University, Shanghai, China
Jian Cao
University of Tasmania, Hobart, TAS, Australia
Quan Bai
University of Technology Sydney, Sydney, NSW, Australia
Guandong Xu

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Zhu, G., Jiang, S., Guo, X., Yuan, C., Huang, Y. (2022). Evolutionary Automated Feature Engineering. In: Khanna, S., Cao, J., Bai, Q., Xu, G. (eds) PRICAI 2022: Trends in Artificial Intelligence. PRICAI 2022. Lecture Notes in Computer Science, vol 13629. Springer, Cham. https://doi.org/10.1007/978-3-031-20862-1_42

Download citation

DOI: https://doi.org/10.1007/978-3-031-20862-1_42
Published: 04 November 2022
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-20861-4
Online ISBN: 978-3-031-20862-1
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics