Abstract
Automated Machine Learning (Auto-ML) is a growing field where several techniques are being developed to address the question of how to automate the process of defining machine learning pipelines, using diverse types of approaches and with relative success, but still, being far from solved. Among these still unsolved questions, the computational cost is one of the major issues. In this context, evaluating a model takes a lot of time and resources, and yet that is still a step that has not received much attention in the Auto-ML literature.
In this sense, this work revisits the Auto-CVE (Automated Coevolutionary Voting Ensemble) and proposes a new method for model evaluation: the dynamic sampling holdout. When compared to the regular Auto-CVE with cross-validation and the popular TPOT (Tree-based Pipeline Optimization Tool) algorithm, Auto-CVE with dynamic holdout shows competitive results in both predictive performance and computing time.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Balaji, A., Allen, A.: Benchmarking Automatic Machine Learning Frameworks. CoRR abs/1808.0 (2018)
Cawley, G.C., Talbot, N.L.: On over-fitting in model selection and subsequent selection bias in performance evaluation. JMLR 11, 2079–2107 (2010)
Chen, T., Guestrin, C.: XGBoost: A Scalable Tree Boosting System. In: Proceedings of KDD 2016, vol. 19, pp. 785–794 (2016)
DeCastro-García, N., Castañeda, Á.L.M., García, D.E., Carriegos, M.V.: Effect of the Sampling of a Dataset in the Hyperparameter Optimization Phase over the Efficiency of a Machine Learning Algorithm. Complexity 2019, pp. 1–16 (February 2019)
Demšar, J.: Statistical comparisons of classifiers over multiple data sets. JMLR 7, 1–30 (2006)
Dietterich, T.G.: Ensemble methods in machine learning. In: Kittler, J., Roli, F. (eds.) MCS 2000. LNCS, vol. 1857, pp. 1–15. Springer, Heidelberg (2000). https://doi.org/10.1007/3-540-45014-9_1
Fabris, F., Freitas, A.A.: Analysing the overfit of the auto-sklearn automated machine learning tool. In: Nicosia, G., Pardalos, P., Umeton, R., Giuffrida, G., Sciacca, V. (eds.) LOD 2019. LNCS, vol. 11943, pp. 508–520. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-37599-7_42
Feurer, M., Eggensperger, K., Falkner, S., Lindauer, M., Hutter, F.: Auto-Sklearn 2.0: The Next Generation. arXiv abs/2007.0, 1–18 (2020)
Gil, Y., et al.: P4ML: A phased performance-based pipeline planner for automated machine learning. In: Proceedings of ICML 2018, AutoML Workshop (2018)
Guyon, I., et al.: Design of the 2015 ChaLearn AutoML challenge. In: Proceedings of IJCNN 2015, pp. 1–8. IEEE (July 2015)
Larcher, Jr., C.H.N., Barbosa, H.J.C.: Auto-CVE: a coevolutionary approach to evolve ensembles in Automated Machine Learning. In: Proceedings of GECCO 2019, pp. 392–400 (2019)
Lévesque, J.C.: Bayesian Hyperparameter Optimization: Overfitting, Ensembles and Conditional Spaces. Ph.D. thesis, Université Laval (2018)
Li, L., Jamieson, K., DeSalvo, G., Rostamizadeh, A., Talwalkar, A.: Hyperband: Bandit-based configuration evaluation for hyperparameter optimization. In: Proceedings of ICLR 2017, pp. 1–15 (2016)
Mahfoud, S.W.: Crowding and preselection revisited. In: Parallel Problem Solving From Nature, pp. 27–36. North-Holland (1992)
Olson, R.S., Moore, J.H.: TPOT: A tree-based pipeline optimization tool for automating machine learning. In: Proceedings of ICML 2016, AutoML Workshop, pp. 66–74 (2016)
Pedregosa, F., et al.: Scikit-learn: machine learning in python. JMLR 12, 2825–2830 (2011)
de Sá, A.G.C., Pinto, W.J.G.S., Oliveira, L.O.V.B., Pappa, G.L.: RECIPE: a grammar-based framework for automatically evolving classification pipelines. In: Genetic Programming, pp. 246–261 (2017)
Whigham, P.A.: Grammatically-based genetic programming. In: Proceedings of the Workshop on Genetic Programming: From Theory to Real-World Applications, vol. 16, pp. 33–41 (1995)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2021 Springer Nature Switzerland AG
About this paper
Cite this paper
Larcher, C.H.N., Barbosa, H.J.C. (2021). Evaluating Models with Dynamic Sampling Holdout. In: Castillo, P.A., Jiménez Laredo, J.L. (eds) Applications of Evolutionary Computation. EvoApplications 2021. Lecture Notes in Computer Science(), vol 12694. Springer, Cham. https://doi.org/10.1007/978-3-030-72699-7_46
Download citation
DOI: https://doi.org/10.1007/978-3-030-72699-7_46
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-72698-0
Online ISBN: 978-3-030-72699-7
eBook Packages: Computer ScienceComputer Science (R0)