Evaluating Models with Dynamic Sampling Holdout

Larcher, Celio H. N.; Barbosa, Helio J. C.

doi:10.1007/978-3-030-72699-7_46

Celio H. N. Larcher Jr¹⁰ &
Helio J. C. Barbosa¹⁰

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 12694))

Included in the following conference series:

International Conference on the Applications of Evolutionary Computation (Part of EvoStar)

1562 Accesses

Abstract

Automated Machine Learning (Auto-ML) is a growing field where several techniques are being developed to address the question of how to automate the process of defining machine learning pipelines, using diverse types of approaches and with relative success, but still, being far from solved. Among these still unsolved questions, the computational cost is one of the major issues. In this context, evaluating a model takes a lot of time and resources, and yet that is still a step that has not received much attention in the Auto-ML literature.

In this sense, this work revisits the Auto-CVE (Automated Coevolutionary Voting Ensemble) and proposes a new method for model evaluation: the dynamic sampling holdout. When compared to the regular Auto-CVE with cross-validation and the popular TPOT (Tree-based Pipeline Optimization Tool) algorithm, Auto-CVE with dynamic holdout shows competitive results in both predictive performance and computing time.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 129.00; Price excludes VAT (USA)

Softcover Book: USD 169.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Balaji, A., Allen, A.: Benchmarking Automatic Machine Learning Frameworks. CoRR abs/1808.0 (2018)
Google Scholar
Cawley, G.C., Talbot, N.L.: On over-fitting in model selection and subsequent selection bias in performance evaluation. JMLR 11, 2079–2107 (2010)
MathSciNet MATH Google Scholar
Chen, T., Guestrin, C.: XGBoost: A Scalable Tree Boosting System. In: Proceedings of KDD 2016, vol. 19, pp. 785–794 (2016)
Google Scholar
DeCastro-García, N., Castañeda, Á.L.M., García, D.E., Carriegos, M.V.: Effect of the Sampling of a Dataset in the Hyperparameter Optimization Phase over the Efficiency of a Machine Learning Algorithm. Complexity 2019, pp. 1–16 (February 2019)
Google Scholar
Demšar, J.: Statistical comparisons of classifiers over multiple data sets. JMLR 7, 1–30 (2006)
MathSciNet MATH Google Scholar
Dietterich, T.G.: Ensemble methods in machine learning. In: Kittler, J., Roli, F. (eds.) MCS 2000. LNCS, vol. 1857, pp. 1–15. Springer, Heidelberg (2000). https://doi.org/10.1007/3-540-45014-9_1
Chapter Google Scholar
Fabris, F., Freitas, A.A.: Analysing the overfit of the auto-sklearn automated machine learning tool. In: Nicosia, G., Pardalos, P., Umeton, R., Giuffrida, G., Sciacca, V. (eds.) LOD 2019. LNCS, vol. 11943, pp. 508–520. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-37599-7_42
Chapter Google Scholar
Feurer, M., Eggensperger, K., Falkner, S., Lindauer, M., Hutter, F.: Auto-Sklearn 2.0: The Next Generation. arXiv abs/2007.0, 1–18 (2020)
Google Scholar
Gil, Y., et al.: P4ML: A phased performance-based pipeline planner for automated machine learning. In: Proceedings of ICML 2018, AutoML Workshop (2018)
Google Scholar
Guyon, I., et al.: Design of the 2015 ChaLearn AutoML challenge. In: Proceedings of IJCNN 2015, pp. 1–8. IEEE (July 2015)
Google Scholar
Larcher, Jr., C.H.N., Barbosa, H.J.C.: Auto-CVE: a coevolutionary approach to evolve ensembles in Automated Machine Learning. In: Proceedings of GECCO 2019, pp. 392–400 (2019)
Google Scholar
Lévesque, J.C.: Bayesian Hyperparameter Optimization: Overfitting, Ensembles and Conditional Spaces. Ph.D. thesis, Université Laval (2018)
Google Scholar
Li, L., Jamieson, K., DeSalvo, G., Rostamizadeh, A., Talwalkar, A.: Hyperband: Bandit-based configuration evaluation for hyperparameter optimization. In: Proceedings of ICLR 2017, pp. 1–15 (2016)
Google Scholar
Mahfoud, S.W.: Crowding and preselection revisited. In: Parallel Problem Solving From Nature, pp. 27–36. North-Holland (1992)
Google Scholar
Olson, R.S., Moore, J.H.: TPOT: A tree-based pipeline optimization tool for automating machine learning. In: Proceedings of ICML 2016, AutoML Workshop, pp. 66–74 (2016)
Google Scholar
Pedregosa, F., et al.: Scikit-learn: machine learning in python. JMLR 12, 2825–2830 (2011)
MathSciNet MATH Google Scholar
de Sá, A.G.C., Pinto, W.J.G.S., Oliveira, L.O.V.B., Pappa, G.L.: RECIPE: a grammar-based framework for automatically evolving classification pipelines. In: Genetic Programming, pp. 246–261 (2017)
Google Scholar
Whigham, P.A.: Grammatically-based genetic programming. In: Proceedings of the Workshop on Genetic Programming: From Theory to Real-World Applications, vol. 16, pp. 33–41 (1995)
Google Scholar

Download references

Author information

Authors and Affiliations

Laboratório Nacional de Computação Científica, Petrópolis, RJ, Brazil
Celio H. N. Larcher Jr & Helio J. C. Barbosa

Authors

Celio H. N. Larcher Jr
View author publications
You can also search for this author in PubMed Google Scholar
Helio J. C. Barbosa
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Celio H. N. Larcher Jr .

Editor information

Editors and Affiliations

ETSIIT-CITIC, University of Granada, Granada, Spain
Pedro A. Castillo
Université Le Havre Normandie, Le Havre, France
Juan Luis Jiménez Laredo

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Larcher, C.H.N., Barbosa, H.J.C. (2021). Evaluating Models with Dynamic Sampling Holdout. In: Castillo, P.A., Jiménez Laredo, J.L. (eds) Applications of Evolutionary Computation. EvoApplications 2021. Lecture Notes in Computer Science(), vol 12694. Springer, Cham. https://doi.org/10.1007/978-3-030-72699-7_46

Download citation

DOI: https://doi.org/10.1007/978-3-030-72699-7_46
Published: 01 April 2021
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-72698-0
Online ISBN: 978-3-030-72699-7
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics