Skip to main content

Evaluating Models with Dynamic Sampling Holdout

  • Conference paper
  • First Online:
Applications of Evolutionary Computation (EvoApplications 2021)

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 12694))

  • 1562 Accesses

Abstract

Automated Machine Learning (Auto-ML) is a growing field where several techniques are being developed to address the question of how to automate the process of defining machine learning pipelines, using diverse types of approaches and with relative success, but still, being far from solved. Among these still unsolved questions, the computational cost is one of the major issues. In this context, evaluating a model takes a lot of time and resources, and yet that is still a step that has not received much attention in the Auto-ML literature.

In this sense, this work revisits the Auto-CVE (Automated Coevolutionary Voting Ensemble) and proposes a new method for model evaluation: the dynamic sampling holdout. When compared to the regular Auto-CVE with cross-validation and the popular TPOT (Tree-based Pipeline Optimization Tool) algorithm, Auto-CVE with dynamic holdout shows competitive results in both predictive performance and computing time.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 129.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 169.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Balaji, A., Allen, A.: Benchmarking Automatic Machine Learning Frameworks. CoRR abs/1808.0 (2018)

    Google Scholar 

  2. Cawley, G.C., Talbot, N.L.: On over-fitting in model selection and subsequent selection bias in performance evaluation. JMLR 11, 2079–2107 (2010)

    MathSciNet  MATH  Google Scholar 

  3. Chen, T., Guestrin, C.: XGBoost: A Scalable Tree Boosting System. In: Proceedings of KDD 2016, vol. 19, pp. 785–794 (2016)

    Google Scholar 

  4. DeCastro-García, N., Castañeda, Á.L.M., García, D.E., Carriegos, M.V.: Effect of the Sampling of a Dataset in the Hyperparameter Optimization Phase over the Efficiency of a Machine Learning Algorithm. Complexity 2019, pp. 1–16 (February 2019)

    Google Scholar 

  5. Demšar, J.: Statistical comparisons of classifiers over multiple data sets. JMLR 7, 1–30 (2006)

    MathSciNet  MATH  Google Scholar 

  6. Dietterich, T.G.: Ensemble methods in machine learning. In: Kittler, J., Roli, F. (eds.) MCS 2000. LNCS, vol. 1857, pp. 1–15. Springer, Heidelberg (2000). https://doi.org/10.1007/3-540-45014-9_1

    Chapter  Google Scholar 

  7. Fabris, F., Freitas, A.A.: Analysing the overfit of the auto-sklearn automated machine learning tool. In: Nicosia, G., Pardalos, P., Umeton, R., Giuffrida, G., Sciacca, V. (eds.) LOD 2019. LNCS, vol. 11943, pp. 508–520. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-37599-7_42

    Chapter  Google Scholar 

  8. Feurer, M., Eggensperger, K., Falkner, S., Lindauer, M., Hutter, F.: Auto-Sklearn 2.0: The Next Generation. arXiv abs/2007.0, 1–18 (2020)

    Google Scholar 

  9. Gil, Y., et al.: P4ML: A phased performance-based pipeline planner for automated machine learning. In: Proceedings of ICML 2018, AutoML Workshop (2018)

    Google Scholar 

  10. Guyon, I., et al.: Design of the 2015 ChaLearn AutoML challenge. In: Proceedings of IJCNN 2015, pp. 1–8. IEEE (July 2015)

    Google Scholar 

  11. Larcher, Jr., C.H.N., Barbosa, H.J.C.: Auto-CVE: a coevolutionary approach to evolve ensembles in Automated Machine Learning. In: Proceedings of GECCO 2019, pp. 392–400 (2019)

    Google Scholar 

  12. Lévesque, J.C.: Bayesian Hyperparameter Optimization: Overfitting, Ensembles and Conditional Spaces. Ph.D. thesis, Université Laval (2018)

    Google Scholar 

  13. Li, L., Jamieson, K., DeSalvo, G., Rostamizadeh, A., Talwalkar, A.: Hyperband: Bandit-based configuration evaluation for hyperparameter optimization. In: Proceedings of ICLR 2017, pp. 1–15 (2016)

    Google Scholar 

  14. Mahfoud, S.W.: Crowding and preselection revisited. In: Parallel Problem Solving From Nature, pp. 27–36. North-Holland (1992)

    Google Scholar 

  15. Olson, R.S., Moore, J.H.: TPOT: A tree-based pipeline optimization tool for automating machine learning. In: Proceedings of ICML 2016, AutoML Workshop, pp. 66–74 (2016)

    Google Scholar 

  16. Pedregosa, F., et al.: Scikit-learn: machine learning in python. JMLR 12, 2825–2830 (2011)

    MathSciNet  MATH  Google Scholar 

  17. de Sá, A.G.C., Pinto, W.J.G.S., Oliveira, L.O.V.B., Pappa, G.L.: RECIPE: a grammar-based framework for automatically evolving classification pipelines. In: Genetic Programming, pp. 246–261 (2017)

    Google Scholar 

  18. Whigham, P.A.: Grammatically-based genetic programming. In: Proceedings of the Workshop on Genetic Programming: From Theory to Real-World Applications, vol. 16, pp. 33–41 (1995)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Celio H. N. Larcher Jr .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2021 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Larcher, C.H.N., Barbosa, H.J.C. (2021). Evaluating Models with Dynamic Sampling Holdout. In: Castillo, P.A., Jiménez Laredo, J.L. (eds) Applications of Evolutionary Computation. EvoApplications 2021. Lecture Notes in Computer Science(), vol 12694. Springer, Cham. https://doi.org/10.1007/978-3-030-72699-7_46

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-72699-7_46

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-72698-0

  • Online ISBN: 978-3-030-72699-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics