research-article

AutoML Loss Landscapes

Authors:

Holger HoosAuthors Info & Claims

ACM Transactions on Evolutionary Learning, Volume 2, Issue 3

Article No.: 10, Pages 1 - 30

https://doi.org/10.1145/3558774

Published: 22 November 2022 Publication History

Abstract

As interest in machine learning and its applications becomes more widespread, how to choose the best models and hyper-parameter settings becomes more important. This problem is known to be challenging for human experts, and consequently, a growing number of methods have been proposed for solving it, giving rise to the area of automated machine learning (AutoML). Many of the most popular AutoML methods are based on Bayesian optimization, which makes only weak assumptions about how modifying hyper-parameters effects the loss of a model. This is a safe assumption that yields robust methods, as the AutoML loss landscapes that relate hyper-parameter settings to loss are poorly understood. We build on recent work on the study of one-dimensional slices of algorithm configuration landscapes by introducing new methods that test n-dimensional landscapes for statistical deviations from uni-modality and convexity, and we use them to show that a diverse set of AutoML loss landscapes are highly structured. We introduce a method for assessing the significance of hyper-parameter partial derivatives, which reveals that most (but not all) AutoML loss landscapes only have a small number of hyper-parameters that interact strongly. To further assess hyper-parameter interactions, we introduce a simplistic optimization procedure that assumes each hyper-parameter can be optimized independently, a single time in sequence, and we show that it obtains configurations that are statistically tied with optimal in all of the n-dimensional AutoML loss landscapes that we studied. Our results suggest many possible new directions for substantially improving the state of the art in AutoML.

References

[1]

Tinus Abell, Yuri Malitsky, and Kevin Tierney. 2012. Fitness Landscape Based Features for Exploiting Black-Box Optimization Problem Structure. IT University of Copenhagen.

[2]

Christof Angermueller, Tanel Pärnamaa, Leopold Parts, and Oliver Stegle. 2016. Deep learning for computational biology. Molecular Systems Biology 12, 7 (2016), 878.

[3]

Nacim Belkhir, Johann Dréo, Pierre Savéant, and Marc Schoenauer. 2016. Feature based algorithm configuration: A case study with differential evolution. In Proceedings of the 14th International Conference on Parallel Problem Solving from Nature (PPSN’16). 156–166.

[4]

Mikhail Belkin, Daniel Hsu, Siyuan Ma, and Soumik Mandal. 2019. Reconciling modern machine-learning practice and the classical bias–variance trade-off. Proceedings of the National Academy of Sciences 116, 32 (2019), 15849–15854.

[5]

James S. Bergstra, Rémi Bardenet, Yoshua Bengio, and Balázs Kégl. 2011. Algorithms for hyper-parameter optimization. In Proceedings of the 25th Conference on Neural Information Processing Systems (NeurIPS’11). 2546–2554.

[6]

James S. Bergstra and Yoshua Bengio. 2012. Random search for hyper-parameter optimization. Journal of Machine Learning Research 13, 10 (2012), 281–305.

Digital Library

[7]

André Biedenkapp, Marius Lindauer, Katharina Eggensperger, Frank Hutter, Chris Fawcett, and Holger H. Hoos. 2017. Efficient parameter importance analysis via ablation with surrogates. In Proceedings of the 31st AAAI Conference on Artificial Intelligence (AAAI’17).

[8]

André Biedenkapp, Joshua Marben, Marius Lindauer, and Frank Hutter. 2018. CAVE: Configuration assessment, visualization and evaluation. In Proceedings of the 12th International Conference on Learning and Intelligent Optimization (LION’18).

[9]

Leo Breiman. 2001. Random forests. Machine Learning 45, 1 (2001), 5–32.

Digital Library

[10]

Tianqi Chen and Carlos Guestrin. 2016. XGBoost: A scalable tree boosting system. In Proceedings of the 22nd ACM International Conference on Knowledge Discovery and Data Mining (SIGKDD’16). 785–794.

Digital Library

[11]

Andrea Coraddu, Luca Oneto, Aessandro Ghio, Stefano Savio, Davide Anguita, and Massimo Figari. 2016. Machine learning approaches for improving condition-based maintenance of naval propulsion plants. Proceedings of the Institution of Mechanical Engineers, Part M: Journal of Engineering for the Maritime Environment 230, 1 (2016), 136–153.

[12]

Corinna Cortes and Vladimir Vapnik. 1995. Support-vector networks. Machine Learning 20, 3 (1995), 273–297.

[13]

Pierre Simon de Laplace. 1820. Théorie analytique des probabilités. Vol. 7. Courcier.

[14]

Bilel Derbel, Arnaud Liefooghe, Sébastien Vérel, Hernan Aguirre, and Kiyoshi Tanaka. 2019. New features for continuous exploratory landscape analysis based on the SOO tree. In Proceedings of the 15th ACM/SIGEVO Conference on Foundations of Genetic Algorithms (FOGA’19). 72–86.

Digital Library

[15]

Dheeru Dua and Casey Graff. 2017. UCI Machine Learning Repository. Retrieved September 4, 2022 from http://archive.ics.uci.edu/ml.

[16]

Katharina Eggensperger, Matthias Feurer, Frank Hutter, James S. Bergstra, Jasper Snoek, Holger H. Hoos, and Kevin Leyton-Brown. 2013. Towards an empirical foundation for assessing Bayesian optimization of hyperparameters. In Proceedings of the NeurIPS Workshop on Bayesian Optimization in Theory and Practice.

[17]

Bashkansky Emil and Gadrich Tamar. 2013. Some statistical aspects of binary measuring systems. Measurement 46, 6 (2013), 1922–1927.

[18]

Stefan Falkner, Aaron Klein, and Frank Hutter. 2018. BOHB: Robust and efficient hyperparameter optimization at scale. In Proceedings of the 35th International Conference on Machine Learning (ICML’18). 1437–1446.

[19]

Chris Fawcett and Holger H. Hoos. 2016. Analysing differences between algorithm configurations through ablation. Journal of Heuristics 22, 4 (2016), 431–458.

Digital Library

[20]

Matthias Feurer, Aaron Klein, Katharina Eggensperger, Jost Springenberg, Manuel Blum, and Frank Hutter. 2015. Efficient and robust automated machine learning. In Proceedings of the 29th Conference on Neural Information Processing Systems (NeurIPS’15). 2962–2970.

[21]

Matthias Feurer, Aaron Klein, Katharina Eggensperger, Jost T. Springenberg, Manuel Blum, and Frank Hutter. 2019. Auto-sklearn: Efficient and robust automated machine learning. In Automated Machine Learning. Springer, Cham, Switzerland, 113–134.

[22]

Nicolo Fusi, Rishit Sheth, and Melih Elibol. 2018. Probabilistic matrix factorization for automated machine learning. In Proceedings of the 32nd Conference on Neural Information Processing Systems (NeurIPS’18). 3352–3361.

[23]

Unai Garciarena, Roberto Santana, and Alexander Mendiburu. 2018. Analysis of the complexity of the automatic pipeline generation problem. In Proceedings of the IEEE Congress on Evolutionary Computation (CEC’18). IEEE, Los Alamitos, CA, 1–8.

Digital Library

[24]

Franz Graf, Hans-Peter Kriegel, Matthias Schubert, Sebastian Pölsterl, and Alexander Cavallaro. 2011. 2D image registration in CT images using radial image descriptors. In Proceedings of the 14th International Conference on Medical Image Computing and Computer-Assisted Intervention (MICCAI’11). 607–614.

[25]

George T. Hall, Pietro S. Oliveto, and Dirk Sudholt. 2020. Fast perturbative algorithm configurators. In Proceedings of the 16th International Conference on Parallel Problem Solving from Nature (PPSN’20). 19–32.

Digital Library

[26]

Matthew Hoffman, Francis R. Bach, and David M. Blei. 2010. Online learning for latent Dirichlet allocation. In Proceedings of the 24th Conference on Neural Information Processing Systems (NeurIPS’10). 856–864.

[27]

Holger H. Hoos and Thomas Stützle. 2005. Stochastic Local Search: Foundations & Applications. Morgan Kaufmann.

[28]

Frank Hutter, Holger H. Hoos, and Kevin Leyton-Brown. 2013. Identifying key algorithm parameters and instance features using forward selection. In Proceedings of the 7th International Conference on Learning and Intelligent Optimization (LION’13). 364–381.

Digital Library

[29]

Frank Hutter, Holger H. Hoos, and Kevin Leyton-Brown. 2014a. An efficient approach for assessing hyperparameter importance. In Proceedings of the 31st International Conference on Machine Learning (ICML’14). 754–762.

[30]

Frank Hutter, Manuel López-Ibáñez, Chris Fawcett, Marius Lindauer, Holger H. Hoos, Kevin Leyton-Brown, and Thomas Stützle. 2014b. AClib: A benchmark library for algorithm configuration. In Proceedings of the 14th International Conference on Learning and Intelligent Optimization (LION’14). 36–40.

[31]

Terry Jones and Stephanie Forrest. 1995. Fitness distance correlation as a measure of problem difficulty for genetic algorithms. In Proceedings of the 6th International Conference on Genetic Algorithms (ICGA’95), Vol. 95. 184–192.

[32]

Kirthevasan Kandasamy, Gautam Dasarathy, Jeff Schneider, and Barnabás Póczos. 2017. Multi-fidelity Bayesian optimisation with continuous approximations. In Proceedings of the 34th International Conference on Machine Learning (ICML’17). 1799–1808.

[33]

Kirthevasan Kandasamy, Jeff Schneider, and Barnabás Póczos. 2015. High dimensional Bayesian optimisation and bandits via additive models. In Proceedings of the 32nd International Conference on Machine Learning (ICML’15). 295–304.

[34]

Pascal Kerschke, Holger H. Hoos, Frank Neumann, and Heike Trautmann. 2019. Automated algorithm selection: Survey and perspectives. Evolutionary Computation 27, 1 (2019), 3–45.

Digital Library

[35]

Jack Kiefer. 1953. Sequential minimax search for a maximum. Proceedings of the American Mathematical Society 4, 3 (1953), 502–506.

[36]

Aaron Klein and Frank Hutter. 2019. Tabular benchmarks for joint architecture and hyperparameter optimization. arXiv preprint arXiv:1905.04970.

[37]

Alex Krizhevsky, Ilya Sutskever, and Geoffrey E. Hinton. 2012. ImageNet classification with deep convolutional neural networks. In Proceedings of the 26th Conference on Neural Information Processing Systems (NeurIPS’12). 1097–1105.

[38]

Yann LeCun, Yoshua Bengio, and Geoffrey Hinton. 2015. Deep learning. Nature 521, 7553 (2015), 436–444.

[39]

Yann LeCun, Léon Bottou, Yoshua Bengio, and Patrick Haffner. 1998. Gradient-based learning applied to document recognition. Proceedings of the IEEE 86, 11 (1998), 2278–2324.

[40]

Lisha Li, Kevin Jamieson, Giulia DeSalvo, Afshin Rostamizadeh, and Ameet Talwalkar. 2017. Hyperband: A novel bandit-based approach to hyperparameter optimization. Journal of Machine Learning Research 18, 1 (2017), 6765–6816.

Digital Library

[41]

Liam Li, Kevin Jamieson, Afshin Rostamizadeh, Ekaterina Gonina, Jonathan Ben-Tzur, Moritz Hardt, Benjamin Recht, and Ameet Talwalkar. 2020. A system for massively parallel hyperparameter tuning. In Proceedings of Machine Learning and Systems (MLSys’20), Vol. 2. 230–246.

[42]

Katherine M. Malan. 2018. Landscape-aware constraint handling applied to differential evolution. In Proceedings of the 7th International Conference on Theory and Practice of Natural Computing (TPNC’18). 176–187.

Digital Library

[43]

Katherine M. Malan. 2021. A survey of advances in landscape analysis for optimisation. Algorithms 14, 2 (2021), 40.

[44]

Katherine M. Malan and Andries P. Engelbrecht. 2013. A survey of techniques for characterising fitness landscapes and some possible ways forward. Information Sciences 241 (2013), 148–163.

Digital Library

[45]

Olaf Mersmann, Bernd Bischl, Heike Trautmann, Mike Preuss, Claus Weihs, and Günter Rudolph. 2011. Exploratory landscape analysis. In Proceedings of the 13th Annual Conference on Genetic and Evolutionary Computation (GECCO’11). 829–836.

Digital Library

[46]

Kevin Miller, M. Pawan Kumar, Ben Packer, Danny Goodman, and Daphne Koller. 2012. Max-margin min-entropy models. In Proceedings of the 15th International Conference on Artificial Intelligence and Statistics (AIStats’12). 779–787.

[47]

Matheus Nunes, Paulo M. Fraga, and Gisele L. Pappa. 2021. Fitness landscape analysis of graph neural network architecture search spaces. In Proceedings of the 23rd International Genetic and Evolutionary Computation Conference (GECCO’21). 876–884.

Digital Library

[48]

Cristiano G. Pimenta, Alex G. C. de Sá, Gabriela Ochoa, and Gisele L. Pappa. 2020. Fitness landscape analysis of automated machine learning search spaces. In Proceedings of the 20th European Conference on Evolutionary Computation in Combinatorial Optimization (EVOCOP’20). 114–130.

Digital Library

[49]

Erik Pitzer and Michael Affenzeller. 2012. A comprehensive survey on fitness landscape analysis. In Recent Advances in Intelligent Engineering Systems. Studies in Computational Intelligence, Vol. 378. Springer, 161–191.

[50]

Yasha Pushak and Holger H. Hoos. 2018. Algorithm configuration landscapes: More benign than expected? In Proceedings of the 15th International Conference on Parallel Problem Solving from Nature (PPSN’18). 271–283.

[51]

Yasha Pushak and Holger H. Hoos. 2020. Golden parameter search: Exploiting structure to quickly configure parameters in parallel. In Proceedings of the 22nd International Genetic and Evolutionary Computation Conference (GECCO’20).

Digital Library

[52]

Anna Rakitianskaia, Eduan Bekker, Katherine M. Malan, and Andries Engelbrecht. 2016. Analysis of error landscapes in multi-layered neural networks for classification. In Proceedings of the IEEE Congress on Evolutionary Computation (CEC’16). IEEE, Los Alamitos, CA, 5270–5277.

Digital Library

[53]

Prashant S. Rana. 2013. Physicochemical Properties of Protein Tertiary Structure Data Set. Retrieved September 4, 2022 from https://archive.ics.uci.edu/ml/datasets/Physicochemical+Properties+of+Protein+Tertiary+Structure.

[54]

Christian M. Reidys and Peter F. Stadler. 2001. Neutrality in fitness landscapes. Applied Mathematics and Computation 117, 2-3 (2001), 321–350.

Digital Library

[55]

Nuno M. Rodrigues, Sara Silva, and Leonardo Vanneschi. 2020. A study of fitness landscapes for neuroevolution. In Proceedings of the IEEE Congress on Evolutionary Computation (CEC’20). IEEE, Los Alamitos, CA, 1–8.

Digital Library

[56]

Samuel S. Shapiro and Martin B. Wilk. 1965. An analysis of variance test for normality (complete samples). Biometrika 52, 3/4 (1965), 591–611.

[57]

David Silver, Aja Huang, Chris J. Maddison, Arthur Guez, Laurent Sifre, George van den Driessche, Julian Schrittwieser, et al. 2016. Mastering the game of Go with deep neural networks and tree search. Nature 529 (2016), 484–489.

[58]

Jasper Snoek, Hugo Larochelle, and Ryan P. Adams. 2012. Practical Bayesian optimization of machine learning algorithms. In Proceedings of the 26th Conference on Neural Information Processing Systems (NeurIPS’12). 2951–2959.

[59]

Jost T. Springenberg, Aaron Klein, Stefan Falkner, and Frank Hutter. 2016. Bayesian optimization with robust Bayesian neural networks. In Proceedings of the 30th Conference on Neural Information Processing Systems (NeurIPS’16). 4134–4142.

[60]

Athanasios Tsanas, Max A. Little, Patrick E. McSharry, and Lorraine O. Ramig. 2010. Accurate telemonitoring of Parkinson’s disease progression by noninvasive speech tests. IEEE Transactions on Biomedical Engineering 57, 4 (2010), 884–893.

[61]

Willem A. van Aardt, Anna S. Bosman, and Katherine M. Malan. 2017. Characterising neutrality in neural network error landscapes. In Proceedings of the IEEE Congress on Evolutionary Computation (CEC’17). IEEE, Los Alamitos, CA, 1374–1381.

[62]

Jean-Paul Watson. 2010. An introduction to fitness landscape analysis and cost models for local search. In Handbook of Metaheuristics. Springer International, 599–623.

[63]

Sewall Wright. 1932. The roles of mutation, inbreeding, crossbreeding, and selection in evolution. Proceedings of the 11th International Congress of Genetics 8 (1932), 209–222.

[64]

Stephen J. Wright. 2015. Coordinate descent algorithms. Mathematical Programming 151, 1 (2015), 3–34.

Digital Library

[65]

Anatoly Yakovlev, Hesam F. Moghadam, Ali Moharrer, Jingxiao Cai, Nikan Chavoshi, Venkatanathan Varadarajan, Sandeep R. Agrawal, et al. 2020. Oracle AutoML: A fast and predictive AutoML pipeline. Proceedings of VLDB Endowment 13, 12 (2020), 3166–3180.

Digital Library

[66]

Chengrun Yang, Yuji Akimoto, Dae W. Kim, and Madeleine Udell. 2019. OBOE: Collaborative filtering for AutoML model selection. In Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (SIGKDD’19). 1173–1183.

Digital Library

[67]

Fatjon Zogaj, José Pablo Cambronero, Martin C. Rinard, and Jürgen Cito. 2021. Doing more with less: Characterizing dataset downsampling for AutoML. In Proceedings of the 47th International Conference on Very Large Data Bases (VLDB’21). 2059–2072.

Digital Library

Cited By

Zhao XLi XZhao QYan BShi YKang J(2025)Integrated model and automatically designed solver for power system restorationApplied Soft Computing10.1016/j.asoc.2024.112525169(112525)Online publication date: Jan-2025
https://doi.org/10.1016/j.asoc.2024.112525
Baratchi MWang CLimmer Svan Rijn JHoos HBäck TOlhofer M(2024)Automated machine learning: past, present and futureArtificial Intelligence Review10.1007/s10462-024-10726-157:5Online publication date: 18-Apr-2024
https://doi.org/10.1007/s10462-024-10726-1
Zöller MLindauer MHuber M(2024)Auto-sktime: Automated Time Series ForecastingLearning and Intelligent Optimization10.1007/978-3-031-75623-8_35(456-471)Online publication date: 9-Jun-2024
https://dl.acm.org/doi/10.1007/978-3-031-75623-8_35
Show More Cited By

Index Terms

AutoML Loss Landscapes

Recommendations

AutoML: A Perspective where Industry Meets Academy
KDD '21: Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery & Data Mining

Machine learning methods have been adopted for various real-world applications, ranging from social networks, online image/video-sharing platforms, and e-commerce to education, healthcare, etc. However, several components of machine learning methods, ...
Ontology-based Meta AutoML

Automated machine learning (AutoML) supports ML engineers and data scientist by automating single tasks like model selection and hyperparameter optimization, automatically generating entire ML pipelines. This article presents a survey of 20 state-...
Local Optima in Diversity Optimization: Non-trivial Offspring Population is Essential
Parallel Problem Solving from Nature – PPSN XVIII
Abstract
The main goal of diversity optimization is to find a diverse set of solutions which satisfy some lower bound on their fitness. Evolutionary algorithms (EAs) are often used for such tasks, since they are naturally designed to optimize populations ...
$_{}_{}$ $_{}$

Comments

Information & Contributors

Information

Published In

cover image ACM Transactions on Evolutionary Learning and Optimization

ACM Transactions on Evolutionary Learning and Optimization Volume 2, Issue 3

September 2022

89 pages

EISSN:2688-3007

DOI:10.1145/3567468

Editors:
Juergen Branke
Warwick Business School, UK
,
Manuel López-Ibáñez
University of Málaga Spain

Issue’s Table of Contents

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 22 November 2022

Online AM: 02 September 2022

Accepted: 30 July 2022

Revised: 06 May 2022

Received: 18 August 2021

Published in TELO Volume 2, Issue 3

Check for updates

Author Tags

Qualifiers

Research-article
Refereed

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

10
Total Citations
View Citations
392
Total Downloads

Downloads (Last 12 months)100
Downloads (Last 6 weeks)14

Reflects downloads up to 03 Mar 2025

Other Metrics

View Author Metrics

Citations

Cited By

Zhao XLi XZhao QYan BShi YKang J(2025)Integrated model and automatically designed solver for power system restorationApplied Soft Computing10.1016/j.asoc.2024.112525169(112525)Online publication date: Jan-2025
https://doi.org/10.1016/j.asoc.2024.112525
Baratchi MWang CLimmer Svan Rijn JHoos HBäck TOlhofer M(2024)Automated machine learning: past, present and futureArtificial Intelligence Review10.1007/s10462-024-10726-157:5Online publication date: 18-Apr-2024
https://doi.org/10.1007/s10462-024-10726-1
Zöller MLindauer MHuber M(2024)Auto-sktime: Automated Time Series ForecastingLearning and Intelligent Optimization10.1007/978-3-031-75623-8_35(456-471)Online publication date: 9-Jun-2024
https://dl.acm.org/doi/10.1007/978-3-031-75623-8_35
Liefooghe ATanabe RVerel S(2024)Contrasting the Landscapes of Feature Selection Under Different Machine Learning ModelsParallel Problem Solving from Nature – PPSN XVIII10.1007/978-3-031-70055-2_22(360-376)Online publication date: 14-Sep-2024
https://dl.acm.org/doi/10.1007/978-3-031-70055-2_22
Kenny ARay TLimmer SSingh HRodemann TOlhofer M(2024)A Hierarchical Dissimilarity Metric for Automated Machine Learning Pipelines, and Visualizing Search BehaviourApplications of Evolutionary Computation10.1007/978-3-031-56855-8_7(115-129)Online publication date: 3-Mar-2024
https://dl.acm.org/doi/10.1007/978-3-031-56855-8_7
Eimer TLindauer MRaileanu RKrause ABrunskill ECho KEngelhardt BSabato SScarlett J(2023)Hyperparameters in reinforcement learning and how to tune themProceedings of the 40th International Conference on Machine Learning10.5555/3618408.3618774(9104-9149)Online publication date: 23-Jul-2023
https://dl.acm.org/doi/10.5555/3618408.3618774
Hoffmann MLasch R(2023)Tackling Industrial Downtimes with Artificial Intelligence in Data-Driven MaintenanceACM Computing Surveys10.1145/362337856:4(1-33)Online publication date: 23-Oct-2023
https://dl.acm.org/doi/10.1145/3623378
Chen DBuzdalov MDoerr CDang N(2023)Using Automated Algorithm Configuration for Parameter ControlProceedings of the 17th ACM/SIGEVO Conference on Foundations of Genetic Algorithms10.1145/3594805.3607127(38-49)Online publication date: 30-Aug-2023
https://dl.acm.org/doi/10.1145/3594805.3607127
Teixeira MPappa G(2023)On the Effect of Solution Representation and Neighborhood Definition in AutoML Fitness LandscapesEvolutionary Computation in Combinatorial Optimization10.1007/978-3-031-30035-6_15(227-243)Online publication date: 12-Apr-2023
https://dl.acm.org/doi/10.1007/978-3-031-30035-6_15
Teixeira MPappa G(2022)Analysis of Neutrality of AutoML Search Spaces with Local Optima NetworksIntelligent Systems10.1007/978-3-031-21686-2_33(473-487)Online publication date: 28-Nov-2022
https://dl.acm.org/doi/10.1007/978-3-031-21686-2_33

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Full Text

View this article in Full Text.

Figures

Tables

Media

View full text|Download PDF

View Issue’s Table of Contents