skip to main content
research-article

AutoML Loss Landscapes

Published: 22 November 2022 Publication History

Abstract

As interest in machine learning and its applications becomes more widespread, how to choose the best models and hyper-parameter settings becomes more important. This problem is known to be challenging for human experts, and consequently, a growing number of methods have been proposed for solving it, giving rise to the area of automated machine learning (AutoML). Many of the most popular AutoML methods are based on Bayesian optimization, which makes only weak assumptions about how modifying hyper-parameters effects the loss of a model. This is a safe assumption that yields robust methods, as the AutoML loss landscapes that relate hyper-parameter settings to loss are poorly understood. We build on recent work on the study of one-dimensional slices of algorithm configuration landscapes by introducing new methods that test n-dimensional landscapes for statistical deviations from uni-modality and convexity, and we use them to show that a diverse set of AutoML loss landscapes are highly structured. We introduce a method for assessing the significance of hyper-parameter partial derivatives, which reveals that most (but not all) AutoML loss landscapes only have a small number of hyper-parameters that interact strongly. To further assess hyper-parameter interactions, we introduce a simplistic optimization procedure that assumes each hyper-parameter can be optimized independently, a single time in sequence, and we show that it obtains configurations that are statistically tied with optimal in all of the n-dimensional AutoML loss landscapes that we studied. Our results suggest many possible new directions for substantially improving the state of the art in AutoML.

References

[1]
Tinus Abell, Yuri Malitsky, and Kevin Tierney. 2012. Fitness Landscape Based Features for Exploiting Black-Box Optimization Problem Structure. IT University of Copenhagen.
[2]
Christof Angermueller, Tanel Pärnamaa, Leopold Parts, and Oliver Stegle. 2016. Deep learning for computational biology. Molecular Systems Biology 12, 7 (2016), 878.
[3]
Nacim Belkhir, Johann Dréo, Pierre Savéant, and Marc Schoenauer. 2016. Feature based algorithm configuration: A case study with differential evolution. In Proceedings of the 14th International Conference on Parallel Problem Solving from Nature (PPSN’16). 156–166.
[4]
Mikhail Belkin, Daniel Hsu, Siyuan Ma, and Soumik Mandal. 2019. Reconciling modern machine-learning practice and the classical bias–variance trade-off. Proceedings of the National Academy of Sciences 116, 32 (2019), 15849–15854.
[5]
James S. Bergstra, Rémi Bardenet, Yoshua Bengio, and Balázs Kégl. 2011. Algorithms for hyper-parameter optimization. In Proceedings of the 25th Conference on Neural Information Processing Systems (NeurIPS’11). 2546–2554.
[6]
James S. Bergstra and Yoshua Bengio. 2012. Random search for hyper-parameter optimization. Journal of Machine Learning Research 13, 10 (2012), 281–305.
[7]
André Biedenkapp, Marius Lindauer, Katharina Eggensperger, Frank Hutter, Chris Fawcett, and Holger H. Hoos. 2017. Efficient parameter importance analysis via ablation with surrogates. In Proceedings of the 31st AAAI Conference on Artificial Intelligence (AAAI’17).
[8]
André Biedenkapp, Joshua Marben, Marius Lindauer, and Frank Hutter. 2018. CAVE: Configuration assessment, visualization and evaluation. In Proceedings of the 12th International Conference on Learning and Intelligent Optimization (LION’18).
[9]
Leo Breiman. 2001. Random forests. Machine Learning 45, 1 (2001), 5–32.
[10]
Tianqi Chen and Carlos Guestrin. 2016. XGBoost: A scalable tree boosting system. In Proceedings of the 22nd ACM International Conference on Knowledge Discovery and Data Mining (SIGKDD’16). 785–794.
[11]
Andrea Coraddu, Luca Oneto, Aessandro Ghio, Stefano Savio, Davide Anguita, and Massimo Figari. 2016. Machine learning approaches for improving condition-based maintenance of naval propulsion plants. Proceedings of the Institution of Mechanical Engineers, Part M: Journal of Engineering for the Maritime Environment 230, 1 (2016), 136–153.
[12]
Corinna Cortes and Vladimir Vapnik. 1995. Support-vector networks. Machine Learning 20, 3 (1995), 273–297.
[13]
Pierre Simon de Laplace. 1820. Théorie analytique des probabilités. Vol. 7. Courcier.
[14]
Bilel Derbel, Arnaud Liefooghe, Sébastien Vérel, Hernan Aguirre, and Kiyoshi Tanaka. 2019. New features for continuous exploratory landscape analysis based on the SOO tree. In Proceedings of the 15th ACM/SIGEVO Conference on Foundations of Genetic Algorithms (FOGA’19). 72–86.
[15]
Dheeru Dua and Casey Graff. 2017. UCI Machine Learning Repository. Retrieved September 4, 2022 from http://archive.ics.uci.edu/ml.
[16]
Katharina Eggensperger, Matthias Feurer, Frank Hutter, James S. Bergstra, Jasper Snoek, Holger H. Hoos, and Kevin Leyton-Brown. 2013. Towards an empirical foundation for assessing Bayesian optimization of hyperparameters. In Proceedings of the NeurIPS Workshop on Bayesian Optimization in Theory and Practice.
[17]
Bashkansky Emil and Gadrich Tamar. 2013. Some statistical aspects of binary measuring systems. Measurement 46, 6 (2013), 1922–1927.
[18]
Stefan Falkner, Aaron Klein, and Frank Hutter. 2018. BOHB: Robust and efficient hyperparameter optimization at scale. In Proceedings of the 35th International Conference on Machine Learning (ICML’18). 1437–1446.
[19]
Chris Fawcett and Holger H. Hoos. 2016. Analysing differences between algorithm configurations through ablation. Journal of Heuristics 22, 4 (2016), 431–458.
[20]
Matthias Feurer, Aaron Klein, Katharina Eggensperger, Jost Springenberg, Manuel Blum, and Frank Hutter. 2015. Efficient and robust automated machine learning. In Proceedings of the 29th Conference on Neural Information Processing Systems (NeurIPS’15). 2962–2970.
[21]
Matthias Feurer, Aaron Klein, Katharina Eggensperger, Jost T. Springenberg, Manuel Blum, and Frank Hutter. 2019. Auto-sklearn: Efficient and robust automated machine learning. In Automated Machine Learning. Springer, Cham, Switzerland, 113–134.
[22]
Nicolo Fusi, Rishit Sheth, and Melih Elibol. 2018. Probabilistic matrix factorization for automated machine learning. In Proceedings of the 32nd Conference on Neural Information Processing Systems (NeurIPS’18). 3352–3361.
[23]
Unai Garciarena, Roberto Santana, and Alexander Mendiburu. 2018. Analysis of the complexity of the automatic pipeline generation problem. In Proceedings of the IEEE Congress on Evolutionary Computation (CEC’18). IEEE, Los Alamitos, CA, 1–8.
[24]
Franz Graf, Hans-Peter Kriegel, Matthias Schubert, Sebastian Pölsterl, and Alexander Cavallaro. 2011. 2D image registration in CT images using radial image descriptors. In Proceedings of the 14th International Conference on Medical Image Computing and Computer-Assisted Intervention (MICCAI’11). 607–614.
[25]
George T. Hall, Pietro S. Oliveto, and Dirk Sudholt. 2020. Fast perturbative algorithm configurators. In Proceedings of the 16th International Conference on Parallel Problem Solving from Nature (PPSN’20). 19–32.
[26]
Matthew Hoffman, Francis R. Bach, and David M. Blei. 2010. Online learning for latent Dirichlet allocation. In Proceedings of the 24th Conference on Neural Information Processing Systems (NeurIPS’10). 856–864.
[27]
Holger H. Hoos and Thomas Stützle. 2005. Stochastic Local Search: Foundations & Applications. Morgan Kaufmann.
[28]
Frank Hutter, Holger H. Hoos, and Kevin Leyton-Brown. 2013. Identifying key algorithm parameters and instance features using forward selection. In Proceedings of the 7th International Conference on Learning and Intelligent Optimization (LION’13). 364–381.
[29]
Frank Hutter, Holger H. Hoos, and Kevin Leyton-Brown. 2014a. An efficient approach for assessing hyperparameter importance. In Proceedings of the 31st International Conference on Machine Learning (ICML’14). 754–762.
[30]
Frank Hutter, Manuel López-Ibáñez, Chris Fawcett, Marius Lindauer, Holger H. Hoos, Kevin Leyton-Brown, and Thomas Stützle. 2014b. AClib: A benchmark library for algorithm configuration. In Proceedings of the 14th International Conference on Learning and Intelligent Optimization (LION’14). 36–40.
[31]
Terry Jones and Stephanie Forrest. 1995. Fitness distance correlation as a measure of problem difficulty for genetic algorithms. In Proceedings of the 6th International Conference on Genetic Algorithms (ICGA’95), Vol. 95. 184–192.
[32]
Kirthevasan Kandasamy, Gautam Dasarathy, Jeff Schneider, and Barnabás Póczos. 2017. Multi-fidelity Bayesian optimisation with continuous approximations. In Proceedings of the 34th International Conference on Machine Learning (ICML’17). 1799–1808.
[33]
Kirthevasan Kandasamy, Jeff Schneider, and Barnabás Póczos. 2015. High dimensional Bayesian optimisation and bandits via additive models. In Proceedings of the 32nd International Conference on Machine Learning (ICML’15). 295–304.
[34]
Pascal Kerschke, Holger H. Hoos, Frank Neumann, and Heike Trautmann. 2019. Automated algorithm selection: Survey and perspectives. Evolutionary Computation 27, 1 (2019), 3–45.
[35]
Jack Kiefer. 1953. Sequential minimax search for a maximum. Proceedings of the American Mathematical Society 4, 3 (1953), 502–506.
[36]
Aaron Klein and Frank Hutter. 2019. Tabular benchmarks for joint architecture and hyperparameter optimization. arXiv preprint arXiv:1905.04970.
[37]
Alex Krizhevsky, Ilya Sutskever, and Geoffrey E. Hinton. 2012. ImageNet classification with deep convolutional neural networks. In Proceedings of the 26th Conference on Neural Information Processing Systems (NeurIPS’12). 1097–1105.
[38]
Yann LeCun, Yoshua Bengio, and Geoffrey Hinton. 2015. Deep learning. Nature 521, 7553 (2015), 436–444.
[39]
Yann LeCun, Léon Bottou, Yoshua Bengio, and Patrick Haffner. 1998. Gradient-based learning applied to document recognition. Proceedings of the IEEE 86, 11 (1998), 2278–2324.
[40]
Lisha Li, Kevin Jamieson, Giulia DeSalvo, Afshin Rostamizadeh, and Ameet Talwalkar. 2017. Hyperband: A novel bandit-based approach to hyperparameter optimization. Journal of Machine Learning Research 18, 1 (2017), 6765–6816.
[41]
Liam Li, Kevin Jamieson, Afshin Rostamizadeh, Ekaterina Gonina, Jonathan Ben-Tzur, Moritz Hardt, Benjamin Recht, and Ameet Talwalkar. 2020. A system for massively parallel hyperparameter tuning. In Proceedings of Machine Learning and Systems (MLSys’20), Vol. 2. 230–246.
[42]
Katherine M. Malan. 2018. Landscape-aware constraint handling applied to differential evolution. In Proceedings of the 7th International Conference on Theory and Practice of Natural Computing (TPNC’18). 176–187.
[43]
Katherine M. Malan. 2021. A survey of advances in landscape analysis for optimisation. Algorithms 14, 2 (2021), 40.
[44]
Katherine M. Malan and Andries P. Engelbrecht. 2013. A survey of techniques for characterising fitness landscapes and some possible ways forward. Information Sciences 241 (2013), 148–163.
[45]
Olaf Mersmann, Bernd Bischl, Heike Trautmann, Mike Preuss, Claus Weihs, and Günter Rudolph. 2011. Exploratory landscape analysis. In Proceedings of the 13th Annual Conference on Genetic and Evolutionary Computation (GECCO’11). 829–836.
[46]
Kevin Miller, M. Pawan Kumar, Ben Packer, Danny Goodman, and Daphne Koller. 2012. Max-margin min-entropy models. In Proceedings of the 15th International Conference on Artificial Intelligence and Statistics (AIStats’12). 779–787.
[47]
Matheus Nunes, Paulo M. Fraga, and Gisele L. Pappa. 2021. Fitness landscape analysis of graph neural network architecture search spaces. In Proceedings of the 23rd International Genetic and Evolutionary Computation Conference (GECCO’21). 876–884.
[48]
Cristiano G. Pimenta, Alex G. C. de Sá, Gabriela Ochoa, and Gisele L. Pappa. 2020. Fitness landscape analysis of automated machine learning search spaces. In Proceedings of the 20th European Conference on Evolutionary Computation in Combinatorial Optimization (EVOCOP’20). 114–130.
[49]
Erik Pitzer and Michael Affenzeller. 2012. A comprehensive survey on fitness landscape analysis. In Recent Advances in Intelligent Engineering Systems. Studies in Computational Intelligence, Vol. 378. Springer, 161–191.
[50]
Yasha Pushak and Holger H. Hoos. 2018. Algorithm configuration landscapes: More benign than expected? In Proceedings of the 15th International Conference on Parallel Problem Solving from Nature (PPSN’18). 271–283.
[51]
Yasha Pushak and Holger H. Hoos. 2020. Golden parameter search: Exploiting structure to quickly configure parameters in parallel. In Proceedings of the 22nd International Genetic and Evolutionary Computation Conference (GECCO’20).
[52]
Anna Rakitianskaia, Eduan Bekker, Katherine M. Malan, and Andries Engelbrecht. 2016. Analysis of error landscapes in multi-layered neural networks for classification. In Proceedings of the IEEE Congress on Evolutionary Computation (CEC’16). IEEE, Los Alamitos, CA, 5270–5277.
[53]
Prashant S. Rana. 2013. Physicochemical Properties of Protein Tertiary Structure Data Set. Retrieved September 4, 2022 from https://archive.ics.uci.edu/ml/datasets/Physicochemical+Properties+of+Protein+Tertiary+Structure.
[54]
Christian M. Reidys and Peter F. Stadler. 2001. Neutrality in fitness landscapes. Applied Mathematics and Computation 117, 2-3 (2001), 321–350.
[55]
Nuno M. Rodrigues, Sara Silva, and Leonardo Vanneschi. 2020. A study of fitness landscapes for neuroevolution. In Proceedings of the IEEE Congress on Evolutionary Computation (CEC’20). IEEE, Los Alamitos, CA, 1–8.
[56]
Samuel S. Shapiro and Martin B. Wilk. 1965. An analysis of variance test for normality (complete samples). Biometrika 52, 3/4 (1965), 591–611.
[57]
David Silver, Aja Huang, Chris J. Maddison, Arthur Guez, Laurent Sifre, George van den Driessche, Julian Schrittwieser, et al. 2016. Mastering the game of Go with deep neural networks and tree search. Nature 529 (2016), 484–489.
[58]
Jasper Snoek, Hugo Larochelle, and Ryan P. Adams. 2012. Practical Bayesian optimization of machine learning algorithms. In Proceedings of the 26th Conference on Neural Information Processing Systems (NeurIPS’12). 2951–2959.
[59]
Jost T. Springenberg, Aaron Klein, Stefan Falkner, and Frank Hutter. 2016. Bayesian optimization with robust Bayesian neural networks. In Proceedings of the 30th Conference on Neural Information Processing Systems (NeurIPS’16). 4134–4142.
[60]
Athanasios Tsanas, Max A. Little, Patrick E. McSharry, and Lorraine O. Ramig. 2010. Accurate telemonitoring of Parkinson’s disease progression by noninvasive speech tests. IEEE Transactions on Biomedical Engineering 57, 4 (2010), 884–893.
[61]
Willem A. van Aardt, Anna S. Bosman, and Katherine M. Malan. 2017. Characterising neutrality in neural network error landscapes. In Proceedings of the IEEE Congress on Evolutionary Computation (CEC’17). IEEE, Los Alamitos, CA, 1374–1381.
[62]
Jean-Paul Watson. 2010. An introduction to fitness landscape analysis and cost models for local search. In Handbook of Metaheuristics. Springer International, 599–623.
[63]
Sewall Wright. 1932. The roles of mutation, inbreeding, crossbreeding, and selection in evolution. Proceedings of the 11th International Congress of Genetics 8 (1932), 209–222.
[64]
Stephen J. Wright. 2015. Coordinate descent algorithms. Mathematical Programming 151, 1 (2015), 3–34.
[65]
Anatoly Yakovlev, Hesam F. Moghadam, Ali Moharrer, Jingxiao Cai, Nikan Chavoshi, Venkatanathan Varadarajan, Sandeep R. Agrawal, et al. 2020. Oracle AutoML: A fast and predictive AutoML pipeline. Proceedings of VLDB Endowment 13, 12 (2020), 3166–3180.
[66]
Chengrun Yang, Yuji Akimoto, Dae W. Kim, and Madeleine Udell. 2019. OBOE: Collaborative filtering for AutoML model selection. In Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (SIGKDD’19). 1173–1183.
[67]
Fatjon Zogaj, José Pablo Cambronero, Martin C. Rinard, and Jürgen Cito. 2021. Doing more with less: Characterizing dataset downsampling for AutoML. In Proceedings of the 47th International Conference on Very Large Data Bases (VLDB’21). 2059–2072.

Cited By

View all

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Transactions on Evolutionary Learning and Optimization
ACM Transactions on Evolutionary Learning and Optimization  Volume 2, Issue 3
September 2022
89 pages
EISSN:2688-3007
DOI:10.1145/3567468
Issue’s Table of Contents

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 22 November 2022
Online AM: 02 September 2022
Accepted: 30 July 2022
Revised: 06 May 2022
Received: 18 August 2021
Published in TELO Volume 2, Issue 3

Check for updates

Author Tags

  1. Landscape analysis
  2. AutoML
  3. hyper-parameter optimization

Qualifiers

  • Research-article
  • Refereed

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)100
  • Downloads (Last 6 weeks)14
Reflects downloads up to 03 Mar 2025

Other Metrics

Citations

Cited By

View all
  • (2025)Integrated model and automatically designed solver for power system restorationApplied Soft Computing10.1016/j.asoc.2024.112525169(112525)Online publication date: Jan-2025
  • (2024)Automated machine learning: past, present and futureArtificial Intelligence Review10.1007/s10462-024-10726-157:5Online publication date: 18-Apr-2024
  • (2024)Auto-sktime: Automated Time Series ForecastingLearning and Intelligent Optimization10.1007/978-3-031-75623-8_35(456-471)Online publication date: 9-Jun-2024
  • (2024)Contrasting the Landscapes of Feature Selection Under Different Machine Learning ModelsParallel Problem Solving from Nature – PPSN XVIII10.1007/978-3-031-70055-2_22(360-376)Online publication date: 14-Sep-2024
  • (2024)A Hierarchical Dissimilarity Metric for Automated Machine Learning Pipelines, and Visualizing Search BehaviourApplications of Evolutionary Computation10.1007/978-3-031-56855-8_7(115-129)Online publication date: 3-Mar-2024
  • (2023)Hyperparameters in reinforcement learning and how to tune themProceedings of the 40th International Conference on Machine Learning10.5555/3618408.3618774(9104-9149)Online publication date: 23-Jul-2023
  • (2023)Tackling Industrial Downtimes with Artificial Intelligence in Data-Driven MaintenanceACM Computing Surveys10.1145/362337856:4(1-33)Online publication date: 23-Oct-2023
  • (2023)Using Automated Algorithm Configuration for Parameter ControlProceedings of the 17th ACM/SIGEVO Conference on Foundations of Genetic Algorithms10.1145/3594805.3607127(38-49)Online publication date: 30-Aug-2023
  • (2023)On the Effect of Solution Representation and Neighborhood Definition in AutoML Fitness LandscapesEvolutionary Computation in Combinatorial Optimization10.1007/978-3-031-30035-6_15(227-243)Online publication date: 12-Apr-2023
  • (2022)Analysis of Neutrality of AutoML Search Spaces with Local Optima NetworksIntelligent Systems10.1007/978-3-031-21686-2_33(473-487)Online publication date: 28-Nov-2022

View Options

Login options

Full Access

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Full Text

View this article in Full Text.

Full Text

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media