Comprehensive analysis of gradient-based hyperparameter optimization algorithms

Bakhteev, O. Y.; Strijov, V. V.

doi:10.1007/s10479-019-03286-z

Comprehensive analysis of gradient-based hyperparameter optimization algorithms

S.I.: OR in Neuroscience II
Published: 03 June 2019

Volume 289, pages 51–65, (2020)
Cite this article

Annals of Operations Research Aims and scope Submit manuscript

O. Y. Bakhteev¹ &
V. V. Strijov^1,2

889 Accesses
11 Citations
7 Altmetric
1 Mention
Explore all metrics

Abstract

The paper investigates hyperparameter optimization problem. Hyperparameters are the parameters of model parameter distribution. The adequate choice of hyperparameter values prevents model overfit and allows it to obtain higher predictive performance. Neural network models with large amount of hyperparameters are analyzed. The hyperparameter optimization for models is computationally expensive. The paper proposes modifications of various gradient-based methods to simultaneously optimize many hyperparameters. The paper compares the experiment results with the random search. The main impact of the paper is hyperparameter optimization algorithms analysis for the models with high amount of parameters. To select precise and stable models the authors suggest to use two model selection criteria: cross-validation and evidence lower bound. The experiments show that the models optimized using the evidence lower bound give higher error rate than the models obtained using cross-validation. These models also show greater stability when data is noisy. The evidence lower bound usage is preferable when the model tends to overfit or when the cross-validation is computationally expensive. The algorithms are evaluated on regression and classification datasets.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Methods for Hyperparameters Optimization in Learning Approaches: An Overview

BHO-MA: Bayesian Hyperparameter Optimization with Multi-objective Acquisition

Hyperparameter Tuning and Optimization Applications

References

Arlot, S., & Celisse, A. (2010). A survey of cross-validation procedures for model selection. Statistics Surveys, 4, 40–79. https://doi.org/10.1214/09-SS054.
Article Google Scholar
Bakhteev, O. (2019). Pyfos: a library for hyperparameter optimization. https://git.io/fjZks. Accessed 1 May 2019.
Bengio, Y., & Grandvalet, Y. (2004). No unbiased estimator of the variance of k-fold cross-validation. Journal of machine learning research, 5, 1089–1105.
Google Scholar
Bergstra, J., & Bengio, Y. (2012). Random search for hyper-parameter optimization. Journal of Machine Learning Research, 13(Feb), 281–305.
Google Scholar
Bergstra, J., Breuleux, O., Bastien, F., Lamblin, P., Pascanu, R., Desjardins, G., Turian, J., Warde-Farley, D., & Bengio, Y. (2010). Theano: A cpu and gpu math expression compiler. In Proceedings of the Python for scientific computing conference (SciPy), vol. 4
Bergstra, J. S., Bardenet, R., Bengio, Y., & Kégl, B. (2011). Algorithms for hyper-parameter optimization. In: J. Shawe-Taylor, R. S. Zemel, P. L. Bartlett, F. Pereira, & K. Q. Weinberger (Eds.), Advances in neural information processing systems (pp. 2546–2554). Granada: Curran Associates, Inc. http://papers.nips.cc/paper/4443-algorithms-for-hyper-parameteroptimization.pdf.
Bishop, C. M. (2006). Pattern recognition and machine learning (information science and statistics). New York: Springer.
Google Scholar
Domke, J. (2012). Generic methods for optimization-based modeling. AISTATS, 22, 318–326.
Google Scholar
Farcomeni, A. (2010). Bayesian constrained variable selection. Statistica Sinica, 20(3), 1043–1062.
Google Scholar
Fu, J., Luo, H., Feng, J., Low, K.H., & Chua, T.S. (2016). Drmad: distilling reverse-mode automatic differentiation for optimizing hyperparameters of deep neural networks. In Proceedings of the twenty-fifth international joint conference on artificial intelligence, pp. 1469–1475
Graves, A. (2011). Practical variational inference for neural networks. Advances in Neural Information Processing Systems, 24, 2348–2356.
Google Scholar
Grünwald, P. (2005). A tutorial introduction to the minimum description length principle. In P. D. Grünwald, I. J. Myung, & M. A. Pitt (Eds.), Advances in minimum description length: theory and applications. Cambridge: MIT Press.
Chapter Google Scholar
Hoffman, M. D., Blei, D. M., Wang, C., & Paisley, J. (2013). Stochastic variational inference. Journal of machine learning research, 14(1), 1303–1347.
Google Scholar
Hutter, F., Hoos, H.H., & Leyton-Brown, K. (2011). Sequential model-based optimization for general algorithm configuration. In International conference on learning and intelligent optimization, pp. 507–523
Hwang, S., & Jeong, M. K. (2018). Robust relevance vector machine for classification with variational inference. Annals of Operations Research, 263(1–2), 21–43.
Article Google Scholar
Karaletsos, T., & Rätsch, G. (2015). Automatic relevance determination for deep generative models. Statistics, 1050, 26.
Google Scholar
Krstajic, D., Buturovic, L. J., Leahy, D. E., & Thomas, S. (2014). Cross-validation pitfalls when selecting and assessing regression and classification models. Journal of Cheminformatics, 6(1), 1–15.
Article Google Scholar
Kuznetsov, M., Tokmakova, A., & Strijov, V. (2016). Analytic and stochastic methods of structure parameter estimation. Informatica, 27(3), 607–624.
Article Google Scholar
Kwapisz, J. R., Weiss, G. M., & Moore, S. A. (2011). Activity recognition using cell phone accelerometers. ACM SigKDD Explorations Newsletter, 12(2), 74–82.
Article Google Scholar
LeCun, Y. (1998). The mnist database of handwritten digits. http://yann.lecun.com/exdb/mnist/. Accessed 1 May 2019.
Luketina, J., Berglund, M., Greff, K., & Raiko, T. (2016). Scalable gradient-based tuning of continuous regularization hyperparameters. In International conference on machine learning, pp. 2952–2960
MacKay, D. J. C. (2002). Information theory, inference and learning algorithms. New York: Cambridge University Press.
Google Scholar
Maclaurin, D., Duvenaud, D., & Adams, R. (2015). Gradient-based hyperparameter optimization through reversible learning. In Proceedings of the 32nd international conference on machine learning (ICML-15), pp. 2113–2122
Nystrup, P., Boyd, S., Lindström, E., & Madsen, H. (2018). Multi-period portfolio selection with drawdown control. Annals of Operations Research. https://doi.org/10.1007/s10479-018-2947-3.
Article Google Scholar
Oliphant, T.E. (2006). A guide to NumPy, vol. 1. Trelgol Publishing USA
Pedregosa, F. (2016). Hyperparameter optimization with approximate gradient. In Proceedings of the 33rd international conference on international conference on machine learning, vol. 48, pp. 737–746
Salakhutdinov, R., & Hinton, G. (2007). Learning a nonlinear embedding by preserving class neighbourhood structure. Journal of Machine Learning Research Proceedings Track, 2, 412–419.
Google Scholar
Salimans, T., Kingma, D. P., & Welling, M. (2015). Markov chain monte carlo and variational inference: Bridging the gap. ICML, 37, 1218–1226.
Google Scholar
Snoek, J., Rippel, O., Swersky, K., Kiros, R., Satish, N., Sundaram, N., Patwary, M.M.A., Prabhat, M., & Adams, R.P. (2015). Scalable bayesian optimization using deep neural networks. In: ICML, pp. 2171–2180
Strijov, V.V. (2014). Model genetation and selection for regression and classification problems (dsc thesis)
Sutskever, I., Vinyals, O., & Le, Q.V. (2014). Sequence to sequence learning with neural networks. In Advances in neural information processing systems 27: Annual conference on neural information processing systems 2014, December 8-13 2014, Montreal, Quebec, pp. 3104–3112
Vishnu, A., Narasimhan, J., Holder, L., Kerbyson, D.J., & Hoisie, A. (2015). Fast and accurate support vector machines on large scale systems. In 2015 IEEE international conference on cluster computing, CLUSTER 2015, Chicago, IL, September 8–11, 2015, pp. 110–119

Download references

Author information

Authors and Affiliations

Moscow Institute of Physics and Technology, Dolgoprudny, Russia
O. Y. Bakhteev & V. V. Strijov
FRCCSC of the Russian Academy of Sciences, Moscow, Russia
V. V. Strijov

Authors

O. Y. Bakhteev
View author publications
You can also search for this author in PubMed Google Scholar
V. V. Strijov
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to O. Y. Bakhteev.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

The research was made possible by Government of the Russian Federation (Agreement No. 05.Y09.21.0018) and FASIE Project No. 44116. This paper contains results of the project Statistical methods of machine learning, which is carried out within the framework of the Program “Center of Big Data Storage and Analysis” of the National Technology Initiative Competence Center. It is supported by the Ministry of Science and Higher Education of the Russian Federation according to the agreement between the M.V. Lomonosov Moscow State University and the Foundation of Project support of the National Technology Initiative from 11.12.2018, No 13/1251/2018.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Bakhteev, O.Y., Strijov, V.V. Comprehensive analysis of gradient-based hyperparameter optimization algorithms. Ann Oper Res 289, 51–65 (2020). https://doi.org/10.1007/s10479-019-03286-z

Download citation

Published: 03 June 2019
Issue Date: June 2020
DOI: https://doi.org/10.1007/s10479-019-03286-z

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Comprehensive analysis of gradient-based hyperparameter optimization algorithms

Abstract

Access this article

Similar content being viewed by others

Methods for Hyperparameters Optimization in Learning Approaches: An Overview

BHO-MA: Bayesian Hyperparameter Optimization with Multi-objective Acquisition

Hyperparameter Tuning and Optimization Applications

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Comprehensive analysis of gradient-based hyperparameter optimization algorithms

Abstract

Access this article

Similar content being viewed by others

Methods for Hyperparameters Optimization in Learning Approaches: An Overview

BHO-MA: Bayesian Hyperparameter Optimization with Multi-objective Acquisition

Hyperparameter Tuning and Optimization Applications

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation