skip to main content
10.1145/1553374.1553415acmotherconferencesArticle/Chapter ViewAbstractPublication PagesicmlConference Proceedingsconference-collections
research-article

A majorization-minimization algorithm for (multiple) hyperparameter learning

Published: 14 June 2009 Publication History

Abstract

We present a general Bayesian framework for hyperparameter tuning in L2-regularized supervised learning models. Paradoxically, our algorithm works by first analytically integrating out the hyperparameters from the model. We find a local optimum of the resulting non-convex optimization problem efficiently using a majorization-minimization (MM) algorithm, in which the non-convex problem is reduced to a series of convex L2-regularized parameter estimation tasks. The principal appeal of our method is its simplicity: the updates for choosing the L2-regularized subproblems in each step are trivial to implement (or even perform by hand), and each subproblem can be efficiently solved by adapting existing solvers. Empirical results on a variety of supervised learning models show that our algorithm is competitive with both grid-search and gradient-based algorithms, but is more efficient and far easier to implement.

References

[1]
Andersen, L. N., Larsen, J., Hansen, L. K., & Hintz-Madsen, M. (1997). Adaptive regularization of neural classifiers. Proceedings of the IEEE Workshop on Neural Networks for Signal Processing VII, Amelia Island, FL, USA (pp. 24--33).
[2]
Buntine, W. L., & Weigend, A. S. (1991). Bayesian back-propagation. Complex Systems, 5, 603--643.
[3]
Cawley, G. C., Talbot, N. L., & Girolami, M. (2007). Sparse multinomial logistic regression via Bayesian L1 regularisation. In B. Schöölkopf, J. Platt and T. Hoffman (Eds.), Advances in Neural Information Processing Systems 19, 209--216. Cambridge, MA: MIT Press.
[4]
Cawley, G. C., & Talbot, N. L. C. (2006). Gene selection in cancer classification using sparse logistic regression with Bayesian regularization. Bioinformatics, 22, 2348--2355.
[5]
Chapelle, O., Vapnik, V., Bousquet, O., & Mukherjee, S. (2002). Choosing multiple parameters for support vector machines. Machine Learning, 46, 131--159.
[6]
Delaney, A. H., & Bresler, Y. (1998). Globally convergent edge-preserving regularized reconstruction: an application to limited-angle tomography. IEEE Transactions on Image Processing, 7, 204--221.
[7]
Do, C. B., Foo, C.-S., & Ng, A. Y. (2008). Efficient multiple hyperparameter learning for log-linear models. In J. Platt, D. Koller, Y. Singer and S. Roweis (Eds.), Advances in Neural Information Processing Systems 20, 377--384. Cambridge, MA: MIT Press.
[8]
Do, C. B., Woods, D. A., & Batzoglou, S. (2006). CONTRAfold: RNA secondary structure prediction without physics-based models. Bioinformatics, 22, e90--e98.
[9]
Fazel, M., Hindi, H., & Boyd, S. P. (2003). Log-det heuristic for matrix rank minimization with applications to Hankel and Euclidean distance matrices. Proceedings of the 2003 American Control Conference (pp. 2156--2162 vol. 3).
[10]
Figueiredo, M. A. T. (2003). Adaptive sparseness for supervised learning. IEEE Transactions on Pattern Analysis and Machine Intelligence, 25, 1150--1159.
[11]
Glasmachers, T., & Igel, C. (2005). Gradient-based adaptation of general Gaussian kernels. Neural Computation, 17, 2099--2105.
[12]
Goutte, C., & Larsen, J. (1998). Adaptive regularization of neural networks using conjugate gradient. Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing, Seattle, WA, USA (pp. 1201--1204 vol. 2).
[13]
Griffiths-Jones, S., Bateman, A., Marshall, M., Khanna, A., & Eddy., S. R. (2003). Rfam: an RNA family database. Nucleic Acids Research, 31, 439--441.
[14]
Keerthi, S. S., Sindhwani, V., & Chapelle, O. (2007). An efficient method for gradient-based adaptation of hyperparameters in SVM models. In B. Schöölkopf, J. Platt and T. Hoffman (Eds.), Advances in Neural Information Processing Systems 19, 673--680. Cambridge, MA: MIT Press.
[15]
Lange, K., Hunter, D. R., & Y., I. (2000). Optimization transfer using surrogate objective functions. Journal of Computational and Graphical Statistics, 9, 1--59.
[16]
Larsen, J., Hansen, L. K., Svarer, C., & Ohlsson, M. (1996a). Design and regularization of neural networks: the optimal use of a validation set. Proceedings of the IEEE Workshop on Neural Networks for Signal Processing VI, Kyoto, Japan (pp. 62--71).
[17]
Larsen, J., Svarer, C., Andersen, L. N., & Hansen, L. K. (1996b). Adaptive regularization in neural network modeling. Neural Networks: Tricks of the Trade (pp. 113--132).
[18]
MacKay, D. J. C. (1992). Bayesian interpolation. Neural Computation, 4, 415--447.
[19]
Neal, R. M. (1996). Bayesian Learning for Neural Networks. Springer.
[20]
Tipping, M., & Faul, A. (2003). Fast marginal likelihood maximisation for sparse Bayesian models. In C. M. Bishop and B. J. Frey (Eds.), Proceedings of the Ninth International Workshop on Artificial Intelligence and Statistics, Key West, FL, USA.
[21]
Tipping, M. E. (2001). Sparse Bayesian learning and the relevance vector machine. Journal of Machine Learning Research, 1, 211--244.
[22]
Williams, P. M. (1995). Bayesian regularization and pruning using a Laplace prior. Neural Computation, 7, 117--143.
[23]
Yuille, A. L., & Rangarajan, A. (2003). The concave-convex procedure. Neural Computation, 15, 915--936.

Cited By

View all
  • (2024)Compressed sensing with log-sum heuristic recover for seismic denoisingFrontiers in Earth Science10.3389/feart.2023.128562211Online publication date: 9-Jan-2024
  • (2020)Improved Covariance Matrix Estimation With an Application in Portfolio OptimizationIEEE Signal Processing Letters10.1109/LSP.2020.299606027(985-989)Online publication date: 2020
  • (2017)An Iterative Parameter-Free MAP Algorithm With an Application to Forward Looking GPR ImagingIEEE Transactions on Geoscience and Remote Sensing10.1109/TGRS.2016.262750055:3(1573-1586)Online publication date: Mar-2017
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Other conferences
ICML '09: Proceedings of the 26th Annual International Conference on Machine Learning
June 2009
1331 pages
ISBN:9781605585161
DOI:10.1145/1553374

Sponsors

  • NSF
  • Microsoft Research: Microsoft Research
  • MITACS

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 14 June 2009

Permissions

Request permissions for this article.

Check for updates

Qualifiers

  • Research-article

Conference

ICML '09
Sponsor:
  • Microsoft Research

Acceptance Rates

Overall Acceptance Rate 140 of 548 submissions, 26%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)7
  • Downloads (Last 6 weeks)0
Reflects downloads up to 08 Feb 2025

Other Metrics

Citations

Cited By

View all
  • (2024)Compressed sensing with log-sum heuristic recover for seismic denoisingFrontiers in Earth Science10.3389/feart.2023.128562211Online publication date: 9-Jan-2024
  • (2020)Improved Covariance Matrix Estimation With an Application in Portfolio OptimizationIEEE Signal Processing Letters10.1109/LSP.2020.299606027(985-989)Online publication date: 2020
  • (2017)An Iterative Parameter-Free MAP Algorithm With an Application to Forward Looking GPR ImagingIEEE Transactions on Geoscience and Remote Sensing10.1109/TGRS.2016.262750055:3(1573-1586)Online publication date: Mar-2017
  • (2016)A Unified View of Nonconvex Heuristic Approach for Low-Rank and Sparse Structure LearningHandbook of Robust Low-Rank and Sparse Matrix Decomposition10.1201/b20190-14(13-1-13-19)Online publication date: 16-Jun-2016
  • (2015)A Unified Framework for Epidemic Prediction based on Poisson RegressionIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2015.243691827:11(2878-2892)Online publication date: 1-Nov-2015
  • (2015)A parameter-free MAP image reconstruction algorithm for impulse-based UWB ground penetrating radar2015 IEEE Global Conference on Signal and Information Processing (GlobalSIP)10.1109/GlobalSIP.2015.7418295(742-746)Online publication date: Dec-2015
  • (2015)A probabilistic model for latent least squares regressionNeurocomputing10.1016/j.neucom.2014.09.014149(1155-1161)Online publication date: Feb-2015
  • (2014)Discrete graph hashingProceedings of the 28th International Conference on Neural Information Processing Systems - Volume 210.5555/2969033.2969208(3419-3427)Online publication date: 8-Dec-2014
  • (2014)A model-based framework for fast dynamic image sampling2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)10.1109/ICASSP.2014.6853913(1822-1826)Online publication date: May-2014
  • (2014)Sparse Structure for Visual Information Sensing: Theory and AlgorithmsHigh-Dimensional and Low-Quality Visual Information Processing10.1007/978-3-662-44526-6_2(9-28)Online publication date: 5-Sep-2014
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media