research-article

A majorization-minimization algorithm for (multiple) hyperparameter learning

Authors:

Chuan-Sheng Foo,

Andrew Y. NgAuthors Info & Claims

ICML '09: Proceedings of the 26th Annual International Conference on Machine Learning

Pages 321 - 328

https://doi.org/10.1145/1553374.1553415

Published: 14 June 2009 Publication History

Abstract

We present a general Bayesian framework for hyperparameter tuning in L₂-regularized supervised learning models. Paradoxically, our algorithm works by first analytically integrating out the hyperparameters from the model. We find a local optimum of the resulting non-convex optimization problem efficiently using a majorization-minimization (MM) algorithm, in which the non-convex problem is reduced to a series of convex L₂-regularized parameter estimation tasks. The principal appeal of our method is its simplicity: the updates for choosing the L₂-regularized subproblems in each step are trivial to implement (or even perform by hand), and each subproblem can be efficiently solved by adapting existing solvers. Empirical results on a variety of supervised learning models show that our algorithm is competitive with both grid-search and gradient-based algorithms, but is more efficient and far easier to implement.

References

[1]

Andersen, L. N., Larsen, J., Hansen, L. K., & Hintz-Madsen, M. (1997). Adaptive regularization of neural classifiers. Proceedings of the IEEE Workshop on Neural Networks for Signal Processing VII, Amelia Island, FL, USA (pp. 24--33).

[2]

Buntine, W. L., & Weigend, A. S. (1991). Bayesian back-propagation. Complex Systems, 5, 603--643.

[3]

Cawley, G. C., Talbot, N. L., & Girolami, M. (2007). Sparse multinomial logistic regression via Bayesian L1 regularisation. In B. Schöölkopf, J. Platt and T. Hoffman (Eds.), Advances in Neural Information Processing Systems 19, 209--216. Cambridge, MA: MIT Press.

[4]

Cawley, G. C., & Talbot, N. L. C. (2006). Gene selection in cancer classification using sparse logistic regression with Bayesian regularization. Bioinformatics, 22, 2348--2355.

Digital Library

[5]

Chapelle, O., Vapnik, V., Bousquet, O., & Mukherjee, S. (2002). Choosing multiple parameters for support vector machines. Machine Learning, 46, 131--159.

Digital Library

[6]

Delaney, A. H., & Bresler, Y. (1998). Globally convergent edge-preserving regularized reconstruction: an application to limited-angle tomography. IEEE Transactions on Image Processing, 7, 204--221.

Digital Library

[7]

Do, C. B., Foo, C.-S., & Ng, A. Y. (2008). Efficient multiple hyperparameter learning for log-linear models. In J. Platt, D. Koller, Y. Singer and S. Roweis (Eds.), Advances in Neural Information Processing Systems 20, 377--384. Cambridge, MA: MIT Press.

[8]

Do, C. B., Woods, D. A., & Batzoglou, S. (2006). CONTRAfold: RNA secondary structure prediction without physics-based models. Bioinformatics, 22, e90--e98.

Digital Library

[9]

Fazel, M., Hindi, H., & Boyd, S. P. (2003). Log-det heuristic for matrix rank minimization with applications to Hankel and Euclidean distance matrices. Proceedings of the 2003 American Control Conference (pp. 2156--2162 vol. 3).

[10]

Figueiredo, M. A. T. (2003). Adaptive sparseness for supervised learning. IEEE Transactions on Pattern Analysis and Machine Intelligence, 25, 1150--1159.

Digital Library

[11]

Glasmachers, T., & Igel, C. (2005). Gradient-based adaptation of general Gaussian kernels. Neural Computation, 17, 2099--2105.

Digital Library

[12]

Goutte, C., & Larsen, J. (1998). Adaptive regularization of neural networks using conjugate gradient. Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing, Seattle, WA, USA (pp. 1201--1204 vol. 2).

[13]

Griffiths-Jones, S., Bateman, A., Marshall, M., Khanna, A., & Eddy., S. R. (2003). Rfam: an RNA family database. Nucleic Acids Research, 31, 439--441.

[14]

Keerthi, S. S., Sindhwani, V., & Chapelle, O. (2007). An efficient method for gradient-based adaptation of hyperparameters in SVM models. In B. Schöölkopf, J. Platt and T. Hoffman (Eds.), Advances in Neural Information Processing Systems 19, 673--680. Cambridge, MA: MIT Press.

[15]

Lange, K., Hunter, D. R., & Y., I. (2000). Optimization transfer using surrogate objective functions. Journal of Computational and Graphical Statistics, 9, 1--59.

[16]

Larsen, J., Hansen, L. K., Svarer, C., & Ohlsson, M. (1996a). Design and regularization of neural networks: the optimal use of a validation set. Proceedings of the IEEE Workshop on Neural Networks for Signal Processing VI, Kyoto, Japan (pp. 62--71).

[17]

Larsen, J., Svarer, C., Andersen, L. N., & Hansen, L. K. (1996b). Adaptive regularization in neural network modeling. Neural Networks: Tricks of the Trade (pp. 113--132).

Digital Library

[18]

MacKay, D. J. C. (1992). Bayesian interpolation. Neural Computation, 4, 415--447.

Digital Library

[19]

Neal, R. M. (1996). Bayesian Learning for Neural Networks. Springer.

Digital Library

[20]

Tipping, M., & Faul, A. (2003). Fast marginal likelihood maximisation for sparse Bayesian models. In C. M. Bishop and B. J. Frey (Eds.), Proceedings of the Ninth International Workshop on Artificial Intelligence and Statistics, Key West, FL, USA.

[21]

Tipping, M. E. (2001). Sparse Bayesian learning and the relevance vector machine. Journal of Machine Learning Research, 1, 211--244.

Digital Library

[22]

Williams, P. M. (1995). Bayesian regularization and pruning using a Laplace prior. Neural Computation, 7, 117--143.

Digital Library

[23]

Yuille, A. L., & Rangarajan, A. (2003). The concave-convex procedure. Neural Computation, 15, 915--936.

Digital Library

Cited By

Sun FZhang QWang ZHou W(2024)Compressed sensing with log-sum heuristic recover for seismic denoisingFrontiers in Earth Science10.3389/feart.2023.128562211Online publication date: 9-Jan-2024
https://doi.org/10.3389/feart.2023.1285622
Deshmukh SDubey A(2020)Improved Covariance Matrix Estimation With an Application in Portfolio OptimizationIEEE Signal Processing Letters10.1109/LSP.2020.299606027(985-989)Online publication date: 2020
https://doi.org/10.1109/LSP.2020.2996060
Ogworonjo HAnderson JNguyen L(2017)An Iterative Parameter-Free MAP Algorithm With an Application to Forward Looking GPR ImagingIEEE Transactions on Geoscience and Remote Sensing10.1109/TGRS.2016.262750055:3(1573-1586)Online publication date: Mar-2017
https://doi.org/10.1109/TGRS.2016.2627500
Show More Cited By

Index Terms

Recommendations

Majorization-minimization for blind source separation of sparse sources

In this paper we propose the Majorization-Minimization Blind Spare Source Separation (MM-BSSS) algorithm for solving the blind source separation (BSS) problem when the source signals are known a priori to be sparse, or can be sparsely represented in ...
A general efficient hyperparameter-free algorithm for convolutional sparse learning
AAAI'17: Proceedings of the Thirty-First AAAI Conference on Artificial Intelligence

Structured sparse learning has become a popular and mature research field. Among all structured sparse models, we found an interesting fact that most structured sparse properties could be captured by convolution operators, most famous ones being total ...
An inexact proximal majorization-minimization algorithm for remote sensing image stripe noise removal
Abstract
The stripe noise existing in remote sensing images badly degrades the visual quality and restricts the precision of data analysis. Therefore, many destriping models have been proposed in recent years. In contrast to these existing models, in this ...

Comments

Information & Contributors

Information

Published In

cover image ACM Other conferences

ICML '09: Proceedings of the 26th Annual International Conference on Machine Learning

June 2009

1331 pages

ISBN:9781605585161

DOI:10.1145/1553374

General Chair:
Andrea Danyluk
Williams College
,
Program Chairs:
Léon Bottou
NEC Laboratories America
,
Michael Littman
Rutgers University

Copyright © 2009 Copyright 2009 by the author(s)/owner(s).

Sponsors

NSF
Microsoft Research: Microsoft Research
MITACS

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 14 June 2009

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Qualifiers

Research-article

Conference

ICML '09

Sponsor:

Microsoft Research

ICML '09: The 26th Annual International Conference on Machine Learning held in conjunction with the 2007 International Conference on Inductive Logic Programming

June 14 - 18, 2009

Quebec, Montreal, Canada

Acceptance Rates

Overall Acceptance Rate 140 of 548 submissions, 26%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

16
Total Citations
View Citations
306
Total Downloads

Downloads (Last 12 months)8
Downloads (Last 6 weeks)1

Reflects downloads up to 09 Feb 2025

Other Metrics

View Author Metrics

Citations

Cited By

Sun FZhang QWang ZHou W(2024)Compressed sensing with log-sum heuristic recover for seismic denoisingFrontiers in Earth Science10.3389/feart.2023.128562211Online publication date: 9-Jan-2024
https://doi.org/10.3389/feart.2023.1285622
Deshmukh SDubey A(2020)Improved Covariance Matrix Estimation With an Application in Portfolio OptimizationIEEE Signal Processing Letters10.1109/LSP.2020.299606027(985-989)Online publication date: 2020
https://doi.org/10.1109/LSP.2020.2996060
Ogworonjo HAnderson JNguyen L(2017)An Iterative Parameter-Free MAP Algorithm With an Application to Forward Looking GPR ImagingIEEE Transactions on Geoscience and Remote Sensing10.1109/TGRS.2016.262750055:3(1573-1586)Online publication date: Mar-2017
https://doi.org/10.1109/TGRS.2016.2627500
(2016)A Unified View of Nonconvex Heuristic Approach for Low-Rank and Sparse Structure LearningHandbook of Robust Low-Rank and Sparse Matrix Decomposition10.1201/b20190-14(13-1-13-19)Online publication date: 16-Jun-2016
https://doi.org/10.1201/b20190-14
Yu Zhang Cheung WJiming Liu (2015)A Unified Framework for Epidemic Prediction based on Poisson RegressionIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2015.243691827:11(2878-2892)Online publication date: 1-Nov-2015
https://dl.acm.org/doi/10.1109/TKDE.2015.2436918
Ogworonjo HAnderson JNguyen L(2015)A parameter-free MAP image reconstruction algorithm for impulse-based UWB ground penetrating radar2015 IEEE Global Conference on Signal and Information Processing (GlobalSIP)10.1109/GlobalSIP.2015.7418295(742-746)Online publication date: Dec-2015
https://doi.org/10.1109/GlobalSIP.2015.7418295
Wang SYang J(2015)A probabilistic model for latent least squares regressionNeurocomputing10.1016/j.neucom.2014.09.014149(1155-1161)Online publication date: Feb-2015
https://doi.org/10.1016/j.neucom.2014.09.014
Liu WMu CKumar SChang S(2014)Discrete graph hashingProceedings of the 28th International Conference on Neural Information Processing Systems - Volume 210.5555/2969033.2969208(3419-3427)Online publication date: 8-Dec-2014
https://dl.acm.org/doi/10.5555/2969033.2969208
Dilshan Godaliyadda GBuzzard GBouman C(2014)A model-based framework for fast dynamic image sampling2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)10.1109/ICASSP.2014.6853913(1822-1826)Online publication date: May-2014
https://doi.org/10.1109/ICASSP.2014.6853913
Deng YDeng Y(2014)Sparse Structure for Visual Information Sensing: Theory and AlgorithmsHigh-Dimensional and Low-Quality Visual Information Processing10.1007/978-3-662-44526-6_2(9-28)Online publication date: 5-Sep-2014
https://doi.org/10.1007/978-3-662-44526-6_2
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Table of Conten