research-article

Proximal regularization for online and batch learning

Authors:
Chuong B. Do

Stanford University, Stanford, CA

Stanford University, Stanford, CA
View Profile

,
Quoc V. Le

Stanford University, Stanford, CA

Stanford University, Stanford, CA
View Profile

,
Chuan-Sheng Foo

Institute for Infocomm Research, Singapore

Institute for Infocomm Research, Singapore
View Profile

ICML '09: Proceedings of the 26th Annual International Conference on Machine LearningJune 2009Pages 257–264https://doi.org/10.1145/1553374.1553407

Published:14 June 2009Publication History

ICML '09: Proceedings of the 26th Annual International Conference on Machine Learning

Pages 257–264

ABSTRACT

Many learning algorithms rely on the curvature (in particular, strong convexity) of regularized objective functions to provide good theoretical performance guarantees. In practice, the choice of regularization penalty that gives the best testing set performance may result in objective functions with little or even no curvature. In these cases, algorithms designed specifically for regularized objectives often either fail completely or require some modification that involves a substantial compromise in performance.

We present new online and batch algorithms for training a variety of supervised learning models (such as SVMs, logistic regression, structured prediction models, and CRFs) under conditions where the optimal choice of regularization parameter results in functions with low curvature. We employ a technique called proximal regularization, in which we solve the original learning problem via a sequence of modified optimization tasks whose objectives are chosen to have greater curvature than the original problem. Theoretically, our algorithms achieve low regret bounds in the online setting and fast convergence in the batch setting. Experimentally, our algorithms improve upon state-of-the-art techniques, including Pegasos and bundle methods, on medium and large-scale SVM and structured learning tasks.

References

Abernethy, J., Bartlett, P. L., Rakhlin, A., & Tewari, A. (2008). Optimal strategies and minimax lower bounds for online convex games. Proceedings of the 21st Annual Conference on Computational Learning Theory.Google Scholar
Bartlett, P., Hazan, E., & Rakhlin, A. (2008). Adaptive online gradient descent. In J. Platt, D. Koller, Y. Singer and S. Roweis (Eds.), Advances in Neural Information Processing Systems 20, 65--72. MIT Press.Google Scholar
Chapelle, O., Le, Q. V., & Smola, A. J. (2007). Large margin optimization of ranking measures. NIPS Workshop: Machine Learning for Web Search.Google Scholar
Do, C. B., Woods, D. A., & Batzoglou, S. (2006). CONTRAfold: RNA secondary structure prediction without physics-based models. Bioinformatics, 22, e90--e98. Google ScholarDigital Library
Hazan, E., Agarwal, A., & Kale, S. (2007). Logarithmic regret algorithms for online convex optimization. Mach Learn, 69, 169--192. Google ScholarDigital Library
Joachims, T. (2006). Training linear SVMs in linear time. Proceedings of the 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (pp. 217--226). Google ScholarDigital Library
Kiwiel, K. C. (1983). Proximity control in bundle methods for convex nondifferentiable minimization. Math Program, 27, 320--341.Google ScholarDigital Library
Lemaréchal, C., Nemirovskii, A., & Nesterov, Y. (1995). New variants of bundle methods. Math Program, 69, 111--147. Google ScholarDigital Library
Schramm, H., & Zowe, J. (1992). A version of the bundle idea for minimizing a nonsmooth function: conceptual idea, convergence analysis, numerical results. SIAM J Optim, 2, 121--152.Google ScholarDigital Library
Shalev-Shwartz, S., & Kakade, S. M. (2009). Mind the duality gap: Logarithmic regret algorithms for online optimization. In D. Koller, D. Schuurmans, Y. Bengio and L. Bottou (Eds.), Advances in Neural Information Processing Systems 21, 1457--1464. MIT Press.Google Scholar
Shalev-Shwartz, S., Singer, Y., & Srebro, N. (2007). Pegasos: Primal Estimated sub-GrAdient SOlver for SVM. Proceedings of the 24th Annual International Conference on Machine Learning (pp. 807--814). Google ScholarDigital Library
Smola, A., Vishwanathan, S. V. N., & Le, Q. (2008). Bundle methods for machine learning. In J. Platt, D. Koller, Y. Singer and S. Roweis (Eds.), Advances in Neural Information Processing Systems 20, 1377--1384. MIT Press.Google Scholar
Teo, C. H., Smola, A., Vishwanathan, S. V., & Le, Q. V. (2007). A scalable modular convex solver for regularized risk minimization. Proceedings of the 13th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (pp. 727--736). Google ScholarDigital Library
Zinkevich, M. (2003). Online convex programming and generalized infinitesimal gradient ascent. Proceedings of the 20th Annual International Conference on Machine Learning.Google Scholar

Index Terms

Proximal regularization for online and batch learning

Recommendations

Linearized symmetric multi-block ADMM with indefinite proximal regularization and optimal proximal parameter
Abstract
The proximal term plays a significant role in the literature of proximal Alternating Direction Method of Multipliers (ADMM), since (positive-definite or indefinite) proximal terms can promote convergence of ADMM and further simplify the involved ...
Read More
Damping proximal coordinate descent algorithm for non-convex regularization

Non-convex regularization has attracted much attention in the fields of machine learning, since it is unbiased and improves the performance on many applications compared with the convex counterparts. The optimization is important but difficult for non-...
Read More
Inexact Alternating Direction Methods of Multipliers with Logarithmic---Quadratic Proximal Regularization

In the literature, it was shown recently that the Douglas---Rachford alternating direction method of multipliers can be combined with the logarithmic-quadratic proximal regularization for solving a class of variational inequalities with separable ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
ICML '09: Proceedings of the 26th Annual International Conference on Machine Learning
June 2009
1331 pages
ISBN:9781605585161
DOI:10.1145/1553374
General Chair:
Andrea Danyluk
Williams College
,
Program Chairs:
Léon Bottou
NEC Laboratories America
,
Michael Littman
Rutgers University
Copyright © 2009 Copyright 2009 by the author(s)/owner(s).
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 14 June 2009
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Qualifiers
- research-article
Conference

Acceptance Rates
Overall Acceptance Rate140of548submissions,26%
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 19
  Total Citations
  View Citations
- 265
  Total Downloads
- Downloads (Last 12 months)17
- Downloads (Last 6 weeks)3
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Proximal regularization for online and batch learning

ICML '09: Proceedings of the 26th Annual International Conference on Machine Learning

ABSTRACT

References

Cited By

Index Terms

Recommendations

Linearized symmetric multi-block ADMM with indefinite proximal regularization and optimal proximal parameter

Damping proximal coordinate descent algorithm for non-convex regularization

Inexact Alternating Direction Methods of Multipliers with Logarithmic---Quadratic Proximal Regularization

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Qualifiers

Conference

Acceptance Rates

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

Proximal regularization for online and batch learning

ICML '09: Proceedings of the 26th Annual International Conference on Machine Learning

ABSTRACT

References

Cited By

Index Terms

Recommendations

Linearized symmetric multi-block ADMM with indefinite proximal regularization and optimal proximal parameter

Damping proximal coordinate descent algorithm for non-convex regularization

Inexact Alternating Direction Methods of Multipliers with Logarithmic---Quadratic Proximal Regularization

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Qualifiers

Conference

Acceptance Rates

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media