skip to main content
10.1145/1553374.1553407acmotherconferencesArticle/Chapter ViewAbstractPublication PagesicmlConference Proceedingsconference-collections
research-article

Proximal regularization for online and batch learning

Published:14 June 2009Publication History

ABSTRACT

Many learning algorithms rely on the curvature (in particular, strong convexity) of regularized objective functions to provide good theoretical performance guarantees. In practice, the choice of regularization penalty that gives the best testing set performance may result in objective functions with little or even no curvature. In these cases, algorithms designed specifically for regularized objectives often either fail completely or require some modification that involves a substantial compromise in performance.

We present new online and batch algorithms for training a variety of supervised learning models (such as SVMs, logistic regression, structured prediction models, and CRFs) under conditions where the optimal choice of regularization parameter results in functions with low curvature. We employ a technique called proximal regularization, in which we solve the original learning problem via a sequence of modified optimization tasks whose objectives are chosen to have greater curvature than the original problem. Theoretically, our algorithms achieve low regret bounds in the online setting and fast convergence in the batch setting. Experimentally, our algorithms improve upon state-of-the-art techniques, including Pegasos and bundle methods, on medium and large-scale SVM and structured learning tasks.

References

  1. Abernethy, J., Bartlett, P. L., Rakhlin, A., & Tewari, A. (2008). Optimal strategies and minimax lower bounds for online convex games. Proceedings of the 21st Annual Conference on Computational Learning Theory.Google ScholarGoogle Scholar
  2. Bartlett, P., Hazan, E., & Rakhlin, A. (2008). Adaptive online gradient descent. In J. Platt, D. Koller, Y. Singer and S. Roweis (Eds.), Advances in Neural Information Processing Systems 20, 65--72. MIT Press.Google ScholarGoogle Scholar
  3. Chapelle, O., Le, Q. V., & Smola, A. J. (2007). Large margin optimization of ranking measures. NIPS Workshop: Machine Learning for Web Search.Google ScholarGoogle Scholar
  4. Do, C. B., Woods, D. A., & Batzoglou, S. (2006). CONTRAfold: RNA secondary structure prediction without physics-based models. Bioinformatics, 22, e90--e98. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Hazan, E., Agarwal, A., & Kale, S. (2007). Logarithmic regret algorithms for online convex optimization. Mach Learn, 69, 169--192. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Joachims, T. (2006). Training linear SVMs in linear time. Proceedings of the 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (pp. 217--226). Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Kiwiel, K. C. (1983). Proximity control in bundle methods for convex nondifferentiable minimization. Math Program, 27, 320--341.Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Lemaréchal, C., Nemirovskii, A., & Nesterov, Y. (1995). New variants of bundle methods. Math Program, 69, 111--147. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Schramm, H., & Zowe, J. (1992). A version of the bundle idea for minimizing a nonsmooth function: conceptual idea, convergence analysis, numerical results. SIAM J Optim, 2, 121--152.Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Shalev-Shwartz, S., & Kakade, S. M. (2009). Mind the duality gap: Logarithmic regret algorithms for online optimization. In D. Koller, D. Schuurmans, Y. Bengio and L. Bottou (Eds.), Advances in Neural Information Processing Systems 21, 1457--1464. MIT Press.Google ScholarGoogle Scholar
  11. Shalev-Shwartz, S., Singer, Y., & Srebro, N. (2007). Pegasos: Primal Estimated sub-GrAdient SOlver for SVM. Proceedings of the 24th Annual International Conference on Machine Learning (pp. 807--814). Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Smola, A., Vishwanathan, S. V. N., & Le, Q. (2008). Bundle methods for machine learning. In J. Platt, D. Koller, Y. Singer and S. Roweis (Eds.), Advances in Neural Information Processing Systems 20, 1377--1384. MIT Press.Google ScholarGoogle Scholar
  13. Teo, C. H., Smola, A., Vishwanathan, S. V., & Le, Q. V. (2007). A scalable modular convex solver for regularized risk minimization. Proceedings of the 13th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (pp. 727--736). Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Zinkevich, M. (2003). Online convex programming and generalized infinitesimal gradient ascent. Proceedings of the 20th Annual International Conference on Machine Learning.Google ScholarGoogle Scholar

Index Terms

  1. Proximal regularization for online and batch learning

                  Recommendations

                  Comments

                  Login options

                  Check if you have access through your login credentials or your institution to get full access on this article.

                  Sign in
                  • Published in

                    cover image ACM Other conferences
                    ICML '09: Proceedings of the 26th Annual International Conference on Machine Learning
                    June 2009
                    1331 pages
                    ISBN:9781605585161
                    DOI:10.1145/1553374

                    Copyright © 2009 Copyright 2009 by the author(s)/owner(s).

                    Publisher

                    Association for Computing Machinery

                    New York, NY, United States

                    Publication History

                    • Published: 14 June 2009

                    Permissions

                    Request permissions about this article.

                    Request Permissions

                    Check for updates

                    Qualifiers

                    • research-article

                    Acceptance Rates

                    Overall Acceptance Rate140of548submissions,26%

                  PDF Format

                  View or Download as a PDF file.

                  PDF

                  eReader

                  View online with eReader.

                  eReader