ABSTRACT
The classic Stochastic Approximation (SA) method achieves optimal rates under the black-box model. This optimality does not rule out better algorithms when more information about functions and data is available.
We present a family of Noise Adaptive Stochastic Approximation (NASA) algorithms for online convex optimization and stochastic convex optimization. NASA is an adaptive variant of Mirror Descent Stochastic Approximation. It is novel in its practical variation-dependent stepsizes and better theoretical guarantees. We show that comparing with state-of-the-art adaptive and non-adaptive SA methods, lower regrets and faster rates can be achieved under low-variation assumptions.
Supplemental Material
- P. Bartlett, E. Hazan, and A. Rakhlin. Adaptive online gradient descent. In Proceedings of NIPS, 2007.Google Scholar
- L. Bottou and N. Murata. Stochastic approximations and efficent learning. In M. A. Arbib, editor, The Handbook of Brain Theory and Neural Networks. Cambridge University Press, 2nd edition, 2002.Google Scholar
- J. Duchi, E. Hazan, and Y. Singer. Adaptive subgradient methods for online learning and stochastic optimization. In Proceedings of COLT, 2010.Google Scholar
- E. Hazan, A. Agarwal, A. Kalai, and S. Kale. Logarithmic regret algorithms for online convex optimization. In COLT, 2006. Google ScholarDigital Library
- C. Hu, J. Kwok, and W. Pan. Accelerated gradient methods for stochastic optimization and online learning. In NIPS, 2009.Google Scholar
- H. J. Kushner and G. G. Yin. Stochastic Approximation and Recursive Algorithms and Applications. Springer, 2nd edition, 2003.Google Scholar
- G. Lan. Efficient methods for stochastic composite optimization. Technical report, Georgia Institute of Technology, August 2008.Google Scholar
- H. B. McMahan and M. Streeter. Adaptive bound optimization for online convex optimization. In Proceedings of COLT, 2010.Google Scholar
- A. S. Nemirovski, A. Juditsky, G. Lan, and A. Shapiro. Robust stochastic approximation approach to stochastic programming. SIAM Journal on Optimization, 19(4):1574--1609, 2009. Google ScholarDigital Library
- A. S. Nemirovsky and D. B. Yudin. Problem Complexity and Method Efficiency in Optimization. John Wiley & Sons, 1983.Google Scholar
- Y. Nesterov. Introductory Lectures on Convex Optimization. Kluwer Academic Publishers, 2004.Google ScholarDigital Library
- H. Robbins and S. Monro. A stochastic approximation method. Annals of Mathematical Statistics, 22(3):400--407, 1951.Google ScholarCross Ref
- S. Shalev-Shwartz. Online Learning: Theory, Algorithms, and Applications. PhD thesis, The Hebrew University of Jerusalem, 2007.Google Scholar
- A. Shapiro, D. Dentcheva, and A. Ruszczynski. Lectures on Stochastic Programming: Modeling and Theory. SIAM, 2009.Google ScholarDigital Library
- N. Srebro, K. Sridharan, and A. Tewari. Smoothness, low-noise and fast rates. In NIPS, 2010.Google Scholar
- K. Sridharan, N. Srebro, and A. Tewari. On the universality of online mirror descent. In Proceedings of NIPS 24, 2011.Google Scholar
- M. Zinkevich. Online convex programming and generalized infinitesimal gradient ascent. In ICML, 2003.Google ScholarDigital Library
Index Terms
- NASA: achieving lower regrets and faster rates via adaptive stepsizes
Recommendations
A Linearly Convergent Variant of the Conditional Gradient Algorithm under Strong Convexity, with Applications to Online and Stochastic Optimization
Linear optimization is many times algorithmically simpler than nonlinear convex optimization. Linear optimization over matroid polytopes, matching polytopes, and path polytopes are examples of problems for which we have simple and efficient combinatorial ...
IQN: An Incremental Quasi-Newton Method with Local Superlinear Convergence Rate
The problem of minimizing an objective that can be written as the sum of a set of $n$ smooth and strongly convex functions is challenging because the cost of evaluating the function and its derivatives is proportional to the number of elements in the sum. ...
Descent direction method with line search for unconstrained optimization in noisy environment
A two-phase descent direction method for unconstrained stochastic optimization problem is proposed. A line-search method with an arbitrary descent direction is used to determine the step sizes during the initial phase, and the second phase performs the ...
Comments