Skip to main content

Adaptive Sampling for Incremental Optimization Using Stochastic Gradient Descent

  • Conference paper
  • First Online:
Algorithmic Learning Theory (ALT 2015)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 9355))

Included in the following conference series:

Abstract

A wide collection of popular statistical learning methods, ranging from K-means to Support Vector Machines through Neural Networks, can be formulated as a stochastic gradient descent (SGD) algorithm in a specific setup. In practice, the main limitation of this incremental optimization technique is due to the stochastic noise induced by the choice at random of the data involved in the gradient estimate computation at each iteration. It is the purpose of this paper to introduce a novel implementation of the SGD algorithm, where the data subset used at a given step is not picked uniformly at random among all possible subsets but drawn from a specific adaptive sampling scheme, depending on the past iterations in a Markovian manner, in order to refine the current statistical estimation of the gradient. Beyond an algorithmic description of the approach we propose, rate bounds are established and illustrative numerical results are displayed in order to provide theoretical and empirical evidence of its statistical performance, compared to more “naive” SGD implementations. Computational issues are also discussed at length, revealing the practical advantages of the method promoted.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Nemirovski, A., Juditsky, A., Lan, G., Shapiro, A.: Robust stochastic approximation approach to stochastic programming. SIAM J. Optim. (2009)

    Google Scholar 

  2. Defazio, A., Bach, F., Lacoste-Julien, S.: SAGA: A Fast Incremental Gradient Method With Support for Non-Strongly Convex Composite Objectives (2014)

    Google Scholar 

  3. Devroye, L.: Non-uniform random variate generation (1986)

    Google Scholar 

  4. Needell, D., Srebro, N., Ward, R.: Stochastic gradient descent, weighted sampling and the randomized kaczmarz algorithm

    Google Scholar 

  5. Bach, F., Moulines, E.: Non-asymptotic analysis of stochastic approximation algorithms for machine learning. In: NIPS, pp. 451–459 (2011)

    Google Scholar 

  6. Fort, G.: Central limit theorems for stochastic approximation with controlled Markov Chain. EsaimPS (2014)

    Google Scholar 

  7. Robbins, H., Monro, S.: A stochastic approximation method. Ann. Math. Statist. 22 (1951)

    Google Scholar 

  8. Bottou, L.: Online algorithms and stochastic approximations. In: Online Learning and Neural Networks

    Google Scholar 

  9. Bottou, L.: Stochastic gradient tricks. In: Neural Networks, Tricks of the Trade, Reloaded (2012)

    Google Scholar 

  10. Mairal, J.: Incremental Majorization-Minimization Optimization with Application to Large-Scale Machine Learning (2014)

    Google Scholar 

  11. Nesterov, Y., Nesterov, I.U.E.: Introductory Lectures on Convex Optimization: A Basic Course. Applied Optimization. Springer (2004)

    Google Scholar 

  12. Pelletier, M.: Weak convergence rates for stochastic approximation with application to multiple targets and simulated annealing. Ann. Appl. Prob (1998)

    Google Scholar 

  13. Johnson, R., Zhang, T.: Accelerating stochastic gradient descent using predictive variance reduction. In: NIPS, pp. 315–323 (2013)

    Google Scholar 

  14. Schmidt, M.W., Le Roux, N., Bach, F.: Minimizing finite sums with the stochastic average gradient. CoRR (2013)

    Google Scholar 

  15. Clemencon, S., Bertail, P., Chautru, E.: Scaling up m-estimation via sampling designs: The horvitz-thompson stochastic gradient descent. In: 2014 IEEE Big Data (2014)

    Google Scholar 

  16. Shalev-Shwartz, S., Zhang, T.: Stochastic Dual Coordinate Ascent Methods for Regularized Loss Minimization (2012)

    Google Scholar 

  17. Meyn, S., Tweedie, R.L.: Markov Chains and Stochastic Stability (2009)

    Google Scholar 

  18. Zhao, P., Zhang, T.: Stochastic Optimization with Importance Sampling (2014)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Pascal Bianchi .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2015 Springer International Publishing Switzerland

About this paper

Cite this paper

Papa, G., Bianchi, P., Clémençon, S. (2015). Adaptive Sampling for Incremental Optimization Using Stochastic Gradient Descent. In: Chaudhuri, K., GENTILE, C., Zilles, S. (eds) Algorithmic Learning Theory. ALT 2015. Lecture Notes in Computer Science(), vol 9355. Springer, Cham. https://doi.org/10.1007/978-3-319-24486-0_21

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-24486-0_21

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-24485-3

  • Online ISBN: 978-3-319-24486-0

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics