Skip to main content

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 3176))

Included in the following conference series:

Abstract

This contribution presents an overview of the theoretical and practical aspects of the broad family of learning algorithms based on Stochastic Gradient Descent, including Perceptrons, Adalines, K-Means, LVQ, Multi-Layer Networks, and Graph Transformer Networks.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Amari, S.-I.: Differential-geometrical methods in statistics. Springer, Berlin (1990)

    MATH  Google Scholar 

  2. Amari, S.I.: A theory of adaptive pattern classifiers. IEEE Transactions on Electronic Computers EC-16, 299–307 (1967)

    Article  MATH  Google Scholar 

  3. Amari, S.-I.: Natural learning in structured parameter spaces – natural riemannian gradient. In: Neural Information Processing Systems, vol. 9, pp. 127–133. MIT Press, Cambridge (1996)

    Google Scholar 

  4. Battiti, R.: First- and second-order methods for learning: Between steepest descent and newton’s method. Neural Computation 4, 141–166 (1992)

    Article  Google Scholar 

  5. Becker, S., Le Cun, Y.: Improving the convergence of back-propagation learning with second-order methods. In: Touretzky, D., Hinton, G., Sejnowski, T. (eds.) Proceedings of the 1988 Connectionist Models Summer School, pp. 29–37. Morgan Kaufmann, San Mateo (1989)

    Google Scholar 

  6. Bengio, Y., LeCun, Y., Nohl, C., Burges, C.: Lerec: A nn/hmm hybrid for on-line handwriting recognition. Neural Computation 7(6) (November 1995)

    Google Scholar 

  7. Benveniste, A., Metivier, M., Priouret, P.: Adaptive Algorithms and Stochastic Approximations. Springer, Berlin (1990)

    Book  MATH  Google Scholar 

  8. Bottou, L., Le Cun, Y., Bengio, Y.: Global training of document processing systems using graph transformer networks. In: Proc. of Computer Vision and Pattern Recognition, pp. 489–493. IEEE, Puerto-Rico (1997)

    Google Scholar 

  9. Bottou, L.: Une Approche théorique de l’Apprentissage Connexionniste: Applications à la Reconnaissance de la Parole. PhD thesis, Université de Paris XI, Orsay, France (1991)

    Google Scholar 

  10. Bottou, L.: Online algorithms and stochastic approximations. In: Saad, D. (ed.) Online Learning and Neural Networks. Cambridge University Press, Cambridge (1998)

    Google Scholar 

  11. Bottou, L., Bengio, Y.: Convergence properties of the kmeans algorithm. In: Advances in Neural Information Processing Systems, Denver, vol. 7. MIT Press, Cambridge (1995)

    Google Scholar 

  12. Bottou, L., Le Cun, Y.: Large scale online learning. In: Advances in Neural Information Processing Systems, vol. 16. MIT Press, Cambridge (2004)

    Google Scholar 

  13. Bottou, L., Le Cun, Y.: On-line learning for very large datasets. In: Applied Stochastic Models in Business and Industry, Special issue (to appear, 2004)

    Google Scholar 

  14. Bottou, L., Murata, N.: Stochastic approximations and efficient learning. In: Arbib, M.A. (ed.) The Handbook of Brain Theory and Neural Networks, 2nd edn. The MIT Press, Cambridge (2002)

    Google Scholar 

  15. Dennis Jr., J.E., Schnabel, R.B.: Numerical Methods For Unconstrained Optimization and Nonlinear Equations. Prentice-Hall, Inc., Englewood Cliffs (1983)

    MATH  Google Scholar 

  16. Duda, R.O., Hart, P.E.: Pattern Classification And Scene Analysis. Wiley and Sons, Chichester (1973)

    MATH  Google Scholar 

  17. Gentile, C., Warmuth, M.K.: Linear hinge loss and average margin. In: Neural Information Processing Systems, vol. 11, pp. 231–255. MIT Press, Cambridge (1999)

    Google Scholar 

  18. Hebb, D.O.: The Organization of Behavior. Wiley, New York (1949)

    Google Scholar 

  19. Kohonen, T.: Self-organized formation of topologically correct feature maps. Biological Cybernetics 43, 59–69 (1982)

    Article  MathSciNet  MATH  Google Scholar 

  20. Kohonen, T., Barna, G., Chrisley, R.: Statistical pattern recognition with neural network: Benchmarking studies. In: Proceedings of the IEEE Second International Conference on Neural Networks, San Diego, vol. 1, pp. 61–68 (1988)

    Google Scholar 

  21. Krasovskii, A.A.: Dynamic of continuous self-Organizing Systems. Fizmatgiz, Moscow (1963) (in russian)

    Google Scholar 

  22. Kushner, H.J., Clark, D.S.: Stochastic Approximation for Constrained and Unconstrained Systems. In: Applied Math. Sci., vol. 26. Springer, Berlin, New York (1978)

    Google Scholar 

  23. Le Cun, Y., Boser, B., Denker, J.S., Henderson, D., Howard, R.E., Hubbard, W., Jackel, L.D.: Backpropagation applied to handwritten zip code recognition. Neural Computation 1(4), 541–551 (1989) (Winter)

    Article  Google Scholar 

  24. Le Cun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient based learning applied to document recognition. Proceedings of IEEE 86(11), 2278–2324 (1998)

    Article  Google Scholar 

  25. LeCun, Y.A., Bottou, L., Orr, G.B., Müller, K.-R.: Efficient backProp. In: Orr, G.B., Müller, K.-R. (eds.) NIPS-WS 1996. LNCS, vol. 1524, p. 9. Springer, Heidelberg (1998)

    Chapter  Google Scholar 

  26. Le Cun, Y., Bottou, L., HuangFu, J.: Learning methods for generic object recognition with invariance to pose and lighting. In: Proc. of Computer Vision and Pattern Recognition, Washington, D.C. IEEE, Los Alamitos (2004)

    Google Scholar 

  27. Ljung, L., Söderström, T.: Theory and Practice of recursive identification. MIT Press, Cambridge (1983)

    MATH  Google Scholar 

  28. MacQueen, J.: Some methods for classification and analysis of multivariate observations. In: LeCam, L.M., Neyman, J. (eds.) Proceedings of the Fifth Berkeley Symposium on Mathematics, Statistics, and Probabilities, vol. 1, pp. 281–297. University of California Press, Berkeley and Los Angeles (Calif) (1967)

    Google Scholar 

  29. Minsky, M., Papert, S.: Perceptrons. MIT Press, Cambridge (1969)

    MATH  Google Scholar 

  30. Müller, U., Gunzinger, A., Guggenbühl, W.: Fast neural net simulation with a DSP processor array. IEEE Trans. on Neural Networks 6(1), 203–213 (1995)

    Article  Google Scholar 

  31. Murata, N., Amari, S.-i.: Statistical analysis of learning dynamics. Signal Processing 74(1), 3–28 (1999)

    Article  MATH  Google Scholar 

  32. Orr, G.B., Leen, T.K.: Momentum and optimal stochastic search. In: Mozer, M.C., Smolensky, P., Touretzky, D.S., Elman, J.L., Weigend, A.S. (eds.) Proceedings of the 1993 Connectionist Models Summer School, pp. 351–357. Lawrence Erlbaum Associates, Mahwah (1994)

    Google Scholar 

  33. Robbins, H., Monro, S.: A stochastic approximation model. Ann. Math. Stat. 22, 400–407 (1951)

    Article  MATH  Google Scholar 

  34. Rosenblatt, F.: The perceptron: A perceiving and recognizing automaton. Technical Report 85-460-1, Project PARA, Cornell Aeronautical Lab (1957)

    Google Scholar 

  35. Rumelhart, D.E., Hinton, G.E., Williams, R.J.: Learning internal representations by error propagation. In: Parallel distributed processing: Explorations in the microstructure of cognition, vol. I, pp. 318–362. Bradford Books, Cambridge (1986)

    Google Scholar 

  36. Rosset, J.Z.S., Hastie, T.: Margin maximizing loss functions. In: Advances in Neural Information Processing Systems, vol. 16. MIT Press, Cambridge (2004)

    Google Scholar 

  37. Schenkel, M., Weissman, H., Guyon, I., Nohl, C., Henderson, D.: Recognition-based segmentation of on-line hand-printed words. In: Hanson, S.J., Cowan, J.D., Giles, C.L. (eds.) Advances in Neural Information Processing Systems, Denver, CO, vol. 5, pp. 723–730 (1993)

    Google Scholar 

  38. Schraudolph, N.N., Graepel, T.: Conjugate directions for stochastic gradient descent. In: Dorronsoro, J.R. (ed.) ICANN 2002. LNCS, vol. 2415, p. 1351. Springer, Heidelberg (2002)

    Chapter  Google Scholar 

  39. Sejnowski, T.J., Rosenberg, C.R.: Parallel networks that learn to pronounce english text. Complex Systems 1, 145–168 (1987)

    MATH  Google Scholar 

  40. Tsypkin, Y.: Adaptation and Learning in automatic systems. Academic Press, New York (1971)

    MATH  Google Scholar 

  41. Tsypkin, Y.: Foundations of the theory of learning systems. Academic Press, New York (1973)

    MATH  Google Scholar 

  42. Vapnik, V.N.: Estimation of dependences based on empirical data. Series in Statistics. Springer, Berlin, New York (1982)

    MATH  Google Scholar 

  43. Vapnik, V.N.: The Nature of Statistical Learning Theory. Springer, Berlin (1995)

    Book  MATH  Google Scholar 

  44. Widrow, B., Hoff, M.E.: Adaptive switching circuits. In: IRE WESCON Conv. Record, Part 4, pp. 96–104 (1960)

    Google Scholar 

  45. Widrow, B., Stearns, S.D.: Adaptive Signal Processing. Prentice-Hall, Englewood Cliffs (1985)

    MATH  Google Scholar 

  46. Wolf, R., Platt, J.: Postal address block location using a convolutional locator network. In: Cowan, J.D., Tesauro, G., Alspector, J. (eds.) Advances in Neural Information Processing Systems, vol. 6, pp. 745–752 (1994)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2004 Springer-Verlag Berlin Heidelberg

About this chapter

Cite this chapter

Bottou, L. (2004). Stochastic Learning. In: Bousquet, O., von Luxburg, U., Rätsch, G. (eds) Advanced Lectures on Machine Learning. ML 2003. Lecture Notes in Computer Science(), vol 3176. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-28650-9_7

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-28650-9_7

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-23122-6

  • Online ISBN: 978-3-540-28650-9

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics