Stochastic Learning

Bottou, Léon

doi:10.1007/978-3-540-28650-9_7

Léon Bottou²¹

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 3176))

Included in the following conference series:

Summer School on Machine Learning

22k Accesses
108 Citations
6 Altmetric

Abstract

This contribution presents an overview of the theoretical and practical aspects of the broad family of learning algorithms based on Stochastic Gradient Descent, including Perceptrons, Adalines, K-Means, LVQ, Multi-Layer Networks, and Graph Transformer Networks.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Amari, S.-I.: Differential-geometrical methods in statistics. Springer, Berlin (1990)
MATH Google Scholar
Amari, S.I.: A theory of adaptive pattern classifiers. IEEE Transactions on Electronic Computers EC-16, 299–307 (1967)
Article MATH Google Scholar
Amari, S.-I.: Natural learning in structured parameter spaces – natural riemannian gradient. In: Neural Information Processing Systems, vol. 9, pp. 127–133. MIT Press, Cambridge (1996)
Google Scholar
Battiti, R.: First- and second-order methods for learning: Between steepest descent and newton’s method. Neural Computation 4, 141–166 (1992)
Article Google Scholar
Becker, S., Le Cun, Y.: Improving the convergence of back-propagation learning with second-order methods. In: Touretzky, D., Hinton, G., Sejnowski, T. (eds.) Proceedings of the 1988 Connectionist Models Summer School, pp. 29–37. Morgan Kaufmann, San Mateo (1989)
Google Scholar
Bengio, Y., LeCun, Y., Nohl, C., Burges, C.: Lerec: A nn/hmm hybrid for on-line handwriting recognition. Neural Computation 7(6) (November 1995)
Google Scholar
Benveniste, A., Metivier, M., Priouret, P.: Adaptive Algorithms and Stochastic Approximations. Springer, Berlin (1990)
Book MATH Google Scholar
Bottou, L., Le Cun, Y., Bengio, Y.: Global training of document processing systems using graph transformer networks. In: Proc. of Computer Vision and Pattern Recognition, pp. 489–493. IEEE, Puerto-Rico (1997)
Google Scholar
Bottou, L.: Une Approche théorique de l’Apprentissage Connexionniste: Applications à la Reconnaissance de la Parole. PhD thesis, Université de Paris XI, Orsay, France (1991)
Google Scholar
Bottou, L.: Online algorithms and stochastic approximations. In: Saad, D. (ed.) Online Learning and Neural Networks. Cambridge University Press, Cambridge (1998)
Google Scholar
Bottou, L., Bengio, Y.: Convergence properties of the kmeans algorithm. In: Advances in Neural Information Processing Systems, Denver, vol. 7. MIT Press, Cambridge (1995)
Google Scholar
Bottou, L., Le Cun, Y.: Large scale online learning. In: Advances in Neural Information Processing Systems, vol. 16. MIT Press, Cambridge (2004)
Google Scholar
Bottou, L., Le Cun, Y.: On-line learning for very large datasets. In: Applied Stochastic Models in Business and Industry, Special issue (to appear, 2004)
Google Scholar
Bottou, L., Murata, N.: Stochastic approximations and efficient learning. In: Arbib, M.A. (ed.) The Handbook of Brain Theory and Neural Networks, 2nd edn. The MIT Press, Cambridge (2002)
Google Scholar
Dennis Jr., J.E., Schnabel, R.B.: Numerical Methods For Unconstrained Optimization and Nonlinear Equations. Prentice-Hall, Inc., Englewood Cliffs (1983)
MATH Google Scholar
Duda, R.O., Hart, P.E.: Pattern Classification And Scene Analysis. Wiley and Sons, Chichester (1973)
MATH Google Scholar
Gentile, C., Warmuth, M.K.: Linear hinge loss and average margin. In: Neural Information Processing Systems, vol. 11, pp. 231–255. MIT Press, Cambridge (1999)
Google Scholar
Hebb, D.O.: The Organization of Behavior. Wiley, New York (1949)
Google Scholar
Kohonen, T.: Self-organized formation of topologically correct feature maps. Biological Cybernetics 43, 59–69 (1982)
Article MathSciNet MATH Google Scholar
Kohonen, T., Barna, G., Chrisley, R.: Statistical pattern recognition with neural network: Benchmarking studies. In: Proceedings of the IEEE Second International Conference on Neural Networks, San Diego, vol. 1, pp. 61–68 (1988)
Google Scholar
Krasovskii, A.A.: Dynamic of continuous self-Organizing Systems. Fizmatgiz, Moscow (1963) (in russian)
Google Scholar
Kushner, H.J., Clark, D.S.: Stochastic Approximation for Constrained and Unconstrained Systems. In: Applied Math. Sci., vol. 26. Springer, Berlin, New York (1978)
Google Scholar
Le Cun, Y., Boser, B., Denker, J.S., Henderson, D., Howard, R.E., Hubbard, W., Jackel, L.D.: Backpropagation applied to handwritten zip code recognition. Neural Computation 1(4), 541–551 (1989) (Winter)
Article Google Scholar
Le Cun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient based learning applied to document recognition. Proceedings of IEEE 86(11), 2278–2324 (1998)
Article Google Scholar
LeCun, Y.A., Bottou, L., Orr, G.B., Müller, K.-R.: Efficient backProp. In: Orr, G.B., Müller, K.-R. (eds.) NIPS-WS 1996. LNCS, vol. 1524, p. 9. Springer, Heidelberg (1998)
Chapter Google Scholar
Le Cun, Y., Bottou, L., HuangFu, J.: Learning methods for generic object recognition with invariance to pose and lighting. In: Proc. of Computer Vision and Pattern Recognition, Washington, D.C. IEEE, Los Alamitos (2004)
Google Scholar
Ljung, L., Söderström, T.: Theory and Practice of recursive identification. MIT Press, Cambridge (1983)
MATH Google Scholar
MacQueen, J.: Some methods for classification and analysis of multivariate observations. In: LeCam, L.M., Neyman, J. (eds.) Proceedings of the Fifth Berkeley Symposium on Mathematics, Statistics, and Probabilities, vol. 1, pp. 281–297. University of California Press, Berkeley and Los Angeles (Calif) (1967)
Google Scholar
Minsky, M., Papert, S.: Perceptrons. MIT Press, Cambridge (1969)
MATH Google Scholar
Müller, U., Gunzinger, A., Guggenbühl, W.: Fast neural net simulation with a DSP processor array. IEEE Trans. on Neural Networks 6(1), 203–213 (1995)
Article Google Scholar
Murata, N., Amari, S.-i.: Statistical analysis of learning dynamics. Signal Processing 74(1), 3–28 (1999)
Article MATH Google Scholar
Orr, G.B., Leen, T.K.: Momentum and optimal stochastic search. In: Mozer, M.C., Smolensky, P., Touretzky, D.S., Elman, J.L., Weigend, A.S. (eds.) Proceedings of the 1993 Connectionist Models Summer School, pp. 351–357. Lawrence Erlbaum Associates, Mahwah (1994)
Google Scholar
Robbins, H., Monro, S.: A stochastic approximation model. Ann. Math. Stat. 22, 400–407 (1951)
Article MATH Google Scholar
Rosenblatt, F.: The perceptron: A perceiving and recognizing automaton. Technical Report 85-460-1, Project PARA, Cornell Aeronautical Lab (1957)
Google Scholar
Rumelhart, D.E., Hinton, G.E., Williams, R.J.: Learning internal representations by error propagation. In: Parallel distributed processing: Explorations in the microstructure of cognition, vol. I, pp. 318–362. Bradford Books, Cambridge (1986)
Google Scholar
Rosset, J.Z.S., Hastie, T.: Margin maximizing loss functions. In: Advances in Neural Information Processing Systems, vol. 16. MIT Press, Cambridge (2004)
Google Scholar
Schenkel, M., Weissman, H., Guyon, I., Nohl, C., Henderson, D.: Recognition-based segmentation of on-line hand-printed words. In: Hanson, S.J., Cowan, J.D., Giles, C.L. (eds.) Advances in Neural Information Processing Systems, Denver, CO, vol. 5, pp. 723–730 (1993)
Google Scholar
Schraudolph, N.N., Graepel, T.: Conjugate directions for stochastic gradient descent. In: Dorronsoro, J.R. (ed.) ICANN 2002. LNCS, vol. 2415, p. 1351. Springer, Heidelberg (2002)
Chapter Google Scholar
Sejnowski, T.J., Rosenberg, C.R.: Parallel networks that learn to pronounce english text. Complex Systems 1, 145–168 (1987)
MATH Google Scholar
Tsypkin, Y.: Adaptation and Learning in automatic systems. Academic Press, New York (1971)
MATH Google Scholar
Tsypkin, Y.: Foundations of the theory of learning systems. Academic Press, New York (1973)
MATH Google Scholar
Vapnik, V.N.: Estimation of dependences based on empirical data. Series in Statistics. Springer, Berlin, New York (1982)
MATH Google Scholar
Vapnik, V.N.: The Nature of Statistical Learning Theory. Springer, Berlin (1995)
Book MATH Google Scholar
Widrow, B., Hoff, M.E.: Adaptive switching circuits. In: IRE WESCON Conv. Record, Part 4, pp. 96–104 (1960)
Google Scholar
Widrow, B., Stearns, S.D.: Adaptive Signal Processing. Prentice-Hall, Englewood Cliffs (1985)
MATH Google Scholar
Wolf, R., Platt, J.: Postal address block location using a convolutional locator network. In: Cowan, J.D., Tesauro, G., Alspector, J. (eds.) Advances in Neural Information Processing Systems, vol. 6, pp. 745–752 (1994)
Google Scholar

Download references

Author information

Authors and Affiliations

NEC Labs of America, 4 Independence Way, Princeton, NJ08540, USA
Léon Bottou

Authors

Léon Bottou
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Pertinence, 32 rue des Jeûneurs, 75002, Paris, France
Olivier Bousquet
Max Planck Institute for Biological Cybernetics, Spemannstr. 38, 72076, Tübingen, Germany
Ulrike von Luxburg
Friedrich Miescher Laboratory of the Max Planck Society, Spemannstr. 39, 72076, Tübingen, Germany
Gunnar Rätsch

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Bottou, L. (2004). Stochastic Learning. In: Bousquet, O., von Luxburg, U., Rätsch, G. (eds) Advanced Lectures on Machine Learning. ML 2003. Lecture Notes in Computer Science(), vol 3176. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-28650-9_7

Download citation

DOI: https://doi.org/10.1007/978-3-540-28650-9_7
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-23122-6
Online ISBN: 978-3-540-28650-9
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics