Stable Adaptive Momentum for Rapid Online Learning in Nonlinear Systems

Graepel, Thore; Schraudolph, Nicol N.

doi:10.1007/3-540-46084-5_73

Thore Graepel⁵ &
Nicol N. Schraudolph⁵

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 2415))

Included in the following conference series:

International Conference on Artificial Neural Networks

107 Accesses
2 Citations

Abstract

We consider the problem of developing rapid, stable, and scalable stochastic gradient descent algorithms for optimisation of very large nonlinear systems. Based on earlier work by Orr et al. on adaptive momentum—an efficient yet extremely unstable stochastic gradient descent algorithm—we develop a stabilised adaptive momentum algorithm that is suitable for noisy nonlinear optimisation problems. The stability is improved by introducing a forgetting factor 0 ≤ λ ≤ 1 that smoothes the trajectory and enables adaptation in non-stationary environments. The scalability of the new algorithm follows from the fact that at each iteration the multiplication by the curvature matrix can be achieved in O (n) steps using automatic differentiation tools. We illustrate the behaviour of the new algorithm on two examples: a linear neuron with squared loss and highly correlated inputs, and a multilayer perceptron applied to the four regions benchmark task.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Y. LeCun, P. Y. Simard, and B. Pearlmutter. Automatic learning rate maximization in large adaptive machines. In S. J. Hanson, J. D. Cowan, and C. L. Giles, editors, Advances in Neural Information Processing Systems, volume 5, pages 156–163. Morgan Kaufmann, San Mateo, CA, 1993.
Google Scholar
T. K. Leen and G. B. Orr. Optimal stochastic search and adaptive momentum. In J. D. Cowan, G. Tesauro, and J. Alspector, editors, Advances in Neural Information Processing Systems, volume 6, pages 477–484. Morgan Kaufmann, San Francisco, CA, 1994.
Google Scholar
D. Marquardt. An algorithm for least-squares estimation of non-linear Parameters. Journal of the Society of Industrial and Applied Mathematics, 11(2):431–441, 1963.
Article MATH MathSciNet Google Scholar
G. B. Orr. Dynamics and Algorithms for Stochastic Learning. PhD thesis, Department of Computer Science and Engineering, Oregon Graduate Institute, Beaverton, OR 97006, 1995. ftp://neural.cse.ogi.edu/pub/neural/papers/orrPhDchi-5. ps.Z, orrPhDch6-9.ps.Z.
Google Scholar
B. A. Pearlmutter. Fast exact multiplication by the Hessian. Neural Computation, 6(1):147–160, 1994.
Article Google Scholar
N. N. Schraudolph. Fast curvature matrix-vector products for second-order gradient descent. Neural Computation, 14(7), 2002. http://www.inf.ethz.ch/~schraudo/ pubs/mvp.ps.gz.
S. Singhal and L. Wu. Training multilayer perceptrons with the extended Kalman filter. In D. S. Touretzky, editor, Advances in Neural Information Processing Systems. Proceedings of the 1988 Conference, pages 133–140, San Mateo, CA, 1989. Morgan Kaufmann.
Google Scholar

Download references

Author information

Authors and Affiliations

Institute of Computational Science ETH, Zürich, Switzerland
Thore Graepel & Nicol N. Schraudolph

Authors

Thore Graepel
View author publications
You can also search for this author in PubMed Google Scholar
Nicol N. Schraudolph
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

ETS Informática, Universidad Autónoma de Madrid, 28049, Madrid, Spain
José R. Dorronsoro

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Graepel, T., Schraudolph, N.N. (2002). Stable Adaptive Momentum for Rapid Online Learning in Nonlinear Systems. In: Dorronsoro, J.R. (eds) Artificial Neural Networks — ICANN 2002. ICANN 2002. Lecture Notes in Computer Science, vol 2415. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-46084-5_73

Download citation

DOI: https://doi.org/10.1007/3-540-46084-5_73
Published: 21 August 2002
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-44074-1
Online ISBN: 978-3-540-46084-8
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics