Abstract
We consider the problem of developing rapid, stable, and scalable stochastic gradient descent algorithms for optimisation of very large nonlinear systems. Based on earlier work by Orr et al. on adaptive momentum—an efficient yet extremely unstable stochastic gradient descent algorithm—we develop a stabilised adaptive momentum algorithm that is suitable for noisy nonlinear optimisation problems. The stability is improved by introducing a forgetting factor 0 ≤ λ ≤ 1 that smoothes the trajectory and enables adaptation in non-stationary environments. The scalability of the new algorithm follows from the fact that at each iteration the multiplication by the curvature matrix can be achieved in O (n) steps using automatic differentiation tools. We illustrate the behaviour of the new algorithm on two examples: a linear neuron with squared loss and highly correlated inputs, and a multilayer perceptron applied to the four regions benchmark task.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Y. LeCun, P. Y. Simard, and B. Pearlmutter. Automatic learning rate maximization in large adaptive machines. In S. J. Hanson, J. D. Cowan, and C. L. Giles, editors, Advances in Neural Information Processing Systems, volume 5, pages 156–163. Morgan Kaufmann, San Mateo, CA, 1993.
T. K. Leen and G. B. Orr. Optimal stochastic search and adaptive momentum. In J. D. Cowan, G. Tesauro, and J. Alspector, editors, Advances in Neural Information Processing Systems, volume 6, pages 477–484. Morgan Kaufmann, San Francisco, CA, 1994.
D. Marquardt. An algorithm for least-squares estimation of non-linear Parameters. Journal of the Society of Industrial and Applied Mathematics, 11(2):431–441, 1963.
G. B. Orr. Dynamics and Algorithms for Stochastic Learning. PhD thesis, Department of Computer Science and Engineering, Oregon Graduate Institute, Beaverton, OR 97006, 1995. ftp://neural.cse.ogi.edu/pub/neural/papers/orrPhDchi-5. ps.Z, orrPhDch6-9.ps.Z.
B. A. Pearlmutter. Fast exact multiplication by the Hessian. Neural Computation, 6(1):147–160, 1994.
N. N. Schraudolph. Fast curvature matrix-vector products for second-order gradient descent. Neural Computation, 14(7), 2002. http://www.inf.ethz.ch/~schraudo/ pubs/mvp.ps.gz.
S. Singhal and L. Wu. Training multilayer perceptrons with the extended Kalman filter. In D. S. Touretzky, editor, Advances in Neural Information Processing Systems. Proceedings of the 1988 Conference, pages 133–140, San Mateo, CA, 1989. Morgan Kaufmann.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2002 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Graepel, T., Schraudolph, N.N. (2002). Stable Adaptive Momentum for Rapid Online Learning in Nonlinear Systems. In: Dorronsoro, J.R. (eds) Artificial Neural Networks — ICANN 2002. ICANN 2002. Lecture Notes in Computer Science, vol 2415. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-46084-5_73
Download citation
DOI: https://doi.org/10.1007/3-540-46084-5_73
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-44074-1
Online ISBN: 978-3-540-46084-8
eBook Packages: Springer Book Archive