Fast Curvature Matrix-Vector Products

Schraudolph, Nicol N.

doi:10.1007/3-540-44668-0_4

Nicol N. Schraudolph⁷

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 2130))

Included in the following conference series:

International Conference on Artificial Neural Networks

3213 Accesses
3 Citations

Abstract

The Gauss-Newton approximation of the Hessian guarantees positive semi-definiteness while retaining more second-order information than the Fisher information. We extend it from nonlinear least squares to all differentiable objectives such that positive semi-definiteness is maintained for the standard loss functions in neural network regression and classification. We give efficient algorithms for computing the product of extended Gauss-Newton and Fisher information matrices with arbitrary vectors, using techniques similar to but even cheaper than the fast Hessian-vector product [1]. The stability of SMD [2,3,4,5], a learning rate adaptation method that uses curvature matrix-vector products, improves when the extended Gauss-Newton matrix is substituted for the Hessian.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 149.00; Price excludes VAT (USA)

Softcover Book: USD 189.00; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

B. A. Pearlmutter, “Fast exact multiplication by the Hessian,” Neural Computation, vol. 6, no. 1, pp. 147–160, 1994.
Article Google Scholar
N. N. Schraudolph, “Local gain adaptation in stochastic gradient descent,” in Proc. 9th Int. Conf. Artificial Neural Networks, pp. 569–574, IEE, London, 1999.
Chapter Google Scholar
N. N. Schraudolph, “Online learning with adaptive local step sizes,” in Neural Nets-WIRN Vietri-99: Proc. 11th Italian Workshop on Neural Networks (M. Marinaro and R. Tagliaferri, eds.), Perspectives in Neural Computing, (Vietri sul Mare, Salerno, Italy), pp. 151–156, Springer Verlag, Berlin, 1999.
Google Scholar
N. N. Schraudolph, “Fast second-order gradient descent via O(n) curvature matrix-vector products,” Tech. Rep. IDSIA-12-00, IDSIA, Galleria 2, CH-6928 Manno, Switzerland, 2000. Submitted to Neural Computation.
Google Scholar
N. N. Schraudolph and X. Giannakopoulos, “Online independent component analysis with local learning rate adaptation,” in Adv. Neural Info. Proc. Systems (S. A. Solla, T. K. Leen, and K.-R. Müller, eds.), vol. 12, pp. 789–795, The MIT Press, Cambridge, MA, 2000.
Google Scholar
C. M. Bishop, Neural Networks for Pattern Recognition. Oxford: Clarendon, 1995.
Google Scholar
S.-i. Amari, Differential-Geometrical Methods in Statistics, vol. 28 of Lecture Notes in Statistics. New York: Springer Verlag, 1985.
MATH Google Scholar
S.-i. Amari, “Natural gradient works efficiently in learning,” Neural Computation, vol. 10, no. 2, pp. 251–276, 1998.
Article MathSciNet Google Scholar
J. Kivinen and M. K. Warmuth, “Additive versus exponentiated gradient updates for linear prediction”, in Proc. 27th Annual ACM Symp. Theory of Computing, (New York, NY), pp. 209–218, Association for Computing Machinery, 1995.
Google Scholar
N. N. Schraudolph, “A fast, compact approximation of the exponential function,” Neural Computation, vol. 11, no. 4, pp. 853–862, 1999.
Article Google Scholar
S. Singhal and L. Wu, “Training multilayer perceptrons with the extended Kalman filter,” in Adv. Neural Info. Proc. Systems: Proc. 1988 Conf. (D. S. Touretzky, ed.), pp. 133–140, Morgan Kaufmann, 1989.
Google Scholar
M. E. Harmon and L. C. Baird III, “Multi-player residual advantage learning with general function approximation,” Tech. Rep. WL-TR-1065, Wright Laboratory, WL/AACF, 2241 Avionics Circle, Wright-Patterson AFB, OH 45433-7308, 1996.
Google Scholar

Download references

Author information

Authors and Affiliations

Institute of Computational Sciences, Eidgenössische Technische Hochschule, CH-8092, Zürich, Switzerland
Nicol N. Schraudolph

Authors

Nicol N. Schraudolph
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Dept. of Mecidal Cybernetics and Artificial Intelligence, University of Vienna, Freyung 6/2, 1010, Vienna, Austria
Georg Dorffner
Institute for Computer Aided Automation Pattern Recognition and Image Processing Group, Technical University of Vienna, Favoritenstr. 9/1832, 1040, Vienna, Austria
Horst Bischof
Institut für Statistik, Wirtschaftsuniversität Wien, Augasse 2-6, 1090, Wien, Austria
Kurt Hornik

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Schraudolph, N.N. (2001). Fast Curvature Matrix-Vector Products. In: Dorffner, G., Bischof, H., Hornik, K. (eds) Artificial Neural Networks — ICANN 2001. ICANN 2001. Lecture Notes in Computer Science, vol 2130. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-44668-0_4

Download citation

DOI: https://doi.org/10.1007/3-540-44668-0_4
Published: 17 August 2001
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-42486-4
Online ISBN: 978-3-540-44668-2
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics