Abstract
The Gauss-Newton approximation of the Hessian guarantees positive semi-definiteness while retaining more second-order information than the Fisher information. We extend it from nonlinear least squares to all differentiable objectives such that positive semi-definiteness is maintained for the standard loss functions in neural network regression and classification. We give efficient algorithms for computing the product of extended Gauss-Newton and Fisher information matrices with arbitrary vectors, using techniques similar to but even cheaper than the fast Hessian-vector product [1]. The stability of SMD [2,3,4,5], a learning rate adaptation method that uses curvature matrix-vector products, improves when the extended Gauss-Newton matrix is substituted for the Hessian.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
B. A. Pearlmutter, “Fast exact multiplication by the Hessian,” Neural Computation, vol. 6, no. 1, pp. 147–160, 1994.
N. N. Schraudolph, “Local gain adaptation in stochastic gradient descent,” in Proc. 9th Int. Conf. Artificial Neural Networks, pp. 569–574, IEE, London, 1999.
N. N. Schraudolph, “Online learning with adaptive local step sizes,” in Neural Nets-WIRN Vietri-99: Proc. 11th Italian Workshop on Neural Networks (M. Marinaro and R. Tagliaferri, eds.), Perspectives in Neural Computing, (Vietri sul Mare, Salerno, Italy), pp. 151–156, Springer Verlag, Berlin, 1999.
N. N. Schraudolph, “Fast second-order gradient descent via O(n) curvature matrix-vector products,” Tech. Rep. IDSIA-12-00, IDSIA, Galleria 2, CH-6928 Manno, Switzerland, 2000. Submitted to Neural Computation.
N. N. Schraudolph and X. Giannakopoulos, “Online independent component analysis with local learning rate adaptation,” in Adv. Neural Info. Proc. Systems (S. A. Solla, T. K. Leen, and K.-R. Müller, eds.), vol. 12, pp. 789–795, The MIT Press, Cambridge, MA, 2000.
C. M. Bishop, Neural Networks for Pattern Recognition. Oxford: Clarendon, 1995.
S.-i. Amari, Differential-Geometrical Methods in Statistics, vol. 28 of Lecture Notes in Statistics. New York: Springer Verlag, 1985.
S.-i. Amari, “Natural gradient works efficiently in learning,” Neural Computation, vol. 10, no. 2, pp. 251–276, 1998.
J. Kivinen and M. K. Warmuth, “Additive versus exponentiated gradient updates for linear prediction”, in Proc. 27th Annual ACM Symp. Theory of Computing, (New York, NY), pp. 209–218, Association for Computing Machinery, 1995.
N. N. Schraudolph, “A fast, compact approximation of the exponential function,” Neural Computation, vol. 11, no. 4, pp. 853–862, 1999.
S. Singhal and L. Wu, “Training multilayer perceptrons with the extended Kalman filter,” in Adv. Neural Info. Proc. Systems: Proc. 1988 Conf. (D. S. Touretzky, ed.), pp. 133–140, Morgan Kaufmann, 1989.
M. E. Harmon and L. C. Baird III, “Multi-player residual advantage learning with general function approximation,” Tech. Rep. WL-TR-1065, Wright Laboratory, WL/AACF, 2241 Avionics Circle, Wright-Patterson AFB, OH 45433-7308, 1996.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2001 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Schraudolph, N.N. (2001). Fast Curvature Matrix-Vector Products. In: Dorffner, G., Bischof, H., Hornik, K. (eds) Artificial Neural Networks — ICANN 2001. ICANN 2001. Lecture Notes in Computer Science, vol 2130. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-44668-0_4
Download citation
DOI: https://doi.org/10.1007/3-540-44668-0_4
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-42486-4
Online ISBN: 978-3-540-44668-2
eBook Packages: Springer Book Archive