Skip to main content

Fast Curvature Matrix-Vector Products

  • Conference paper
  • First Online:
Artificial Neural Networks — ICANN 2001 (ICANN 2001)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 2130))

Included in the following conference series:

Abstract

The Gauss-Newton approximation of the Hessian guarantees positive semi-definiteness while retaining more second-order information than the Fisher information. We extend it from nonlinear least squares to all differentiable objectives such that positive semi-definiteness is maintained for the standard loss functions in neural network regression and classification. We give efficient algorithms for computing the product of extended Gauss-Newton and Fisher information matrices with arbitrary vectors, using techniques similar to but even cheaper than the fast Hessian-vector product [1]. The stability of SMD [2,3,4,5], a learning rate adaptation method that uses curvature matrix-vector products, improves when the extended Gauss-Newton matrix is substituted for the Hessian.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 149.00
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 189.00
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. B. A. Pearlmutter, “Fast exact multiplication by the Hessian,” Neural Computation, vol. 6, no. 1, pp. 147–160, 1994.

    Article  Google Scholar 

  2. N. N. Schraudolph, “Local gain adaptation in stochastic gradient descent,” in Proc. 9th Int. Conf. Artificial Neural Networks, pp. 569–574, IEE, London, 1999.

    Chapter  Google Scholar 

  3. N. N. Schraudolph, “Online learning with adaptive local step sizes,” in Neural Nets-WIRN Vietri-99: Proc. 11th Italian Workshop on Neural Networks (M. Marinaro and R. Tagliaferri, eds.), Perspectives in Neural Computing, (Vietri sul Mare, Salerno, Italy), pp. 151–156, Springer Verlag, Berlin, 1999.

    Google Scholar 

  4. N. N. Schraudolph, “Fast second-order gradient descent via O(n) curvature matrix-vector products,” Tech. Rep. IDSIA-12-00, IDSIA, Galleria 2, CH-6928 Manno, Switzerland, 2000. Submitted to Neural Computation.

    Google Scholar 

  5. N. N. Schraudolph and X. Giannakopoulos, “Online independent component analysis with local learning rate adaptation,” in Adv. Neural Info. Proc. Systems (S. A. Solla, T. K. Leen, and K.-R. Müller, eds.), vol. 12, pp. 789–795, The MIT Press, Cambridge, MA, 2000.

    Google Scholar 

  6. C. M. Bishop, Neural Networks for Pattern Recognition. Oxford: Clarendon, 1995.

    Google Scholar 

  7. S.-i. Amari, Differential-Geometrical Methods in Statistics, vol. 28 of Lecture Notes in Statistics. New York: Springer Verlag, 1985.

    MATH  Google Scholar 

  8. S.-i. Amari, “Natural gradient works efficiently in learning,” Neural Computation, vol. 10, no. 2, pp. 251–276, 1998.

    Article  MathSciNet  Google Scholar 

  9. J. Kivinen and M. K. Warmuth, “Additive versus exponentiated gradient updates for linear prediction”, in Proc. 27th Annual ACM Symp. Theory of Computing, (New York, NY), pp. 209–218, Association for Computing Machinery, 1995.

    Google Scholar 

  10. N. N. Schraudolph, “A fast, compact approximation of the exponential function,” Neural Computation, vol. 11, no. 4, pp. 853–862, 1999.

    Article  Google Scholar 

  11. S. Singhal and L. Wu, “Training multilayer perceptrons with the extended Kalman filter,” in Adv. Neural Info. Proc. Systems: Proc. 1988 Conf. (D. S. Touretzky, ed.), pp. 133–140, Morgan Kaufmann, 1989.

    Google Scholar 

  12. M. E. Harmon and L. C. Baird III, “Multi-player residual advantage learning with general function approximation,” Tech. Rep. WL-TR-1065, Wright Laboratory, WL/AACF, 2241 Avionics Circle, Wright-Patterson AFB, OH 45433-7308, 1996.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2001 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Schraudolph, N.N. (2001). Fast Curvature Matrix-Vector Products. In: Dorffner, G., Bischof, H., Hornik, K. (eds) Artificial Neural Networks — ICANN 2001. ICANN 2001. Lecture Notes in Computer Science, vol 2130. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-44668-0_4

Download citation

  • DOI: https://doi.org/10.1007/3-540-44668-0_4

  • Published:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-42486-4

  • Online ISBN: 978-3-540-44668-2

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics