Skip to main content

A Hessian Free Neural Networks Training Algorithm with Curvature Scaled Adaptive Momentum

  • Conference paper
  • First Online:
Learning and Intelligent Optimization (LION 2019)

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 11968))

Included in the following conference series:

  • 741 Accesses

Abstract

In this paper we propose an algorithm for training neural network architectures, called Hessian Free algorithm with Curvature Scaled Adaptive Momentum (HF-CSAM). The algorithm’s weight update rule is similar to SGD with momentum but with two main differences arising from the formulation of the training task as a constrained optimization problem: (i) the momentum term is scaled with curvature information (in the form of the Hessian); (ii) the coefficients for the learning rate and the scaled momentum term are adaptively determined. The implementation of the algorithm requires minimal additional computations compared to a classical SGD with momentum iteration since no actual computation of the Hessian is needed, due to the algorithm’s requirement for computing only a Hessian-vector product. This product can be computed exactly and very efficiently within any modern computational graph framework such as, for example, Tensorflow. We report experiments with different neural network architectures trained on standard neural network benchmarks which demonstrate the efficiency of the proposed method.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    https://github.com/flo3003/HF-CSAM.

References

  1. Battiti, R.: First and second-order methods for learning: between steepest descent and Newton’s method. Neural Comput. 4, 141–166 (1992)

    Article  Google Scholar 

  2. Bryson, A.E., Denham, W.F.: A steepest-ascent method for solving optimum programming problems. J. Appl. Mech. 29, 247–257 (1962)

    Article  MathSciNet  Google Scholar 

  3. Chollet, F., et al.: Keras (2015). https://keras.io

  4. Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM J. Optim. 2(1), 21–42 (1992)

    Article  MathSciNet  Google Scholar 

  5. Gould, N.I.M., Nocedal, J.: The modified absolute-value factorization norm for trust-region minimization. In: De Leone, R., Murli, A., Pardalos, P.M., Toraldo, G. (eds.) High Performance Algorithms and Software in Nonlinear Optimization. Applied Optimization, vol. 24, pp. 225–241. Springer, Boston (1998). https://doi.org/10.1007/978-1-4613-3279-4_15

    Chapter  Google Scholar 

  6. Kingma, D.P., Ba, J.: Adam: a method for stochastic optimization. CoRR, abs/1412.6980 (2014)

    Google Scholar 

  7. Krizhevsky, A.: Learning multiple layers of features from tiny images. Technical report, Canadian Institute for Advanced Research (2009)

    Google Scholar 

  8. LeCun, Y., Cortes, C.: MNIST handwritten digit database (2010)

    Google Scholar 

  9. Martens, J.: Deep learning via Hessian-free optimization. In: Proceedings of the 27th International Conference on Machine Learning (ICML-2010), Haifa, Israel, 21–24 June 2010, pp. 735–742 (2010)

    Google Scholar 

  10. Nair, V., Hinton, G.E.: Rectified linear units improve restricted Boltzmann machines. In: Proceedings of the 27th International Conference on International Conference on Machine Learning, ICML 2010, USA, pp. 807–814. Omnipress (2010)

    Google Scholar 

  11. Pearlmutter, B.A.: Fast exact multiplication by the Hessian. Neural Comput. 6(1), 147–160 (1994)

    Article  Google Scholar 

  12. Perantonis, S.J., Ampazis, N., Virvilis, V.: A learning framework for neural networks using constrained optimization methods. Ann. Oper. Res. 99, 385–401 (2000)

    Article  MathSciNet  Google Scholar 

  13. Perantonis, S.J., Karras, D.A.: An efficient constrained learning algorithm with momentum acceleration. Neural Networks 8(2), 237–239 (1995)

    Article  Google Scholar 

  14. Ramsundar, B., Zadeh, R.B.: TensorFlow for Deep Learning: From Linear Regression to Reinforcement Learning, 1st edn. O’Reilly Media Inc., Sebastopol (2018)

    Google Scholar 

  15. Robbins, H., Monro, S.: A stochastic approximation method. Ann. Math. Stat. 22(3), 400–407 (1951)

    Article  MathSciNet  Google Scholar 

  16. Srivastava, N., Hinton, G., Krizhevsky, A., Sutskever, I., Salakhutdinov, R.: Dropout: a simple way to prevent neural networks from overfitting. J. Mach. Learn. Res. 15, 1929–1958 (2014)

    MathSciNet  MATH  Google Scholar 

  17. Sutskever, I., Martens, J., Dahl, G., Hinton, G.: On the importance of initialization and momentum in deep learning. In: Proceedings of the 30th International Conference on International Conference on Machine Learning, ICML 2013, vol. 28, pp. III–1139–III–1147 (2013). www.JMLR.org

  18. Tieleman, T., Hinton, G.: Lecture 6.5—RMSProp: Divide the gradient by a running average of its recent magnitude. COURSERA: Neural Networks for Machine Learning (2012)

    Google Scholar 

  19. Wilson, A.C., Roelofs, R., Stern, M., Srebro, N., Recht, B.: The marginal value of adaptive gradient methods in machine learning. In: Guyon, I., et al. (eds.) Advances in Neural Information Processing Systems, vol. 30, pp. 4148–4158. Curran Associates Inc. (2017)

    Google Scholar 

Download references

Funding

Mrs. Sakketou is supported by a Ph.D. Scholarship by the State Scholarships Foundation (IKY), Greece.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Flora Sakketou .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2020 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Sakketou, F., Ampazis, N. (2020). A Hessian Free Neural Networks Training Algorithm with Curvature Scaled Adaptive Momentum. In: Matsatsinis, N., Marinakis, Y., Pardalos, P. (eds) Learning and Intelligent Optimization. LION 2019. Lecture Notes in Computer Science(), vol 11968. Springer, Cham. https://doi.org/10.1007/978-3-030-38629-0_8

Download citation

Publish with us

Policies and ethics