A Hessian Free Neural Networks Training Algorithm with Curvature Scaled Adaptive Momentum

Sakketou, Flora; Ampazis, Nicholas

doi:10.1007/978-3-030-38629-0_8

Flora Sakketou¹¹ &
Nicholas Ampazis¹¹

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 11968))

Included in the following conference series:

International Conference on Learning and Intelligent Optimization

741 Accesses

Abstract

In this paper we propose an algorithm for training neural network architectures, called Hessian Free algorithm with Curvature Scaled Adaptive Momentum (HF-CSAM). The algorithm’s weight update rule is similar to SGD with momentum but with two main differences arising from the formulation of the training task as a constrained optimization problem: (i) the momentum term is scaled with curvature information (in the form of the Hessian); (ii) the coefficients for the learning rate and the scaled momentum term are adaptively determined. The implementation of the algorithm requires minimal additional computations compared to a classical SGD with momentum iteration since no actual computation of the Hessian is needed, due to the algorithm’s requirement for computing only a Hessian-vector product. This product can be computed exactly and very efficiently within any modern computational graph framework such as, for example, Tensorflow. We report experiments with different neural network architectures trained on standard neural network benchmarks which demonstrate the efficiency of the proposed method.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
https://github.com/flo3003/HF-CSAM.

References

Battiti, R.: First and second-order methods for learning: between steepest descent and Newton’s method. Neural Comput. 4, 141–166 (1992)
Article Google Scholar
Bryson, A.E., Denham, W.F.: A steepest-ascent method for solving optimum programming problems. J. Appl. Mech. 29, 247–257 (1962)
Article MathSciNet Google Scholar
Chollet, F., et al.: Keras (2015). https://keras.io
Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM J. Optim. 2(1), 21–42 (1992)
Article MathSciNet Google Scholar
Gould, N.I.M., Nocedal, J.: The modified absolute-value factorization norm for trust-region minimization. In: De Leone, R., Murli, A., Pardalos, P.M., Toraldo, G. (eds.) High Performance Algorithms and Software in Nonlinear Optimization. Applied Optimization, vol. 24, pp. 225–241. Springer, Boston (1998). https://doi.org/10.1007/978-1-4613-3279-4_15
Chapter Google Scholar
Kingma, D.P., Ba, J.: Adam: a method for stochastic optimization. CoRR, abs/1412.6980 (2014)
Google Scholar
Krizhevsky, A.: Learning multiple layers of features from tiny images. Technical report, Canadian Institute for Advanced Research (2009)
Google Scholar
LeCun, Y., Cortes, C.: MNIST handwritten digit database (2010)
Google Scholar
Martens, J.: Deep learning via Hessian-free optimization. In: Proceedings of the 27th International Conference on Machine Learning (ICML-2010), Haifa, Israel, 21–24 June 2010, pp. 735–742 (2010)
Google Scholar
Nair, V., Hinton, G.E.: Rectified linear units improve restricted Boltzmann machines. In: Proceedings of the 27th International Conference on International Conference on Machine Learning, ICML 2010, USA, pp. 807–814. Omnipress (2010)
Google Scholar
Pearlmutter, B.A.: Fast exact multiplication by the Hessian. Neural Comput. 6(1), 147–160 (1994)
Article Google Scholar
Perantonis, S.J., Ampazis, N., Virvilis, V.: A learning framework for neural networks using constrained optimization methods. Ann. Oper. Res. 99, 385–401 (2000)
Article MathSciNet Google Scholar
Perantonis, S.J., Karras, D.A.: An efficient constrained learning algorithm with momentum acceleration. Neural Networks 8(2), 237–239 (1995)
Article Google Scholar
Ramsundar, B., Zadeh, R.B.: TensorFlow for Deep Learning: From Linear Regression to Reinforcement Learning, 1st edn. O’Reilly Media Inc., Sebastopol (2018)
Google Scholar
Robbins, H., Monro, S.: A stochastic approximation method. Ann. Math. Stat. 22(3), 400–407 (1951)
Article MathSciNet Google Scholar
Srivastava, N., Hinton, G., Krizhevsky, A., Sutskever, I., Salakhutdinov, R.: Dropout: a simple way to prevent neural networks from overfitting. J. Mach. Learn. Res. 15, 1929–1958 (2014)
MathSciNet MATH Google Scholar
Sutskever, I., Martens, J., Dahl, G., Hinton, G.: On the importance of initialization and momentum in deep learning. In: Proceedings of the 30th International Conference on International Conference on Machine Learning, ICML 2013, vol. 28, pp. III–1139–III–1147 (2013). www.JMLR.org
Tieleman, T., Hinton, G.: Lecture 6.5—RMSProp: Divide the gradient by a running average of its recent magnitude. COURSERA: Neural Networks for Machine Learning (2012)
Google Scholar
Wilson, A.C., Roelofs, R., Stern, M., Srebro, N., Recht, B.: The marginal value of adaptive gradient methods in machine learning. In: Guyon, I., et al. (eds.) Advances in Neural Information Processing Systems, vol. 30, pp. 4148–4158. Curran Associates Inc. (2017)
Google Scholar

Download references

Funding

Mrs. Sakketou is supported by a Ph.D. Scholarship by the State Scholarships Foundation (IKY), Greece.

Author information

Authors and Affiliations

Department of Financial and Management Engineering, University of the Aegean, Chios, Greece
Flora Sakketou & Nicholas Ampazis

Authors

Flora Sakketou
View author publications
You can also search for this author in PubMed Google Scholar
Nicholas Ampazis
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Flora Sakketou .

Editor information

Editors and Affiliations

Technical University of Crete, Chania, Greece
Nikolaos F. Matsatsinis
Technical University of Crete, Chania, Greece
Yannis Marinakis
Department of Industrial and Systems Engineering, University of Florida, Gainesville, FL, USA
Panos Pardalos

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Sakketou, F., Ampazis, N. (2020). A Hessian Free Neural Networks Training Algorithm with Curvature Scaled Adaptive Momentum. In: Matsatsinis, N., Marinakis, Y., Pardalos, P. (eds) Learning and Intelligent Optimization. LION 2019. Lecture Notes in Computer Science(), vol 11968. Springer, Cham. https://doi.org/10.1007/978-3-030-38629-0_8

Download citation

DOI: https://doi.org/10.1007/978-3-030-38629-0_8
Published: 22 January 2020
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-38628-3
Online ISBN: 978-3-030-38629-0
eBook Packages: Mathematics and StatisticsMathematics and Statistics (R0)

Publish with us

Policies and ethics