Parameter Tuning Using Adaptive Moment Estimation in Deep Learning Neural Networks

Okewu, Emmanuel; Misra, Sanjay; Lius, Fernandez-Sanz

doi:10.1007/978-3-030-58817-5_20

Emmanuel Okewu¹⁹,
Sanjay Misra²⁰ &
Fernandez-Sanz Lius²¹

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 12254))

Included in the following conference series:

International Conference on Computational Science and Its Applications

2856 Accesses

Abstract

The twin issues of loss quality (accuracy) and training time are critical in choosing a stochastic optimizer for training deep neural networks. Optimization methods for machine learning include gradient descent, simulated annealing, genetic algorithm and second order techniques like Newton’s method. However, the popular method for optimizing neural networks is gradient descent. Overtime, researchers have made gradient descent more responsive to the requirements of improved quality loss (accuracy) and reduced training time by progressing from using simple learning rate to using adaptive moment estimation technique for parameter tuning. In this work, we investigate the performances of established stochastic gradient descent algorithms like Adam, RMSProp, Adagrad, and Adadelta in terms of training time and loss quality. We show practically, using series of stochastic experiments, that adaptive moment estimation has improved the gradient descent optimization method. Based on the empirical outcomes, we recommend further improvement of the method by using higher moments of gradient for parameter tuning (weight update). The output of our experiments also indicate that neural network is a stochastic algorithm.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

An Enhanced Stochastic Gradient Descent Variance Reduced Ascension Optimization Algorithm for Deep Neural Networks

A Modified Stochastic Gradient Descent Optimization Algorithm With Random Learning Rate for Machine Learning and Deep Learning

Article 25 August 2023

AdaLip: An Adaptive Learning Rate Method per Layer for Stochastic Optimization

Article Open access 04 January 2023

References

Brownlee, J.: How to choose loss functions when training deep learning neural networks. In: Deep Learning Performance (2019)
Google Scholar
Shridhar, K.: A beginners guide to deep learning (2017)
Google Scholar
Zhou, Z., Feng, J.: Deep forest: towards an alternative to deep neural networks. In: Proceedings of the 26th International Joint Conference on Artificial Intelligence, pp. 3553–3559. AAAI Press (2017)
Google Scholar
Garnelo, M., Schwarz, J., Rosenbaum, D., Rezende, V.F., Eslami, S.M., Teh, Y.W.: Neural processes, arXiv preprint arXiv:1807.01622 (2018)
Damianou, A., Lawrence, N.: Deep Gaussian processes. In: Artificial Intelligence and Statistics, pp. 207–215 (2013)
Google Scholar
Pandey, P.: Demystifying neural networks: a mathematical approach (Part 2) (2018)
Google Scholar
Zeiler, M.D.: Adadelta: an adaptive learning rate method, arXiv preprint arXiv:1212.5701 (2012)
Tieleman, T., Hinton, G.: Lecture 6.5-rmsprop: divide the gradient by a running average of its recent magnitude. COURSERA Neural Netw. Mach. Learn. 4(2), 26–31 (2012)
Google Scholar
Dauphin, Y.N., Pascanu, R., Caglar, G., Kyunghyun, C., Ganguli, S., Bengio, Y.: Identifying and attacking the saddle point problem in high-dimensional non-convex optimization. In: Advances in Neural Information Processing Systems, pp. 2933–2941 (2014)
Google Scholar
Kawaguchi, K.: Deep learning without poor local minima. In: Advances in Neural Information Processing Systems (NIPS) (2016)
Google Scholar
Kim, D., Fessler, J.A.: Optimized first-order methods for smooth convex minimization. Math. Prog. 151, 8–107 (2016)
Google Scholar
Aji, A.F., Heafield, K.: Combining global sparse gradients with local gradients. In: ICLR Conference (2019)
Google Scholar
Walia, A.S.: Types of optimization algorithms used in neural networks and ways to optimize gradient descent (2017)
Google Scholar
Kingma, D.P., Ba, J.L.: Adam: a method for stochastic optimization. In: International Conference on Learning Representations (2015)
Google Scholar
Koushik, J., Hayashi, H.: Improving stochastic gradient descent with feedback. In: Conference Paper at ICLR (2017)
Google Scholar
Polyak, B.T., Juditsky, A.B.: Acceleration of stochastic approximation by averaging (PDF). SIAM J. Control Optim. 30(4), 838–855 (1992)
Article MathSciNet Google Scholar
Zhang, S., Choromanska, A., LeCun, Y.: Deep learning with elastic averaging SGD. In: Neural Information Processing Systems Conference (NIPS) (2015)
Google Scholar
Davies, C., Dembinska, A.: Computing moments of discrete order statistics from non-identical distributions. J. Comput. Appl. Math. 328(15), 340–354 (2018)
Article MathSciNet Google Scholar
Qian, N.: On the momentum term in gradient descent learning algorithms. Neural Netw. Official J. Int. Neural Netw. Soc. 12(1), 145–151 (1999)
Article MathSciNet Google Scholar
Lockett, A.: What is the most popular learning rate decay formula in machine learning? The University of Texas at Austin (2012)
Google Scholar
Darken, C., Chang, J., Moody, J.: Learning rate schedules for faster stochastic gradient search. In: Neural Networks for Signal Processing II Proceedings of the 1992 IEEE Workshop (1992)
Google Scholar
Mei, S.: A mean field view of the landscape of two-layer neural networks. In: Proceedings of the National Academy of Sciences (2018)
Google Scholar
Okewu, E., Adewole, P., Sennaike, O.: Experimental comparison of stochastic optimizers in deep learning. In: Misra, S., et al. (eds.) ICCSA 2019. LNCS, vol. 11623, pp. 704–715. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-24308-1_55
Chapter Google Scholar

Download references

Author information

Authors and Affiliations

Centre for Information Technology and Systems, University of Lagos, Lagos, Nigeria
Emmanuel Okewu
Department of Electrical and Information Engineering, Covenant University, Ota, Nigeria
Sanjay Misra
Department of Computer Sciences, University of Alcala, Alcalá de Henares, Spain
Fernandez-Sanz Lius

Authors

Emmanuel Okewu
View author publications
You can also search for this author in PubMed Google Scholar
Sanjay Misra
View author publications
You can also search for this author in PubMed Google Scholar
Fernandez-Sanz Lius
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Emmanuel Okewu .

Editor information

Editors and Affiliations

University of Perugia, Perugia, Italy
Osvaldo Gervasi
University of Basilicata, Potenza, Potenza, Italy
Beniamino Murgante
Chair- Center of ICT/ICE, Covenant University, Ota, Nigeria
Sanjay Misra
University of Cagliari, Cagliari, Italy
Chiara Garau
University of Cagliari, Cagliari, Italy
Ivan Blečić
Clayton School of Information Technology, Monash University, Clayton, VIC, Australia
David Taniar
Department of Information Science, Kyushu Sangyo University, Fukuoka, Japan
Bernady O. Apduhan
University of Minho, Braga, Portugal
Ana Maria A. C. Rocha
Polytechnic University of Bari, Bari, Italy
Eufemia Tarantino
Polytechnic University of Bari, Bari, Italy
Carmelo Maria Torre
Department of Neurology, University of Massachusetts Medical School, Worcester, MA, USA
Yeliz Karaca

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Okewu, E., Misra, S., Lius, FS. (2020). Parameter Tuning Using Adaptive Moment Estimation in Deep Learning Neural Networks. In: Gervasi, O., et al. Computational Science and Its Applications – ICCSA 2020. ICCSA 2020. Lecture Notes in Computer Science(), vol 12254. Springer, Cham. https://doi.org/10.1007/978-3-030-58817-5_20

Download citation

DOI: https://doi.org/10.1007/978-3-030-58817-5_20
Published: 30 September 2020
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-58816-8
Online ISBN: 978-3-030-58817-5
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Parameter Tuning Using Adaptive Moment Estimation in Deep Learning Neural Networks

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

An Enhanced Stochastic Gradient Descent Variance Reduced Ascension Optimization Algorithm for Deep Neural Networks

A Modified Stochastic Gradient Descent Optimization Algorithm With Random Learning Rate for Machine Learning and Deep Learning

AdaLip: An Adaptive Learning Rate Method per Layer for Stochastic Optimization

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Subscribe and save

Buy Now

Navigation

Parameter Tuning Using Adaptive Moment Estimation in Deep Learning Neural Networks

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

An Enhanced Stochastic Gradient Descent Variance Reduced Ascension Optimization Algorithm for Deep Neural Networks

A Modified Stochastic Gradient Descent Optimization Algorithm With Random Learning Rate for Machine Learning and Deep Learning

AdaLip: An Adaptive Learning Rate Method per Layer for Stochastic Optimization

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation