Skip to main content

Experimental Comparison of Stochastic Optimizers in Deep Learning

  • Conference paper
  • First Online:

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 11623))

Abstract

The stochastic optimization problem in deep learning involves finding optimal values of loss function and neural network parameters using a meta-heuristic search algorithm. The fact that these values cannot be reasonably obtained by using a deterministic optimization technique underscores the need for an iterative method that randomly picks data segments, arbitrarily determines initial values of optimization (network) parameters and steadily computes series of error functions until a tolerable error is attained. The typical stochastic optimization algorithm for training deep neural networks as a non-convex optimization problem is gradient descent. It has existing extensions like Stochastic Gradient Descent, Adagrad, Adadelta, RMSProp and Adam. In terms of accuracy, convergence rate and training time, each of these stochastic optimizers represents an improvement. However, there is room for further improvement. This paper presents outcomes of series of experiments conducted with a view to providing empirical evidences of successes made so far. We used Python deep learning libaries (Tensorflow and Keras API) for our experiments. Each algorithm is executed, results collated, and a case made for further research in deep learning to improve training time and convergence rate of deep neural network, as well as accuracy of outcomes. This is in response to the growing demands for deep learning in mission-critical and highly sophisticated decision making processes across industry verticals.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

  1. Glorot, X., Bengio, Y.: Understanding the difficulty of training deep feedforward neural networks. DIRO, Université de Montréal, Montréal (2010)

    Google Scholar 

  2. Bengio, Y., LeCun, Y., Hinton, G.: Deep learning. Nature 521(7553), 436–444 (2015). https://doi.org/10.1038/nature14539. Bibcode:2015Natur.521..436L. PMID 26017442

    Article  Google Scholar 

  3. Koushik, J., Hayashi, H.: Improving stochastic gradient descent with feedback. In: Conference Paper at ICLR 2017

    Google Scholar 

  4. Duchi, J., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12(Jul), 2121–2159 (2011)

    Google Scholar 

  5. Kim, D., Fessler, J.A.: Optimized first-order methods for smooth convex minimization. Math. Prog. 151, 8–107 (2016)

    Google Scholar 

  6. Walia, A.S.: Types of Optimization Algorithms used in Neural Networks and Ways to Optimize Gradient Descent (2017)

    Google Scholar 

  7. Shridhar, K.: A Beginners Guide to Deep Learning (2017)

    Google Scholar 

  8. Tran, D.T., Iosifidis, A., Gabbouj, M.: Improving efficiency in convolutional neural networks with multilinear filters. Neural Netw. 105, 328–339 (2018)

    Article  Google Scholar 

  9. Li, J., Zhou, T., Wang, C.: On global convergence of gradient descent algorithms for generalized phase retrieval problem. J. Comput. Appl. Math. 329, 202–222 (2018)

    Article  MathSciNet  Google Scholar 

  10. Anandkumar, A.: Nonconvex optimization: challenges and recent successes. In: ICML 2016 Tutorial

    Google Scholar 

  11. Kingma, D., Ba, J.: Adam: a method for stochastic optimization. In: Published as a Conference Paper at ICLR 2015

    Google Scholar 

  12. Brownlee, J.: How to setup a python environment for machine learning and deep learning with anaconda. In: Python Machine Learning (2017)

    Google Scholar 

  13. Li, G., et al.: Training deep neural networks with discrete state transition. Neurocomputing 272, 154–162 (2018)

    Article  Google Scholar 

  14. Schmidhuber, J.: Deep learning in neural networks: an overview. Neural Netw. 61, 85–117 (2015). arXiv:1404.7828 . https://doi.org/10.1016/j.neunet.2014.09.003. PMID 25462637

    Article  Google Scholar 

  15. Sutskever, I.: Training recurrent neural networks (PDF). Ph.D., University of Toronto, p. 74 (2013)

    Google Scholar 

  16. Mei, S.: A mean field view of the landscape of two-layer neural networks. In: Proceedings of the National Academy of Sciences (2018)

    Google Scholar 

  17. Robbins, H., Monro, S.: For developing SGD in their 1951 article titled “A Stochastic Approximation Method” (1951)

    Google Scholar 

  18. Sutskever, I., Martens, J., Dahl, G., Hinton, G.E.: On the importance of initialization and momentum in deep learning’(PDF). In: Dasgupta, S., Mcallester, D. (eds.) Proceedings of the 30th International Conference on Machine Learning (ICML-13), Atlanta, GA, vol. 28, pp. 1139–1147. Accessed 14 Jan 2016

    Google Scholar 

  19. Rumelhart, D.E., Hinton, G.E., Williams, R.J.: Learning representations by back-propagating errors. Nature 323(6088), 533–536 (1986)

    Article  Google Scholar 

  20. Nesterov (1983)

    Google Scholar 

  21. Zeiler, M.D.: Adadelta: an adaptive learning rate method. arXiv preprint arXiv:1212.5701 (2012)

  22. Tieleman, T., Hinton, G.: Lecture 6.5-rmsprop: divide the gradient by a running average of its recent magnitude. COURSERA: Neural Netw. Mach. Learn. 4(2), 26–31 (2012)

    Google Scholar 

  23. Graves, A.: Generating Sequences with Recurrent Neural Networks (2014)

    Google Scholar 

  24. Yalçın, O.G.: Image Classification in 10 Minutes with MNIST Dataset (2018)

    Google Scholar 

  25. Torres, J.: Convolutional Neural Networks for Beginners. Practical Guide with Python and Keras (2018)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Emmanuel Okewu .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2019 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Okewu, E., Adewole, P., Sennaike, O. (2019). Experimental Comparison of Stochastic Optimizers in Deep Learning. In: Misra, S., et al. Computational Science and Its Applications – ICCSA 2019. ICCSA 2019. Lecture Notes in Computer Science(), vol 11623. Springer, Cham. https://doi.org/10.1007/978-3-030-24308-1_55

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-24308-1_55

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-24307-4

  • Online ISBN: 978-3-030-24308-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics