Abstract
The stochastic optimization problem in deep learning involves finding optimal values of loss function and neural network parameters using a meta-heuristic search algorithm. The fact that these values cannot be reasonably obtained by using a deterministic optimization technique underscores the need for an iterative method that randomly picks data segments, arbitrarily determines initial values of optimization (network) parameters and steadily computes series of error functions until a tolerable error is attained. The typical stochastic optimization algorithm for training deep neural networks as a non-convex optimization problem is gradient descent. It has existing extensions like Stochastic Gradient Descent, Adagrad, Adadelta, RMSProp and Adam. In terms of accuracy, convergence rate and training time, each of these stochastic optimizers represents an improvement. However, there is room for further improvement. This paper presents outcomes of series of experiments conducted with a view to providing empirical evidences of successes made so far. We used Python deep learning libaries (Tensorflow and Keras API) for our experiments. Each algorithm is executed, results collated, and a case made for further research in deep learning to improve training time and convergence rate of deep neural network, as well as accuracy of outcomes. This is in response to the growing demands for deep learning in mission-critical and highly sophisticated decision making processes across industry verticals.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsReferences
Glorot, X., Bengio, Y.: Understanding the difficulty of training deep feedforward neural networks. DIRO, Université de Montréal, Montréal (2010)
Bengio, Y., LeCun, Y., Hinton, G.: Deep learning. Nature 521(7553), 436–444 (2015). https://doi.org/10.1038/nature14539. Bibcode:2015Natur.521..436L. PMID 26017442
Koushik, J., Hayashi, H.: Improving stochastic gradient descent with feedback. In: Conference Paper at ICLR 2017
Duchi, J., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12(Jul), 2121–2159 (2011)
Kim, D., Fessler, J.A.: Optimized first-order methods for smooth convex minimization. Math. Prog. 151, 8–107 (2016)
Walia, A.S.: Types of Optimization Algorithms used in Neural Networks and Ways to Optimize Gradient Descent (2017)
Shridhar, K.: A Beginners Guide to Deep Learning (2017)
Tran, D.T., Iosifidis, A., Gabbouj, M.: Improving efficiency in convolutional neural networks with multilinear filters. Neural Netw. 105, 328–339 (2018)
Li, J., Zhou, T., Wang, C.: On global convergence of gradient descent algorithms for generalized phase retrieval problem. J. Comput. Appl. Math. 329, 202–222 (2018)
Anandkumar, A.: Nonconvex optimization: challenges and recent successes. In: ICML 2016 Tutorial
Kingma, D., Ba, J.: Adam: a method for stochastic optimization. In: Published as a Conference Paper at ICLR 2015
Brownlee, J.: How to setup a python environment for machine learning and deep learning with anaconda. In: Python Machine Learning (2017)
Li, G., et al.: Training deep neural networks with discrete state transition. Neurocomputing 272, 154–162 (2018)
Schmidhuber, J.: Deep learning in neural networks: an overview. Neural Netw. 61, 85–117 (2015). arXiv:1404.7828 . https://doi.org/10.1016/j.neunet.2014.09.003. PMID 25462637
Sutskever, I.: Training recurrent neural networks (PDF). Ph.D., University of Toronto, p. 74 (2013)
Mei, S.: A mean field view of the landscape of two-layer neural networks. In: Proceedings of the National Academy of Sciences (2018)
Robbins, H., Monro, S.: For developing SGD in their 1951 article titled “A Stochastic Approximation Method” (1951)
Sutskever, I., Martens, J., Dahl, G., Hinton, G.E.: On the importance of initialization and momentum in deep learning’(PDF). In: Dasgupta, S., Mcallester, D. (eds.) Proceedings of the 30th International Conference on Machine Learning (ICML-13), Atlanta, GA, vol. 28, pp. 1139–1147. Accessed 14 Jan 2016
Rumelhart, D.E., Hinton, G.E., Williams, R.J.: Learning representations by back-propagating errors. Nature 323(6088), 533–536 (1986)
Nesterov (1983)
Zeiler, M.D.: Adadelta: an adaptive learning rate method. arXiv preprint arXiv:1212.5701 (2012)
Tieleman, T., Hinton, G.: Lecture 6.5-rmsprop: divide the gradient by a running average of its recent magnitude. COURSERA: Neural Netw. Mach. Learn. 4(2), 26–31 (2012)
Graves, A.: Generating Sequences with Recurrent Neural Networks (2014)
Yalçın, O.G.: Image Classification in 10 Minutes with MNIST Dataset (2018)
Torres, J.: Convolutional Neural Networks for Beginners. Practical Guide with Python and Keras (2018)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2019 Springer Nature Switzerland AG
About this paper
Cite this paper
Okewu, E., Adewole, P., Sennaike, O. (2019). Experimental Comparison of Stochastic Optimizers in Deep Learning. In: Misra, S., et al. Computational Science and Its Applications – ICCSA 2019. ICCSA 2019. Lecture Notes in Computer Science(), vol 11623. Springer, Cham. https://doi.org/10.1007/978-3-030-24308-1_55
Download citation
DOI: https://doi.org/10.1007/978-3-030-24308-1_55
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-24307-4
Online ISBN: 978-3-030-24308-1
eBook Packages: Computer ScienceComputer Science (R0)