Long-term temporal averaging for stochastic optimization of deep neural networks

Passalis, Nikolaos; Tefas, Anastasios

doi:10.1007/s00521-018-3712-x

Long-term temporal averaging for stochastic optimization of deep neural networks

S.I. : EANN 2017
Published: 19 September 2018

Volume 31, pages 1733–1745, (2019)
Cite this article

Neural Computing and Applications Aims and scope Submit manuscript

455 Accesses
2 Citations
Explore all metrics

Abstract

Deep learning models are capable of successfully tackling several difficult tasks. However, training deep neural models is not always a straightforward task due to several well-known issues, such as the problems of vanishing and exploding gradients. Furthermore, the stochastic nature of most of the used optimization techniques inevitably leads to instabilities during the training process, even when state-of-the-art stochastic optimization techniques are used. In this work, we propose an advanced temporal averaging technique that is capable of stabilizing the convergence of stochastic optimization for neural network training. Six different datasets and evaluation setups are used to extensively evaluate the proposed method and demonstrate the performance benefits. The more stable convergence of the algorithm also reduces the risk of stopping the training process when a bad descent step was taken and the learning rate was not appropriately set.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1

Fig. 4

Fig. 6

Fig. 10

An Enhanced Stochastic Gradient Descent Variance Reduced Ascension Optimization Algorithm for Deep Neural Networks

Deterministic Neural Networks Optimization from a Continuous and Energy Point of View

Article 24 May 2023

Experimental Comparison of Stochastic Optimizers in Deep Learning

References

Russakovsky O, Deng J, Su H, Krause J, Satheesh S, Ma S, Huang Z, Karpathy A, Khosla A, Bernstein M, Berg AC, Fei-Fei L (2015) ImageNet large scale visual recognition challenge. Int J Comput Vis 115(3):211–252. https://doi.org/10.1007/s11263-015-0816-y
Article MathSciNet Google Scholar
Liu W, Anguelov D, Erhan D, Szegedy C, Reed S, Fu CY, Berg AC (2016) SSD: single shot multibox detector. In: Proceedings of the European conference on computer vision, pp 21–37 (2016)
Redmon J, Divvala S, Girshick R, Farhadi A (2016) You only look once: unified, real-time object detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 779–788
Van den Oord A, Dieleman S, Zen H, Simonyan K, Vinyals O, Graves A, Kalchbrenner N, Senior A, Kavukcuoglu K (2016) Wavenet: a generative model for raw audio. In: 9th ISCA speech synthesis workshop, pp 125–125
LeCun Y, Bengio Y, Hinton G (2015) Deep learning. Nature 521(7553):436–444
Article Google Scholar
Passalis N, Tefas A (2017) Concept detection and face pose estimation using lightweight convolutional neural networks for steering drone video shooting. In: Proceedings of the European signal processing conference, pp 71–75
Smolyanskiy N, Kamenev A, Smith J, Birchfield S (2017) Toward low-flying autonomous MAV trail navigation using deep neural networks for environmental awareness. arXiv preprint arXiv:1705.02550
Arunkumar R, Karthigaikumar P (2017) Multi-retinal disease classification by reduced deep learning features. Neural Comput Appl 28(2):329–334
Article Google Scholar
Jiang F, Grigorev A, Rho S, Tian Z, Fu Y, Jifara W, Adil K, Liu S (2018) Medical image semantic segmentation based on deep learning. Neural Comput Appl. https://doi.org/10.1007/s00521-017-3158-6
Google Scholar
Liu Y, Gadepalli K, Norouzi M, Dahl GE, Kohlberger T, Boyko A, Venugopalan S, Timofeev A, Nelson PQ, Corrado GS et al (2017) Detecting cancer metastases on gigapixel pathology images. arXiv preprint arXiv:1703.02442
Wang X, Guo Y, Wang Y, Yu J (2017) Automatic breast tumor detection in ABVS images based on convolutional neural network and superpixel patterns. Neural Comput Appl. https://doi.org/10.1007/s00521-017-3138-x
Google Scholar
Yuxin D, Siyi Z (2017) Malware detection based on deep learning algorithm. Neural Comput Appl. https://doi.org/10.1007/s00521-017-3077-6
Google Scholar
Bengio Y, Simard P, Frasconi P (1994) Learning long-term dependencies with gradient descent is difficult. IEEE Trans Neural Netw 5(2):157–166
Article Google Scholar
Sutskever I, Martens J, Dahl G, Hinton G (2013) On the importance of initialization and momentum in deep learning. In: Proceedings of the international conference on machine learning, pp 1139–1147
Farzad A, Mashayekhi H, Hassanpour H (2017) A comparative performance analysis of different activation functions in LSTM networks for classification. Neural Comput Appl. https://doi.org/10.1007/s00521-017-3210-6
Google Scholar
He K, Zhang X, Ren S, Sun J (2015) Delving deep into rectifiers: surpassing human-level performance on ImageNet classification. In: Proceedings of the IEEE international conference on computer vision, pp 1026–1034
He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 770–778
Ioffe S, Szegedy C (2015) Batch normalization: accelerating deep network training by reducing internal covariate shift. In: Proceedings of the international conference on machine learning, pp 448–456
Kingma D, Ba J (2015) Adam: a method for stochastic optimization. arXiv preprint arXiv:1412.6980
Srivastava RK, Greff K, Schmidhuber J (2015) Highway networks. arXiv preprint arXiv:1505.00387
Moulines E, Bach FR (2011) Non-asymptotic analysis of stochastic approximation algorithms for machine learning. In: Proceedings of the advances in neural information processing systems, pp 451–459 (2011)
Polyak BT, Juditsky AB (1992) Acceleration of stochastic approximation by averaging. SIAM J Control Optim 30(4):838–855
Article MathSciNet MATH Google Scholar
Ruppert D (1988) Efficient estimations from a slowly convergent Robbins–Monro process. Technical report, Cornell University Operations Research and Industrial Engineering
Passalis N, Tefas A (2017) Improving face pose estimation using long-term temporal averaging for stochastic optimization. In: Proceedings of the international conference on engineering applications of neural networks, pp 194–204
Duchi J, Hazan E, Singer Y (2011) Adaptive subgradient methods for online learning and stochastic optimization. J Mach Learn Res 12:2121–2159
MathSciNet MATH Google Scholar
Zeiler MD (2012) ADADELTA: an adaptive learning rate method. arXiv preprint arXiv:1212.5701
Van Hasselt H, Guez A, Silver D (2016) Deep reinforcement learning with double Q-learning. In: Proceedings of the AAAI conference on artificial intelligence
Anschel O, Baram N, Shimkin N (2017) Averaged-DQN: variance reduction and stabilization for deep reinforcement learning. In: Proceedings of the international conference on machine learning
Haykin S, Network N (2004) A comprehensive foundation. Neural Netw 2(2004):41
Google Scholar
Passalis N, Tefas A, Pitas I (2018) Efficient camera control using 2D visual information for unmanned aerial vehicle-based cinematography. In: Proceedings of the IEEE international symposium on circuits and systems, pp 1–5
Nousi P, Tefas A (2017) Discriminatively trained autoencoders for fast and accurate face recognition. In: Proceedings of the international conference on engineering applications of neural networks, pp 205–215
Nousi P, Tefas A (2018) Self-supervised autoencoders for clustering and classification. Evol Syst. https://doi.org/10.1007/s12530-018-9235-y
Google Scholar
Mademlis I et al (2018) Challenges in autonomous UAV cinematography: an overview. In: Proceedings of the IEEE international conference on multimedia and expo
Chollet F et al (2015) Keras. https://keras.io. Accessed 17 Sept 2018
LeCun Y, et al (1998) Gradient-based learning applied to document recognition. Proc IEEE 86(11):2278–2324
Article Google Scholar
Haykin SS, Haykin SS, Haykin SS, Haykin SS (2009) Neural networks and learning machines, vol 3. Pearson, Upper Saddle River
MATH Google Scholar
Srivastava N, Hinton GE, Krizhevsky A, Sutskever I, Salakhutdinov R (2014) Dropout: a simple way to prevent neural networks from overfitting. J Mach Learn Res 15(1):1929–1958
MathSciNet MATH Google Scholar
Krizhevsky A, Hinton G (2009) Learning multiple layers of features from tiny images. Technical report, University of Toronto
Torralba A, Fergus R, Freeman WT (2008) 80 million tiny images: a large data set for nonparametric object and scene recognition. IEEE Trans Pattern Anal Mach Intell 30(11):1958–1970
Article Google Scholar
Xiao H, Rasul K, Vollgraf R (2017) Fashion-mnist: a novel image dataset for benchmarking machine learning algorithms. arXiv:1708.07747
Gourier N, Hall D, Crowley JL (2004) Estimating face orientation from robust detection of salient facial structures. In: FG NET workshop on visual observation of deictic gestures
Chollet F et al (2015) Keras. https://github.com/fchollet/keras. Accessed 17 Sept 2018
Koestinger M, Wohlhart P, Roth PM, Bischof H (2011) Annotated facial landmarks in the wild: a large-scale, real-world database for facial landmark localization. In: First IEEE international workshop on benchmarking facial image analysis technologies
Maas AL, Daly RE, Pham PT, Huang D, Ng AY, Potts C (2011) Learning word vectors for sentiment analysis. In: Proceedings of the annual meeting of the Association for Computational Linguistics: human language technologies pp 142–150
Hochreiter S, Schmidhuber J (1997) Long short-term memory. Neural Comput 9(8):1735–1780
Article Google Scholar
Gibbons JD, Chakraborti S (2011) Nonparametric statistical inference. Springer, Berlin
Book MATH Google Scholar
Jones E, Oliphant E, Peterson P et al (2001) SciPy: open source scientific tools for Python. http://www.scipy.org/. Accessed 31 July 2018

Download references

Acknowledgements

The research leading to these results has been partially funded from the European Union’s Horizon 2020 research and innovation program under Grant Agreement No. 731667 (MULTIDRONE). This publication reflects the authors’ views only. The European Commission is not responsible for any use that may be made of the information it contains. The authors would like to thank the anonymous reviewers for their helpful and constructive comments that greatly contributed to improving the final version of this manuscript.

Author information

Authors and Affiliations

Department of Informatics, Aristotle University of Thessaloniki, 54124, Thessaloníki, Greece
Nikolaos Passalis & Anastasios Tefas

Authors

Nikolaos Passalis
View author publications
You can also search for this author in PubMed Google Scholar
Anastasios Tefas
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Nikolaos Passalis.

Ethics declarations

Conflict of interest

The authors declare that they have no conflicts of interest.

Ethical approval

This article does not contain any studies with human participants or animals performed by any of the authors.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Passalis, N., Tefas, A. Long-term temporal averaging for stochastic optimization of deep neural networks. Neural Comput & Applic 31, 1733–1745 (2019). https://doi.org/10.1007/s00521-018-3712-x

Download citation

Received: 15 January 2018
Accepted: 10 September 2018
Published: 19 September 2018
Issue Date: 01 June 2019
DOI: https://doi.org/10.1007/s00521-018-3712-x

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Long-term temporal averaging for stochastic optimization of deep neural networks

Abstract

Access this article

Similar content being viewed by others

An Enhanced Stochastic Gradient Descent Variance Reduced Ascension Optimization Algorithm for Deep Neural Networks

Deterministic Neural Networks Optimization from a Continuous and Energy Point of View

Experimental Comparison of Stochastic Optimizers in Deep Learning

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Ethical approval

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Long-term temporal averaging for stochastic optimization of deep neural networks

Abstract

Access this article

Similar content being viewed by others

An Enhanced Stochastic Gradient Descent Variance Reduced Ascension Optimization Algorithm for Deep Neural Networks

Deterministic Neural Networks Optimization from a Continuous and Energy Point of View

Experimental Comparison of Stochastic Optimizers in Deep Learning

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Ethical approval

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation