Stochastic perturbation of subgradient algorithm for nonconvex deep neural networks

El Mouatasim, A.; de Cursi, J. E. Souza; Ellaia, R.

doi:10.1007/s40314-023-02307-9

Stochastic perturbation of subgradient algorithm for nonconvex deep neural networks

Published: 01 May 2023

Volume 42, article number 167, (2023)
Cite this article

Computational and Applied Mathematics Aims and scope Submit manuscript

192 Accesses
1 Citation
Explore all metrics

Abstract

Choosing a learning rate is a necessary part of any subgradient method optimization. With deeper models such as convolutional neural networks of image classification, fine-tuning the learning rate can quickly become tedious, and it does not always result in optimal convergence. In this work, we suggest a variation of the subgradient method in which the learning rate is updated by a control step in each iteration of each epoch. Stochastic Perturbation Subgradient Algorithm (SPSA) is our approach for tackling image classification issues with deep neural networks including convolutional neural networks. Used MNIST dataset, the numerical results reveal that our SPSA method is faster than Stochastic Gradient Descent and its variants with a fixed learning rate. However SPSA and convolutional neural network model improve the results of image classification including loss and accuracy.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Automated machine learning: past, present and future

Article Open access 18 April 2024

A survey of the recent architectures of deep convolutional neural networks

Article 21 April 2020

Visualizing and Understanding Convolutional Networks

Notes

CNN as introduced in LeCun et al. (1989) make use of weight sharing as introduced in Sect. 4 which reduces the complexity and size of the network and allows to train deep architectures.
Usually this includes gradient descent (Singh et al. 2015; Tuyen and Hang-Tuan 2021) optimization as discussed in Sect. 5 as well as error backpropagation as introduced in Sect. 3 to evaluate the gradient of a chosen loss function.
The multilayer perceptron is discussed in detail in Sect. 3.
A directed graph is an ordered pair \(G = (V,E)\), where V is a set of nodes and E is a set of edges linking the nodes in its most general form: Within the graph, \((u,v) \in E\) denotes the presence of a directed edge from node u to node v. Given two units u and v in a network graph, a directed edge from u to v indicates that the output of unit u is used as input by unit v.
A K-layer perceptron, on the other hand, is made up of \((K+1)\) layers, including the input layer. The input layer remains uncounted (or is numbered to zero), since it does not perform processing: the input units compute the identity (Bishop 1995, 2006).
The objective is to assign \(\textbf{x}\) to one among \(\eta _K\) discrete classes, using the outputs \(\textbf{h}^{(K)}\) (Bishop 2006; Stutz 2014).
A one-hot vector v is then a binary vector with a single non-zero component, which takes the value 1.
Weight decay is a term used to describe the \(\ell _2\)-regularization; see Bishop (1995) for more information.
For \(p = 1\), the norm \(\Vert \cdot \Vert _1\) is defined as \(\Vert \textbf{w}\Vert _1 =\sum _{\ell = 1}^{K} \sum _{i=1}^{\eta _\ell }\sum _{j=1}^{\eta _{\ell -1}}{\vert w_{ij}^{(\ell )} \vert }\).
By averaging the predictions of different models, model averaging attempts to reduce inaccuracy (Hinton et al. 2012).

References

Bagirov AM, Jin L, Karmitsa N, Al Nuaimat A, Sultanova N (2013) Subgradient method for nonconvex nonsmooth optimization. J Optim Theory Appl 157:416–435
Article MathSciNet MATH Google Scholar
Bengio Y (2009) Learning deep architectures for AI. Found Trends Mach Learn 2(1):1–127
Article MathSciNet MATH Google Scholar
Bishop C (1995) Neural networks for pattern recognition. Clarendon Press, Oxford
MATH Google Scholar
Bishop C (2006) Pattern recognition and machine learning. Springer, New York
MATH Google Scholar
Bojarski M, Del Testa D, Dworakowski D, Firner B, Flepp B, Goyal P, Jackel LD, Monfort M, Muller U, Zhang J et al (2016) End to end learning for self-driving cars. arXiv preprint arXiv:1604.07316
Botev A, Lever G, Barber D (2017) Nesterov’s accelerated gradient and momentum as approximations to regularised update descent. In: Neural networks (IJCNN) 2017 international joint conference on, pp 1899–1903
Ciresan DC, Meier U, Schmidhuber J (2012) Multi-column deep neural networks for image classification. Comput Res Repos. arXiv:abs/1202.2745
Cui Y, He Z, Pang J (2020) Multicomposite nonconvex optimization for training deep neural networks. SIAM J Optim 30(2):1693–1723
Article MathSciNet MATH Google Scholar
Dem’vanov VF, Vasil’ev LV (1985) Nondifferentiable optimization. Optimization Software, Inc., Publications Division, New York
Book Google Scholar
Duchi JC, Hazan E, Singer Y (2011) Adaptive subgradient methods for online learning and stochastic optimization. J Mach Learn Res 12:2121–2159
MathSciNet MATH Google Scholar
Duda R, Hart P, Stork D (2001) Pattern classification. Wiley, New York
MATH Google Scholar
El Jaafari I, Ellahyani A, Charfi S (2021) Parametric rectified nonlinear unit (PRenu) for convolution neural networks. J Signal Image Video Process (SIViP) 15:241–246
Article Google Scholar
El Mouatasim A (2018) Implementation of reduced gradient with bisection algorithms for non-convex optimization problem via stochastic perturbation. J Numer Algorithms 78(1):41–62
Article MathSciNet MATH Google Scholar
El Mouatasim A (2019) Control proximal gradient algorithm for \(\ell _1\) regularization image. J Signal Image Video Process (SIViP) 13(6):1113–1121
Article Google Scholar
El Mouatasim A (2020) Fast gradient descent algorithm for image classification with neural networks. J Signal Image Video Process (SIViP) 14:1565–1572
Article Google Scholar
El Mouatasim A, Wakrim M (2015) Control subgradient algorithm for image regularization. J Signal Image Video Process (SIViP) 9:275–283
Article Google Scholar
El Mouatasim A, Ellaia R, Souza de Cursi JE (2006) Random perturbation of variable metric method for unconstraint nonsmooth nonconvex optimization. Appl Math Comput Sci 16(4):463–474
MathSciNet MATH Google Scholar
El Mouatasim A, Ellaia R, Souza de Cursi JE (2011) Projected variable metric method for linear constrained nonsmooth global optimization via perturbation stochastic. Int J Appl Math Comput Sci 21(2):317–329
Article MathSciNet MATH Google Scholar
El Mouatasim A, Ellaia R, Souza de Cursi JE (2014) Stochastic perturbation of reduced gradient & GRG methods for nonconvex programming problems. J Appl Math Comput 226:198–211
Article MathSciNet MATH Google Scholar
Feng J, Lu S (2019) Performance analysis of various activation functions in artificial neural networks. J Phys Conf Ser. https://doi.org/10.1088/1742-6596/1237/2/022030
Article Google Scholar
Glorot X, Bengio Y (2010) Understanding the difficulty of training deep feedforward neural networks. In: International conference on artificial intelligence and statistics, pp 249–256
Haykin S (2005) Neural networks a comprehensive foundation. Pearson Education, New Delhi
MATH Google Scholar
Hinton GE, Srivastava N, Krizhevsky A, Sutskever I, Salakhutdinov R (2012) Improving neural networks by preventing co-adaptation of feature detectors. Comput Res Repos. arXiv:abs/1207.0580
Huang K, Hussain A, Wang Q, Zhang R (2019) Deep learning: fundamentals, theory and applications. Springer, Berlin
Book Google Scholar
Jarrett K, Kavukcuogl K, Ranzato M, LeCun Y (2009) What is the best multi-stage architecture for object recognition? In: International conference on computer vision, pp 2146–2153
Josef S (2022) A few samples from the MNIST test dataset. https://commons.wikimedia.org/wiki/File:MnistExamples.png. Accessed 12 Dec. Under Creative Commons Attribution-ShareAlike 4.0 International License
Khalij L, de Cursi ES (2021) Uncertainty quantification in data fitting neural and Hilbert networks. In: Proceedings of the 5th international symposium on uncertainty quantification and stochastic modelling, pp 222–241. https://doi.org/10.1007/978-3-030-53669-5_17
Kingma DP, Ba JL (2015) Adam: a method for stochastic optimization. In: Proceedings of the 3rd international conference on learning representations, San Diego, CA
Konstantin E, Johannes S (2019) A comparison of deep networks with ReLU activation function and linear spline-type methods. Neural Netw 110:232–242
Article Google Scholar
Krizhevsky A, Sutskever I, Hinton GE (2012) ImageNet classification with deep convolutional neural networks. Adv Neural Inf Process Syst 60:1097–1105
Google Scholar
Kutyniok G (2022) The mathematics of artificial intelligence. arXiv preprint arXiv:2203.08890
LeCun Y (1989) Generalization and network design strategies. Connect Perspect 19:143–155
Google Scholar
LeCun Y, Cortes C (2010) MNIST handwritten digit database
LeCun Y, Boser B, Denker JS, Henderson D, Howard RE, Hubbard W, Jackel LD (1989) Backpropagation applied to handwritten zip code recognition. Neural Comput 1(4):541–551
Article Google Scholar
LeCun Y, Kavukvuoglu K, Farabet C (2010) Convolutional networks and applications in vision. In: International symposium on circuits and systems, vol 5, pp 253–256
Liu Z, Liu H (2019) An efficient gradient method with approximately optimal stepsize based on tensor model for unconstrained optimization. J Optim Theory Appl 181:608–633
Article MathSciNet MATH Google Scholar
Li J, Yang X (2020) A cyclical learning rate method in deep learning training. In: International conference on computer, information and telecommunication systems (CITS), pp 1–5
Minsky ML (1954) Theory of neural-analog reinforcement systems and its application to the brain-model problem. Ph.D. dissertation, Princeton University
Nakamura K, Derbel B, Won K-J, Hong B-W (2021) Learning-rate annealing methods for deep neural networks. Electronics 10:2029
Article Google Scholar
Neutelings I (2022) Graphics with TikZ in LaTeX. Neural networks. https://tikz.net/neura_networks. Accessed 12 Dec. Under Creative Commons Attribution-ShareAlike 4.0 International License
Pelletier C, Webb GI, Petitjean F (2019) Temporal convolutional neural network for the classification of satellite image time series. Remote Sens 11(5):523
Article Google Scholar
Pogu M, Souza de Cursi JE (1994) Global optimization by random perturbation of the gradient method with a fixed parameter. J Global Optim 5:159–180
Article MathSciNet MATH Google Scholar
Rosenblatt F (1958) The perceptron: a probabilistic model for information storage and organization in the brain. Psychol Rev 65(6):386–408. https://doi.org/10.1037/h0042519
Article Google Scholar
Singh BK, Verma K, Thoke AS (2015) Adaptive gradient descent backpropagation for classification of breast tumors in ultrasound imaging. Procedia Comput Sci 46:1601–1609
Article Google Scholar
Stutz D (2014) Understanding convolutional neural networks. Seminar report, Fakultät für Mathematik, Informatik und Naturwissenschaften
Szandała T (2021) Review and comparison of commonly used activation functions for deep neural networks. In: Bhoi A, Mallick P, Liu CM, Balas V (eds) Bio-inspired neurocomputing. Studies in computational intelligence, vol 903. Springer, Singapore. https://doi.org/10.1007/978-981-15-5495-7_11
Chapter Google Scholar
Tuyen TT, Hang-Tuan N (2021) Backtracking gradient descent method and some applications in large scale optimisation. Part 2. Appl Math Optim 84:2557–2586
Article MathSciNet MATH Google Scholar
Uryas’ev SP (1991) New variable-metric algorithms for nondifferentiable optimization problems. J Optim Theory Appl 71(2):359–388
Article MathSciNet MATH Google Scholar
Wójcik B, Maziarka L, Tabor J (2018) Automatic learning rate in gradient descent. Schedae Inf 27:47–57
Article Google Scholar
Xinhua L, Qian Y (2015) Face recognition based on deep neural network. Int J Signal Process Image Process Pattern Recogn 8(10):29–38
Google Scholar
Zeiler MD, Fergus R (2013) Visualizing and understanding convolutional networks. Comput Res Repos. arXiv:abs/1311.2901

Download references

Acknowledgements

We are indebted to the anonymous Reviewers and Editors for their many helpful recommendations and insightful remarks that helped us improve the original article.

Author information

Authors and Affiliations

Mathematical and Management Department, Ibnou Zohr University, FPO, Ouarzazate, Morocco
A. El Mouatasim
Laboratoire de Mécanique de Normandie, Normandie Université, INSA Rouen Normandie, 685, Avenue de l’Université Saint-Etienne du Rouvray, Rouen, France
J. E. Souza de Cursi
LERMA, Mohammadia School of Engineers, Mohammed V University in Rabat, Avenue Ibn Sina BP765, Agdal, Rabat, Morocco
R. Ellaia

Authors

A. El Mouatasim
View author publications
You can also search for this author in PubMed Google Scholar
J. E. Souza de Cursi
View author publications
You can also search for this author in PubMed Google Scholar
R. Ellaia
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to A. El Mouatasim.

Ethics declarations

Conflict of interest

The authors have no conflicts of interest to declare. All co-authors have seen and agree with the contents of the manuscript “Stochastic Perturbation of Subgradient Algorithm for Nonconvex Deep Neural Networks” and there is no financial interest to report. We certify that the submission is original work and is not under review at any other publication.

Additional information

Communicated by Antonio José Silva Neto.

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

El Mouatasim, A., de Cursi, J.E.S. & Ellaia, R. Stochastic perturbation of subgradient algorithm for nonconvex deep neural networks. Comp. Appl. Math. 42, 167 (2023). https://doi.org/10.1007/s40314-023-02307-9

Download citation

Received: 19 January 2023
Revised: 19 January 2023
Accepted: 15 April 2023
Published: 01 May 2023
DOI: https://doi.org/10.1007/s40314-023-02307-9

Keywords

Mathematics Subject Classification

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Stochastic perturbation of subgradient algorithm for nonconvex deep neural networks

Abstract

Access this article

Similar content being viewed by others

Automated machine learning: past, present and future

A survey of the recent architectures of deep convolutional neural networks

Visualizing and Understanding Convolutional Networks

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Mathematics Subject Classification

Navigation

Stochastic perturbation of subgradient algorithm for nonconvex deep neural networks

Abstract

Access this article

Similar content being viewed by others

Automated machine learning: past, present and future

A survey of the recent architectures of deep convolutional neural networks

Visualizing and Understanding Convolutional Networks

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Mathematics Subject Classification

Search

Navigation