Embedding Simulated Annealing within Stochastic Gradient Descent

Fischetti, Matteo; Stringher, Matteo

doi:10.1007/978-3-030-85672-4_1

Embedding Simulated Annealing within Stochastic Gradient Descent

Conference paper
First Online: 17 August 2021

770 Accesses
1 Citations

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 1443))

Abstract

We propose a new metaheuristic training scheme for Machine Learning that combines Stochastic Gradient Descent (SGD) and Discrete Optimization in an unconventional way. Our idea is to define a discrete neighborhood of the current SGD point containing a number of “potentially good moves” that exploit gradient information, and to search this neighborhood by using a classical metaheuristic scheme borrowed from Discrete Optimization. In the present paper we investigate the use of a simple Simulated Annealing (SA) metaheuristic that accepts/rejects a candidate new solution in the neighborhood with a probability that depends both on the new solution quality and on a parameter (the temperature) which is modified over time to lower the probability of accepting worsening moves.

Computational results on image classification (CIFAR-10) are reported, showing that the proposed approach leads to an improvement of the final validation accuracy for modern Deep Neural Networks such as ResNet34 and VGG16.

Work supported by MiUR, Italy (project PRIN). We gratefully acknowledge the support of NVIDIA Corporation with the donation of the Titan Xp GPU used for this research.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 79.99; Price excludes VAT (USA)

Softcover Book: USD 99.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

He, K., Zhang, X., Ren, S., Sun, J.: Deep Residual Learning for Image Recognition. arXiv e-prints, December 2015
Google Scholar
Kirkpatrick, S., Gelatt, C.D., Vecchi, M.P.: Optimization by simulated annealing. Science 220(4598), 671–80 (1983)
Article MathSciNet Google Scholar
Krizhevsky, A., Nair, V., Hinton, G.: CIFAR-10 (Canadian Institute for Advanced Research). http://www.cs.toronto.edu/~kriz/cifar.html
Ledesma, S., Torres, M., Hernández, D., Aviña, G., García, G.: Temperature cycling on simulated annealing for neural network learning. In: Gelbukh, A., Kuri Morales, Á.F. (eds.) MICAI 2007. LNCS (LNAI), vol. 4827, pp. 161–171. Springer, Heidelberg (2007). https://doi.org/10.1007/978-3-540-76631-5_16
Chapter Google Scholar
Metropolis, N.C., Rosenbluth, A.W., Rosenbluth, M.N., Teller, A.H., Teller, E.: Equation of state calculation by fast computing machines. J. Chem. Phys. 21, 1087–1092 (1953)
Article Google Scholar
Nesterov, Y.: A method of solving a convex programming problem with convergence rate O(1/sqr(k)). Soviet Math. Doklady 27, 372–376 (1983). http://www.core.ucl.ac.be/~nesterov/Research/Papers/DAN83.pdf
Sexton, R., Dorsey, R., Johnson, J.: Beyond backpropagation: using simulated annealing for training neural networks. J. End User Comput. 11 (1999)
Google Scholar
Simonyan, K., Zisserman, A.: Very Deep Convolutional Networks for Large-Scale Image Recognition. arXiv e-prints, September 2014
Google Scholar
Sutskever, I., Martens, J., Dahl, G., Hinton, G.: On the importance of initialization and momentum in deep learning. In: Proceedings of the 30th International Conference on International Conference on Machine Learning, ICML 2013, vol. 28, pp. III-1139–III-1147. JMLR.org (2013). http://dl.acm.org/citation.cfm?id=3042817.3043064
Xiao, H., Rasul, K., Vollgraf, R.: Fashion-MNIST: a novel image dataset for benchmarking machine learning algorithms (2017)
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Information Engineering, University of Padova, via Gradenigo 6/A, 35100, Padova, Italy
Matteo Fischetti & Matteo Stringher

Authors

Matteo Fischetti
View author publications
You can also search for this author in PubMed Google Scholar
Matteo Stringher
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Matteo Fischetti .

Editor information

Editors and Affiliations

University of Cádiz, Cádiz, Spain
Bernabé Dorronsoro
University of Technology of Troyes, Troyes, France
Lionel Amodeo
University of Catania, Catania, Italy
Mario Pavone
University of Cádiz, Cádiz, Spain
Patricia Ruiz

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Fischetti, M., Stringher, M. (2021). Embedding Simulated Annealing within Stochastic Gradient Descent. In: Dorronsoro, B., Amodeo, L., Pavone, M., Ruiz, P. (eds) Optimization and Learning. OLA 2021. Communications in Computer and Information Science, vol 1443. Springer, Cham. https://doi.org/10.1007/978-3-030-85672-4_1

Download citation

DOI: https://doi.org/10.1007/978-3-030-85672-4_1
Published: 17 August 2021
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-85671-7
Online ISBN: 978-3-030-85672-4
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics