Abstract
We propose a new metaheuristic training scheme for Machine Learning that combines Stochastic Gradient Descent (SGD) and Discrete Optimization in an unconventional way. Our idea is to define a discrete neighborhood of the current SGD point containing a number of “potentially good moves” that exploit gradient information, and to search this neighborhood by using a classical metaheuristic scheme borrowed from Discrete Optimization. In the present paper we investigate the use of a simple Simulated Annealing (SA) metaheuristic that accepts/rejects a candidate new solution in the neighborhood with a probability that depends both on the new solution quality and on a parameter (the temperature) which is modified over time to lower the probability of accepting worsening moves.
Computational results on image classification (CIFAR-10) are reported, showing that the proposed approach leads to an improvement of the final validation accuracy for modern Deep Neural Networks such as ResNet34 and VGG16.
Work supported by MiUR, Italy (project PRIN). We gratefully acknowledge the support of NVIDIA Corporation with the donation of the Titan Xp GPU used for this research.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsReferences
He, K., Zhang, X., Ren, S., Sun, J.: Deep Residual Learning for Image Recognition. arXiv e-prints, December 2015
Kirkpatrick, S., Gelatt, C.D., Vecchi, M.P.: Optimization by simulated annealing. Science 220(4598), 671–80 (1983)
Krizhevsky, A., Nair, V., Hinton, G.: CIFAR-10 (Canadian Institute for Advanced Research). http://www.cs.toronto.edu/~kriz/cifar.html
Ledesma, S., Torres, M., Hernández, D., Aviña, G., García, G.: Temperature cycling on simulated annealing for neural network learning. In: Gelbukh, A., Kuri Morales, Á.F. (eds.) MICAI 2007. LNCS (LNAI), vol. 4827, pp. 161–171. Springer, Heidelberg (2007). https://doi.org/10.1007/978-3-540-76631-5_16
Metropolis, N.C., Rosenbluth, A.W., Rosenbluth, M.N., Teller, A.H., Teller, E.: Equation of state calculation by fast computing machines. J. Chem. Phys. 21, 1087–1092 (1953)
Nesterov, Y.: A method of solving a convex programming problem with convergence rate O(1/sqr(k)). Soviet Math. Doklady 27, 372–376 (1983). http://www.core.ucl.ac.be/~nesterov/Research/Papers/DAN83.pdf
Sexton, R., Dorsey, R., Johnson, J.: Beyond backpropagation: using simulated annealing for training neural networks. J. End User Comput. 11 (1999)
Simonyan, K., Zisserman, A.: Very Deep Convolutional Networks for Large-Scale Image Recognition. arXiv e-prints, September 2014
Sutskever, I., Martens, J., Dahl, G., Hinton, G.: On the importance of initialization and momentum in deep learning. In: Proceedings of the 30th International Conference on International Conference on Machine Learning, ICML 2013, vol. 28, pp. III-1139–III-1147. JMLR.org (2013). http://dl.acm.org/citation.cfm?id=3042817.3043064
Xiao, H., Rasul, K., Vollgraf, R.: Fashion-MNIST: a novel image dataset for benchmarking machine learning algorithms (2017)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2021 Springer Nature Switzerland AG
About this paper
Cite this paper
Fischetti, M., Stringher, M. (2021). Embedding Simulated Annealing within Stochastic Gradient Descent. In: Dorronsoro, B., Amodeo, L., Pavone, M., Ruiz, P. (eds) Optimization and Learning. OLA 2021. Communications in Computer and Information Science, vol 1443. Springer, Cham. https://doi.org/10.1007/978-3-030-85672-4_1
Download citation
DOI: https://doi.org/10.1007/978-3-030-85672-4_1
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-85671-7
Online ISBN: 978-3-030-85672-4
eBook Packages: Computer ScienceComputer Science (R0)