A mini-batch stochastic conjugate gradient algorithm with variance reduction

Kou, Caixia; Yang, Han

doi:10.1007/s10898-022-01205-4

A mini-batch stochastic conjugate gradient algorithm with variance reduction

Published: 01 July 2022

Volume 87, pages 1009–1025, (2023)
Cite this article

Journal of Global Optimization Aims and scope Submit manuscript

802 Accesses
1 Citation
Explore all metrics

Abstract

Stochastic gradient descent method is popular for large scale optimization but has slow convergence asymptotically due to the inherent variance. To remedy this problem, there have been many explicit variance reduction methods for stochastic descent, such as SVRG Johnson and Zhang [Advances in neural information processing systems, (2013), pp. 315–323], SAG Roux et al. [Advances in neural information processing systems, (2012), pp. 2663–2671], SAGA Defazio et al. [Advances in neural information processing systems, (2014), pp. 1646–1654] and so on. Conjugate gradient method, which has the same computation cost with gradient descent method, is considered. In this paper, in the spirit of SAGA, we propose a stochastic conjugate gradient algorithm which we call SCGA. With the Fletcher and Reeves type choices, we prove a linear convergence rate for smooth and strongly convex functions. We experimentally demonstrate that SCGA converges faster than the popular SGD type algorithms for four machine learning models, which may be convex, nonconvex or nonsmooth. Solving regression problems, SCGA is competitive with CGVR, which is the only one stochastic conjugate gradient algorithm with variance reduction so far, as we know.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fundamentals of Artificial Neural Networks and Deep Learning

Bolstering stochastic gradient descent with model building

Article Open access 15 April 2024

PolieDRO: a novel classification and regression framework with non-parametric data-driven regularization

Article 15 April 2024

Notes

References

Krizhevsky, A., Sutskever, I., Hinton, G. E.: Imagenet classification with deep convolutional neural networks, In: Advances in neural information processing systems, pp. 1097–1105. (2012)
Dahl, G.E., Yu, D., Deng, L., Acero, A.: Context-dependent pre-trained deep neural networks for large-vocabulary speech recognition. IEEE Transactions on audio, speech, and language processing 20(1), 30–42 (2011)
Article Google Scholar
Hinton, G., Deng, L., Yu, D., Dahl, G.E., Mohamed, A.-R., Jaitly, N., Senior, A., Vanhoucke, V., Nguyen, P., Sainath, T.N., et al.: Deep neural networks for acoustic modeling in speech recognition: The shared views of four research groups. IEEE Signal processing magazine 29(6), 82–97 (2012)
Article Google Scholar
Collobert, R., Weston, J.: A unified architecture for natural language processing: Deep neural networks with multitask learning,” in Proceedings of the 25th international conference on Machine learning, pp. 160–167. (2008)
Dahl, G. E., Stokes, J. W., Deng, L., Yu, D.: Large-scale malware classification using random projections and neural networks, In 2013 IEEE International Conference on Acoustics, Speech and Signal Processing. IEEE, pp. 3422–3426. (2013)
Cauchy, A.: Méthode générale pour la résolution des systemes d’équations simultanées. Comp. Rend. Sci. Paris. 25(1847), 536–538 (1847)
Google Scholar
Robbins, H., Monro, S.: A stochastic approximation method, The annals of mathematical statistics, pp. 400–407, (1951)
Bottou, L.: Large-scale machine learning with stochastic gradient descent, Proc. COMPSTAT, pp. 177–186, (2010)
Goodfellow, I., Bengio, Y., Courville, A., Bengio, Y.: Deep learning. MIT press Cambridge, 1, (2016)
Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1–17 (1964)
Article Google Scholar
Nesterov, Y.: A method of solving a convex programming problem with convergence rate o (1/k2), In Soviet Mathematics Doklady, (1983)
Qian, N.: On the momentum term in gradient descent learning algorithms. Neural Netw 12(1), 145–151 (1999)
Article Google Scholar
Duchi, J., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Machine Learning Research, 12(7), (2011)
Zeiler, M. D.: Adadelta: an adaptive learning rate method, arXiv preprint arXiv:1212.5701, (2012)
Tieleman, T., Hinton, G.: Lecture 6.5-rmsprop: Divide the gradient by a running average of its recent magnitude,. COURSERA: Neural networks for machine learning 4(2), 26–31 (2012)
Google Scholar
Kingma, D., Ba, J.: Adam: A method for stochastic optimization, Computer ence, (2014)
Hager, W.W., Zhang, H.: A survey of nonlinear conjugate gradient methods. Pacific Journal of Optimization, 2(1), 35–58 (2006)
Roux, N. L., Schmidt, M., Bach, F. R.: A stochastic gradient method with an exponential convergence rate for finite training sets, in Advances in neural information processing systems, pp. 2663–2671, (2012)
Johnson, R., Zhang, T.: Accelerating stochastic gradient descent using predictive variance reduction, In Advances in neural information processing systems, pp. 315–323. (2013)
Defazio, A., Bach, F., Lacoste-Julien, S.: Saga: A fast incremental gradient method with support for non-strongly convex composite objectives, in Advances in neural information processing systems, pp. 1646–1654. (2014)
Nguyen, L. M., Liu, J., Scheinberg, K., Taká, M.: Sarah: A novel method for machine learning problems using stochastic recursive gradient, (2017)
Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM J. Optim. 2, 21–42 (1992)
Article MathSciNet MATH Google Scholar
Nocedal, J., Wright, S.: Numerical optimization. Springer Science & Business Media, (2006)
Fletcher, R., Reeves, C.M.: Function minimization by conjugate gradients. The computer journal 7(2), 149–154 (1964)
Article MathSciNet MATH Google Scholar
Polak, E., Ribiere, G.: Note sur la convergence de méthodes de directions conjuguées,” ESAIM: Mathematical Modelling and Numerical Analysis-Modélisation Mathématique et Analyse Numérique, 3(R1), pp. 35–43, (1969)
Polyak, B.T.: The conjugate gradient method in extreme problem. USSR Comp. Math. Math. Phys. 9(4), 94–112 (1969)
Article Google Scholar
Hestenes, M.R., Stiefel, E.: Methods of conjugate gradients for solving. Journal of research of the National Bureau of Standards 49(6), 409 (1952)
Dai, Y.H., Yuan, Y.: A nonlinear conjugate gradient method with a strong global convergence property. Siam Journal on Optimization 10(1), 177–182 (1999)
Article MathSciNet MATH Google Scholar
Hager, W.W., Zhang, H.: A new conjugate gradient method with guaranteed descent and an efficient line search. SIAM Journal on Optimization 16(1), 170–192 (2005)
Article MathSciNet MATH Google Scholar
Dai, Y.H., Kou, C.X.: A nonlinear conjugate gradient algorithm with an optimal property and an improved wolfe line search. Siam J Optim 23(1), 296–320 (2013)
Article MathSciNet MATH Google Scholar
Dai, Y.H., Yuan, Y.: Nonlinear conjugate gradient methods. Shanghai Science and Technology Publisher, (2000)
Møller, M.F.: A scaled conjugate gradient algorithm for fast supervised learning. Neural networks 6(4), 525–533 (1993)
Article Google Scholar
Le, Q. V., Ngiam, J., Coates, A., Lahiri, A., Prochnow, B., Ng, A. Y.: On optimization methods for deep learning, In ICML, (2011)
Moritz, P., Nishihara, R., Jordan, M. I.: A linearly convergent stochastic l-bfgs algorithm, Mathematics, (2015)
Jin, X.B., Zhang, X.Y., Huang, K., Geng, G.G.: Stochastic conjugate gradient algorithm with variance reduction. IEEE transactions on neural networks and learning systems 30(5), 1360–1369 (2018)

Download references

Acknowledgements

We would like to thank the anonymous referees for their helpful comments. We also would like to thank professor Dai, Y. H. for the valuable suggestions. This work was supported by the Chinese NSF grants (Nos. 11971073, 12171052 and 11871115).

Author information

Authors and Affiliations

School of Science, Beijing University of Posts and Telecommunications, No.10, XiTuCheng Road, Beijing, 100876, China
Caixia Kou & Han Yang

Authors

Caixia Kou
View author publications
You can also search for this author in PubMed Google Scholar
Han Yang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Caixia Kou.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Kou, C., Yang, H. A mini-batch stochastic conjugate gradient algorithm with variance reduction. J Glob Optim 87, 1009–1025 (2023). https://doi.org/10.1007/s10898-022-01205-4

Download citation

Received: 01 September 2021
Accepted: 16 June 2022
Published: 01 July 2022
Issue Date: November 2023
DOI: https://doi.org/10.1007/s10898-022-01205-4

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A mini-batch stochastic conjugate gradient algorithm with variance reduction

Abstract

Access this article

Similar content being viewed by others

Fundamentals of Artificial Neural Networks and Deep Learning

Bolstering stochastic gradient descent with model building

PolieDRO: a novel classification and regression framework with non-parametric data-driven regularization

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

A mini-batch stochastic conjugate gradient algorithm with variance reduction

Abstract

Access this article

Similar content being viewed by others

Fundamentals of Artificial Neural Networks and Deep Learning

Bolstering stochastic gradient descent with model building

PolieDRO: a novel classification and regression framework with non-parametric data-driven regularization

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation