Improving the stochastically controlled stochastic gradient method by the bandwidth-based stepsize

Liu, Chenchen; Huang, Yakui; Wang, Dan

doi:10.1007/s10589-025-00651-6

Improving the stochastically controlled stochastic gradient method by the bandwidth-based stepsize

Published: 23 January 2025

Volume 90, pages 941–968, (2025)
Cite this article

Computational Optimization and Applications Aims and scope Submit manuscript

133 Accesses
Explore all metrics

Abstract

Stepsize plays an important role in the stochastic gradient method. The bandwidth-based stepsize allows us to adjust the stepsize within a banded region determined by some boundary functions. Based on the bandwidth-based stepsize, we propose a new method, namely SCSG-BD, for smooth non-convex finite-sum optimization problems. For the boundary functions 1/t, $1/(t\log (t + 1))$ and $1/t^p$ ($p\in (0,1)$), SCSG-BD converges sublinearly to a stationary point at a faster rate than the stochastically controlled stochastic gradient (SCSG) method under certain conditions. Moreover, SCSG-BD is able to converge linearly to the solution if the objective function satisfies the Polyak–Łojasiewicz condition. We also introduce the 1/t-Barzilai–Borwein stepsize for practical computation. Numerical experiments demonstrate that SCSG-BD performs better than SCSG and its variants.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

SPIRAL: a superlinearly convergent incremental proximal algorithm for nonconvex finite sum minimization

Article 29 March 2024

Gradient-free methods for non-smooth convex stochastic optimization with heavy-tailed noise on convex compact

Article 28 August 2023

On the linear convergence of the stochastic gradient method with constant step-size

Article 25 September 2018

Data Availability

The data that support the finding of this study are available from the corresponding author upon reasonable request.

Notes

available on https://www.csie.ntu.edu.tw/~cjlin/libsvmtools/datasets/,

References

Barzilai, J., Borwein, J.M.: Two-point step size gradient methods. IMA J. Numer. Anal. 8(1), 141–148 (1988)
Article MathSciNet MATH Google Scholar
Bottou, L.: Large-scale machine learning with stochastic gradient descent. In: Proceedings of COMPSTAT’2010: 19th International Conference on Computational Statistics, pp. 177–186 (2010)
Bottou, L., Curtis, F.E., Nocedal, J.: Optimization methods for large-scale machine learning. SIAM Rev. 60(2), 223–311 (2018)
Article MathSciNet MATH Google Scholar
Dai, Y.H., Huang, Y.K., Liu, X.W.: A family of spectral gradient methods for optimization. Comput. Optim. Appl. 74, 43–65 (2019)
Article MathSciNet MATH Google Scholar
Ghadimi, S., Lan, G.: Stochastic first-and zeroth-order methods for nonconvex stochastic programming. SIAM J. Optim. 23(4), 2341–2368 (2013)
Article MathSciNet MATH Google Scholar
Gower, R.M., Loizou, N., Qian, X., et al.: SGD: General analysis and improved rates. In: International Conference on Machine Learning, pp. 5200–5209 (2019)
Hastie, T., Tibshirani, R., Friedman, J.H., et al.: The Elements of Statistical Learning: Data Mining, Inference, and Prediction. Springer, New York (2009)
Book MATH Google Scholar
Hazan, E., Kale, S.: Beyond the regret minimization barrier: optimal algorithms for stochastic strongly-convex optimization. J. Mach. Learn. Res. 15(1), 2489–2512 (2014)
MathSciNet MATH Google Scholar
Horváth, S., Lei, L.H., Richtárik, P., et al.: Adaptivity of stochastic gradient methods for nonconvex optimization. SIAM J. Math. Data Sci. 4(2), 634–648 (2022)
Article MathSciNet MATH Google Scholar
Huang, Y.K., Dai, Y.H., Liu, X.W.: Equipping the Barzilai–Borwein method with the two dimensional quadratic termination property. SIAM J. Optim. 31(4), 3068–3096 (2021)
Article MathSciNet MATH Google Scholar
Jordan, M.I., Mitchell, T.M.: Machine learning: trends, perspectives, and prospects. Science 349(6245), 255–260 (2015)
Article MathSciNet MATH Google Scholar
Johnson, R., Zhang, T.: Accelerating stochastic gradient descent using predictive variance reduction. Adv. Neural Inf. Process. Syst. 26, 315–323 (2013)
MATH Google Scholar
Kasiviswanathan, S.P., Jin, H.: Efficient private empirical risk minimization for high-dimensional learning. In: International Conference on Machine Learning, pp. 488–497 (2016)
Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. Commun. ACM 60(6), 84–90 (2017)
Article MATH Google Scholar
LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. Nature 521(7553), 436–444 (2015)
Article MATH Google Scholar
Lei, L.H., Jordan, M.I.: Less than a single pass: stochastically controlled stochastic gradient method. In: Artificial Intelligence and Statistics, pp. 148–156 (2017)
Lei, L.H., Jordan, M.I.: On the adaptivity of stochastic gradient-based optimization. SIAM J. Optim. 30(2), 1473–1500 (2020)
Article MathSciNet MATH Google Scholar
Lei, L.H., Ju, C., Chen, J.B., et al.: Non-convex finite-sum optimization via SCSG methods. Adv. Neural Inf. Process. Syst. 30, 2345–2355 (2017)
MATH Google Scholar
Nguyen, L.M., Liu, J., Scheinberg, K., et al.: SARAH: a novel method for machine learning problems using stochastic recursive gradient. In: International Conference on Machine Learning, pp. 2613–2621 (2017)
Nguyen, L.M., Nguyen, P.H., Richtárik, P., et al.: New convergence aspects of stochastic gradient algorithms. J. Mach. Learn. Res. 20(176), 1–49 (2019)
MathSciNet MATH Google Scholar
Polyak, B.T.: Gradient methods for the minimisation of functionals. USSR Comput. Math. Math. Phys. 3(4), 864–878 (1963)
Article MATH Google Scholar
Rakhlin, A., Shamir, O., Sridharan, K.: Making gradient descent optimal for strongly convex stochastic optimization. In: International Conference on Machine Learning, pp. 1571–1578 (2012)
Reddi, S.J., Hefny, A., Sra, S., et al.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016)
Robbins, H., Monro, S.: A stochastic approximation method. Ann. Math. Stat. 22, 400–407 (1951)
Article MathSciNet MATH Google Scholar
Roux, N., Schmidt, M., Bach, F.: A stochastic gradient method with an exponential convergence rate for finite training sets. Adv. Neural Inf. Process. Syst. 25, 663–2671 (2012)
MATH Google Scholar
Sutskever, I., Martens, J., Dahl, G., et al.: On the importance of initialization and momentum in deep learning. In: International Conference on Machine Learning, pp. 1139–1147 (2013)
Tan, C.H., Ma, S.Q., Dai, Y.H., et al.: Barzilai–Borwein step size for stochastic gradient descent. Adv. Neural Inf. Process. Syst. 29, 685–693 (2016)
MATH Google Scholar
Wang, X.Y., Magnússon, S., Johansson, M.: On the convergence of step decay step-size for stochastic optimization. Adv. Neural Inf. Process. Syst. 34, 14226–14238 (2021)
MATH Google Scholar
Wang, X.Y., Yuan, Y.X.: On the convergence of stochastic gradient descent with bandwidth-based step size. J. Mach. Learn. Res. 24(48), 1–49 (2023)
MathSciNet MATH Google Scholar
Yu, T.T., Liu, X.W., Dai, Y.H., et al.: A minibatch proximal stochastic recursive gradient algorithm using a trust-region-like scheme and Barzilai–Borwein stepsizes. IEEE Trans. Neural Net. Learn. 32(10), 4627–4638 (2020)
Article MathSciNet MATH Google Scholar
Yu, T.T., Liu, X.W., Dai, Y.H., et al.: Stochastic variance reduced gradient methods using a trust-region-like scheme. J. Sci. Comput. 87(1), 1–24 (2021)
Article MathSciNet MATH Google Scholar
Yu, T.T., Liu, X.W., Dai, Y.H., et al.: Variable metric proximal stochastic variance reduced gradient methods for nonconvex nonsmooth optimization. J. Ind. Manag. Optim. 18(4), 2611–2631 (2022)
Article MathSciNet MATH Google Scholar

Download references

Acknowledgements

The authors would like to thank the associate editor and the anonymous referees for their valuable comments and suggestions.This work was supported by the National Natural Science Foundation of China (Grant No. 11701137) and Natural Science Foundation of Hebei Province (Grant No. A2021202010).

Author information

Authors and Affiliations

School of Sciences, Hebei University of Technology, Tianjin, 300401, China
Chenchen Liu & Yakui Huang
Institute of Mathematics, Hebei University of Technology, Tianjin, 300401, China
Yakui Huang
School of Artificial Intelligence, Hebei University of Technology, Tianjin, 300401, China
Dan Wang

Authors

Chenchen Liu
View author publications
You can also search for this author inPubMed Google Scholar
Yakui Huang
View author publications
You can also search for this author inPubMed Google Scholar
Dan Wang
View author publications
You can also search for this author inPubMed Google Scholar

Corresponding author

Correspondence to Yakui Huang.

Ethics declarations

Conflict of interest

The authors have no relevant financial or non-financial interests to disclose.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Liu, C., Huang, Y. & Wang, D. Improving the stochastically controlled stochastic gradient method by the bandwidth-based stepsize. Comput Optim Appl 90, 941–968 (2025). https://doi.org/10.1007/s10589-025-00651-6

Download citation

Received: 04 July 2023
Accepted: 13 January 2025
Published: 23 January 2025
Issue Date: April 2025
DOI: https://doi.org/10.1007/s10589-025-00651-6

Keywords

Mathematics Subject Classification

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Improving the stochastically controlled stochastic gradient method by the bandwidth-based stepsize

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

SPIRAL: a superlinearly convergent incremental proximal algorithm for nonconvex finite sum minimization

Gradient-free methods for non-smooth convex stochastic optimization with heavy-tailed noise on convex compact

On the linear convergence of the stochastic gradient method with constant step-size

Data Availability

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Mathematics Subject Classification

Subscribe and save

Buy Now