Abstract
Stepsize plays an important role in the stochastic gradient method. The bandwidth-based stepsize allows us to adjust the stepsize within a banded region determined by some boundary functions. Based on the bandwidth-based stepsize, we propose a new method, namely SCSG-BD, for smooth non-convex finite-sum optimization problems. For the boundary functions 1/t, \(1/(t\log (t + 1))\) and \(1/t^p\) (\(p\in (0,1)\)), SCSG-BD converges sublinearly to a stationary point at a faster rate than the stochastically controlled stochastic gradient (SCSG) method under certain conditions. Moreover, SCSG-BD is able to converge linearly to the solution if the objective function satisfies the Polyak–Łojasiewicz condition. We also introduce the 1/t-Barzilai–Borwein stepsize for practical computation. Numerical experiments demonstrate that SCSG-BD performs better than SCSG and its variants.







Similar content being viewed by others
Data Availability
The data that support the finding of this study are available from the corresponding author upon reasonable request.
Notes
available on https://www.csie.ntu.edu.tw/~cjlin/libsvmtools/datasets/,
References
Barzilai, J., Borwein, J.M.: Two-point step size gradient methods. IMA J. Numer. Anal. 8(1), 141–148 (1988)
Bottou, L.: Large-scale machine learning with stochastic gradient descent. In: Proceedings of COMPSTAT’2010: 19th International Conference on Computational Statistics, pp. 177–186 (2010)
Bottou, L., Curtis, F.E., Nocedal, J.: Optimization methods for large-scale machine learning. SIAM Rev. 60(2), 223–311 (2018)
Dai, Y.H., Huang, Y.K., Liu, X.W.: A family of spectral gradient methods for optimization. Comput. Optim. Appl. 74, 43–65 (2019)
Ghadimi, S., Lan, G.: Stochastic first-and zeroth-order methods for nonconvex stochastic programming. SIAM J. Optim. 23(4), 2341–2368 (2013)
Gower, R.M., Loizou, N., Qian, X., et al.: SGD: General analysis and improved rates. In: International Conference on Machine Learning, pp. 5200–5209 (2019)
Hastie, T., Tibshirani, R., Friedman, J.H., et al.: The Elements of Statistical Learning: Data Mining, Inference, and Prediction. Springer, New York (2009)
Hazan, E., Kale, S.: Beyond the regret minimization barrier: optimal algorithms for stochastic strongly-convex optimization. J. Mach. Learn. Res. 15(1), 2489–2512 (2014)
Horváth, S., Lei, L.H., Richtárik, P., et al.: Adaptivity of stochastic gradient methods for nonconvex optimization. SIAM J. Math. Data Sci. 4(2), 634–648 (2022)
Huang, Y.K., Dai, Y.H., Liu, X.W.: Equipping the Barzilai–Borwein method with the two dimensional quadratic termination property. SIAM J. Optim. 31(4), 3068–3096 (2021)
Jordan, M.I., Mitchell, T.M.: Machine learning: trends, perspectives, and prospects. Science 349(6245), 255–260 (2015)
Johnson, R., Zhang, T.: Accelerating stochastic gradient descent using predictive variance reduction. Adv. Neural Inf. Process. Syst. 26, 315–323 (2013)
Kasiviswanathan, S.P., Jin, H.: Efficient private empirical risk minimization for high-dimensional learning. In: International Conference on Machine Learning, pp. 488–497 (2016)
Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. Commun. ACM 60(6), 84–90 (2017)
LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. Nature 521(7553), 436–444 (2015)
Lei, L.H., Jordan, M.I.: Less than a single pass: stochastically controlled stochastic gradient method. In: Artificial Intelligence and Statistics, pp. 148–156 (2017)
Lei, L.H., Jordan, M.I.: On the adaptivity of stochastic gradient-based optimization. SIAM J. Optim. 30(2), 1473–1500 (2020)
Lei, L.H., Ju, C., Chen, J.B., et al.: Non-convex finite-sum optimization via SCSG methods. Adv. Neural Inf. Process. Syst. 30, 2345–2355 (2017)
Nguyen, L.M., Liu, J., Scheinberg, K., et al.: SARAH: a novel method for machine learning problems using stochastic recursive gradient. In: International Conference on Machine Learning, pp. 2613–2621 (2017)
Nguyen, L.M., Nguyen, P.H., Richtárik, P., et al.: New convergence aspects of stochastic gradient algorithms. J. Mach. Learn. Res. 20(176), 1–49 (2019)
Polyak, B.T.: Gradient methods for the minimisation of functionals. USSR Comput. Math. Math. Phys. 3(4), 864–878 (1963)
Rakhlin, A., Shamir, O., Sridharan, K.: Making gradient descent optimal for strongly convex stochastic optimization. In: International Conference on Machine Learning, pp. 1571–1578 (2012)
Reddi, S.J., Hefny, A., Sra, S., et al.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016)
Robbins, H., Monro, S.: A stochastic approximation method. Ann. Math. Stat. 22, 400–407 (1951)
Roux, N., Schmidt, M., Bach, F.: A stochastic gradient method with an exponential convergence rate for finite training sets. Adv. Neural Inf. Process. Syst. 25, 663–2671 (2012)
Sutskever, I., Martens, J., Dahl, G., et al.: On the importance of initialization and momentum in deep learning. In: International Conference on Machine Learning, pp. 1139–1147 (2013)
Tan, C.H., Ma, S.Q., Dai, Y.H., et al.: Barzilai–Borwein step size for stochastic gradient descent. Adv. Neural Inf. Process. Syst. 29, 685–693 (2016)
Wang, X.Y., Magnússon, S., Johansson, M.: On the convergence of step decay step-size for stochastic optimization. Adv. Neural Inf. Process. Syst. 34, 14226–14238 (2021)
Wang, X.Y., Yuan, Y.X.: On the convergence of stochastic gradient descent with bandwidth-based step size. J. Mach. Learn. Res. 24(48), 1–49 (2023)
Yu, T.T., Liu, X.W., Dai, Y.H., et al.: A minibatch proximal stochastic recursive gradient algorithm using a trust-region-like scheme and Barzilai–Borwein stepsizes. IEEE Trans. Neural Net. Learn. 32(10), 4627–4638 (2020)
Yu, T.T., Liu, X.W., Dai, Y.H., et al.: Stochastic variance reduced gradient methods using a trust-region-like scheme. J. Sci. Comput. 87(1), 1–24 (2021)
Yu, T.T., Liu, X.W., Dai, Y.H., et al.: Variable metric proximal stochastic variance reduced gradient methods for nonconvex nonsmooth optimization. J. Ind. Manag. Optim. 18(4), 2611–2631 (2022)
Acknowledgements
The authors would like to thank the associate editor and the anonymous referees for their valuable comments and suggestions.This work was supported by the National Natural Science Foundation of China (Grant No. 11701137) and Natural Science Foundation of Hebei Province (Grant No. A2021202010).
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
The authors have no relevant financial or non-financial interests to disclose.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Liu, C., Huang, Y. & Wang, D. Improving the stochastically controlled stochastic gradient method by the bandwidth-based stepsize. Comput Optim Appl 90, 941–968 (2025). https://doi.org/10.1007/s10589-025-00651-6
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10589-025-00651-6
Keywords
- Stochastic gradient method
- Bandwidth-based stepsize
- Barzilai–Borwein stepsize
- Non-convex finite-sum optimization