Skip to main content

Advertisement

Log in

A robust stochastic quasi-Newton algorithm for non-convex machine learning

  • Published:
Applied Intelligence Aims and scope Submit manuscript

Abstract

Stochastic quasi-Newton methods have garnered considerable attention within large-scale machine learning optimization. Nevertheless, the presence of a stochastic gradient equaling zero poses a significant obstacle to updating the quasi-Newton matrix, thereby impacting the stability of the quasi-Newton algorithm. To address this issue, a checkpoint mechanism is introduced, i.e., checking the value of \(\textbf{s}_k\) before updating the quasi-Newton matrix, which effectively prevents zero increments in the optimization variable and enhances algorithmic stability during iterations. Meanwhile, a novel gradient incremental formulation is introduced to satisfy curvature conditions, facilitating convergence for non-convex objectives. Additionally, finite-memory techniques are employed to reduce storage requirements in large-scale machine learning tasks. The last iteration of the proposed algorithm is proven to converge in a non-convex setting, which is better than average and minimum iteration convergence. Finally, experiments are conducted on benchmark datasets to compare the proposed RSLBFGS algorithm with other popular first and second-order methods, demonstrating the effectiveness and robustness of RSLBFGS.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Algorithm 1
Algorithm 2
Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7

Similar content being viewed by others

Explore related subjects

Discover the latest articles and news from researchers in related subjects, suggested using machine learning.

Availability of Data and Materials

Only public datasets, in www.csie.ntu.edu.tw/~cjlin/libsvmtools/. are used.

Notes

  1. www.csie.ntu.edu.tw/~cjlin/libsvmtools/.

References

  1. Wang S, Takyi-Aninakwa P, Jin S, Yu C, Fernandez C, Stroe DI (2022) An improved feedforward-long short-term memory modeling method for the whole-life-cycle state of charge prediction of lithium-ion batteries considering current-voltage-temperature variation. Energy 254:124224

    Google Scholar 

  2. Wang S, Fan Y, Jin S, Takyi-Aninakwa P, Fernandez C (2023) Improved anti-noise adaptive long short-term memory neural network modeling for the robust remaining useful life prediction of lithium-ion batteries. Reliab Eng Syst Saf 230:108920

    Google Scholar 

  3. Shalev-Shwartz S, Ben-David S (2014) Understanding Machine Learning: From theory to algorithms. Cambridge University Press, Cambridge, UK

    MATH  Google Scholar 

  4. Zhang Y, Qiu M, Gao H (2023) Communication-efficient stochastic gradient descent ascent with momentum algorithms. In: International joint conference on artificial intelligence, pp 4602–4610

  5. Lin T, Jin C, Jordan M (2020) On gradient descent ascent for nonconvex-concave minimax problems. In: International conference on machine learning, pp 6083–6093

  6. Chen X, Liu S, Sun R, Hong M (2019) On the convergence of a class of Adam-type algorithms for non-convex optimization. In: International conference on learning representations, pp 1–30

  7. Xu D, Zhang S, Zhang H, Mandic DP (2021) Convergence of the rmsprop deep learning method with penalty for nonconvex optimization. Neural Netw 139:17–23

    MATH  Google Scholar 

  8. Huang R, Qin Y, Liu K, Yuan G (2023) Biased stochastic conjugate gradient algorithm with adaptive step size for nonconvex problems. Expert Syst Appl 121556

  9. Jiang W, Liang Y, Jiang Z, Xu D, Zhou L (2024) Abngrad: adaptive step size gradient descent for optimizing neural networks. Appl Intell 1–18

  10. Yang Z (2023) Stochastic variance reduced gradient with hyper-gradient for non-convex large-scale learning. Appl Intell 53(23):28627–28641

    Google Scholar 

  11. Wang S, Zhang S, Wen S, Fernandez C (2024) An accurate state-of-charge estimation of lithium-ion batteries based on improved particle swarm optimization-adaptive square root cubature kalman filter. J Power Sourc 624:235594

    Google Scholar 

  12. Ouyang C, Lu C, Zhao X, Huang R, Yuan G, Jiang Y (2024) Stochastic three-term conjugate gradient method with variance technique for non-convex learning. Stat Comput 34(3):107

    MathSciNet  MATH  Google Scholar 

  13. Luo L, Ye H, Huang Z, Zhang T (2020) Stochastic recursive gradient descent ascent for stochastic nonconvex-strongly-concave minimax problems. Advances in neural information processing systems 33:20566–20577

    MATH  Google Scholar 

  14. Liang Y, Liu J, Xu D (2023) Stochastic momentum methods for non-convex learning without bounded assumptions. Neural Netw 165:830–845

    MATH  Google Scholar 

  15. Byrd R, Hansen S, Nocedal J, Singer Y (2016) A stochastic quasi-Newton method for large-scale optimization. SIAM J Optim 26(2):1008–1031

    MathSciNet  MATH  Google Scholar 

  16. Nicolas L, Fitzgibbon A (2010) A fast natural Newton method. In: International conference on machine learning, pp 623–630

  17. Lucchi A, McWilliams B, Hofmann T (2015) A variance reduced stochastic Newton method. arXiv preprint arXiv:1503.08316

  18. Wang Y, Wang Z, Huang H (2023) Stochastic adaptive CL-BFGS algorithms for fully complex-valued dendritic neuron model. Knowl-Based Syst 277:110788

    MATH  Google Scholar 

  19. Xu P, Roosta F, Mahoney M (2020) Newton-type methods for non-convex optimization under inexact Hessian information. Math Program 184(1–2):35–70

    MathSciNet  MATH  Google Scholar 

  20. Na S, Dereziński M, Mahoney M (2023) Hessian averaging in stochastic Newton methods achieves superlinear convergence. Math Program 201(1):473–520

    MathSciNet  MATH  Google Scholar 

  21. Luo J, Wei Z, Man J, Xu S (2023) TRBoost: a generic gradient boosting machine based on trust-region method. Appl Intell 53(22):27876–27891

    MATH  Google Scholar 

  22. Mokhtari A, Ribeiro A (2014) Res: Regularized stochastic bfgs algorithm. IEEE Trans Signal Process 62(23):6089–6104

    MathSciNet  MATH  Google Scholar 

  23. Wang X, Ma S, Goldfarb D, Liu W (2017) Stochastic quasi-Newton methods for nonconvex stochastic optimization. SIAM J Optim 27(2):927–956

    MathSciNet  MATH  Google Scholar 

  24. Wei Z, Li G, Qi L (2006) New quasi-Newton methods for unconstrained optimization problems. Appl Math Comput 175(2):1156–1188

    MathSciNet  MATH  Google Scholar 

  25. Bordes A, Bottou L (2009) SGD-QN: Careful quasi-Newton stochastic gradient descent. J Mach Learn Res 10:1737–1754

    MathSciNet  MATH  Google Scholar 

  26. Liu D, Nocedal J (1989) On the limited memory BFGS method for large scale optimization. Math Program 45:503–528

    MathSciNet  MATH  Google Scholar 

  27. Deng N, Li Z (1995) Some global convergence properties of a conic-variable metric algorithm for minimization with inexact line searches. Numer Algebra Control Optim 5(1):105–122

    MATH  Google Scholar 

  28. Liu J, Kong J, Xu D, Qi M, Lu Y (2022) Convergence analysis of AdaBound with relaxed bound functions for non-convex optimization. Neural Netw 145:300–307

    MATH  Google Scholar 

  29. Bertsekas D (1997) Nonlinear programming. J Oper Res Soc 48(3):334–334

    MATH  Google Scholar 

  30. Li X, Orabona F (2019) On the convergence of stochastic gradient descent with adaptive stepsizes. In: International conference on artificial intelligence and statistics, pp 983–992

  31. Yang Z (2022) Adaptive stochastic conjugate gradient for machine learning. Expert Syst Appl 206:117719

    Google Scholar 

  32. Robbins H, Monro S (1951) A stochastic approximation method. Ann Math Stat 400–407

Download references

Acknowledgements

The authors would like to thank the anonymous reviewers for their insightful and very helpful expert comments and suggestions, which have led to important improvements. The authors thank Dr. Kasper Karlsson for polishing the manuscript.

Funding

This work was funded in part by the National Key Research and Development Program of China (2021YFA1003400), in part by the National Natural Science Foundation of China (62176051), and in part by the Scientific Research Program of Jilin Provincial Department of Education.

Author information

Authors and Affiliations

Authors

Contributions

Hanger Liu: writing - original draft preparation, conceptualization, Yuqing Liang: writing - review and editing, validation, Jinlan Liu: writing-review and editing, supervision, Dongpo Xu: funding acquisition, supervision, review and editing for the manuscript. All authors have reviewed the manuscript.

Corresponding authors

Correspondence to Jinlan Liu or Dongpo Xu.

Ethics declarations

Ethical Approval and Consent to Participate

Not applicable.

Consent for Publication

Not applicable.

Competing Interests

The authors declare that they have no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Liu, H., Liang, Y., Liu, J. et al. A robust stochastic quasi-Newton algorithm for non-convex machine learning. Appl Intell 55, 569 (2025). https://doi.org/10.1007/s10489-025-06475-5

Download citation

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s10489-025-06475-5

Keywords