Abstract
Stochastic quasi-Newton methods have garnered considerable attention within large-scale machine learning optimization. Nevertheless, the presence of a stochastic gradient equaling zero poses a significant obstacle to updating the quasi-Newton matrix, thereby impacting the stability of the quasi-Newton algorithm. To address this issue, a checkpoint mechanism is introduced, i.e., checking the value of \(\textbf{s}_k\) before updating the quasi-Newton matrix, which effectively prevents zero increments in the optimization variable and enhances algorithmic stability during iterations. Meanwhile, a novel gradient incremental formulation is introduced to satisfy curvature conditions, facilitating convergence for non-convex objectives. Additionally, finite-memory techniques are employed to reduce storage requirements in large-scale machine learning tasks. The last iteration of the proposed algorithm is proven to converge in a non-convex setting, which is better than average and minimum iteration convergence. Finally, experiments are conducted on benchmark datasets to compare the proposed RSLBFGS algorithm with other popular first and second-order methods, demonstrating the effectiveness and robustness of RSLBFGS.









Similar content being viewed by others
Explore related subjects
Discover the latest articles and news from researchers in related subjects, suggested using machine learning.Availability of Data and Materials
Only public datasets, in www.csie.ntu.edu.tw/~cjlin/libsvmtools/. are used.
References
Wang S, Takyi-Aninakwa P, Jin S, Yu C, Fernandez C, Stroe DI (2022) An improved feedforward-long short-term memory modeling method for the whole-life-cycle state of charge prediction of lithium-ion batteries considering current-voltage-temperature variation. Energy 254:124224
Wang S, Fan Y, Jin S, Takyi-Aninakwa P, Fernandez C (2023) Improved anti-noise adaptive long short-term memory neural network modeling for the robust remaining useful life prediction of lithium-ion batteries. Reliab Eng Syst Saf 230:108920
Shalev-Shwartz S, Ben-David S (2014) Understanding Machine Learning: From theory to algorithms. Cambridge University Press, Cambridge, UK
Zhang Y, Qiu M, Gao H (2023) Communication-efficient stochastic gradient descent ascent with momentum algorithms. In: International joint conference on artificial intelligence, pp 4602–4610
Lin T, Jin C, Jordan M (2020) On gradient descent ascent for nonconvex-concave minimax problems. In: International conference on machine learning, pp 6083–6093
Chen X, Liu S, Sun R, Hong M (2019) On the convergence of a class of Adam-type algorithms for non-convex optimization. In: International conference on learning representations, pp 1–30
Xu D, Zhang S, Zhang H, Mandic DP (2021) Convergence of the rmsprop deep learning method with penalty for nonconvex optimization. Neural Netw 139:17–23
Huang R, Qin Y, Liu K, Yuan G (2023) Biased stochastic conjugate gradient algorithm with adaptive step size for nonconvex problems. Expert Syst Appl 121556
Jiang W, Liang Y, Jiang Z, Xu D, Zhou L (2024) Abngrad: adaptive step size gradient descent for optimizing neural networks. Appl Intell 1–18
Yang Z (2023) Stochastic variance reduced gradient with hyper-gradient for non-convex large-scale learning. Appl Intell 53(23):28627–28641
Wang S, Zhang S, Wen S, Fernandez C (2024) An accurate state-of-charge estimation of lithium-ion batteries based on improved particle swarm optimization-adaptive square root cubature kalman filter. J Power Sourc 624:235594
Ouyang C, Lu C, Zhao X, Huang R, Yuan G, Jiang Y (2024) Stochastic three-term conjugate gradient method with variance technique for non-convex learning. Stat Comput 34(3):107
Luo L, Ye H, Huang Z, Zhang T (2020) Stochastic recursive gradient descent ascent for stochastic nonconvex-strongly-concave minimax problems. Advances in neural information processing systems 33:20566–20577
Liang Y, Liu J, Xu D (2023) Stochastic momentum methods for non-convex learning without bounded assumptions. Neural Netw 165:830–845
Byrd R, Hansen S, Nocedal J, Singer Y (2016) A stochastic quasi-Newton method for large-scale optimization. SIAM J Optim 26(2):1008–1031
Nicolas L, Fitzgibbon A (2010) A fast natural Newton method. In: International conference on machine learning, pp 623–630
Lucchi A, McWilliams B, Hofmann T (2015) A variance reduced stochastic Newton method. arXiv preprint arXiv:1503.08316
Wang Y, Wang Z, Huang H (2023) Stochastic adaptive CL-BFGS algorithms for fully complex-valued dendritic neuron model. Knowl-Based Syst 277:110788
Xu P, Roosta F, Mahoney M (2020) Newton-type methods for non-convex optimization under inexact Hessian information. Math Program 184(1–2):35–70
Na S, Dereziński M, Mahoney M (2023) Hessian averaging in stochastic Newton methods achieves superlinear convergence. Math Program 201(1):473–520
Luo J, Wei Z, Man J, Xu S (2023) TRBoost: a generic gradient boosting machine based on trust-region method. Appl Intell 53(22):27876–27891
Mokhtari A, Ribeiro A (2014) Res: Regularized stochastic bfgs algorithm. IEEE Trans Signal Process 62(23):6089–6104
Wang X, Ma S, Goldfarb D, Liu W (2017) Stochastic quasi-Newton methods for nonconvex stochastic optimization. SIAM J Optim 27(2):927–956
Wei Z, Li G, Qi L (2006) New quasi-Newton methods for unconstrained optimization problems. Appl Math Comput 175(2):1156–1188
Bordes A, Bottou L (2009) SGD-QN: Careful quasi-Newton stochastic gradient descent. J Mach Learn Res 10:1737–1754
Liu D, Nocedal J (1989) On the limited memory BFGS method for large scale optimization. Math Program 45:503–528
Deng N, Li Z (1995) Some global convergence properties of a conic-variable metric algorithm for minimization with inexact line searches. Numer Algebra Control Optim 5(1):105–122
Liu J, Kong J, Xu D, Qi M, Lu Y (2022) Convergence analysis of AdaBound with relaxed bound functions for non-convex optimization. Neural Netw 145:300–307
Bertsekas D (1997) Nonlinear programming. J Oper Res Soc 48(3):334–334
Li X, Orabona F (2019) On the convergence of stochastic gradient descent with adaptive stepsizes. In: International conference on artificial intelligence and statistics, pp 983–992
Yang Z (2022) Adaptive stochastic conjugate gradient for machine learning. Expert Syst Appl 206:117719
Robbins H, Monro S (1951) A stochastic approximation method. Ann Math Stat 400–407
Acknowledgements
The authors would like to thank the anonymous reviewers for their insightful and very helpful expert comments and suggestions, which have led to important improvements. The authors thank Dr. Kasper Karlsson for polishing the manuscript.
Funding
This work was funded in part by the National Key Research and Development Program of China (2021YFA1003400), in part by the National Natural Science Foundation of China (62176051), and in part by the Scientific Research Program of Jilin Provincial Department of Education.
Author information
Authors and Affiliations
Contributions
Hanger Liu: writing - original draft preparation, conceptualization, Yuqing Liang: writing - review and editing, validation, Jinlan Liu: writing-review and editing, supervision, Dongpo Xu: funding acquisition, supervision, review and editing for the manuscript. All authors have reviewed the manuscript.
Corresponding authors
Ethics declarations
Ethical Approval and Consent to Participate
Not applicable.
Consent for Publication
Not applicable.
Competing Interests
The authors declare that they have no conflict of interest.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Liu, H., Liang, Y., Liu, J. et al. A robust stochastic quasi-Newton algorithm for non-convex machine learning. Appl Intell 55, 569 (2025). https://doi.org/10.1007/s10489-025-06475-5
Accepted:
Published:
DOI: https://doi.org/10.1007/s10489-025-06475-5