Skip to main content

Advertisement

A Proximal Stochastic Quasi-Newton Algorithm with Dynamical Sampling and Stochastic Line Search

  • Published:
Journal of Scientific Computing Aims and scope Submit manuscript

Abstract

In the field of machine learning, many large-scale optimization problems can be decomposed into the sum of two functions: a smooth function and a nonsmooth function with a simple proximal mapping. In light of this, our paper introduces a novel variant of the proximal stochastic quasi-Newton algorithm, grounded in three key components: (i) developing an adaptive sampling method that dynamically increases the sample size during the iteration process, thus preventing rapid growth in sample size and mitigating the noise introduced by the stochastic approximation method; (ii) the integration of stochastic line search to ensure a sufficient decrease in the expected value of the objective function; and (iii) a stable update scheme for the stochastic modified limited-memory Broyden-Fletcher-Goldfarb-Shanno algorithm. For a general objective function, it can be proven that the limit points of the generated sequence almost surely converge to stationary points. Furthermore, the convergence rate and the number of required gradient computations for this process have been analyzed. In the case of a strongly convex objective function, a global linear convergence rate can be achieved, and the number of required gradient computations is thoroughly examined. Finally, numerical experiments demonstrate the robustness of the proposed method across various hyperparameter settings, establishing its competitiveness compared to state-of-the-art methods.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Algorithm 1
Algorithm 2
Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15
Fig. 16
Fig. 17
Fig. 18
Fig. 19
Fig. 20
Fig. 21
Fig. 22

Similar content being viewed by others

Availability of Data and Materials

The datasets analysed during the current study are available in links given in the paper.

References

  1. Beck, A.: First-order methods in optimization. SIAM (2017)

  2. Beck, A., Teboulle, M.: A fast iterative shrinkage-thresholding algorithm for linear inverse problems. SIAM J. Imag. Sci. 2(1), 183–202 (2009)

    Article  MathSciNet  MATH  Google Scholar 

  3. Beiser, F., Keith, B., Urbainczyk, S., Wohlmuth, B.: Adaptive sampling strategies for risk-averse stochastic optimization with constraints. IMA J. Numer. Anal. 43(6), 3729–3765 (2023)

    Article  MathSciNet  MATH  Google Scholar 

  4. Berahas, A.S., Bollapragada, R., Nocedal, J.: An investigation of Newton-sketch and subsampled Newton methods. Optimiz. Methods Softw. 35(4), 661–680 (2020)

    Article  MathSciNet  MATH  Google Scholar 

  5. Berahas, A.S., Nocedal, J., Takác, M.: A multi-batch L-BFGS method for machine learning. Adv. Neural Inf. Proc. Syst. 29, 16 (2016)

    MATH  Google Scholar 

  6. Bollapragada, R., Byrd, R.H., Nocedal, J.: Adaptive sampling strategies for stochastic optimization. SIAM J. Opt. 28(4), 3312–3343 (2018)

    Article  MathSciNet  MATH  Google Scholar 

  7. Bollapragada, R., Byrd, R.H., Nocedal, J.: Exact and inexact subsampled Newton methods for optimization. IMA J. Numer. Anal. 39(2), 545–578 (2019)

    Article  MathSciNet  MATH  Google Scholar 

  8. Bollapragada, R., Nocedal, J., Mudigere, D., Shi, H.J., Tang, P.T.P.: A progressive batching L-BFGS method for machine learning. In: International Conference on Machine Learning, pp. 620–629. PMLR (2018)

  9. Bonettini, S., Loris, I., Porta, F., Prato, M.: Variable metric inexact line-search-based methods for nonsmooth optimization. SIAM J. Opt. 26(2), 891–921 (2016)

    Article  MathSciNet  MATH  Google Scholar 

  10. Botev, A., Ritter, H., Barber, D.: Practical gauss-Newton optimisation for deep learning. In: International Conference on Machine Learning, pp. 557–565. PMLR (2017)

  11. Bottou, L., Curtis, F.E., Nocedal, J.: Optimization methods for large-scale machine learning. SIAM Rev. 60(2), 223–311 (2018)

    Article  MathSciNet  MATH  Google Scholar 

  12. Byrd, R.H., Chin, G.M., Nocedal, J., Wu, Y.: Sample size selection in optimization methods for machine learning. Math. Progr. 134(1), 127–155 (2012)

    Article  MathSciNet  MATH  Google Scholar 

  13. Byrd, R.H., Hansen, S.L., Nocedal, J., Singer, Y.: A stochastic quasi-Newton method for large-scale optimization. SIAM J. Opt. 26(2), 1008–1031 (2016)

    Article  MathSciNet  MATH  Google Scholar 

  14. Byrd, R.H., Nocedal, J., Oztoprak, F.: An inexact successive quadratic approximation method for l-1 regularized optimization. Math. Progr. 157(2), 375–396 (2016)

    Article  MathSciNet  MATH  Google Scholar 

  15. Byrd, R.H., Nocedal, J., Schnabel, R.B.: Representations of quasi-Newton matrices and their use in limited memory methods. Math. Progr. 63(1), 129–156 (1994)

    Article  MathSciNet  MATH  Google Scholar 

  16. Combettes, P.L., Pesquet, J.C.: Proximal splitting methods in signal processing. Fixed-point algorithms for inverse problems in science and engineering pp. 185–212 (2011)

  17. Defazio, A., Domke, J., et al.: Finito: A faster, permutable incremental gradient method for big data problems. In: International Conference on Machine Learning, pp. 1125–1133. PMLR (2014)

  18. Di Serafino, D., Krejić, N., Krklec Jerinkić, N., Viola, M.: Lsos: line-search second-order stochastic optimization methods for nonconvex finite sums. Math. Computat. 92(341), 1273–1299 (2023)

    Article  MathSciNet  MATH  Google Scholar 

  19. Fan, R.E., Chang, K.W., Hsieh, C.J., Wang, X.R., Lin, C.J.: Liblinear: a library for large linear classification. J. Machi. Learn. Res. 9, 1871–1874 (2008)

    MATH  Google Scholar 

  20. Franchini, G., Porta, F., Ruggiero, V., Trombini, I.: A line search based proximal stochastic gradient algorithm with dynamical variance reduction. J. Sci. Comput. 94(1), 23 (2023)

    Article  MathSciNet  MATH  Google Scholar 

  21. Ghadimi, S., Lan, G.: Optimal stochastic approximation algorithms for strongly convex stochastic composite optimization i: a generic algorithmic framework. SIAM J. Opt. 22(4), 1469–1492 (2012)

    Article  MathSciNet  MATH  Google Scholar 

  22. Goldman, R.: Curvature formulas for implicit curves and surfaces. Comput. Aided Geomet. Des. 22(7), 632–658 (2005)

    Article  MathSciNet  MATH  Google Scholar 

  23. Hastie, T., Tibshirani, R., Friedman, J.H., Friedman, J.H.: The elements of statistical learning: data mining, inference, and prediction, vol. 2, p. 103. Springer, London (2009)

    Book  MATH  Google Scholar 

  24. Reddi, J., S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Adv. Neural Inf. Proc. Syst. 29, 16 (2016)

  25. Johnson, R., Zhang, T.: Accelerating stochastic gradient descent using predictive variance reduction. Adv. Neural Inf. Proc. Syst. 26, 16 (2013)

    MATH  Google Scholar 

  26. Kanzow, C., Lechner, T.: Globalized inexact proximal Newton-type methods for nonconvex composite functions. Computat. Opt. Appl. 78(2), 377–410 (2021)

    Article  MathSciNet  MATH  Google Scholar 

  27. Lee, Cp., Wright, S.J.: Inexact successive quadratic approximation for regularized optimization. Computat. Opt. Appl. 72, 641–674 (2019)

    Article  MathSciNet  MATH  Google Scholar 

  28. Lee, J.D., Sun, Y., Saunders, M.A.: Proximal Newton-type methods for minimizing composite functions. SIAM J. Opt. 24(3), 1420–1443 (2014)

    Article  MathSciNet  MATH  Google Scholar 

  29. Li, D.H., Fukushima, M.: A modified BFGS method and its global convergence in nonconvex minimization. J. Computat. Appl. Math. 129(1–2), 15–35 (2001)

    Article  MathSciNet  MATH  Google Scholar 

  30. Li, D.H., Fukushima, M.: On the global convergence of the BFGS method for nonconvex unconstrained optimization problems. SIAM J. Opt. 11(4), 1054–1064 (2001)

    Article  MathSciNet  MATH  Google Scholar 

  31. Liu, D.C., Nocedal, J.: On the limited memory BFGS method for large scale optimization. Math. Progr. 45(1–3), 503–528 (1989)

    Article  MathSciNet  MATH  Google Scholar 

  32. Mannel, F., Aggrawal, H.O., Modersitzki, J.: A structured L-BFGS method and its application to inverse problems. Inverse Problems (2023)

  33. Miller, I., Miller, M., Freund, J.E.: John E. Freund’s Mathematical Statistics with Applications. 8th edition. Pearson Education Limited, America (2014)

  34. Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course, vol. 87. Springer Science & Business Media, Cham (2003)

    MATH  Google Scholar 

  35. Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Sarah: A novel method for machine learning problems using stochastic recursive gradient. In: International conference on machine learning, pp. 2613–2621. PMLR (2017)

  36. Nocedal, J.: Theory of algorithms for unconstrained optimization. Acta Numer. 1, 199–242 (1992)

    Article  MathSciNet  MATH  Google Scholar 

  37. Pham, N.H., Nguyen, L.M., Phan, D.T., Tran-Dinh, Q.: Proxsarah: an efficient algorithmic framework for stochastic composite nonconvex optimization. J. Mach. Learn. Res. 21(1), 4455–4502 (2020)

    MathSciNet  MATH  Google Scholar 

  38. Pilanci, M., Wainwright, M.J.: Newton sketch: A near linear-time optimization algorithm with linear-quadratic convergence. SIAM J. Opt. 27(1), 205–245 (2017)

    Article  MathSciNet  MATH  Google Scholar 

  39. Robbins, H., Siegmund, D.: A convergence theorem for non negative almost supermartingales and some applications. In: Optimizing methods in statistics, pp. 233–257. Elsevier (1971)

  40. Roux, N., Schmidt, M., Bach, F.: A stochastic gradient method with an exponential convergence rate for finite training sets. Adv. Neural Inf. Process. Syst. 25, 12 (2012)

    MATH  Google Scholar 

  41. Saratchandran, H., Chng, S.F., Ramasinghe, S., MacDonald, L., Lucey, S.: Curvature-aware training for coordinate networks. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 13328–13338 (2023)

  42. Schmidt, M., Le Roux, N., Bach, F.: Minimizing finite sums with the stochastic average gradient. Math. Progr. 162, 83–112 (2017)

    Article  MathSciNet  MATH  Google Scholar 

  43. Shalev-Shwartz, S., Ben-David, S.: Understanding Machine Learning: From Theory to Algorithms. Cambridge University Press, Cambridge (2014)

    Book  MATH  Google Scholar 

  44. Shalev-Shwartz, S., Tewari, A.: Stochastic methods for \(l_{1}\) regularized loss minimization. In: Proceedings of the 26th Annual International Conference on Machine Learning, pp. 929–936 (2009)

  45. Shi, J., Yin, W., Osher, S., Sajda, P.: A fast hybrid algorithm for large-scale \(l_{1}\)-regularized logistic regression. J. Mach. Learn. Res. 11, 713–741 (2010)

    MathSciNet  MATH  Google Scholar 

  46. Vapnik, V.: The Nature of Statistical Learning Theory. Springer science & business media, Cham (1999)

    MATH  Google Scholar 

  47. Wang, J., Zhang, T.: Utilizing second order information in minibatch stochastic variance reduced proximal iterations. J. Mach. Learn. Res. 20(1), 1578–1633 (2019)

    MathSciNet  MATH  Google Scholar 

  48. Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-Newton methods for nonconvex stochastic optimization. SIAM J. Opt. 27(2), 927–956 (2017)

    Article  MathSciNet  MATH  Google Scholar 

  49. Wang, X., Wang, X., Yuan, Yx.: Stochastic proximal quasi-Newton methods for non-convex composite optimization. Opt. Methods Softw. 34(5), 922–948 (2019)

    Article  MathSciNet  MATH  Google Scholar 

  50. Wang, X., Zhang, H.: Inexact proximal stochastic second-order methods for nonconvex composite optimization. Opt. Methods Softw. 35(4), 808–835 (2020)

    Article  MathSciNet  MATH  Google Scholar 

  51. Wright, S.J.: Numerical optimization (2006)

  52. Xiao, L., Zhang, T.: A proximal stochastic gradient method with progressive variance reduction. SIAM J. Opt. 24(4), 2057–2075 (2014)

    Article  MathSciNet  MATH  Google Scholar 

  53. Xie, Y., Bollapragada, R., Byrd, R., Nocedal, J.: Constrained and composite optimization via adaptive sampling methods. IMA J. Numer. Anal. 44(2), 680–709 (2024)

    Article  MathSciNet  MATH  Google Scholar 

  54. Xu, P., Roosta, F., Mahoney, M.W.: Second-order optimization for non-convex machine learning: An empirical study. In: Proceedings of the 2020 SIAM International Conference on Data Mining, pp. 199–207. SIAM (2020)

  55. Xu, P., Yang, J., Roosta, F., Ré, C., Mahoney, M.W.: Sub-sampled Newton methods with non-uniform sampling. Advances in Neural Information Processing Systems 29 (2016)

  56. Yang, M., Milzarek, A., Wen, Z., Zhang, T.: A stochastic extra-step quasi-Newton method for nonsmooth nonconvex optimization. Mathematical Programming pp. 1–47 (2021)

Download references

Acknowledgements

The authors thank the anonymous referees for their careful reading and useful remarks and suggestions that improved the quality of the paper.

Funding

This work was partially supported by the National Natural Science Foundation of China (No. 11971078) and Graduate Research and Innovation Foundation of Chongqing, China (Grant No. CYB23009).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Shengjie Li.

Ethics declarations

Conflict of interest

The authors have not disclosed any conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendices

Appendices

1.1 Appendix A: Proof of Theorem 3.1

For the sake of discussion and without loss of generality, we assume that k is greater than the storage size m. Let the eigenvalues of matrix \(\bar{B}_{k}\) be denoted by \(\lambda _{1}, \lambda _{2}, \cdots , \lambda _{d}\), with \(\lambda _{\max }\) being the maximum eigenvalue and \(\lambda _{\min }\) being the minimum eigenvalue.

Lemma 7.1

Suppose that \(\bar{y}_{i}\) is updated through the formula (3.16). Then, for \(\forall i \in {0, 1,\cdots ,k}\),

$$\begin{aligned} c \le \dfrac{\bar{y}_{i}^{\top }\bar{y}_{i}}{s_{i}^{\top }\bar{y}_{i}} \le \dfrac{(2L + c)^2}{c} \end{aligned}$$
(7.1)

holds.

Proof

Since \(\Vert s_{i}\Vert \Vert \bar{y}_{i}\Vert \ge s_{i}^{\top }\bar{y}_{i}\) and \(s_{i}^{\top }\bar{y}_{i} \ge c\Vert s_{i}\Vert ^2\) (3.17), for \(\forall i \in {0, 1,\cdots ,k}\), we obtain

$$\begin{aligned} \Vert \bar{y}_{i}\Vert \ge c\Vert s_{i}\Vert \end{aligned}$$

holds. Therefore, we get

$$\begin{aligned} \dfrac{\bar{y}_{i}^{\top }\bar{y}_{i}}{s_{k}^{\top }\bar{y}_{i}} \ge \dfrac{\Vert \bar{y}_{i}\Vert ^2}{\Vert s_{i}\Vert \Vert \bar{y}_{i}\Vert } \ge c. \end{aligned}$$
(7.2)

From (3.16), (3.17) and \(\nabla f_{i}\) is L-Lipschitz continuous, the following inequality is true

$$\begin{aligned} \dfrac{\Vert \bar{y}_{i}\Vert }{\Vert s_{i}\Vert } \le \dfrac{\Vert \tilde{y}_{i}\Vert +\Vert c_{i}\Vert \Vert s_{i}\Vert }{\Vert s_{i}\Vert } \le \dfrac{\Vert \tilde{y}_{i}\Vert + \left( c + \dfrac{\Vert s_{i}\Vert \Vert \tilde{y}_{i}\Vert }{\Vert s_{i}\Vert ^2}\right) \Vert s_{i}\Vert }{\Vert s_{i}\Vert } \le 2L + c. \end{aligned}$$

Furthermore, we can obtain

$$\begin{aligned} \dfrac{\bar{y}_{i}^{\top }\bar{y}_{i}}{s_{i}^{\top }\bar{y}_{i}} = \dfrac{\Vert \bar{y}_{i}\Vert ^2}{\Vert s_{i}\Vert ^{2}}\dfrac{\Vert s_{i}\Vert ^2}{s_{i}^{\top }\bar{y}_{i}} \le \dfrac{(2L + c)^2}{c}. \end{aligned}$$
(7.3)

Due to (7.2) and (7.3), we can conclude (7.1). \(\square \)

Lemma 7.2

The updated formula for matrix \(\bar{B}_{k}\) is given by (3.27), and \(\bar{B}_{k-m} = \frac{\bar{y}_{k-m-1}^{\top }\bar{y}_{k-m-1}}{s_{k-m-1}^{\top }\bar{y}_{k-m-1}}\) (3.25). Then, \(\bar{B}_{k}\) satisfies the following inequality:

$$\begin{aligned} \bar{B}_{k} \preceq \lambda _{\max }I \preceq (d+m) \dfrac{(2L + c)^2}{c}I, \end{aligned}$$

where \(\lambda _{\max }\) is maximum eigenvalue of \(\bar{B}_{k}\), d is the dimension and m is history storage size.

Proof

Since \(\bar{B}_{k}\) is positive definite (Remark 3.2) and \(\textrm{tr}(\bar{B}_{k})\) is the sum of all eigenvalues of matrix \(\bar{B}_{k}\), we can deduce that

$$\begin{aligned} \begin{aligned} \lambda _{\max }&\le \lambda _{1} + \lambda _{2} + \cdots + \lambda _{d}\\&= \textrm{tr}(\bar{B}_{k})=\textrm{tr} (\bar{B}_{k-1}) - \dfrac{\Vert \bar{B}_{k-1}s_{k-1}\Vert ^2}{s_{k-1}^{\top } \bar{B}_{k-1}s_{k-1}} +\dfrac{\Vert \bar{y}_{k-1}\Vert ^2}{s_{k-1}^{\top }\bar{y}_{k-1}} ~~~(\text {by }(3.27))\\&\le \textrm{tr} (\bar{B}_{k-1}) + \dfrac{\Vert \bar{y}_{k-1} \Vert ^2}{s_{k-1}^{\top }\bar{y}_{k-1}}\\&\le \textrm{tr} (\bar{B}_{k-1}) + \dfrac{(2L + c)^2}{c} ~~~( \text {by Lemma } 7.1)\\&\le \textrm{tr} (\bar{B}_{k-m}) + m \dfrac{(2L + c)^2}{c}~~~(\text {by } (3.26) \text { and Remark } 3.2)\\&= \textrm{tr} \left( \dfrac{\bar{y}_{k-1}^{\top }\bar{y}_{k-1}}{s_{k-1}^{\top }\bar{y}_{k-1}}I\right) + m \dfrac{(2L + c)^2}{c} ~~~(\text {by } (3.25))\\&\le (d+m) \dfrac{(2L + c)^2}{c} ~~~(\text {by Lemma } 7.1), \end{aligned} \end{aligned}$$

where d is the dimension and m is history storage size. \(\square \)

Lemma 7.3

The updated formula for matrix \(\bar{B}_{k}\) is given by (3.27), and \(\bar{B}_{k-m} = \frac{\bar{y}_{k-m-1}^{\top }\bar{y}_{k-m-1}}{s_{k-m-1}^{\top }\bar{y}_{k-m-1}} I\) (3.25). Then, \(\bar{B}_{k}\) satisfies the following inequality:

$$\begin{aligned} \bar{B}_{k} \succeq \lambda _{\min }I \succeq \left( \dfrac{c^{m+d}}{\lambda _{\max }^{m+d-1}}\right) I, \end{aligned}$$

where \(\lambda _{\min }\) and \(\lambda _{\max }\) are respectively the minimum and maximum eigenvalues of \(\bar{B}_{k}\), d is the dimension and m is history storage size.

Proof

Because \(\bar{B}_{k}\) is a symmetric positive definite matrix (Remark 3.2), \(\det (\bar{B}_{k})\) is the product of all the eigenvalues of matrix \(\bar{B}_{k}\). We can then conclude that

$$\begin{aligned} \lambda _{\min } \ge \dfrac{\lambda _{1} \lambda _{2} \cdots \lambda _{d}}{\lambda _{\max }^{d-1}} = \dfrac{\det (\bar{B}_{k})}{\lambda _{\max }^{d-1}}. \end{aligned}$$
(7.4)

Next, we explore the lower bound of \(\det (\bar{B}_{k})\). From (3.27) and (3.23), we can derive the following inequality

$$\begin{aligned} \textrm{det}(\bar{B}_{k}) =&\textrm{det}\left( \bar{B}_{k-1} - \dfrac{\bar{B}_{k-1}s_{k}s_{k}^{\top }\bar{B}_{k-1}}{s_{k}^{\top }\bar{B}_{k-1}s_{k}} +\dfrac{\bar{y}_{k}\bar{y}_{k}^{\top }}{s_{k}^{\top }\bar{y}_{k}}\right) ~~~(\text {by } (3.27))\\ =&{\det }(\bar{B}_{k-1}) \det \left( I - \dfrac{s_{k}s_{k}^{\top }\bar{B}_{k-1}}{s_{k}^{\top }\bar{B}_{k-1}s_{k}} +\dfrac{\bar{B}_{k-1}^{-1}\bar{y}_{k}\bar{y}_{k}^{\top }}{s_{k}^{\top }\bar{y}_{k}}\right) \\ =&\det (\bar{B}_{k-1})\left( \left( 1-s_k^{\top }\frac{\bar{B}_ks_k}{s_k^{\top }\bar{B}_ks_k}\right) \left( 1+(\bar{B}_k^{-1}\bar{y}_{k})^{\top }\frac{\bar{y}_k}{\bar{y}_k^{\top }s_k}\right) \right. \\&\left. -\left( -s_k^{\top }\frac{\bar{y}_k}{\bar{y}_k^{\top }s_k}\right) \left( \frac{(\bar{B}_k s_k)^{\top }}{s_k^{\top }\bar{B}_ks_k}\bar{B}_k^{-1}\bar{y}_k\right) \right) \\ =&\textrm{det}(\bar{B}_{k-1})\dfrac{s_{k}^{\top }\bar{y}_{k}}{s_{k}^{\top }\bar{B}_{k-1}s_{k}}\\ =&\textrm{det} (\bar{B}_{k-1}) \dfrac{s_{k}^{\top }\bar{y}_{k}}{\Vert s_{k}\Vert ^2}\dfrac{\Vert s_{k}\Vert ^2}{s_{k}^{\top }\bar{B}_{k-1}s_{k}}\\ \ge&\det (\bar{B}_{k-1})\dfrac{c}{\lambda _{\max }}~~~(\text {by Lemma } 7.1)\\ \ge&\det (\bar{B}_{k-m})\left( \dfrac{c}{\lambda _{\max }}\right) ^m ~~~(\text {by } (3.26) \text { and Remark } 3.2)\\ =&\det \left( \dfrac{\bar{y}_{k-1}^{\top }\bar{y}_{k-1}}{s_{k-1}^{\top }\bar{y}_{k-1}} I\right) \left( \dfrac{c}{\lambda _{\max }}\right) ^m~~~(\text {by } (3.25))\\ \ge&c^{d}\left( \dfrac{c}{\lambda _{\max }}\right) ^m ~~~ (\text {by Lemma } 7.1)), \end{aligned}$$

where the third equality follows from the formula \(\det (I+u_1u_2^\textrm{T}+u_3u_4^\textrm{T})=(1+u_1^\textrm{T}u_2)(1+u_3^\textrm{T}u_4)-(u_1^\textrm{T}u_4)(u_2^\textrm{T}u_3)\). Therefore, combining inequality (7.4), we obtain

$$\begin{aligned} \lambda _{\min } \ge \dfrac{c^{m+d}}{\lambda _{\max }^{m+d-1}}, \end{aligned}$$

where d is the dimension and m is history storage size. \(\square \)

Corollary

Combining Lemmas 7.2 and 7.3, if we define

$$\begin{aligned} \bar{\lambda } = (d+m) \dfrac{(2L + c)^2}{c}~\text {and}~ \underline{\lambda } = \dfrac{c^{m+d}}{\bar{\lambda }^{m+d-1}}, \end{aligned}$$
(7.5)

then

$$\begin{aligned} \underline{\lambda } I \preceq \lambda _{\min } I \preceq \bar{B}_{k} \preceq \lambda _{\max } I \preceq \bar{\lambda } I \end{aligned}$$

holds. Combining Remark 3.2 with Eq. (7.5), Theorem 3.1 is established.

1.2 Appendix B: Subproblem Solution

The FISTA algorithm we used in the paper is faithful to the literature [2, Section 4], with essentially no modifications.

Algorithm 3
figure c

FISTA [2, Section 4]

It should be noted that we did not set a separate step size, and moreover, \(t_{j}\) not starting from 1 each time. Because we have not disrupted the standard LBFGS iteration format, based on the compact update formula proposed in [15, Section 2 and 3], we can rewrite Eq. (3.27) as follows:

$$\begin{aligned} B_k:=\bar{B}_{k-m}-\begin{bmatrix}\bar{B}_{k-m}S_k&Y_k\end{bmatrix}\begin{bmatrix}S_k^{\top }\bar{B}_{k-m}S_k& L_k\\ L_k^T& -D_k\end{bmatrix}^{-1}\begin{bmatrix}S_k^{\top }\bar{B}_{k-m}\\ Y_k^{\top }\end{bmatrix}, \end{aligned}$$

where \(Y_{k} = [\bar{y}_{k-m}, \bar{y}_{k-m+1}, \cdots , \bar{y}_{k-1}]\), \(S_{k} = [s_{k-m}, s_{k-m+1}, \cdots , s_{k-1}]\), \(D_{k}\) is the \(m\times m\) diagonal matrix \(D_{k} = \text {diag} [s_{k-m}^{\top } \bar{y}_{k-m}, s_{k-m+1}^{\top } \bar{y}_{k-m+1},\cdots , s_{k-1}^{\top } \bar{y}_{k-1}]\), \(L_{k}\) is the \(m\times m\) matrix

$$\begin{aligned} \left. \left( L_{k}\right) _{i,j}=\left\{ \begin{array}{ll}s_{i-1}^{\top }y_{j-1}& if ~~i>j,\\ 0& otherwise.\end{array}\right. \right. \end{aligned}$$

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Zhang, M., Li, S. A Proximal Stochastic Quasi-Newton Algorithm with Dynamical Sampling and Stochastic Line Search. J Sci Comput 102, 23 (2025). https://doi.org/10.1007/s10915-024-02748-2

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s10915-024-02748-2

Keywords

Mathematics Subject Classification