A Proximal Stochastic Quasi-Newton Algorithm with Dynamical Sampling and Stochastic Line Search

Zhang, Mengxiang; Li, Shengjie

doi:10.1007/s10915-024-02748-2

A Proximal Stochastic Quasi-Newton Algorithm with Dynamical Sampling and Stochastic Line Search

Published: 02 December 2024

Volume 102, article number 23, (2025)
Cite this article

Journal of Scientific Computing Aims and scope Submit manuscript

Mengxiang Zhang¹ &
Shengjie Li¹

169 Accesses
Explore all metrics

Abstract

In the field of machine learning, many large-scale optimization problems can be decomposed into the sum of two functions: a smooth function and a nonsmooth function with a simple proximal mapping. In light of this, our paper introduces a novel variant of the proximal stochastic quasi-Newton algorithm, grounded in three key components: (i) developing an adaptive sampling method that dynamically increases the sample size during the iteration process, thus preventing rapid growth in sample size and mitigating the noise introduced by the stochastic approximation method; (ii) the integration of stochastic line search to ensure a sufficient decrease in the expected value of the objective function; and (iii) a stable update scheme for the stochastic modified limited-memory Broyden-Fletcher-Goldfarb-Shanno algorithm. For a general objective function, it can be proven that the limit points of the generated sequence almost surely converge to stationary points. Furthermore, the convergence rate and the number of required gradient computations for this process have been analyzed. In the case of a strongly convex objective function, a global linear convergence rate can be achieved, and the number of required gradient computations is thoroughly examined. Finally, numerical experiments demonstrate the robustness of the proposed method across various hyperparameter settings, establishing its competitiveness compared to state-of-the-art methods.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

An Overview of Stochastic Quasi-Newton Methods for Large-Scale Machine Learning

Article Open access 25 February 2023

Proximal Gradient Method with Extrapolation and Line Search for a Class of Non-convex and Non-smooth Problems

Article 19 December 2023

A Line Search Based Proximal Stochastic Gradient Algorithm with Dynamical Variance Reduction

Article 23 December 2022

Availability of Data and Materials

The datasets analysed during the current study are available in links given in the paper.

References

Beck, A.: First-order methods in optimization. SIAM (2017)
Beck, A., Teboulle, M.: A fast iterative shrinkage-thresholding algorithm for linear inverse problems. SIAM J. Imag. Sci. 2(1), 183–202 (2009)
Article MathSciNet MATH Google Scholar
Beiser, F., Keith, B., Urbainczyk, S., Wohlmuth, B.: Adaptive sampling strategies for risk-averse stochastic optimization with constraints. IMA J. Numer. Anal. 43(6), 3729–3765 (2023)
Article MathSciNet MATH Google Scholar
Berahas, A.S., Bollapragada, R., Nocedal, J.: An investigation of Newton-sketch and subsampled Newton methods. Optimiz. Methods Softw. 35(4), 661–680 (2020)
Article MathSciNet MATH Google Scholar
Berahas, A.S., Nocedal, J., Takác, M.: A multi-batch L-BFGS method for machine learning. Adv. Neural Inf. Proc. Syst. 29, 16 (2016)
MATH Google Scholar
Bollapragada, R., Byrd, R.H., Nocedal, J.: Adaptive sampling strategies for stochastic optimization. SIAM J. Opt. 28(4), 3312–3343 (2018)
Article MathSciNet MATH Google Scholar
Bollapragada, R., Byrd, R.H., Nocedal, J.: Exact and inexact subsampled Newton methods for optimization. IMA J. Numer. Anal. 39(2), 545–578 (2019)
Article MathSciNet MATH Google Scholar
Bollapragada, R., Nocedal, J., Mudigere, D., Shi, H.J., Tang, P.T.P.: A progressive batching L-BFGS method for machine learning. In: International Conference on Machine Learning, pp. 620–629. PMLR (2018)
Bonettini, S., Loris, I., Porta, F., Prato, M.: Variable metric inexact line-search-based methods for nonsmooth optimization. SIAM J. Opt. 26(2), 891–921 (2016)
Article MathSciNet MATH Google Scholar
Botev, A., Ritter, H., Barber, D.: Practical gauss-Newton optimisation for deep learning. In: International Conference on Machine Learning, pp. 557–565. PMLR (2017)
Bottou, L., Curtis, F.E., Nocedal, J.: Optimization methods for large-scale machine learning. SIAM Rev. 60(2), 223–311 (2018)
Article MathSciNet MATH Google Scholar
Byrd, R.H., Chin, G.M., Nocedal, J., Wu, Y.: Sample size selection in optimization methods for machine learning. Math. Progr. 134(1), 127–155 (2012)
Article MathSciNet MATH Google Scholar
Byrd, R.H., Hansen, S.L., Nocedal, J., Singer, Y.: A stochastic quasi-Newton method for large-scale optimization. SIAM J. Opt. 26(2), 1008–1031 (2016)
Article MathSciNet MATH Google Scholar
Byrd, R.H., Nocedal, J., Oztoprak, F.: An inexact successive quadratic approximation method for l-1 regularized optimization. Math. Progr. 157(2), 375–396 (2016)
Article MathSciNet MATH Google Scholar
Byrd, R.H., Nocedal, J., Schnabel, R.B.: Representations of quasi-Newton matrices and their use in limited memory methods. Math. Progr. 63(1), 129–156 (1994)
Article MathSciNet MATH Google Scholar
Combettes, P.L., Pesquet, J.C.: Proximal splitting methods in signal processing. Fixed-point algorithms for inverse problems in science and engineering pp. 185–212 (2011)
Defazio, A., Domke, J., et al.: Finito: A faster, permutable incremental gradient method for big data problems. In: International Conference on Machine Learning, pp. 1125–1133. PMLR (2014)
Di Serafino, D., Krejić, N., Krklec Jerinkić, N., Viola, M.: Lsos: line-search second-order stochastic optimization methods for nonconvex finite sums. Math. Computat. 92(341), 1273–1299 (2023)
Article MathSciNet MATH Google Scholar
Fan, R.E., Chang, K.W., Hsieh, C.J., Wang, X.R., Lin, C.J.: Liblinear: a library for large linear classification. J. Machi. Learn. Res. 9, 1871–1874 (2008)
MATH Google Scholar
Franchini, G., Porta, F., Ruggiero, V., Trombini, I.: A line search based proximal stochastic gradient algorithm with dynamical variance reduction. J. Sci. Comput. 94(1), 23 (2023)
Article MathSciNet MATH Google Scholar
Ghadimi, S., Lan, G.: Optimal stochastic approximation algorithms for strongly convex stochastic composite optimization i: a generic algorithmic framework. SIAM J. Opt. 22(4), 1469–1492 (2012)
Article MathSciNet MATH Google Scholar
Goldman, R.: Curvature formulas for implicit curves and surfaces. Comput. Aided Geomet. Des. 22(7), 632–658 (2005)
Article MathSciNet MATH Google Scholar
Hastie, T., Tibshirani, R., Friedman, J.H., Friedman, J.H.: The elements of statistical learning: data mining, inference, and prediction, vol. 2, p. 103. Springer, London (2009)
Book MATH Google Scholar
Reddi, J., S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Adv. Neural Inf. Proc. Syst. 29, 16 (2016)
Johnson, R., Zhang, T.: Accelerating stochastic gradient descent using predictive variance reduction. Adv. Neural Inf. Proc. Syst. 26, 16 (2013)
MATH Google Scholar
Kanzow, C., Lechner, T.: Globalized inexact proximal Newton-type methods for nonconvex composite functions. Computat. Opt. Appl. 78(2), 377–410 (2021)
Article MathSciNet MATH Google Scholar
Lee, Cp., Wright, S.J.: Inexact successive quadratic approximation for regularized optimization. Computat. Opt. Appl. 72, 641–674 (2019)
Article MathSciNet MATH Google Scholar
Lee, J.D., Sun, Y., Saunders, M.A.: Proximal Newton-type methods for minimizing composite functions. SIAM J. Opt. 24(3), 1420–1443 (2014)
Article MathSciNet MATH Google Scholar
Li, D.H., Fukushima, M.: A modified BFGS method and its global convergence in nonconvex minimization. J. Computat. Appl. Math. 129(1–2), 15–35 (2001)
Article MathSciNet MATH Google Scholar
Li, D.H., Fukushima, M.: On the global convergence of the BFGS method for nonconvex unconstrained optimization problems. SIAM J. Opt. 11(4), 1054–1064 (2001)
Article MathSciNet MATH Google Scholar
Liu, D.C., Nocedal, J.: On the limited memory BFGS method for large scale optimization. Math. Progr. 45(1–3), 503–528 (1989)
Article MathSciNet MATH Google Scholar
Mannel, F., Aggrawal, H.O., Modersitzki, J.: A structured L-BFGS method and its application to inverse problems. Inverse Problems (2023)
Miller, I., Miller, M., Freund, J.E.: John E. Freund’s Mathematical Statistics with Applications. 8th edition. Pearson Education Limited, America (2014)
Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course, vol. 87. Springer Science & Business Media, Cham (2003)
MATH Google Scholar
Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Sarah: A novel method for machine learning problems using stochastic recursive gradient. In: International conference on machine learning, pp. 2613–2621. PMLR (2017)
Nocedal, J.: Theory of algorithms for unconstrained optimization. Acta Numer. 1, 199–242 (1992)
Article MathSciNet MATH Google Scholar
Pham, N.H., Nguyen, L.M., Phan, D.T., Tran-Dinh, Q.: Proxsarah: an efficient algorithmic framework for stochastic composite nonconvex optimization. J. Mach. Learn. Res. 21(1), 4455–4502 (2020)
MathSciNet MATH Google Scholar
Pilanci, M., Wainwright, M.J.: Newton sketch: A near linear-time optimization algorithm with linear-quadratic convergence. SIAM J. Opt. 27(1), 205–245 (2017)
Article MathSciNet MATH Google Scholar
Robbins, H., Siegmund, D.: A convergence theorem for non negative almost supermartingales and some applications. In: Optimizing methods in statistics, pp. 233–257. Elsevier (1971)
Roux, N., Schmidt, M., Bach, F.: A stochastic gradient method with an exponential convergence rate for finite training sets. Adv. Neural Inf. Process. Syst. 25, 12 (2012)
MATH Google Scholar
Saratchandran, H., Chng, S.F., Ramasinghe, S., MacDonald, L., Lucey, S.: Curvature-aware training for coordinate networks. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 13328–13338 (2023)
Schmidt, M., Le Roux, N., Bach, F.: Minimizing finite sums with the stochastic average gradient. Math. Progr. 162, 83–112 (2017)
Article MathSciNet MATH Google Scholar
Shalev-Shwartz, S., Ben-David, S.: Understanding Machine Learning: From Theory to Algorithms. Cambridge University Press, Cambridge (2014)
Book MATH Google Scholar
Shalev-Shwartz, S., Tewari, A.: Stochastic methods for $l_{1}$ regularized loss minimization. In: Proceedings of the 26th Annual International Conference on Machine Learning, pp. 929–936 (2009)
Shi, J., Yin, W., Osher, S., Sajda, P.: A fast hybrid algorithm for large-scale $l_{1}$-regularized logistic regression. J. Mach. Learn. Res. 11, 713–741 (2010)
MathSciNet MATH Google Scholar
Vapnik, V.: The Nature of Statistical Learning Theory. Springer science & business media, Cham (1999)
MATH Google Scholar
Wang, J., Zhang, T.: Utilizing second order information in minibatch stochastic variance reduced proximal iterations. J. Mach. Learn. Res. 20(1), 1578–1633 (2019)
MathSciNet MATH Google Scholar
Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-Newton methods for nonconvex stochastic optimization. SIAM J. Opt. 27(2), 927–956 (2017)
Article MathSciNet MATH Google Scholar
Wang, X., Wang, X., Yuan, Yx.: Stochastic proximal quasi-Newton methods for non-convex composite optimization. Opt. Methods Softw. 34(5), 922–948 (2019)
Article MathSciNet MATH Google Scholar
Wang, X., Zhang, H.: Inexact proximal stochastic second-order methods for nonconvex composite optimization. Opt. Methods Softw. 35(4), 808–835 (2020)
Article MathSciNet MATH Google Scholar
Wright, S.J.: Numerical optimization (2006)
Xiao, L., Zhang, T.: A proximal stochastic gradient method with progressive variance reduction. SIAM J. Opt. 24(4), 2057–2075 (2014)
Article MathSciNet MATH Google Scholar
Xie, Y., Bollapragada, R., Byrd, R., Nocedal, J.: Constrained and composite optimization via adaptive sampling methods. IMA J. Numer. Anal. 44(2), 680–709 (2024)
Article MathSciNet MATH Google Scholar
Xu, P., Roosta, F., Mahoney, M.W.: Second-order optimization for non-convex machine learning: An empirical study. In: Proceedings of the 2020 SIAM International Conference on Data Mining, pp. 199–207. SIAM (2020)
Xu, P., Yang, J., Roosta, F., Ré, C., Mahoney, M.W.: Sub-sampled Newton methods with non-uniform sampling. Advances in Neural Information Processing Systems 29 (2016)
Yang, M., Milzarek, A., Wen, Z., Zhang, T.: A stochastic extra-step quasi-Newton method for nonsmooth nonconvex optimization. Mathematical Programming pp. 1–47 (2021)

Download references

Acknowledgements

The authors thank the anonymous referees for their careful reading and useful remarks and suggestions that improved the quality of the paper.

Funding

This work was partially supported by the National Natural Science Foundation of China (No. 11971078) and Graduate Research and Innovation Foundation of Chongqing, China (Grant No. CYB23009).

Author information

Authors and Affiliations

College of Mathematics and Statistics, Chongqing University, Chongqing, 401331, China
Mengxiang Zhang & Shengjie Li

Authors

Mengxiang Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Shengjie Li
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Shengjie Li.

Ethics declarations

Conflict of interest

The authors have not disclosed any conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendices

1.1 Appendix A: Proof of Theorem 3.1

For the sake of discussion and without loss of generality, we assume that k is greater than the storage size m. Let the eigenvalues of matrix $\bar{B}_{k}$ be denoted by $\lambda _{1}, \lambda _{2}, \cdots , \lambda _{d}$, with $\lambda _{\max }$ being the maximum eigenvalue and $\lambda _{\min }$ being the minimum eigenvalue.

Lemma 7.1

Suppose that $\bar{y}_{i}$ is updated through the formula (3.16). Then, for $\forall i \in {0, 1,\cdots ,k}$,

$$\begin{aligned} c \le \dfrac{\bar{y}_{i}^{\top }\bar{y}_{i}}{s_{i}^{\top }\bar{y}_{i}} \le \dfrac{(2L + c)^2}{c} \end{aligned}$$

(7.1)

holds.

Proof

Since $\Vert s_{i}\Vert \Vert \bar{y}_{i}\Vert \ge s_{i}^{\top }\bar{y}_{i}$ and $s_{i}^{\top }\bar{y}_{i} \ge c\Vert s_{i}\Vert ^2$ (3.17), for $\forall i \in {0, 1,\cdots ,k}$, we obtain

$$\begin{aligned} \Vert \bar{y}_{i}\Vert \ge c\Vert s_{i}\Vert \end{aligned}$$

holds. Therefore, we get

$$\begin{aligned} \dfrac{\bar{y}_{i}^{\top }\bar{y}_{i}}{s_{k}^{\top }\bar{y}_{i}} \ge \dfrac{\Vert \bar{y}_{i}\Vert ^2}{\Vert s_{i}\Vert \Vert \bar{y}_{i}\Vert } \ge c. \end{aligned}$$

(7.2)

From (3.16), (3.17) and $\nabla f_{i}$ is L-Lipschitz continuous, the following inequality is true

$$\begin{aligned} \dfrac{\Vert \bar{y}_{i}\Vert }{\Vert s_{i}\Vert } \le \dfrac{\Vert \tilde{y}_{i}\Vert +\Vert c_{i}\Vert \Vert s_{i}\Vert }{\Vert s_{i}\Vert } \le \dfrac{\Vert \tilde{y}_{i}\Vert + \left( c + \dfrac{\Vert s_{i}\Vert \Vert \tilde{y}_{i}\Vert }{\Vert s_{i}\Vert ^2}\right) \Vert s_{i}\Vert }{\Vert s_{i}\Vert } \le 2L + c. \end{aligned}$$

Furthermore, we can obtain

$$\begin{aligned} \dfrac{\bar{y}_{i}^{\top }\bar{y}_{i}}{s_{i}^{\top }\bar{y}_{i}} = \dfrac{\Vert \bar{y}_{i}\Vert ^2}{\Vert s_{i}\Vert ^{2}}\dfrac{\Vert s_{i}\Vert ^2}{s_{i}^{\top }\bar{y}_{i}} \le \dfrac{(2L + c)^2}{c}. \end{aligned}$$

(7.3)

Due to (7.2) and (7.3), we can conclude (7.1). $\square $

Lemma 7.2

The updated formula for matrix $\bar{B}_{k}$ is given by (3.27), and $\bar{B}_{k-m} = \frac{\bar{y}_{k-m-1}^{\top }\bar{y}_{k-m-1}}{s_{k-m-1}^{\top }\bar{y}_{k-m-1}}$ (3.25). Then, $\bar{B}_{k}$ satisfies the following inequality:

$$\begin{aligned} \bar{B}_{k} \preceq \lambda _{\max }I \preceq (d+m) \dfrac{(2L + c)^2}{c}I, \end{aligned}$$

where $\lambda _{\max }$ is maximum eigenvalue of $\bar{B}_{k}$, d is the dimension and m is history storage size.

Proof

Since $\bar{B}_{k}$ is positive definite (Remark 3.2) and $\textrm{tr}(\bar{B}_{k})$ is the sum of all eigenvalues of matrix $\bar{B}_{k}$, we can deduce that

$$\begin{aligned} \begin{aligned} \lambda _{\max }&\le \lambda _{1} + \lambda _{2} + \cdots + \lambda _{d}\\&= \textrm{tr}(\bar{B}_{k})=\textrm{tr} (\bar{B}_{k-1}) - \dfrac{\Vert \bar{B}_{k-1}s_{k-1}\Vert ^2}{s_{k-1}^{\top } \bar{B}_{k-1}s_{k-1}} +\dfrac{\Vert \bar{y}_{k-1}\Vert ^2}{s_{k-1}^{\top }\bar{y}_{k-1}} ~~~(\text {by }(3.27))\\&\le \textrm{tr} (\bar{B}_{k-1}) + \dfrac{\Vert \bar{y}_{k-1} \Vert ^2}{s_{k-1}^{\top }\bar{y}_{k-1}}\\&\le \textrm{tr} (\bar{B}_{k-1}) + \dfrac{(2L + c)^2}{c} ~~~( \text {by Lemma } 7.1)\\&\le \textrm{tr} (\bar{B}_{k-m}) + m \dfrac{(2L + c)^2}{c}~~~(\text {by } (3.26) \text { and Remark } 3.2)\\&= \textrm{tr} \left( \dfrac{\bar{y}_{k-1}^{\top }\bar{y}_{k-1}}{s_{k-1}^{\top }\bar{y}_{k-1}}I\right) + m \dfrac{(2L + c)^2}{c} ~~~(\text {by } (3.25))\\&\le (d+m) \dfrac{(2L + c)^2}{c} ~~~(\text {by Lemma } 7.1), \end{aligned} \end{aligned}$$

where d is the dimension and m is history storage size. $\square $

Lemma 7.3

The updated formula for matrix $\bar{B}_{k}$ is given by (3.27), and $\bar{B}_{k-m} = \frac{\bar{y}_{k-m-1}^{\top }\bar{y}_{k-m-1}}{s_{k-m-1}^{\top }\bar{y}_{k-m-1}} I$ (3.25). Then, $\bar{B}_{k}$ satisfies the following inequality:

$$\begin{aligned} \bar{B}_{k} \succeq \lambda _{\min }I \succeq \left( \dfrac{c^{m+d}}{\lambda _{\max }^{m+d-1}}\right) I, \end{aligned}$$

where $\lambda _{\min }$ and $\lambda _{\max }$ are respectively the minimum and maximum eigenvalues of $\bar{B}_{k}$, d is the dimension and m is history storage size.

Proof

Because $\bar{B}_{k}$ is a symmetric positive definite matrix (Remark 3.2), $\det (\bar{B}_{k})$ is the product of all the eigenvalues of matrix $\bar{B}_{k}$. We can then conclude that

$$\begin{aligned} \lambda _{\min } \ge \dfrac{\lambda _{1} \lambda _{2} \cdots \lambda _{d}}{\lambda _{\max }^{d-1}} = \dfrac{\det (\bar{B}_{k})}{\lambda _{\max }^{d-1}}. \end{aligned}$$

(7.4)

Next, we explore the lower bound of $\det (\bar{B}_{k})$. From (3.27) and (3.23), we can derive the following inequality

$$\begin{aligned} \textrm{det}(\bar{B}_{k}) =&\textrm{det}\left( \bar{B}_{k-1} - \dfrac{\bar{B}_{k-1}s_{k}s_{k}^{\top }\bar{B}_{k-1}}{s_{k}^{\top }\bar{B}_{k-1}s_{k}} +\dfrac{\bar{y}_{k}\bar{y}_{k}^{\top }}{s_{k}^{\top }\bar{y}_{k}}\right) ~~~(\text {by } (3.27))\\ =&{\det }(\bar{B}_{k-1}) \det \left( I - \dfrac{s_{k}s_{k}^{\top }\bar{B}_{k-1}}{s_{k}^{\top }\bar{B}_{k-1}s_{k}} +\dfrac{\bar{B}_{k-1}^{-1}\bar{y}_{k}\bar{y}_{k}^{\top }}{s_{k}^{\top }\bar{y}_{k}}\right) \\ =&\det (\bar{B}_{k-1})\left( \left( 1-s_k^{\top }\frac{\bar{B}_ks_k}{s_k^{\top }\bar{B}_ks_k}\right) \left( 1+(\bar{B}_k^{-1}\bar{y}_{k})^{\top }\frac{\bar{y}_k}{\bar{y}_k^{\top }s_k}\right) \right. \\&\left. -\left( -s_k^{\top }\frac{\bar{y}_k}{\bar{y}_k^{\top }s_k}\right) \left( \frac{(\bar{B}_k s_k)^{\top }}{s_k^{\top }\bar{B}_ks_k}\bar{B}_k^{-1}\bar{y}_k\right) \right) \\ =&\textrm{det}(\bar{B}_{k-1})\dfrac{s_{k}^{\top }\bar{y}_{k}}{s_{k}^{\top }\bar{B}_{k-1}s_{k}}\\ =&\textrm{det} (\bar{B}_{k-1}) \dfrac{s_{k}^{\top }\bar{y}_{k}}{\Vert s_{k}\Vert ^2}\dfrac{\Vert s_{k}\Vert ^2}{s_{k}^{\top }\bar{B}_{k-1}s_{k}}\\ \ge&\det (\bar{B}_{k-1})\dfrac{c}{\lambda _{\max }}~~~(\text {by Lemma } 7.1)\\ \ge&\det (\bar{B}_{k-m})\left( \dfrac{c}{\lambda _{\max }}\right) ^m ~~~(\text {by } (3.26) \text { and Remark } 3.2)\\ =&\det \left( \dfrac{\bar{y}_{k-1}^{\top }\bar{y}_{k-1}}{s_{k-1}^{\top }\bar{y}_{k-1}} I\right) \left( \dfrac{c}{\lambda _{\max }}\right) ^m~~~(\text {by } (3.25))\\ \ge&c^{d}\left( \dfrac{c}{\lambda _{\max }}\right) ^m ~~~ (\text {by Lemma } 7.1)), \end{aligned}$$

where the third equality follows from the formula $\det (I+u_1u_2^\textrm{T}+u_3u_4^\textrm{T})=(1+u_1^\textrm{T}u_2)(1+u_3^\textrm{T}u_4)-(u_1^\textrm{T}u_4)(u_2^\textrm{T}u_3)$. Therefore, combining inequality (7.4), we obtain

$$\begin{aligned} \lambda _{\min } \ge \dfrac{c^{m+d}}{\lambda _{\max }^{m+d-1}}, \end{aligned}$$

where d is the dimension and m is history storage size. $\square $

Corollary

Combining Lemmas 7.2 and 7.3, if we define

$$\begin{aligned} \bar{\lambda } = (d+m) \dfrac{(2L + c)^2}{c}~\text {and}~ \underline{\lambda } = \dfrac{c^{m+d}}{\bar{\lambda }^{m+d-1}}, \end{aligned}$$

(7.5)

then

$$\begin{aligned} \underline{\lambda } I \preceq \lambda _{\min } I \preceq \bar{B}_{k} \preceq \lambda _{\max } I \preceq \bar{\lambda } I \end{aligned}$$

holds. Combining Remark 3.2 with Eq. (7.5), Theorem 3.1 is established.

1.2 Appendix B: Subproblem Solution

The FISTA algorithm we used in the paper is faithful to the literature [2, Section 4], with essentially no modifications.

It should be noted that we did not set a separate step size, and moreover, $t_{j}$ not starting from 1 each time. Because we have not disrupted the standard LBFGS iteration format, based on the compact update formula proposed in [15, Section 2 and 3], we can rewrite Eq. (3.27) as follows:

$$\begin{aligned} B_k:=\bar{B}_{k-m}-\begin{bmatrix}\bar{B}_{k-m}S_k&Y_k\end{bmatrix}\begin{bmatrix}S_k^{\top }\bar{B}_{k-m}S_k& L_k\\ L_k^T& -D_k\end{bmatrix}^{-1}\begin{bmatrix}S_k^{\top }\bar{B}_{k-m}\\ Y_k^{\top }\end{bmatrix}, \end{aligned}$$

where $Y_{k} = [\bar{y}_{k-m}, \bar{y}_{k-m+1}, \cdots , \bar{y}_{k-1}]$, $S_{k} = [s_{k-m}, s_{k-m+1}, \cdots , s_{k-1}]$, $D_{k}$ is the $m\times m$ diagonal matrix $D_{k} = \text {diag} [s_{k-m}^{\top } \bar{y}_{k-m}, s_{k-m+1}^{\top } \bar{y}_{k-m+1},\cdots , s_{k-1}^{\top } \bar{y}_{k-1}]$, $L_{k}$ is the $m\times m$ matrix

$$\begin{aligned} \left. \left( L_{k}\right) _{i,j}=\left\{ \begin{array}{ll}s_{i-1}^{\top }y_{j-1}& if ~~i>j,\\ 0& otherwise.\end{array}\right. \right. \end{aligned}$$

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Zhang, M., Li, S. A Proximal Stochastic Quasi-Newton Algorithm with Dynamical Sampling and Stochastic Line Search. J Sci Comput 102, 23 (2025). https://doi.org/10.1007/s10915-024-02748-2

Download citation

Received: 06 February 2024
Revised: 24 September 2024
Accepted: 18 November 2024
Published: 02 December 2024
DOI: https://doi.org/10.1007/s10915-024-02748-2

Keywords

Mathematics Subject Classification

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A Proximal Stochastic Quasi-Newton Algorithm with Dynamical Sampling and Stochastic Line Search

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

An Overview of Stochastic Quasi-Newton Methods for Large-Scale Machine Learning

Proximal Gradient Method with Extrapolation and Line Search for a Class of Non-convex and Non-smooth Problems

A Line Search Based Proximal Stochastic Gradient Algorithm with Dynamical Variance Reduction

Availability of Data and Materials

References

Acknowledgements

Funding

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Appendices

Appendices

1.1 Appendix A: Proof of Theorem 3.1

Lemma 7.1

Proof

Lemma 7.2

Proof

Lemma 7.3

Proof

Corollary

1.2 Appendix B: Subproblem Solution

Rights and permissions

About this article

Cite this article

Keywords

Mathematics Subject Classification

Subscribe and save

Buy Now

Navigation

A Proximal Stochastic Quasi-Newton Algorithm with Dynamical Sampling and Stochastic Line Search

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

An Overview of Stochastic Quasi-Newton Methods for Large-Scale Machine Learning

Proximal Gradient Method with Extrapolation and Line Search for a Class of Non-convex and Non-smooth Problems

A Line Search Based Proximal Stochastic Gradient Algorithm with Dynamical Variance Reduction

Availability of Data and Materials

References

Acknowledgements

Funding

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Appendices

Appendices

1.1 Appendix A: Proof of Theorem 3.1

Lemma 7.1

Proof

Lemma 7.2

Proof

Lemma 7.3

Proof

Corollary

1.2 Appendix B: Subproblem Solution

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Mathematics Subject Classification

Subscribe and save

Buy Now

Search

Navigation