Abstract
We study the convergence rate of the famous Symmetric Rank-1 (SR1) algorithm, which has wide applications in different scenarios. Although it has been extensively investigated, SR1 still lacks a non-asymptotic superlinear rate compared with other quasi-Newton methods such as DFP and BFGS. In this paper, we address the aforementioned issue to obtain the first explicit non-asymptotic rates of superlinear convergence for the vanilla SR1 methods with a correction strategy that is used to achieve numerical stability. Specifically, the vanilla SR1 with the correction strategy achieves the rate of the form \(\left( \frac{2n\ln (4\varkappa )}{k}\right) ^{k/2}\) for general smooth strongly-convex functions where k is the iteration counter, \(\varkappa \) is the condition number of the objective function, and n is the dimensionality of the problem. Furthermore, the vanilla SR1 algorithm enjoys a little faster convergence rate and can find the optima of the quadratic objective function at most n steps.

Similar content being viewed by others
Notes
Indeed, the proof of [22, Corollary 4.4] gives \(K_0^{\mathrm {greedy\_SR1}} = 2\varkappa \ln (2n+1)+n\ln (4n\varkappa )\).
References
Berahas, A.S., Jahani, M., Richtárik, P., Takáč, M.: Quasi-Newton methods for deep learning: Forget the past, just sample. arXiv preprint arXiv:1901.09997 (2019)
Boyd, S., Boyd, S.P., Vandenberghe, L.: Convex Optimization. Cambridge University Press, Cambridge (2004)
Broyden, C.G.: The convergence of a class of double-rank minimization algorithms: 1. general considerations. IMA J. Appl. Math. 6(1), 76–90 (1970)
Broyden, C.G.: The convergence of a class of double-rank minimization algorithms: 2. the new algorithm. IMA J. Appl. Math. 6(3), 222–231 (1970)
Broyden, C.G., Dennis, J.E., Jr., Moré, J.J.: On the local and superlinear convergence of quasi-Newton methods. IMA J. Appl. Math. 12(3), 223–245 (1973)
Byrd, R.H., Hansen, S.L., Nocedal, J., Singer, Y.: A stochastic quasi-Newton method for large-scale optimization. SIAM J. Optim. 26(2), 1008–1031 (2016)
Byrd, R.H., Khalfan, H.F., Schnabel, R.B.: Analysis of a symmetric rank-one trust region method. SIAM J. Optim. 6(4), 1025–1039 (1996)
Byrd, R.H., Liu, D.C., Nocedal, J.: On the behavior of Broyden’s class of quasi-newton methods. SIAM J. Optim. 2(4), 533–557 (1992)
Byrd, R.H., Nocedal, J.: A tool for the analysis of quasi-Newton methods with application to unconstrained minimization. SIAM J. Numer. Anal. 26(3), 727–739 (1989)
Byrd, R.H., Nocedal, J., Yuan, Y.X.: Global convergence of a cass of quasi-newton methods on convex problems. SIAM J. Numer. Anal. 24(5), 1171–1190 (1987)
Chang, C.C., Lin, C.J.: Libsvm: a library for support vector machines. ACM transactions on intelligent systems and technology (TIST) 2(3), 1–27 (2011)
Conn, A.R., Gould, N.I., Toint, P.L.: Convergence of quasi-Newton matrices generated by the symmetric rank one update. Math. Program. 50(1), 177–195 (1991)
Dixon, L.: Quasi-Newton algorithms generate identical points. Math. Program. 2(1), 383–387 (1972)
Dixon, L.: Quasi Newton techniques generate identical points II: The proofs of four new theorems. Math. Program. 3(1), 345–358 (1972)
Fletcher, R.: A new approach to variable metric algorithms. Comput. J. 13(3), 317–322 (1970)
Goldfarb, D.: A family of variable-metric methods derived by variational means. Math. Comput. 24(109), 23–26 (1970)
Gower, R., Goldfarb, D., Richtárik, P.: Stochastic block BFGS: Squeezing more curvature out of data. In: International Conference on Machine Learning, pp. 1869–1878. PMLR (2016)
Gower, R.M., Richtárik, P.: Randomized quasi-Newton updates are linearly convergent matrix inversion algorithms. SIAM J. Matrix Anal. Appl. 38(4), 1380–1409 (2017)
Jin, Q., Mokhtari, A.: Non-asymptotic superlinear convergence of standard quasi-Newton methods. arXiv preprint arXiv:2003.13607 (2020)
Kao, C., Chen, S.P.: A stochastic quasi-Newton method for simulation response optimization. Eur. J. Oper. Res. 173(1), 30–46 (2006)
Kovalev, D., Gower, R.M., Richtárik, P., Rogozin, A.: Fast linear convergence of randomized BFGS. arXiv preprint arXiv:2002.11337 (2020)
Lin, D., Ye, H., Zhang, Z.: Greedy and random quasi-newton methods with faster explicit superlinear convergence. Adv. Neural. Inf. Process. Syst. 34, 6646–6657 (2021)
Moritz, P., Nishihara, R., Jordan, M.: A linearly-convergent stochastic L-BFGS algorithm. In: Artificial Intelligence and Statistics, pp. 249–258. PMLR (2016)
Nocedal, J., Wright, S.: Numerical Optimization. Springer Science & Business Media, Berlin (2006)
Powell, M.: On the convergence of the variable metric algorithm. IMA J. Appl. Math. 7(1), 21–36 (1971)
Qu, S., Goh, M., Chan, F.T.: Quasi-Newton methods for solving multiobjective optimization. Oper. Res. Lett. 39(5), 397–399 (2011)
Rodomanov, A., Nesterov, Y.: Greedy quasi-Newton methods with explicit superlinear convergence. SIAM J. Optim. 31(1), 785–811 (2021)
Rodomanov, A., Nesterov, Y.: New results on superlinear convergence of classical Quasi-Newton methods. J. Optim. Theory Appl. 188(3), 744–769 (2021)
Rodomanov, A., Nesterov, Y.: Rates of superlinear convergence for classical quasi-Newton methods. Math. Program. 194(1), 159–190 (2022)
Shanno, D.F.: Conditioning of quasi-Newton methods for function minimization. Math. Comput. 24(111), 647–656 (1970)
Wei, Z., Yu, G., Yuan, G., Lian, Z.: The superlinear convergence of a modified BFGS-type method for unconstrained optimization. Comput. Optim. Appl. 29(3), 315–332 (2004)
Acknowledgements
We would like to thank the two anonymous reviewers for their careful works and constructive comments that greatly help us improve the paper quality. Ye was supported in part by the National Natural Science Foundation of China under Grant 12101491. Chang was supported in part by the National Natural Science Foundation for Outstanding Young Scholars of China under Grant 72122018 and in part by the Natural Science Foundation of Shaanxi Province under Grant 2021JC-01. Haishan Ye and Dachao Lin have the equal contributions to this paper.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Appendices
Useful lemmas
Lemma 10
Let approximate Hessians be updated recursively as \(G_{i+1} \triangleq \mathrm {SR1}(A, G_i, u_i)\) (by Eqn. (5)) with an arbitrary \(u_i \in {\mathbb {R}}^n, \forall i \ge 0\). For each \(k \ge 0\), if \(u_i^\top (G_i - A) u_i > 0 \) holds for all \(0 \le i \le k\), then vectors \(u_0,\dots ,u_k\) are linearly independent. Furthermore, it holds that
Proof
Denote \(R_i = G_i - A, \forall i \ge 0\). By the update of SR1, we can obtain that for all \(i \ge 0\),
Thus, we get
where \(\mathrm {Ker}(R_i)\) is the null space of \(R_i\).
Next, we prove \(u_0,\dots , u_k\) are linearly independent provided that \(u_{i}^\top (G_{i} - A) u_{i} > 0, \forall i\le k\) by induction. First, for \(k = 0\), the result holds trivially. Then, we assume that \(u_0,\dots , u_{k}\) are linearly independent for some \(k \ge 0\), and we will show that \(u_0,\dots , u_{k+1}\) are linearly independent provided that \(u_{k+1}^\top (G_{k+1} - A) u_{k+1} > 0\). We try to prove the result by contradiction and assume that \(u_{k+1}\) can be represented as
where scalars \(\alpha _0, \dots , \alpha _{k}\) are not all zero. Applying Eq. (66), we have \(R_{k+1} u_i=0, \forall 0 \le i < k+1\). Then we further obtain that \(R_{k+1} u_{k+1} = R_{k+1} (\alpha _0 u_0 + \dots + \alpha _{k} u_{k}) = 0 + \dots + 0 = 0\). This contradicts the assumption that \(u_{k+1}^\top R_{k+1} u_{k+1} = u_{k+1}^\top (G_{k+1} - A) u_{k+1} > 0\). Thus, \(u_0,\dots , u_{k+1}\) are linearly independent, and we finish the induction.
Finally, Eq. (65) can be immediately obtained by observing that \(u_i \in \mathrm {Ker}(R_{i+1}) \subseteq \mathrm {Ker}(R_k) \) for all \(i = 0, \dots , k-1\) based on Eq. (66). \(\square \)
Upper bound of M for logistic regression
The logistic regression is defined as follows:
where \(a_i \in {\mathbb {R}}^{n}\) is the i-th input vector, \(b_i\in \{-1,1\}\) is the corresponding label, and \(\gamma \ge 0\) is the regularization parameter. Accordingly, for all \(x,h \in {\mathbb {R}}^n\), we have that
and
\(D^3 f(x)[h,h,h] {=} \left. \frac{d^3}{d t^3} f(x+t h)\right| _{t=0}\) is the third derivative of f along the direction h and \((i)\) uses the derivation
Since \(|t(1-t)(1-2t)| \le \frac{\sqrt{3}}{18}, \forall t\in (0,1)\) and the equality holds when \( t = \frac{1}{2} - \frac{1}{2\sqrt{3}}\), we further obtain
By Eq. (68), f(x) has \(L'\)-Lipschitz Hessians with \(L'\triangleq \frac{\sqrt{3}}{18m}\sum _{i=1}^{m} \left\| a_i\right\| ^3\).
Finally, combining with f(x) being \(\gamma \)-strongly convex, we could get that given \(\forall x,y, z, w\in {\mathbb {R}}^n\), it holds that
where the second inequality is because of f(x) is \(\gamma \)-strongly convex. Hence, f is M-strongly self-concordant with
Rights and permissions
Springer Nature or its licensor holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Ye, H., Lin, D., Chang, X. et al. Towards explicit superlinear convergence rate for SR1. Math. Program. 199, 1273–1303 (2023). https://doi.org/10.1007/s10107-022-01865-w
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10107-022-01865-w