Abstract
This work presents a novel analysis that allows to achieve tight complexity bounds of gradient-based methods for convex optimization. We start by identifying some of the pitfalls rooted in the classical complexity analysis of the gradient descent method, and show how they can be remedied. Our methodology hinges on elementary and direct arguments in the spirit of the classical analysis. It allows us to establish some new (and reproduce known) tight complexity results for several fundamental algorithms including, gradient descent, proximal point and proximal gradient methods which previously could be proven only through computer-assisted convergence proof arguments.
Similar content being viewed by others
Notes
Though we will commonly write “solving a problem”, here and throughout the rest of the paper, it should be understood as finding an approximate solution to the problem.
Inequality (\(\bigstar \)) will play a starring role in this work, hence the label.
By applying (\(\bigstar \)) twice at (x, y), first as is and secondly after reversing the roles of (x, y).
In case that \(\Vert \nabla f(x^{n+1})\Vert =\Vert \nabla f(x^{n})\Vert \).
Under the convention that \(T_{-1}:=0\).
Observe that we replace the index from i to n when we use (4.7) in our development.
Clearly, the example presented in [4] for the gradient projection method in order to illustrate that the complexity bound is tight holds also for the more general proximal gradient method.
References
Beck, A., Teboulle, M.: A fast iterative shrinkage-thresholding algorithm for linear inverse problems. SIAM J. Imag. Sci. 2(1), 183–202 (2009)
Beck, A.: First-Order Methods in Optimization, vol. 25. SIAM, Philadelphia (2017)
Bertsekas, D.P.: Convex Optimization Algorithms. Athena Scientific, Belmont (2015)
Drori, Y.: Contributions to the complexity analysis of optimization algorithms. Ph.D. Thesis Tel-Aviv University (2014)
Drori, Y., Teboulle, M.: Performance of first-order methods for smooth convex minimization: a novel approach. Math. Program. Ser. A 145(1–2), 451–482 (2014)
Goldstein, A.A.: Convex programming in Hilbert space. B. Am. Math. Soc. 70(5), 709–710 (1964)
Güler, O.: On the convergence of the proximal point algorithm for convex minimization. SIAM J. Control. Optim. 29(2), 403–419 (1991)
Graham, R., Knuth, D., Patashnik, O.: Concrete Mathematics: A Foundation for Computer Science, 2nd edn. Addison-Wesley, Boston (1994)
Kim, D., Fessler, J.A.: Optimizing the efficiency of first-order methods for decreasing the gradient of smooth convex functions. J. Optim. Theory Appl. 188, 192–219 (2021)
Levitin, E.S., Polyak, B.T.: Constrained minimization methods. USSR Comp. Math. Math. Phys. 6(5), 1–50 (1966)
Lions, P.L., Mercier, I.: Splitting algorithms for the sum of two nonlinear operators. SIAM J. Numer. Anal. 16(6), 964–979 (1979)
Martinet, B.: Régularisation d’inéquations variationnelles par approximations successives. Rev. Française Informatique. Recherche Opérationnelle 4, 154–158 (1970)
Moreau, J.-J.: Proximité et dualité dans un espace hilbertien. Bull. Soc. Math. France 93, 273–299 (1965)
Passty, G.B.: Ergodic convergence to a zero of the sum of monotone operators in Hilbert space. J. Math. Anal. Appl. 72(2), 383–390 (1979)
Sabach, S., Teboulle, M.: Lagrangian methods for composite optimization. Handb. Numer. Anal. 20, 401–436 (2019)
Taylor, A.B., Hendrickx, J.M., Glineur, F.: Smooth strongly convex interpolation and exact worst-case performance of first-order methods. Math. Program. Ser. A 161(1–2), 307–345 (2017)
Taylor, A.B., Hendrickx, J.M., Glineur, F.: Exact worst-case performance of first-order methods for composite convex optimization. SIAM J. Optim. 27(3), 1283–1313 (2017)
Teboulle, M.: A simplified view of first order methods for optimization. Math. Program. Ser. B 170(1), 67–96 (2018)
Funding
This research was partially supported by the Israel Science Foundation, under ISF Grants 1844-16, and 2619-20.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Appendix A
Appendix A
In the following lemma we establish a useful relation between \(\lambda _n\) and \(T_n\).
Lemma 13
For a given positive integer k, let \(\{t_n\}_{n=0}^{k-1}\) be the sequence defined by the recurrence relation (4.4). Then for any \(n=0,1,\dots ,k-1\) it holds that
Proof
Recall that according to (4.4), \(Lt_0=\sqrt{2}\) and that \(Lt_n\) is the positive root of
for any \(n=1,2,\dots ,k-1\). Thus, \(\lambda _0=Lt_0-1=\sqrt{2}-1\) and \(\lambda _n=Lt_n-1\) is the positive root of
for any \(n=1,2,\dots ,k-1\). The last quadratic equation can be written as
Using the relation \(LT_n=LT_{n-1}+Lt_n=LT_{n-1}+\lambda _n+1\) we can write the above as
The above holds for \(n=0,1,\dots ,k-1\) (also for \(n=0\)) since
\(\square \)
We are now ready to prove Lemma 6.
Proof of Lemma 6
We begin by proving that \(\rho _n=0\) for all \(n=0,1,\dots ,k-1\). Since \(\lambda _0+1>0\) in order to examine whether \(\rho _0=(\lambda _0+1)(\tau (\lambda _0)-\lambda _0)=0\) it is enough to verify that \(\lambda _0=\tau (\lambda _0)\). The latter relation holds for \(\lambda _0=\sqrt{2}-1\). Indeed,
To show that \(\rho _n=0\) for all \(n=1,2,\dots ,k-1\) we first observe that due to Lemma 13
Hence,
and
and thus
Using the above we can write for all \(n=1,2,\dots ,k-1\)
Finally, the alleged value of \(\rho _k\) is established due to Lemma 13 as follows
which completes the proof.\(\square \)
Rights and permissions
Springer Nature or its licensor holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Teboulle, M., Vaisbourd, Y. An elementary approach to tight worst case complexity analysis of gradient based methods. Math. Program. 201, 63–96 (2023). https://doi.org/10.1007/s10107-022-01899-0
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10107-022-01899-0
Keywords
- Convex minimization
- Gradient descent
- Worst-case complexity analysis
- Performance estimation problem
- Composite minimization
- Proximal schemes
- Global rate of convergence