An elementary approach to tight worst case complexity analysis of gradient based methods

Teboulle, Marc; Vaisbourd, Yakov

doi:10.1007/s10107-022-01899-0

An elementary approach to tight worst case complexity analysis of gradient based methods

Full Length Paper
Series A
Published: 13 October 2022

Volume 201, pages 63–96, (2023)
Cite this article

Mathematical Programming Submit manuscript

1662 Accesses
Explore all metrics

Abstract

This work presents a novel analysis that allows to achieve tight complexity bounds of gradient-based methods for convex optimization. We start by identifying some of the pitfalls rooted in the classical complexity analysis of the gradient descent method, and show how they can be remedied. Our methodology hinges on elementary and direct arguments in the spirit of the classical analysis. It allows us to establish some new (and reproduce known) tight complexity results for several fundamental algorithms including, gradient descent, proximal point and proximal gradient methods which previously could be proven only through computer-assisted convergence proof arguments.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Nonlinear conjugate gradient methods: worst-case convergence rates via computer-assisted analyses

Article 22 August 2024

On the worst-case complexity of the gradient method with exact line search for smooth strongly convex functions

Article Open access 14 October 2016

Exact Worst-Case Convergence Rates of the Proximal Gradient Method for Composite Convex Minimization

Article 10 May 2018

Notes

Though we will commonly write “solving a problem”, here and throughout the rest of the paper, it should be understood as finding an approximate solution to the problem.
Inequality ($\bigstar $) will play a starring role in this work, hence the label.
By applying ($\bigstar $) twice at (x, y), first as is and secondly after reversing the roles of (x, y).
In case that $\Vert \nabla f(x^{n+1})\Vert =\Vert \nabla f(x^{n})\Vert $.
Under the convention that $T_{-1}:=0$.
Observe that we replace the index from i to n when we use (4.7) in our development.
Clearly, the example presented in [4] for the gradient projection method in order to illustrate that the complexity bound is tight holds also for the more general proximal gradient method.

References

Beck, A., Teboulle, M.: A fast iterative shrinkage-thresholding algorithm for linear inverse problems. SIAM J. Imag. Sci. 2(1), 183–202 (2009)
Article MathSciNet MATH Google Scholar
Beck, A.: First-Order Methods in Optimization, vol. 25. SIAM, Philadelphia (2017)
Book MATH Google Scholar
Bertsekas, D.P.: Convex Optimization Algorithms. Athena Scientific, Belmont (2015)
MATH Google Scholar
Drori, Y.: Contributions to the complexity analysis of optimization algorithms. Ph.D. Thesis Tel-Aviv University (2014)
Drori, Y., Teboulle, M.: Performance of first-order methods for smooth convex minimization: a novel approach. Math. Program. Ser. A 145(1–2), 451–482 (2014)
Article MathSciNet MATH Google Scholar
Goldstein, A.A.: Convex programming in Hilbert space. B. Am. Math. Soc. 70(5), 709–710 (1964)
Article MathSciNet MATH Google Scholar
Güler, O.: On the convergence of the proximal point algorithm for convex minimization. SIAM J. Control. Optim. 29(2), 403–419 (1991)
Article MathSciNet MATH Google Scholar
Graham, R., Knuth, D., Patashnik, O.: Concrete Mathematics: A Foundation for Computer Science, 2nd edn. Addison-Wesley, Boston (1994)
MATH Google Scholar
Kim, D., Fessler, J.A.: Optimizing the efficiency of first-order methods for decreasing the gradient of smooth convex functions. J. Optim. Theory Appl. 188, 192–219 (2021)
Article MathSciNet MATH Google Scholar
Levitin, E.S., Polyak, B.T.: Constrained minimization methods. USSR Comp. Math. Math. Phys. 6(5), 1–50 (1966)
Article MATH Google Scholar
Lions, P.L., Mercier, I.: Splitting algorithms for the sum of two nonlinear operators. SIAM J. Numer. Anal. 16(6), 964–979 (1979)
Article MathSciNet MATH Google Scholar
Martinet, B.: Régularisation d’inéquations variationnelles par approximations successives. Rev. Française Informatique. Recherche Opérationnelle 4, 154–158 (1970)
MATH Google Scholar
Moreau, J.-J.: Proximité et dualité dans un espace hilbertien. Bull. Soc. Math. France 93, 273–299 (1965)
Article MathSciNet MATH Google Scholar
Passty, G.B.: Ergodic convergence to a zero of the sum of monotone operators in Hilbert space. J. Math. Anal. Appl. 72(2), 383–390 (1979)
Article MathSciNet MATH Google Scholar
Sabach, S., Teboulle, M.: Lagrangian methods for composite optimization. Handb. Numer. Anal. 20, 401–436 (2019)
MathSciNet MATH Google Scholar
Taylor, A.B., Hendrickx, J.M., Glineur, F.: Smooth strongly convex interpolation and exact worst-case performance of first-order methods. Math. Program. Ser. A 161(1–2), 307–345 (2017)
Article MathSciNet MATH Google Scholar
Taylor, A.B., Hendrickx, J.M., Glineur, F.: Exact worst-case performance of first-order methods for composite convex optimization. SIAM J. Optim. 27(3), 1283–1313 (2017)
Article MathSciNet MATH Google Scholar
Teboulle, M.: A simplified view of first order methods for optimization. Math. Program. Ser. B 170(1), 67–96 (2018)
Article MathSciNet MATH Google Scholar

Download references

Funding

This research was partially supported by the Israel Science Foundation, under ISF Grants 1844-16, and 2619-20.

Author information

Authors and Affiliations

School of Mathematical Sciences, Tel-Aviv University, Ramat-Aviv, 69978, Israel
Marc Teboulle
Department of Mathematics and Statistics, McGill University, Montreal, Canada
Yakov Vaisbourd

Authors

Marc Teboulle
View author publications
You can also search for this author inPubMed Google Scholar
Yakov Vaisbourd
View author publications
You can also search for this author inPubMed Google Scholar

Corresponding author

Correspondence to Marc Teboulle.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendix A

In the following lemma we establish a useful relation between $\lambda _n$ and $T_n$.

Lemma 13

For a given positive integer k, let $\{t_n\}_{n=0}^{k-1}$ be the sequence defined by the recurrence relation (4.4). Then for any $n=0,1,\dots ,k-1$ it holds that

$$\begin{aligned} \lambda _n = \frac{LT_n}{2+LT_n}. \end{aligned}$$

(A.1)

Proof

Recall that according to (4.4), $Lt_0=\sqrt{2}$ and that $Lt_n$ is the positive root of

$$\begin{aligned} 0= & {} (Lt_n)^2+(LT_{n-1})Lt_n-2(LT_{n-1}+1)\\= & {} (Lt_n-1)^2+(LT_{n-1}+2)(Lt_n-1)-(LT_{n-1}+1), \end{aligned}$$

for any $n=1,2,\dots ,k-1$. Thus, $\lambda _0=Lt_0-1=\sqrt{2}-1$ and $\lambda _n=Lt_n-1$ is the positive root of

$$\begin{aligned} \lambda _n^2+(LT_{n-1}+2)\lambda _n-(LT_{n-1}+1)=0, \end{aligned}$$

for any $n=1,2,\dots ,k-1$. The last quadratic equation can be written as

$$\begin{aligned} \lambda _n\left( 2+LT_{n-1}+(\lambda _n+1)\right) = LT_{n-1}+\lambda _n+1. \end{aligned}$$

Using the relation $LT_n=LT_{n-1}+Lt_n=LT_{n-1}+\lambda _n+1$ we can write the above as

$$\begin{aligned} \lambda _n = \frac{LT_{n-1}+(\lambda _n+1)}{2+LT_{n-1}+(\lambda _n+1)} = \frac{LT_n}{2+LT_n}. \end{aligned}$$

The above holds for $n=0,1,\dots ,k-1$ (also for $n=0$) since

$$\begin{aligned} \lambda _0 = \sqrt{2}-1 =(\sqrt{2}-1)\frac{(\sqrt{2}+1)\sqrt{2}}{2+\sqrt{2}} =\frac{(2-1)\sqrt{2}}{2+\sqrt{2}} = \frac{LT_0}{2+LT_0}. \end{aligned}$$

$\square $

We are now ready to prove Lemma 6.

Proof of Lemma 6

We begin by proving that $\rho _n=0$ for all $n=0,1,\dots ,k-1$. Since $\lambda _0+1>0$ in order to examine whether $\rho _0=(\lambda _0+1)(\tau (\lambda _0)-\lambda _0)=0$ it is enough to verify that $\lambda _0=\tau (\lambda _0)$. The latter relation holds for $\lambda _0=\sqrt{2}-1$. Indeed,

$$\begin{aligned}&\sqrt{2}-1=\lambda _0=\tau (\lambda _0)=1-\frac{2\lambda _0^2}{1-\lambda _0} = 1-\frac{2(\sqrt{2}-1)^2}{2-\sqrt{2}}\\&\quad =1-\frac{\sqrt{2}(\sqrt{2}-1)^2}{\sqrt{2}-1}=\sqrt{2}-1. \end{aligned}$$

To show that $\rho _n=0$ for all $n=1,2,\dots ,k-1$ we first observe that due to Lemma 13

$$\begin{aligned} \lambda _{n-1} = \frac{LT_{n-1}}{2+LT_{n-1}}=\frac{LT_{n}-(\lambda _n+1)}{2+LT_{n}-(\lambda _n+1)}=\frac{LT_{n}\lambda _n-1}{LT_{n}\lambda _n+1}. \end{aligned}$$

Hence,

$$\begin{aligned} 1+\lambda _{n-1}= \frac{LT_{n}-\lambda _n+1+LT_{n}-\lambda _n-1}{LT_{n}-\lambda _n+1} = \frac{2(LT_n-\lambda _n)}{LT_{n}-\lambda _n+1}, \end{aligned}$$

and

$$\begin{aligned} 1-\lambda _{n-1}= \frac{LT_{n}-\lambda _n+1-\left( LT_{n}-\lambda _n-1\right) }{LT_{n}-\lambda _n+1} = \frac{2}{LT_{n}-\lambda _n+1}, \end{aligned}$$

and thus

$$\begin{aligned} \tau ^+(\lambda _{n-1})=1+\frac{2\lambda _{n-1}}{1-\lambda _{n-1}}=\frac{1+\lambda _{n-1}}{1-\lambda _{n-1}} = LT_n-\lambda _n. \end{aligned}$$

(A.2)

Using the above we can write for all $n=1,2,\dots ,k-1$

$$\begin{aligned} \begin{array}{rl} \rho _n &{}= LT_n\tau (\lambda _n)+LT_{n-1}\tau ^{+}(\lambda _{n-1})-(\lambda _n+1)\lambda _n\\ &{}=LT_n\left( 1-\frac{2\lambda _n^2}{1-\lambda _n}\right) +\left[ LT_n-(\lambda _n+1)\right] (LT_n-\lambda _n)-(\lambda _n+1)\lambda _n\\ &{}=LT_n\left( 1-\frac{2\lambda _n^2}{1-\lambda _n}\right) +LT_n\left[ LT_n-(2\lambda _n+1)\right] =LT_n\left[ LT_n-\frac{2\lambda _n^2+2\lambda _n(1-\lambda _n)}{1-\lambda _n} \right] \\ &{}=LT_n\left[ LT_n-\frac{2\lambda _n}{1-\lambda _n}\right] \overset{(A.1)}{=}LT_n\left[ LT_n-\frac{2LT_n/(2+LT_n)}{2/(2+LT_n)}\right] =0. \end{array} \end{aligned}$$

Finally, the alleged value of $\rho _k$ is established due to Lemma 13 as follows

$$\begin{aligned} \rho _k=LT_{k-1}\tau ^+(\lambda _{k-1})\overset{(A.2)}{=}LT_{k-1}(LT_k-\lambda _k)=LT_{k-1}(LT_{k-1}+1), \end{aligned}$$

which completes the proof.$\square $

Rights and permissions

Springer Nature or its licensor holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Teboulle, M., Vaisbourd, Y. An elementary approach to tight worst case complexity analysis of gradient based methods. Math. Program. 201, 63–96 (2023). https://doi.org/10.1007/s10107-022-01899-0

Download citation

Received: 19 July 2021
Accepted: 17 September 2022
Published: 13 October 2022
Issue Date: September 2023
DOI: https://doi.org/10.1007/s10107-022-01899-0

Keywords

Mathematics subject classification

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

An elementary approach to tight worst case complexity analysis of gradient based methods

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Nonlinear conjugate gradient methods: worst-case convergence rates via computer-assisted analyses

On the worst-case complexity of the gradient method with exact line search for smooth strongly convex functions

Exact Worst-Case Convergence Rates of the Proximal Gradient Method for Composite Convex Minimization

Notes

References

Funding

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Appendix A

Appendix A

Lemma 13

Proof

Proof of Lemma 6

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Mathematics subject classification

Subscribe and save

Buy Now