An adaptive accelerated first-order method for convex optimization

Monteiro, Renato D. C.; Ortiz, Camilo; Svaiter, Benar F.

doi:10.1007/s10589-015-9802-0

An adaptive accelerated first-order method for convex optimization

Published: 11 November 2015

Volume 64, pages 31–73, (2016)
Cite this article

Computational Optimization and Applications Aims and scope Submit manuscript

Renato D. C. Monteiro¹,
Camilo Ortiz¹ &
Benar F. Svaiter²

933 Accesses
Explore all metrics

Abstract

This paper presents a new accelerated variant of Nesterov’s method for solving composite convex optimization problems in which certain acceleration parameters are adaptively (and aggressively) chosen so as to substantially improve its practical performance compared to existing accelerated variants while at the same time preserve the optimal iteration-complexity shared by these methods. Computational results are presented to demonstrate that the proposed adaptive accelerated method endowed with a restarting scheme outperforms other existing accelerated variants.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Accelerated Meta-Algorithm for Convex Optimization Problems

Article 01 January 2021

A Universal Accelerated Primal–Dual Method for Convex Optimization Problems

Article 01 March 2024

A new accelerated conjugate gradient method for large-scale unconstrained optimization

Article Open access 20 November 2019

References

Beck, A., Teboulle, M.: A fast iterative shrinkage-thresholding algorithm for linear inverse problems. SIAM J. Img. Sci. 2(1), 183–202 (2009). http://dx.doi.org/10.1137/080716542
Burer, S., Monteiro, R.D.C., Zhang, Y.: A computational study of a gradient-based log-barrier algorithm for a class of large-scale SDPs. Math. Program. 95, 359–379 (2003). http://dx.doi.org/10.1007/s10107-002-0353-7
Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Core discussion paper 2011/02, Center for Operations Research and Econometrics (CORE), Catholic University of Louvain, Dec 2010
Dolan, E.D., Moré, J.J.: Benchmarking optimization software with performance profiles. Math. Program. 91(2), 201–213 (2002). http://dx.doi.org/doi:10.1007/s101070100263
Goldfarb, D., Scheinberg, K.: Fast first-order methods for composite convex optimization with line search. Optim.ization-online Prepr. 3004 (2011). http://www.optimization-online.org/DB_HTML/2011/04/3004.html
Gonzaga, C.C., Karas, E.W.: Fine tuning Nesterov’s steepest descent algorithm for differentiable convex programming. Math. Program., 1–26 (2012). http://dx.doi.org/10.1007/s10107-012-0541-z
Gonzaga, Clóvis C., Karas, Elizabeth W., Rossetto, Diane R.: An optimal algorithm for constrained differentiable convex optimization. SIAM J. Optim. 23(4), 1939–1955 (2013). http://dx.doi.org/10.1137/110836602
Lan, G., Lu, Z., Monteiro, R.D.C.: Primal-dual first-order methods with iteration-complexity for cone programming. Math. Program. 126(1), 1–29 (2011)
Article MathSciNet MATH Google Scholar
Merritt, M., Zhang, Y.: Interior-point gradient method for large-scale totally nonnegative least squares problems. J. Optim. Theory Appl. 126, 191–202 (2005). http://dx.doi.org/10.1007/s10957-005-2668-z
Monteiro, R.D.C., Svaiter, B.F.: An accelerated hybrid proximal extragradient method for convex optimization and its implications to second-order methods. SIAM J. Optim. 23(2), 1092–1125 (2013). http://epubs.siam.org/doi/abs/10.1137/110833786
Nesterov, Y.: A method for unconstrained convex minimization problem with the rate of convergence $O(1/k^2)$. Doklady AN SSSR 269(3), 543–547 (1983)
MathSciNet Google Scholar
Nesterov, Y.: Introductory lectures on convex optimization: a basic course of applied optimization, vol. 87. Kluwer Academic Publishers, Boston (2004)
MATH Google Scholar
Nesterov, Y.: Smooth minimization of non-smooth functions. Math. Program. 103(1), 127–152 (2005)
Article MathSciNet MATH Google Scholar
Nesterov, Y.: Dual extrapolation and its applications to solving variational inequalities and related problems. Math. Program. 109(2–3, Ser. B), 319–344 (2007)
Article MathSciNet MATH Google Scholar
Nesterov, Y.: Gradient methods for minimizing composite objective function. CORE, Leuven (2007)
Google Scholar
O’Donoghue, B., Candès, E.: Adaptive restart for accelerated gradient schemes. Found. Comput. Math., 1–18 (2013). http://dx.doi.org/10.1007/s10208-013-9150-3
Povh, J., Rendl, F., Wiegele, A.: A boundary point method to solve semidefinite programs. Computing 78, 277–286 (2006). http://dx.doi.org/10.1007/s00607-006-0182-2
Tseng, P.: On accelerated proximal gradient methods for convex-concave optimization. submitted to SIAM Journal. SIAM J Optim. (2008)
Zhao, X.-Y., Sun, D., Toh, K.-C.: A Newton-CG augmented lagrangian method for semidefinite programming. SIAM J. Optim. 20(4), 1737–1765 (2010). http://link.aip.org/link/?SJE/20/1737/1

Download references

Acknowledgments

We want to thank Elizabeth W. Karas for generously providing us with the code of the algorithms in [7]. Renato D.C Monteiro: The work of this author was partially supported by NSF Grants CMMI-0900094 and CMMI-1300221, and ONR Grant ONR N00014-11-1-0062. Benar F. Svaiter: The work of this author was partially supported by CNPq Grants No. 303583/2008-8, 302962/2011-5, 480101/2008-6, 474944/2010-7, FAPERJ Grants E-26/102.821/2008, E-26/102.940/2011.

Author information

Authors and Affiliations

School of Industrial and Systems Engineering, Georgia Institute of Technology, Atlanta, GA, 30332-0205, USA
Renato D. C. Monteiro & Camilo Ortiz
IMPA, Estrada Dona Castorina 110, 22460-320, Rio de Janeiro, Brazil
Benar F. Svaiter

Authors

Renato D. C. Monteiro
View author publications
You can also search for this author inPubMed Google Scholar
Camilo Ortiz
View author publications
You can also search for this author inPubMed Google Scholar
Benar F. Svaiter
View author publications
You can also search for this author inPubMed Google Scholar

Corresponding author

Correspondence to Camilo Ortiz.

Appendices

Appendix 1: Technical results used in Section 2

This section establishes a technical result which is used in the proof of Proposition 1 of Sect. 2, namely, Lemma 4.

This section assumes that the functions $\phi $, f and h, and the set $\Omega $, satisfy conditions C.1–C.5 of Sect. 2. Before stating and giving the proof of Lemma 4, we establish two technical results.

Lemma 2

Let $A\ge 0$, $\lambda >0$ and $x^{0}$, $y\in \mathcal {X}$ be given and assume that $\Gamma :\mathcal {X}\rightarrow \bar{\mathbb {R}}$ is a proper closed convex function $\Gamma :\mathcal {X}\rightarrow \bar{\mathbb {R}}$ such that $A\Gamma $ is $A\mu $-strongly convex. Assume also that the triple $(y,A,\Gamma )$ satisfies

$$\begin{aligned} A\Gamma \le A\phi ,\quad A\phi (y)\le & {} \min _{u\in \mathcal {X}}\left\{ A\Gamma (u)+\frac{1}{2}\Vert u-x^{0}\Vert ^{2}\right\} . \end{aligned}$$

(19)

Also, define $x\in \mathcal {X}$, $a>0$ and $\tilde{x}\in \mathcal {X}$ as

$$\begin{aligned}&x:=\arg \min _{u\in \mathcal {X}}\left\{ A\Gamma (u)+\frac{1}{2}\Vert u-x^{0}\Vert ^{2}\right\} , \end{aligned}$$

(20)

$$\begin{aligned}&a:=\frac{\lambda (A\mu +1)+\sqrt{\lambda ^{2}(A\mu +1)^{2}+4\lambda (A\mu +1)A}}{2},\qquad \tilde{x}:=\frac{A}{A+a}y+\frac{a}{A+a}x. \end{aligned}$$

(21)

Then, for any proper closed convex function $\gamma :\mathcal {X}\rightarrow \bar{\mathbb {R}}$ minorizing $\phi $, we have

$$\begin{aligned} \min _{u\in \mathcal {X}}\left\{ a\gamma (u)+A\Gamma (u)+\frac{1}{2}\Vert u-u_{0}\Vert ^{2}\right\} \ge (A+a)\theta \end{aligned}$$

where

$$\begin{aligned} \theta :=\min _{u\in \mathcal {X}}\left\{ \gamma (u)+\frac{1}{2\lambda }\Vert u-\tilde{x}\Vert ^{2}\right\} . \end{aligned}$$

(22)

Proof

Note also that the definition of a in (21) implies that

$$\begin{aligned} \lambda =\frac{a^{2}}{(A+a)(A\mu +1)}. \end{aligned}$$

(23)

Now, let an arbitrary $u\in \mathcal {X}$ be given and define

$$\begin{aligned} \tilde{u}:=\frac{A}{A+a}y+\frac{a}{A+a}u. \end{aligned}$$

Clearly, in view of the second identity in (21), we have

$$\begin{aligned} \tilde{u}-\tilde{x}=\frac{a}{A+a}(u-x). \end{aligned}$$

In view of (19), (20), the last two relations, and the fact that $\gamma $ is a convex function minorizing $\phi $ and the function $A\Gamma (\cdot )+\Vert \cdot -x^{0}\Vert ^{2}/2$ is $(A\mu +1)$-strongly convex, we conclude that

$$\begin{aligned} a\gamma (u)&+A\Gamma (u)+\frac{1}{2}\Vert u-x^{0}\Vert ^{2}\\&\ge a\gamma (u)+\frac{A\mu +1}{2}\Vert u-x\Vert ^{2}+\min _{u\in \mathcal {X}}\left\{ A\Gamma (u)+\frac{1}{2}\Vert u-x^{0}\Vert ^{2}\right\} \\&\ge a\gamma (u)+\frac{A\mu +1}{2}\Vert u-x\Vert ^{2}+A\gamma (y)\\&\ge (A+a)\gamma \left( \tilde{u}\right) +\frac{A\mu +1}{2}\Vert u-x\Vert ^{2}\\&=(A+a)\left[ \gamma (\tilde{u})+\frac{(A+a)(A\mu +1)}{2a^{2}}\Vert \tilde{u}-\tilde{x}\Vert ^{2}\right] \ge (A+a)\theta \end{aligned}$$

where the last inequality is due to (23) and the definition of $\theta $ in (22). Since the above inequality holds for every $u\in \mathcal {X}$, the conclusion of the lemma follows. $\square $

Lemma 3

Let $\lambda >0$, $\xi \in \mathcal {X}$ and a proper closed $\mu $-strongly convex function $g:\mathcal {X}\rightarrow \bar{\mathbb {R}}$ be given and denote the optimal solution of

$$\begin{aligned} \min _{x\in \mathcal {X}}\left\{ g(x)+\frac{1}{2\lambda }\Vert x-\xi \Vert ^{2}\right\} \end{aligned}$$

(24)

by $\bar{x}$. Then, the function $\gamma :\mathcal {X}\rightarrow \mathbb {R}$ defined as

$$\begin{aligned} \gamma (u):=g(\bar{x})+\frac{1}{\lambda }\langle {\xi -\bar{x}},{u-\bar{x}}\rangle +\frac{\mu }{2}\Vert u-\bar{x}\Vert ^{2}\quad \forall u\in \mathcal {X} \end{aligned}$$

has the property that $\gamma \le g$ and $\bar{x}$ is the optimal solution of

$$\begin{aligned} \min _{u\in \mathcal {X}}\left\{ \gamma (u)+\frac{1}{2\lambda }\Vert u-\xi \Vert ^{2}\right\} . \end{aligned}$$

(25)

As a consequence, the optimal value of (25) is the same as that of (24).

Proof

The optimality condition for (24) implies that $(\xi -\bar{x})/\lambda \in \partial g(\bar{x})$, and hence that $\gamma \le g$ in view of the definition of $\gamma $ and the fact that g is a proper closed $\mu $-strongly convex function. Moreover, $\bar{x}$ clearly satisfies the optimality condition for (25), from which the second claim of the lemma follows. Finally, the latter conclusion of the lemma follows immediately from its second claim. $\square $

The main result of this appendix is as follows.

Lemma 4

Let $A\ge 0$, $0<\lambda \le 1/(L-\mu _{f})$, $x^{0},y\in \mathcal {X}$ and a proper closed $\mu $-strongly convex function $\Gamma :\mathcal {X}\rightarrow \mathbb {R}$ be given, and assume that the triple $(y,A,\Gamma )$ satisfies (19). Let x, a and $\tilde{x}$ be as in (20) and (21), and define

$$\begin{aligned} \tilde{x}_{\Omega }&:=\Pi _{\Omega }(\tilde{x}),\qquad A^{+}:=A+a, \end{aligned}$$

(26)

$$\begin{aligned} y^{+}&:=\arg \min _{u\in \mathcal {X}}\left\{ p(u)+\frac{1}{2\lambda }\Vert u-\tilde{x}\Vert ^{2}\right\} \end{aligned}$$

(27)

where

$$\begin{aligned} p(u):=f(\tilde{x}_{\Omega })+\langle {\nabla f(\tilde{x}_{\Omega })},{u-\tilde{x}_{\Omega }}\rangle +\frac{\mu _{f}}{2}\Vert u-\tilde{x}_{\Omega }\Vert ^{2}+h(u)\quad \forall u\in \mathcal {X}. \end{aligned}$$

(28)

Also, define the functions $\gamma ,\Gamma ^{+}:\mathcal {X}\rightarrow \bar{\mathbb {R}}$ as

$$\begin{aligned} \gamma (u):= & {} p(y^{+})+\frac{1}{\lambda }\langle {\tilde{x}-y^{+}},{u-y^{+}}\rangle +\frac{\mu }{2}\Vert u-y^{+}\Vert ^{2}\quad \forall u\in \mathcal {X}, \end{aligned}$$

(29)

$$\begin{aligned} \Gamma ^{+}:= & {} \frac{A}{A+a}\Gamma +\frac{a}{A+a}\gamma \end{aligned}$$

(30)

Then, $\Gamma ^{+}$ is a proper closed $\mu $-strongly convex function and the triple $(y^{+},A^{+},\Gamma ^{+})$ satisfies

$$\begin{aligned}&A^{+}\phi (y^{+}) \le \min _{u\in \mathcal {X}}\left\{ A^{+}\Gamma ^{+}(u)+\frac{1}{2}\Vert u-x^{0}\Vert ^{2}\right\} ,\qquad A^{+}\Gamma ^{+}\le A^{+}\phi , \end{aligned}$$

(31)

$$\begin{aligned}&\sqrt{A^{+}}\ge \sqrt{A}+\frac{1}{2}\sqrt{\lambda (A\mu +1)}. \end{aligned}$$

(32)

Moreover, if $\Gamma $ is a quadratic function with Hessian equal to $\mu I$, then the (unique) optimal solution $x^{+}$ of the minimization problem in (31) can be obtained as

$$\begin{aligned} x^{+}=\frac{1}{1+\mu (A+a)}\left[ x-\frac{a}{\lambda }(\tilde{x}-y^{+})+\mu (Ax+ay^{+})\right] . \end{aligned}$$

(33)

Proof

We first claim that

$$\begin{aligned} \gamma \le \phi ,\qquad \phi (y^{+})\le \min _{u\in \mathcal {X}}\left\{ \gamma (u)+\frac{1}{2\lambda }\Vert u-\tilde{x}\Vert ^{2}\right\} . \end{aligned}$$

(34)

To prove the above claim, note that conditions C.2 and C.3, the non-expansiveness property of $\Pi _{\Omega }$, and the assumption that $\lambda \in (0,1/(L-\mu _{f})]$, imply that

$$\begin{aligned} p(u)\le \phi (u)\le p(u)+\frac{L-\mu _{f}}{2}\Vert u-\tilde{x}_{\Omega }\Vert ^{2}\le p(u)+\frac{1}{2\lambda }\Vert u-\tilde{x}\Vert ^{2}\quad \forall u\in \Omega . \end{aligned}$$

(35)

Then, in view of (27) and the above relation with $u=y^{+}$, we have

$$\begin{aligned} \min _{u\in \mathcal {X}}\left\{ p(u)+\frac{1}{2\lambda }\Vert u-\tilde{x}\Vert ^{2}\right\}&=p(y^{+})+\frac{1}{2\lambda }\Vert y^{+}-\tilde{x}\Vert ^{2}\ge \phi (y^{+}). \end{aligned}$$

The claim now follows from the first inequality in (35), the above relation, the definitions of $\gamma $ and p, and Lemma 3 with $g=p$ and $\xi =\tilde{x}$ (and hence $\bar{x}=y^{+}$).

Now, using the assumption that (19) holds, the second relation in (34) and Lemma 2, we conclude that

$$\begin{aligned} \min _{u\in \mathcal {X}}\left\{ a\gamma (u)+A\Gamma (u)+\frac{1}{2}\Vert u-x^{0}\Vert ^{2}\right\}&\ge (A+a)\min _{u\in \mathcal {X}}\left\{ \gamma (u)+\frac{1}{2\lambda }\Vert u-\tilde{x}\Vert ^{2}\right\} \\&\ge (A+a)\phi (y^{+}), \end{aligned}$$

and hence that the first inequality in (31) holds in view of (26) and (30). Moreover, the first inequality in (34), relations (26) and (30), and the first inequality of (19), clearly imply the second inequality in (31).

To show (32), let $\lambda _{\mu }:=\lambda (A\mu +1)$ and note that the definition of a in (21) implies that

$$\begin{aligned} a\ge \frac{\lambda _{\mu }}{2}+\sqrt{\lambda _{\mu }A}. \end{aligned}$$

This inequality together with the definition of $A^{+}$ in (26) then yields

$$\begin{aligned} A^{+}\ge A+\left( \frac{\lambda _{\mu }}{2}+\sqrt{\lambda _{\mu }A}\right) \ge \left( \sqrt{A}+\frac{1}{2}\sqrt{\lambda _{\mu }}\right) ^{2}, \end{aligned}$$

showing that (32) holds. Finally, to show (33), first observe that the optimality conditions for (20) and (31) imply that $x=x_{0}-A\nabla \Gamma (x)$ and $x^{+}=x_{0}-A^{+}\nabla \Gamma ^{+}(x^{+})$. These two identities together with relations (26), (29) and (30), and the assumption that $\Gamma $ is a quadratic function with Hessian equal to $\mu I$, then imply that

$$\begin{aligned} x^{+}&=x_{0}-A\nabla \Gamma (x^{+})-a\nabla \gamma (x^{+})\\&=x_{0}-A\nabla \Gamma (x)-a\nabla \gamma (x^{+})+A[\nabla \Gamma (x)-\nabla \Gamma (x^{+})]\\&=x-a\left[ \frac{1}{\lambda }(\tilde{x}-y^{+})+\mu (x^{+}-y^{+})\right] +A[\mu (x-x^{+})], \end{aligned}$$

and hence that (33) holds. $\square $

Appendix 2: Proof of Proposition 3

First note that the equivalence between (a) and (b) of Lemma 1 with $\chi =\sum _{i=1}^{m}\alpha _{i}\phi (y)$, $u_{0}=y$ and $q(\cdot )=\sum _{i=1}^{m}\alpha _{i}\gamma _{i}(\cdot )+\Vert \cdot -x_{0}\Vert ^{2}/2$ (and hence $Q=I+\sum _{i=1}^{m}\alpha _{i}\nabla ^{2}\gamma _{i}$) imply that the hard constraint in problem (12) is equivalent to

$$\begin{aligned}&\left\langle \left( \sum _{i=1}^{m}\alpha _{i}\nabla \gamma _{i}(y)+y-x^{0}\right) \,,\,\left( I+\sum _{i=1}^{m}\alpha _{i}\nabla ^{2}\gamma _{i}\right) ^{-1}\left( \sum _{i=1}^{m}\alpha _{i}\nabla \gamma _{i}(y)+y-x^{0}\right) \right\rangle \nonumber \\&\quad +\,2\sum _{i=1}^{m}\alpha _{i}[\phi (y)-\gamma _{i}(y)] \le \Vert y-x^{0}\Vert ^{2}. \end{aligned}$$

(36)

Now, assume that (b) holds. Then, in view of (13), the point $\alpha (t):=(t\bar{\alpha }_{1},\ldots ,t\bar{\alpha }_{m})$ clearly satisfies (36), and hence is feasible for (12) for every $t>0$. Hence, for a fixed $x^{*}\in X^{*}$, this conclusion implies that

$$\begin{aligned} \sum _{i=1}^{m}(t\bar{\alpha }_{i})\phi (y)&\le \sum _{i=1}^{m}(t\bar{\alpha }_{i})\gamma _{i}(x^{*})+\frac{1}{2}\Vert x^{*}-x^{0}\Vert ^{2}\\&\le t\left( \sum _{i=1}^{m}\bar{\alpha }_{i}\right) \phi (x^{*})+\frac{1}{2}\Vert x^{*}-x^{0}\Vert ^{2}, \end{aligned}$$

where the last inequality follows from the assumption that $\gamma _{i}\le \phi $ and the non-negativity of $t\bar{\alpha }_{1},\ldots ,t\bar{\alpha }_{m}$. Dividing this expression by $t(\sum _{i=1}^{m}\bar{\alpha }_{i})>0$, and letting $t\rightarrow \infty $, we then conclude that $\phi (y)-\phi (x^{*})\le 0$, and hence that $y\in X^{*}$. Also, the objective function of (12) at the point $\alpha (t)$ converges to infinity as $t\rightarrow \infty $. We have thus shown that (b) implies (a).

Now assume that (a) holds. Since the feasible set of (12) is closed convex, it must have a nonzero direction of recession $\bar{\alpha }:=(\bar{\alpha }_{1},\ldots ,\bar{\alpha }_{m})$. Moreover, since this set obviously contains the zero vector, it follows that $\alpha (t):=(t\bar{\alpha }_{1},\ldots ,t\bar{\alpha }_{m})$ is feasible for (12) for every $t>0$. This implies that $\alpha (t)$ satisfies (36) for every $t>0$ and $\bar{\alpha }_{i}\ge 0$ for every i. Using the latter fact, the assumption that $\gamma _{i}\le \phi $ (and hence $\gamma _{i}(y)\le \phi (y)$) for every i, and the assumption that either $\nabla ^{2}\gamma _{i}$ is zero or is positive definite for every i (and hence, $\sum _{i=1}^{m}\bar{\alpha }_{i}\nabla ^{2}\gamma _{i}$ is zero or is positive definite), we easily see that (13) holds. $\square $

Rights and permissions

Reprints and permissions

About this article

Cite this article

Monteiro, R.D.C., Ortiz, C. & Svaiter, B.F. An adaptive accelerated first-order method for convex optimization. Comput Optim Appl 64, 31–73 (2016). https://doi.org/10.1007/s10589-015-9802-0

Download citation

Received: 28 May 2014
Published: 11 November 2015
Issue Date: May 2016
DOI: https://doi.org/10.1007/s10589-015-9802-0

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

An adaptive accelerated first-order method for convex optimization

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Accelerated Meta-Algorithm for Convex Optimization Problems

A Universal Accelerated Primal–Dual Method for Convex Optimization Problems

A new accelerated conjugate gradient method for large-scale unconstrained optimization

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Appendices

Appendix 1: Technical results used in Section 2

Lemma 2

Proof

Lemma 3

Proof

Lemma 4

Proof

Appendix 2: Proof of Proposition 3

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now