Skip to main content
Log in

An adaptive accelerated first-order method for convex optimization

  • Published:
Computational Optimization and Applications Aims and scope Submit manuscript

Abstract

This paper presents a new accelerated variant of Nesterov’s method for solving composite convex optimization problems in which certain acceleration parameters are adaptively (and aggressively) chosen so as to substantially improve its practical performance compared to existing accelerated variants while at the same time preserve the optimal iteration-complexity shared by these methods. Computational results are presented to demonstrate that the proposed adaptive accelerated method endowed with a restarting scheme outperforms other existing accelerated variants.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10

Similar content being viewed by others

References

  1. Beck, A., Teboulle, M.: A fast iterative shrinkage-thresholding algorithm for linear inverse problems. SIAM J. Img. Sci. 2(1), 183–202 (2009). http://dx.doi.org/10.1137/080716542

  2. Burer, S., Monteiro, R.D.C., Zhang, Y.: A computational study of a gradient-based log-barrier algorithm for a class of large-scale SDPs. Math. Program. 95, 359–379 (2003). http://dx.doi.org/10.1007/s10107-002-0353-7

  3. Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Core discussion paper 2011/02, Center for Operations Research and Econometrics (CORE), Catholic University of Louvain, Dec 2010

  4. Dolan, E.D., Moré, J.J.: Benchmarking optimization software with performance profiles. Math. Program. 91(2), 201–213 (2002). http://dx.doi.org/doi:10.1007/s101070100263

  5. Goldfarb, D., Scheinberg, K.: Fast first-order methods for composite convex optimization with line search. Optim.ization-online Prepr. 3004 (2011). http://www.optimization-online.org/DB_HTML/2011/04/3004.html

  6. Gonzaga, C.C., Karas, E.W.: Fine tuning Nesterov’s steepest descent algorithm for differentiable convex programming. Math. Program., 1–26 (2012). http://dx.doi.org/10.1007/s10107-012-0541-z

  7. Gonzaga, Clóvis C., Karas, Elizabeth W., Rossetto, Diane R.: An optimal algorithm for constrained differentiable convex optimization. SIAM J. Optim. 23(4), 1939–1955 (2013). http://dx.doi.org/10.1137/110836602

  8. Lan, G., Lu, Z., Monteiro, R.D.C.: Primal-dual first-order methods with iteration-complexity for cone programming. Math. Program. 126(1), 1–29 (2011)

    Article  MathSciNet  MATH  Google Scholar 

  9. Merritt, M., Zhang, Y.: Interior-point gradient method for large-scale totally nonnegative least squares problems. J. Optim. Theory Appl. 126, 191–202 (2005). http://dx.doi.org/10.1007/s10957-005-2668-z

  10. Monteiro, R.D.C., Svaiter, B.F.: An accelerated hybrid proximal extragradient method for convex optimization and its implications to second-order methods. SIAM J. Optim. 23(2), 1092–1125 (2013). http://epubs.siam.org/doi/abs/10.1137/110833786

  11. Nesterov, Y.: A method for unconstrained convex minimization problem with the rate of convergence \(O(1/k^2)\). Doklady AN SSSR 269(3), 543–547 (1983)

    MathSciNet  Google Scholar 

  12. Nesterov, Y.: Introductory lectures on convex optimization: a basic course of applied optimization, vol. 87. Kluwer Academic Publishers, Boston (2004)

    MATH  Google Scholar 

  13. Nesterov, Y.: Smooth minimization of non-smooth functions. Math. Program. 103(1), 127–152 (2005)

    Article  MathSciNet  MATH  Google Scholar 

  14. Nesterov, Y.: Dual extrapolation and its applications to solving variational inequalities and related problems. Math. Program. 109(2–3, Ser. B), 319–344 (2007)

    Article  MathSciNet  MATH  Google Scholar 

  15. Nesterov, Y.: Gradient methods for minimizing composite objective function. CORE, Leuven (2007)

    Google Scholar 

  16. O’Donoghue, B., Candès, E.: Adaptive restart for accelerated gradient schemes. Found. Comput. Math., 1–18 (2013). http://dx.doi.org/10.1007/s10208-013-9150-3

  17. Povh, J., Rendl, F., Wiegele, A.: A boundary point method to solve semidefinite programs. Computing 78, 277–286 (2006). http://dx.doi.org/10.1007/s00607-006-0182-2

  18. Tseng, P.: On accelerated proximal gradient methods for convex-concave optimization. submitted to SIAM Journal. SIAM J Optim. (2008)

  19. Zhao, X.-Y., Sun, D., Toh, K.-C.: A Newton-CG augmented lagrangian method for semidefinite programming. SIAM J. Optim. 20(4), 1737–1765 (2010). http://link.aip.org/link/?SJE/20/1737/1

Download references

Acknowledgments

We want to thank Elizabeth W. Karas for generously providing us with the code of the algorithms in [7]. Renato D.C Monteiro: The work of this author was partially supported by NSF Grants CMMI-0900094 and CMMI-1300221, and ONR Grant ONR N00014-11-1-0062. Benar F. Svaiter: The work of this author was partially supported by CNPq Grants No. 303583/2008-8, 302962/2011-5, 480101/2008-6, 474944/2010-7, FAPERJ Grants E-26/102.821/2008, E-26/102.940/2011.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Camilo Ortiz.

Appendices

Appendix 1: Technical results used in Section 2

This section establishes a technical result which is used in the proof of Proposition 1 of Sect. 2, namely, Lemma 4.

This section assumes that the functions \(\phi \), f and h, and the set \(\Omega \), satisfy conditions C.1–C.5 of Sect. 2. Before stating and giving the proof of Lemma 4, we establish two technical results.

Lemma 2

Let \(A\ge 0\), \(\lambda >0\) and \(x^{0}\), \(y\in \mathcal {X}\) be given and assume that \(\Gamma :\mathcal {X}\rightarrow \bar{\mathbb {R}}\) is a proper closed convex function \(\Gamma :\mathcal {X}\rightarrow \bar{\mathbb {R}}\) such that \(A\Gamma \) is \(A\mu \)-strongly convex. Assume also that the triple \((y,A,\Gamma )\) satisfies

$$\begin{aligned} A\Gamma \le A\phi ,\quad A\phi (y)\le & {} \min _{u\in \mathcal {X}}\left\{ A\Gamma (u)+\frac{1}{2}\Vert u-x^{0}\Vert ^{2}\right\} . \end{aligned}$$
(19)

Also, define \(x\in \mathcal {X}\), \(a>0\) and \(\tilde{x}\in \mathcal {X}\) as

$$\begin{aligned}&x:=\arg \min _{u\in \mathcal {X}}\left\{ A\Gamma (u)+\frac{1}{2}\Vert u-x^{0}\Vert ^{2}\right\} , \end{aligned}$$
(20)
$$\begin{aligned}&a:=\frac{\lambda (A\mu +1)+\sqrt{\lambda ^{2}(A\mu +1)^{2}+4\lambda (A\mu +1)A}}{2},\qquad \tilde{x}:=\frac{A}{A+a}y+\frac{a}{A+a}x. \end{aligned}$$
(21)

Then, for any proper closed convex function \(\gamma :\mathcal {X}\rightarrow \bar{\mathbb {R}}\) minorizing \(\phi \), we have

$$\begin{aligned} \min _{u\in \mathcal {X}}\left\{ a\gamma (u)+A\Gamma (u)+\frac{1}{2}\Vert u-u_{0}\Vert ^{2}\right\} \ge (A+a)\theta \end{aligned}$$

where

$$\begin{aligned} \theta :=\min _{u\in \mathcal {X}}\left\{ \gamma (u)+\frac{1}{2\lambda }\Vert u-\tilde{x}\Vert ^{2}\right\} . \end{aligned}$$
(22)

Proof

Note also that the definition of a in (21) implies that

$$\begin{aligned} \lambda =\frac{a^{2}}{(A+a)(A\mu +1)}. \end{aligned}$$
(23)

Now, let an arbitrary \(u\in \mathcal {X}\) be given and define

$$\begin{aligned} \tilde{u}:=\frac{A}{A+a}y+\frac{a}{A+a}u. \end{aligned}$$

Clearly, in view of the second identity in (21), we have

$$\begin{aligned} \tilde{u}-\tilde{x}=\frac{a}{A+a}(u-x). \end{aligned}$$

In view of (19), (20), the last two relations, and the fact that \(\gamma \) is a convex function minorizing \(\phi \) and the function \(A\Gamma (\cdot )+\Vert \cdot -x^{0}\Vert ^{2}/2\) is \((A\mu +1)\)-strongly convex, we conclude that

$$\begin{aligned} a\gamma (u)&+A\Gamma (u)+\frac{1}{2}\Vert u-x^{0}\Vert ^{2}\\&\ge a\gamma (u)+\frac{A\mu +1}{2}\Vert u-x\Vert ^{2}+\min _{u\in \mathcal {X}}\left\{ A\Gamma (u)+\frac{1}{2}\Vert u-x^{0}\Vert ^{2}\right\} \\&\ge a\gamma (u)+\frac{A\mu +1}{2}\Vert u-x\Vert ^{2}+A\gamma (y)\\&\ge (A+a)\gamma \left( \tilde{u}\right) +\frac{A\mu +1}{2}\Vert u-x\Vert ^{2}\\&=(A+a)\left[ \gamma (\tilde{u})+\frac{(A+a)(A\mu +1)}{2a^{2}}\Vert \tilde{u}-\tilde{x}\Vert ^{2}\right] \ge (A+a)\theta \end{aligned}$$

where the last inequality is due to (23) and the definition of \(\theta \) in (22). Since the above inequality holds for every \(u\in \mathcal {X}\), the conclusion of the lemma follows. \(\square \)

Lemma 3

Let \(\lambda >0\), \(\xi \in \mathcal {X}\) and a proper closed \(\mu \)-strongly convex function \(g:\mathcal {X}\rightarrow \bar{\mathbb {R}}\) be given and denote the optimal solution of

$$\begin{aligned} \min _{x\in \mathcal {X}}\left\{ g(x)+\frac{1}{2\lambda }\Vert x-\xi \Vert ^{2}\right\} \end{aligned}$$
(24)

by \(\bar{x}\). Then, the function \(\gamma :\mathcal {X}\rightarrow \mathbb {R}\) defined as

$$\begin{aligned} \gamma (u):=g(\bar{x})+\frac{1}{\lambda }\langle {\xi -\bar{x}},{u-\bar{x}}\rangle +\frac{\mu }{2}\Vert u-\bar{x}\Vert ^{2}\quad \forall u\in \mathcal {X} \end{aligned}$$

has the property that \(\gamma \le g\) and \(\bar{x}\) is the optimal solution of

$$\begin{aligned} \min _{u\in \mathcal {X}}\left\{ \gamma (u)+\frac{1}{2\lambda }\Vert u-\xi \Vert ^{2}\right\} . \end{aligned}$$
(25)

As a consequence, the optimal value of (25) is the same as that of (24).

Proof

The optimality condition for (24) implies that \((\xi -\bar{x})/\lambda \in \partial g(\bar{x})\), and hence that \(\gamma \le g\) in view of the definition of \(\gamma \) and the fact that g is a proper closed \(\mu \)-strongly convex function. Moreover, \(\bar{x}\) clearly satisfies the optimality condition for (25), from which the second claim of the lemma follows. Finally, the latter conclusion of the lemma follows immediately from its second claim. \(\square \)

The main result of this appendix is as follows.

Lemma 4

Let \(A\ge 0\), \(0<\lambda \le 1/(L-\mu _{f})\), \(x^{0},y\in \mathcal {X}\) and a proper closed \(\mu \)-strongly convex function \(\Gamma :\mathcal {X}\rightarrow \mathbb {R}\) be given, and assume that the triple \((y,A,\Gamma )\) satisfies (19). Let x, a and \(\tilde{x}\) be as in (20) and (21), and define

$$\begin{aligned} \tilde{x}_{\Omega }&:=\Pi _{\Omega }(\tilde{x}),\qquad A^{+}:=A+a, \end{aligned}$$
(26)
$$\begin{aligned} y^{+}&:=\arg \min _{u\in \mathcal {X}}\left\{ p(u)+\frac{1}{2\lambda }\Vert u-\tilde{x}\Vert ^{2}\right\} \end{aligned}$$
(27)

where

$$\begin{aligned} p(u):=f(\tilde{x}_{\Omega })+\langle {\nabla f(\tilde{x}_{\Omega })},{u-\tilde{x}_{\Omega }}\rangle +\frac{\mu _{f}}{2}\Vert u-\tilde{x}_{\Omega }\Vert ^{2}+h(u)\quad \forall u\in \mathcal {X}. \end{aligned}$$
(28)

Also, define the functions \(\gamma ,\Gamma ^{+}:\mathcal {X}\rightarrow \bar{\mathbb {R}}\) as

$$\begin{aligned} \gamma (u):= & {} p(y^{+})+\frac{1}{\lambda }\langle {\tilde{x}-y^{+}},{u-y^{+}}\rangle +\frac{\mu }{2}\Vert u-y^{+}\Vert ^{2}\quad \forall u\in \mathcal {X}, \end{aligned}$$
(29)
$$\begin{aligned} \Gamma ^{+}:= & {} \frac{A}{A+a}\Gamma +\frac{a}{A+a}\gamma \end{aligned}$$
(30)

Then, \(\Gamma ^{+}\) is a proper closed \(\mu \)-strongly convex function and the triple \((y^{+},A^{+},\Gamma ^{+})\) satisfies

$$\begin{aligned}&A^{+}\phi (y^{+}) \le \min _{u\in \mathcal {X}}\left\{ A^{+}\Gamma ^{+}(u)+\frac{1}{2}\Vert u-x^{0}\Vert ^{2}\right\} ,\qquad A^{+}\Gamma ^{+}\le A^{+}\phi , \end{aligned}$$
(31)
$$\begin{aligned}&\sqrt{A^{+}}\ge \sqrt{A}+\frac{1}{2}\sqrt{\lambda (A\mu +1)}. \end{aligned}$$
(32)

Moreover, if \(\Gamma \) is a quadratic function with Hessian equal to \(\mu I\), then the (unique) optimal solution \(x^{+}\) of the minimization problem in (31) can be obtained as

$$\begin{aligned} x^{+}=\frac{1}{1+\mu (A+a)}\left[ x-\frac{a}{\lambda }(\tilde{x}-y^{+})+\mu (Ax+ay^{+})\right] . \end{aligned}$$
(33)

Proof

We first claim that

$$\begin{aligned} \gamma \le \phi ,\qquad \phi (y^{+})\le \min _{u\in \mathcal {X}}\left\{ \gamma (u)+\frac{1}{2\lambda }\Vert u-\tilde{x}\Vert ^{2}\right\} . \end{aligned}$$
(34)

To prove the above claim, note that conditions C.2 and C.3, the non-expansiveness property of \(\Pi _{\Omega }\), and the assumption that \(\lambda \in (0,1/(L-\mu _{f})]\), imply that

$$\begin{aligned} p(u)\le \phi (u)\le p(u)+\frac{L-\mu _{f}}{2}\Vert u-\tilde{x}_{\Omega }\Vert ^{2}\le p(u)+\frac{1}{2\lambda }\Vert u-\tilde{x}\Vert ^{2}\quad \forall u\in \Omega . \end{aligned}$$
(35)

Then, in view of (27) and the above relation with \(u=y^{+}\), we have

$$\begin{aligned} \min _{u\in \mathcal {X}}\left\{ p(u)+\frac{1}{2\lambda }\Vert u-\tilde{x}\Vert ^{2}\right\}&=p(y^{+})+\frac{1}{2\lambda }\Vert y^{+}-\tilde{x}\Vert ^{2}\ge \phi (y^{+}). \end{aligned}$$

The claim now follows from the first inequality in (35), the above relation, the definitions of \(\gamma \) and p, and Lemma 3 with \(g=p\) and \(\xi =\tilde{x}\) (and hence \(\bar{x}=y^{+}\)).

Now, using the assumption that (19) holds, the second relation in (34) and Lemma 2, we conclude that

$$\begin{aligned} \min _{u\in \mathcal {X}}\left\{ a\gamma (u)+A\Gamma (u)+\frac{1}{2}\Vert u-x^{0}\Vert ^{2}\right\}&\ge (A+a)\min _{u\in \mathcal {X}}\left\{ \gamma (u)+\frac{1}{2\lambda }\Vert u-\tilde{x}\Vert ^{2}\right\} \\&\ge (A+a)\phi (y^{+}), \end{aligned}$$

and hence that the first inequality in (31) holds in view of (26) and (30). Moreover, the first inequality in (34), relations (26) and (30), and the first inequality of (19), clearly imply the second inequality in (31).

To show (32), let \(\lambda _{\mu }:=\lambda (A\mu +1)\) and note that the definition of a in (21) implies that

$$\begin{aligned} a\ge \frac{\lambda _{\mu }}{2}+\sqrt{\lambda _{\mu }A}. \end{aligned}$$

This inequality together with the definition of \(A^{+}\) in (26) then yields

$$\begin{aligned} A^{+}\ge A+\left( \frac{\lambda _{\mu }}{2}+\sqrt{\lambda _{\mu }A}\right) \ge \left( \sqrt{A}+\frac{1}{2}\sqrt{\lambda _{\mu }}\right) ^{2}, \end{aligned}$$

showing that (32) holds. Finally, to show (33), first observe that the optimality conditions for (20) and (31) imply that \(x=x_{0}-A\nabla \Gamma (x)\) and \(x^{+}=x_{0}-A^{+}\nabla \Gamma ^{+}(x^{+})\). These two identities together with relations (26), (29) and (30), and the assumption that \(\Gamma \) is a quadratic function with Hessian equal to \(\mu I\), then imply that

$$\begin{aligned} x^{+}&=x_{0}-A\nabla \Gamma (x^{+})-a\nabla \gamma (x^{+})\\&=x_{0}-A\nabla \Gamma (x)-a\nabla \gamma (x^{+})+A[\nabla \Gamma (x)-\nabla \Gamma (x^{+})]\\&=x-a\left[ \frac{1}{\lambda }(\tilde{x}-y^{+})+\mu (x^{+}-y^{+})\right] +A[\mu (x-x^{+})], \end{aligned}$$

and hence that (33) holds. \(\square \)

Appendix 2: Proof of Proposition 3

First note that the equivalence between (a) and (b) of Lemma 1 with \(\chi =\sum _{i=1}^{m}\alpha _{i}\phi (y)\), \(u_{0}=y\) and \(q(\cdot )=\sum _{i=1}^{m}\alpha _{i}\gamma _{i}(\cdot )+\Vert \cdot -x_{0}\Vert ^{2}/2\) (and hence \(Q=I+\sum _{i=1}^{m}\alpha _{i}\nabla ^{2}\gamma _{i}\)) imply that the hard constraint in problem (12) is equivalent to

$$\begin{aligned}&\left\langle \left( \sum _{i=1}^{m}\alpha _{i}\nabla \gamma _{i}(y)+y-x^{0}\right) \,,\,\left( I+\sum _{i=1}^{m}\alpha _{i}\nabla ^{2}\gamma _{i}\right) ^{-1}\left( \sum _{i=1}^{m}\alpha _{i}\nabla \gamma _{i}(y)+y-x^{0}\right) \right\rangle \nonumber \\&\quad +\,2\sum _{i=1}^{m}\alpha _{i}[\phi (y)-\gamma _{i}(y)] \le \Vert y-x^{0}\Vert ^{2}. \end{aligned}$$
(36)

Now, assume that (b) holds. Then, in view of (13), the point \(\alpha (t):=(t\bar{\alpha }_{1},\ldots ,t\bar{\alpha }_{m})\) clearly satisfies (36), and hence is feasible for (12) for every \(t>0\). Hence, for a fixed \(x^{*}\in X^{*}\), this conclusion implies that

$$\begin{aligned} \sum _{i=1}^{m}(t\bar{\alpha }_{i})\phi (y)&\le \sum _{i=1}^{m}(t\bar{\alpha }_{i})\gamma _{i}(x^{*})+\frac{1}{2}\Vert x^{*}-x^{0}\Vert ^{2}\\&\le t\left( \sum _{i=1}^{m}\bar{\alpha }_{i}\right) \phi (x^{*})+\frac{1}{2}\Vert x^{*}-x^{0}\Vert ^{2}, \end{aligned}$$

where the last inequality follows from the assumption that \(\gamma _{i}\le \phi \) and the non-negativity of \(t\bar{\alpha }_{1},\ldots ,t\bar{\alpha }_{m}\). Dividing this expression by \(t(\sum _{i=1}^{m}\bar{\alpha }_{i})>0\), and letting \(t\rightarrow \infty \), we then conclude that \(\phi (y)-\phi (x^{*})\le 0\), and hence that \(y\in X^{*}\). Also, the objective function of (12) at the point \(\alpha (t)\) converges to infinity as \(t\rightarrow \infty \). We have thus shown that (b) implies (a).

Now assume that (a) holds. Since the feasible set of (12) is closed convex, it must have a nonzero direction of recession \(\bar{\alpha }:=(\bar{\alpha }_{1},\ldots ,\bar{\alpha }_{m})\). Moreover, since this set obviously contains the zero vector, it follows that \(\alpha (t):=(t\bar{\alpha }_{1},\ldots ,t\bar{\alpha }_{m})\) is feasible for (12) for every \(t>0\). This implies that \(\alpha (t)\) satisfies (36) for every \(t>0\) and \(\bar{\alpha }_{i}\ge 0\) for every i. Using the latter fact, the assumption that \(\gamma _{i}\le \phi \) (and hence \(\gamma _{i}(y)\le \phi (y)\)) for every i, and the assumption that either \(\nabla ^{2}\gamma _{i}\) is zero or is positive definite for every i (and hence, \(\sum _{i=1}^{m}\bar{\alpha }_{i}\nabla ^{2}\gamma _{i}\) is zero or is positive definite), we easily see that (13) holds. \(\square \)

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Monteiro, R.D.C., Ortiz, C. & Svaiter, B.F. An adaptive accelerated first-order method for convex optimization. Comput Optim Appl 64, 31–73 (2016). https://doi.org/10.1007/s10589-015-9802-0

Download citation

  • Received:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10589-015-9802-0

Keywords

Navigation