Abstract
In a real Hilbert space, we study a new class of forward-backward algorithms for structured non-smooth minimization problems. As a special case of the parameters, we recover the method AFB (Accelerated Forward-Backward) that was recently discussed as an enhanced variant of FISTA (Fast Iterative Soft Thresholding Algorithm). Our algorithms enjoy the well-known properties of AFB. Namely, they generate convergent sequences (xn) that minimize the function values at the rate o(nββ2). Another important specificity of our processes is that they can be regarded as discrete models suggested by first-order formulations of Newton-like dynamical systems. This permit us to extend to the non-smooth setting, a property of fast convergence to zero of the gradients, established so far for discrete Newton-like dynamics with smooth potentials only. In specific, as a new result, we show that the latter property also applies to AFB. To prove this stability phenomenon, we develop a technical analysis that can be also useful regarding many other related developments. Numerical experiments are furthermore performed so as to illustrate the properties of the considered algorithms comparing with other existing ones.
Similar content being viewed by others
References
Alvarez, F., Attouch, H., Bolte, J., Redont, P.: A second-order gradient-like dissipative dynamical system with Hessian driven damping. Application to Optimization and Mechanics. J. Math. Pures appl. 81(8), 747β779 (2002)
Attouch, H., Bolte, J., Redont, P.: Optimizing properties of an inertial dynamical system with geometric damping. Control. Cybern. 31, 643β657 (2002)
Attouch, H., Cabot, A.: Convergence rates of inertial forward-backward algorithms. SIAM J. Optim. 28, 849β874 (2018)
Attouch, H., Cabot, A.: Convergence of a relaxed inertial forward-backward algorithm for structured monotone inclusions. Applied Math. Optimization 80, 547β598 (2019)
Attouch, H., Chbani, Z., Fadili, J., Riahi, H.: First-order optimization algorithms via inertial systems with Hessian driven damping, arXiv preprint, arXiv:1907.10536 (2019)
Attouch, H., Chbani, Z., Peypouquet, J., Redont, P.: Fast convergence of inertial dynamics and algorithms with asymptotic vanishing viscosity, Math. programming, Volume 168, Issue 1β2, pp. 123β175 (2018)
Attouch, H., Peypouquet, J.: The rate of convergence of Nesterovβs accelerated forward-backward method is actually faster than 1/k2. SIAM J. Optimization 26(3), 1824β1834 (2016)
Attouch, J., Peypouquet, P.R.: Fast convex optimization via intertial dynamics with hessian driven damping. J Differential Equations 261 (10), 5734β5783 (2016)
Beck, A., Teboulle, M.: A fast iterative shrinkage-thresholding algorithm for linear inverse problems. SIAM J. Imaging Sci. 2(1), 183β202 (2009)
Brezis, H.: OpΓ©rateurs Maximaux Monotones, Math. Stud, 5. North-Holland, Amsterdam (1973)
Chambolle, A., Dossal, C.: On the convergence of the iterates of FISTA. JOTA 166(3), 968β982 (2015)
Cruz, J.B., Nghia, T.: On the convergence of the proximal forward-backward splitting method with linesearches. Optim. Methods and Software 31 (6), 1209β1238 (2016)
GΓΌler, O.: On the convergence of the proximal point algorithm for convex minimization. SIAM J. Control Optimization 29, 403β419 (1991)
GΓΌler, O.: New proximal point algorithms for convex minimization. SIAM J. Optim. 2(4), 649β664 (1992)
Iutzeler, F., Hendrickx, J.M.: A Generic online acceleration scheme for Optimization algorithms via Relaxation and Inertia arXiv:1603.05398v3 (2017)
Lemaire, B.: The Proximal Algorithm. In: New Methods in Optimization and Their Industrial Uses, J.P. Penot (Ed), Internat. Ser. Numer. Math, 87, pp. 73-87. Birkhauser, Basel (1989)
Lions, P.L., Mercier, B.: Splitting algorithms for the sum of two nonlinear operators. SIAM J. Numer. Anal. 16, 964β979 (1979)
Lorenz, D.A., Pock, T.: An inertial forward-backward algorithm for monotone inclusions. J. Math. Imaging Vision, pp. 1β15 (2014)
Labarre, F., MaingΓ©, P. E.: First-order frameworks for continuous Newton-like dynamics governed by maximally monotone operators. Set-Valued and Variational Analysis, pp. 1β27 (2021)
MaingΓ©, P.E., Maruster, S.: Convergence in norm of modified Krasnoselski-Mann iterations for fixed points of demicontractive mappings Applied Mathematics and Computation. Elsevier 217(24), 9864β9874 (2011)
May, R.: Asymptotic for a second order evolution equation with convex potential and vanishing damping term. Turkish Journal of Mathematics, 41(3). https://doi.org/10.3906/mat-1512-28 (2015)
Moudafi, A., Oliny, M.: Convergence of a splitting inertial proximal method for monotone operators. J. Comput. Appl. Math. 155(2), 447β454 (2003)
Nesterov, Y.: A method of solving a convex programming problem with convergence rate O(1/k2). Soviet Mathematics Doklady 27, 372β376 (1983)
Nesterov, Y.: Gradient methods for minimizing composite objective function. Math. Programming, Ser. B 140, 125β161 (2013). https://doi.org/10.1007/s10107-012-0629-5
Opial, Z.: Weak convergence of the sequence of successive approximations for nonexpansive mappings. Bull. Amer. Math. Soc. 73, 591β597 (1967)
Passty, G.B.: Ergodic convergence to a zero of the sum of monotone operators in Hilbert space. J. Math. Anal. Appl. 72, 383β390 (1979)
Su, W., Boyd, S., Candes, E. J.: A differential equation for modeling Nesterovβs accelerated gradient method: theory and insights. J. Machine learning Reasearch 17(153), 1β43 (2016)
Scheinberg, D.K., Goldfarb, X.: Bai, Fast first-order methods for composite convex optimization with backtraking. Found. Comput. Math. 14(3), 389β417 (2014)
Shi, B., Du, S.S., Jordan, M.I., Su, W.J.: Understanding the acceleration phenomenon via High-Resolution differential equations. https://doi.org/10.13140/RG.2.2.20063.92329 (2018)
Apidopoulos, V., Aujol, J.F., Dossal, C.: Convegence rate of inertial forward-backward algorithms beyong Nesterovβs rule, Mathematical Programming, Serie A, Springer, pp. 1-20 (ff10.1007/s10107-018-1350-9) (2018)
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisherβs note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Appendix 1. Proof of Proposition 3.2
Appendix 1. Proof of Proposition 3.2
Let {ΞΊ, e, Ξ½n} be positive parameters, and set Ο± =β1 β ΞΊ, Οn = e + Ξ½n+β1 and un = yn β xn. It is easily seen that (3.8a) can be rewritten as (for n β₯ p)
The sequel of the proof can be divided into the following parts (r1)β(r4):
(r1) An estimate from the inertial part of the method. Given \((s,q) \in [0,\infty ) \times {\mathcal H} \), we begin with proving that the discrete derivative \( \dot {G}_{n+1}(s,q)\) satisfies
In order to get this result, we readily notice that \(\dot {G}_{n+1}(s,q)\) can be formulated as
where an := γq β xn,unγ, bn := (1/2)β₯xn β qβ₯2 and cn := (1/2)β₯unβ₯2. For the sake of clarity, and so as to estimate the right side of (A.3), we set
Clearly, by an = γq β xn,unγ and \(u_{n}=- \frac {1 }{\kappa } \dot {y}_{n+1} \) (from (A.1c)), we get
Again from an = γq β xn,unγ, and noticing \(\dot {u}_{n+1} = \dot {y}_{n+1} - \dot {x}_{n+1} \) (as un = yn β xn), we readily have
Taking the scalar product of each side of (A.1b) by \(q-\dot {x}_{n+1}\), along with \(u_{n}=- \frac {1 }{\kappa } \dot {y}_{n+1}\) (from (A.1c)), amounts to Pn β Wn = ΞΊββ1πnRn, which, by \( \theta _{n} = \tau _{n}^{-1} (\nu _{n} - {\varrho } \nu _{n+1})\) (from (A.1a)) is equivalent to
Therefore, by (A.4), (A.5) and (A.6), and recalling that Ο± =β1 β ΞΊ, we are led to
From bn+β1 = (1/2)β₯xn+β1 β qβ₯2, we readily get
In addition, by cn+β1 = (1/2)β₯un+β1β₯2, and \(\dot {u}_{n+1}= - \kappa u_{n} - \dot {x}_{n+1} \) (from un = yn β xn and \(\dot {y}_{n+1} =-\kappa u_{n}\)), we immediately have
In light of (A.3) together with (A.7), (A.8) and (A.9), we are led to
where the quantity \( \bar {\eta }_{n}\) is given by
This leads to the desired result.
(r2) An estimate from the proximal part of the method. Let us establish that, for any ΞΎnβ β1, it holds that
Indeed, we have \(\dot {x}_{n+1} =-\theta _n u_{n} -\chi _{n}^{*}\) (from (A.1b)), which, for any ΞΎnβ β1, can be rewritten as
where \(H_{n}= \dot {x}_{n+1} + (1-\xi _n)^{-1}\theta _n u_{n} \). Furthermore, by \(-\chi _{n}^{*}= \dot {x}_{n+1}+\theta _{n} u_{n}\) (again using (A.1b)) and denoting \(Q_{n}=\langle \dot {x}_{n+1}, u_{n}\rangle \), we simply obtain
Therefore, by adding \((1/2) \| \chi _{n}^{*}\|^{2}\) to the scalar product of the left side of equality (A.11) with \(\chi _{n}^{*}\), and using (A.12) and \( \| \chi _{n}^{*}\|^{2}= \|\dot {x}_{n+1}+ \theta _{n} u_{n}\|^{2}\), we get
This yields (A.10).
(r3) Combining proximal and inertial effects. Let (3.10) hold and set \(\rho _{n}= 1- (1-\kappa ) \frac {\nu _{n+1}}{\nu _{n}}\). It is worthwhile noticing that the term πn involved in (A.1) can be simply expressed as
So, by (A.13), in light of Οn >β0 (from condition (3.3a)), we deduce that πn is a positive sequence.
Now, we introduce the real sequence \((\bar {\gamma }_{n})\) defined by
Clearly, by (A.14), along with Οn >β0 (as Οn := e + Ξ½n+β1) and Οn >β0, we obviously have
Next, given s >β0, we show that the iterates produced by (3.8a) (or, equivalently, by (A.1)) verify
where Tn(u, x) is defined for any \((u,x) \in {\mathcal H}^{2}\) by
together with the parameters
Indeed, by (A.2) and setting \(Q=\langle \dot {x}_{n+1} , u_{n} \rangle \), we know that
Moreover, in light of \(\bar {\gamma }_{n} \neq 1\) (from (A.15)), by using (A.10) (with the special value \(\xi _{n}= \gamma _{n}:=1- \frac {s \rho _{n}}{\tau _{n}}\)) and recalling that \(\theta _n = \frac {\nu _{n} \rho _{n}}{\tau _{n}}\) we obtain
Then, multiplying equality (A.20) by \(\rho _{n}^{-1} {\tau _{n}^{2}}\), and adding the resulting equality to (A.19), we get
Hence, noticing that Ξ½nΟn = Ξ½n β Ο±Ξ½n+β1, we infer that (A.16)β(A.17) is actually satisfied together with the parameters
This leads to the desired result.
(r4) Finally, we give an alternative formulation of the quantity Tn(u, x) given by (A.17)β(A.18). For this purpose, we begin with reformulating Οn. By the definitions Οn := e + Ξ½n+β1, \(\bar {\gamma }_{n}:=1-s \frac {\rho _{n}}{\tau _{n}}\), and by an easy computation we have
where Οn, t = t +β2Ξ½n+β1 (for t β₯β0). As a consequence, by the previous definition of Οn (in (A.18)), we obtain
Then we consider the following two situations relative to the constant ΞΊ:
- In the special case when ΞΊ =β1 (hence Οn =β1 and \( \rho _{n}^{-1}=1\)), we obviously have wn =β0 and Ξ·n =β0. Then, for \((u,x) \in {\mathcal H}^{2}\), by definition of Tn (in (A.17)) along with \( {\sigma }_{n}= \frac {\left (e -s \right )}{2} \tau _{n,e} \) (from (A.21)) we obtain
- For \(\kappa \in (0,1) \cup (1,\infty )\) (hence Ξ·nβ β0), also setting \({\varsigma }_{n}:=\frac {w_{n}}{2 \eta _{n}}\), and \(\psi _{n}:= 4 {\sigma }_{n} {\eta }_{n}- {w}_{n}^{2}\), by definition of Tn (in (A.17)) we classically have
On the one hand, by \({w}_{n} = {\varrho } \nu _{n+1} \left (\nu _{n+1} + s \right )\) (from (A.18)) and remembering that Οn, s = s +β2Ξ½n+β1, we simply have \({w}_{n}^{2} = ({\varrho } \nu _{n+1})^{2} \left ((\nu _{n+1} )^{2} + s \tau _{n,s} \right )\). Hence, by (A.21) while using the definition of Οn, and setting Sn := Ο±ΟnΞ½nΞ½n+β1 (so that Sn =β2Ξ·n and \(\psi _{n}= 2 {\sigma }_{n} S_{n}- {w}_{n}^{2}\)), we obtain
It is also easily checked that \(S_{n} \left (\rho _{n}^{-1} -1 \right ) = ({\varrho } \nu _{n+1})^{2} \), which by the previous equality yields
Then, noticing that eΟn, e β sΟn, s = (e β s)Οn, e+s (as Οn, t := t +β2Ξ½n+β1, for t β₯β0), we infer that \( \psi _{n} = \left (e-s \right ) \left (S_{n} \tau _{n,e} + ({\varrho } \nu _{n+1})^{2} \tau _{n,e+s} \right )\), which by (A.23) entails that
On the other hand, we clearly have \({\varsigma }_{n}=\frac {w_{n}}{S_{n}}\) (since Sn =β2Ξ·n), together with \( {w}_{n} = {\varrho } \nu _{n+1} \left (\nu _{n+1} + s \right ) \), Sn = Ο±ΟnΞ½nΞ½n+β1 and \( \frac {1}{\theta _{n}}=\frac { e+ \nu _{n+1}}{ \rho _{n} \nu _{n}}\) (from (A.13)), which gives us
Combining the last two results amounts to
This completes the proof.
Rights and permissions
About this article
Cite this article
MaingΓ©, PE., Labarre, F. Accelerated methods with fastly vanishing subgradients for structured non-smooth minimization. Numer Algor 90, 99β136 (2022). https://doi.org/10.1007/s11075-021-01181-y
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11075-021-01181-y
Keywords
- Nesterov-type algorithm
- Inertial-type algorithm
- Global rate of convergence
- Non-smooth minimization
- Structured minimization
- Fast first-order method