Skip to main content
Log in

Accelerated methods with fastly vanishing subgradients for structured non-smooth minimization

  • Original Paper
  • Published:
Numerical Algorithms Aims and scope Submit manuscript

Abstract

In a real Hilbert space, we study a new class of forward-backward algorithms for structured non-smooth minimization problems. As a special case of the parameters, we recover the method AFB (Accelerated Forward-Backward) that was recently discussed as an enhanced variant of FISTA (Fast Iterative Soft Thresholding Algorithm). Our algorithms enjoy the well-known properties of AFB. Namely, they generate convergent sequences (xn) that minimize the function values at the rate o(nβˆ’β€‰2). Another important specificity of our processes is that they can be regarded as discrete models suggested by first-order formulations of Newton-like dynamical systems. This permit us to extend to the non-smooth setting, a property of fast convergence to zero of the gradients, established so far for discrete Newton-like dynamics with smooth potentials only. In specific, as a new result, we show that the latter property also applies to AFB. To prove this stability phenomenon, we develop a technical analysis that can be also useful regarding many other related developments. Numerical experiments are furthermore performed so as to illustrate the properties of the considered algorithms comparing with other existing ones.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5

Similar content being viewed by others

References

  1. Alvarez, F., Attouch, H., Bolte, J., Redont, P.: A second-order gradient-like dissipative dynamical system with Hessian driven damping. Application to Optimization and Mechanics. J. Math. Pures appl. 81(8), 747–779 (2002)

    ArticleΒ  MathSciNetΒ  Google ScholarΒ 

  2. Attouch, H., Bolte, J., Redont, P.: Optimizing properties of an inertial dynamical system with geometric damping. Control. Cybern. 31, 643–657 (2002)

    MATHΒ  Google ScholarΒ 

  3. Attouch, H., Cabot, A.: Convergence rates of inertial forward-backward algorithms. SIAM J. Optim. 28, 849–874 (2018)

    ArticleΒ  MathSciNetΒ  Google ScholarΒ 

  4. Attouch, H., Cabot, A.: Convergence of a relaxed inertial forward-backward algorithm for structured monotone inclusions. Applied Math. Optimization 80, 547–598 (2019)

    ArticleΒ  MathSciNetΒ  Google ScholarΒ 

  5. Attouch, H., Chbani, Z., Fadili, J., Riahi, H.: First-order optimization algorithms via inertial systems with Hessian driven damping, arXiv preprint, arXiv:1907.10536 (2019)

  6. Attouch, H., Chbani, Z., Peypouquet, J., Redont, P.: Fast convergence of inertial dynamics and algorithms with asymptotic vanishing viscosity, Math. programming, Volume 168, Issue 1–2, pp. 123–175 (2018)

  7. Attouch, H., Peypouquet, J.: The rate of convergence of Nesterov’s accelerated forward-backward method is actually faster than 1/k2. SIAM J. Optimization 26(3), 1824–1834 (2016)

    ArticleΒ  MathSciNetΒ  Google ScholarΒ 

  8. Attouch, J., Peypouquet, P.R.: Fast convex optimization via intertial dynamics with hessian driven damping. J Differential Equations 261 (10), 5734–5783 (2016)

    ArticleΒ  MathSciNetΒ  Google ScholarΒ 

  9. Beck, A., Teboulle, M.: A fast iterative shrinkage-thresholding algorithm for linear inverse problems. SIAM J. Imaging Sci. 2(1), 183–202 (2009)

    ArticleΒ  MathSciNetΒ  Google ScholarΒ 

  10. Brezis, H.: OpΓ©rateurs Maximaux Monotones, Math. Stud, 5. North-Holland, Amsterdam (1973)

  11. Chambolle, A., Dossal, C.: On the convergence of the iterates of FISTA. JOTA 166(3), 968–982 (2015)

    ArticleΒ  Google ScholarΒ 

  12. Cruz, J.B., Nghia, T.: On the convergence of the proximal forward-backward splitting method with linesearches. Optim. Methods and Software 31 (6), 1209–1238 (2016)

    ArticleΒ  MathSciNetΒ  Google ScholarΒ 

  13. GΓΌler, O.: On the convergence of the proximal point algorithm for convex minimization. SIAM J. Control Optimization 29, 403–419 (1991)

    ArticleΒ  MathSciNetΒ  Google ScholarΒ 

  14. GΓΌler, O.: New proximal point algorithms for convex minimization. SIAM J. Optim. 2(4), 649–664 (1992)

    ArticleΒ  MathSciNetΒ  Google ScholarΒ 

  15. Iutzeler, F., Hendrickx, J.M.: A Generic online acceleration scheme for Optimization algorithms via Relaxation and Inertia arXiv:1603.05398v3 (2017)

  16. Lemaire, B.: The Proximal Algorithm. In: New Methods in Optimization and Their Industrial Uses, J.P. Penot (Ed), Internat. Ser. Numer. Math, 87, pp. 73-87. Birkhauser, Basel (1989)

  17. Lions, P.L., Mercier, B.: Splitting algorithms for the sum of two nonlinear operators. SIAM J. Numer. Anal. 16, 964–979 (1979)

    ArticleΒ  MathSciNetΒ  Google ScholarΒ 

  18. Lorenz, D.A., Pock, T.: An inertial forward-backward algorithm for monotone inclusions. J. Math. Imaging Vision, pp. 1–15 (2014)

  19. Labarre, F., MaingΓ©, P. E.: First-order frameworks for continuous Newton-like dynamics governed by maximally monotone operators. Set-Valued and Variational Analysis, pp. 1–27 (2021)

  20. MaingΓ©, P.E., Maruster, S.: Convergence in norm of modified Krasnoselski-Mann iterations for fixed points of demicontractive mappings Applied Mathematics and Computation. Elsevier 217(24), 9864–9874 (2011)

    MATHΒ  Google ScholarΒ 

  21. May, R.: Asymptotic for a second order evolution equation with convex potential and vanishing damping term. Turkish Journal of Mathematics, 41(3). https://doi.org/10.3906/mat-1512-28 (2015)

  22. Moudafi, A., Oliny, M.: Convergence of a splitting inertial proximal method for monotone operators. J. Comput. Appl. Math. 155(2), 447–454 (2003)

    ArticleΒ  MathSciNetΒ  Google ScholarΒ 

  23. Nesterov, Y.: A method of solving a convex programming problem with convergence rate O(1/k2). Soviet Mathematics Doklady 27, 372–376 (1983)

    MATHΒ  Google ScholarΒ 

  24. Nesterov, Y.: Gradient methods for minimizing composite objective function. Math. Programming, Ser. B 140, 125–161 (2013). https://doi.org/10.1007/s10107-012-0629-5

    ArticleΒ  Google ScholarΒ 

  25. Opial, Z.: Weak convergence of the sequence of successive approximations for nonexpansive mappings. Bull. Amer. Math. Soc. 73, 591–597 (1967)

    ArticleΒ  MathSciNetΒ  Google ScholarΒ 

  26. Passty, G.B.: Ergodic convergence to a zero of the sum of monotone operators in Hilbert space. J. Math. Anal. Appl. 72, 383–390 (1979)

    ArticleΒ  MathSciNetΒ  Google ScholarΒ 

  27. Su, W., Boyd, S., Candes, E. J.: A differential equation for modeling Nesterov’s accelerated gradient method: theory and insights. J. Machine learning Reasearch 17(153), 1–43 (2016)

    MathSciNetΒ  MATHΒ  Google ScholarΒ 

  28. Scheinberg, D.K., Goldfarb, X.: Bai, Fast first-order methods for composite convex optimization with backtraking. Found. Comput. Math. 14(3), 389–417 (2014)

    ArticleΒ  MathSciNetΒ  Google ScholarΒ 

  29. Shi, B., Du, S.S., Jordan, M.I., Su, W.J.: Understanding the acceleration phenomenon via High-Resolution differential equations. https://doi.org/10.13140/RG.2.2.20063.92329 (2018)

  30. Apidopoulos, V., Aujol, J.F., Dossal, C.: Convegence rate of inertial forward-backward algorithms beyong Nesterov’s rule, Mathematical Programming, Serie A, Springer, pp. 1-20 (ff10.1007/s10107-018-1350-9) (2018)

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Paul-Emile MaingΓ©.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendix 1. Proof of Proposition 3.2

Appendix 1. Proof of Proposition 3.2

Let {ΞΊ, e, Ξ½n} be positive parameters, and set Ο± = 1 βˆ’ ΞΊ, Ο„n = e + Ξ½n+ 1 and un = yn βˆ’ xn. It is easily seen that (3.8a) can be rewritten as (for n β‰₯ p)

$$ \begin{array}{@{}rcl@{}} && \theta_n = \frac{1}{\tau_{n} } \left (\nu_{n} - {\varrho} \nu_{n+1} \right ), \end{array} $$
(A.1a)
$$ \begin{array}{@{}rcl@{}} & & \dot{x}_{n+1} + \chi^{*}_{n} + \theta_n u_n =0, \end{array} $$
(A.1b)
$$ \begin{array}{@{}rcl@{}} & & \dot{y}_{n+1} + \kappa u_{n} =0. \end{array} $$
(A.1c)

The sequel of the proof can be divided into the following parts (r1)–(r4):

(r1) An estimate from the inertial part of the method. Given \((s,q) \in [0,\infty ) \times {\mathcal H} \), we begin with proving that the discrete derivative \( \dot {G}_{n+1}(s,q)\) satisfies

$$ \begin{array}{l} \dot{G}_{n+1}(s,q) + s \tau_n \langle \chi^{*}_{n}, x_{n+1} -q \rangle =\\ {\kern48pt}- \left (s \nu_n+ {\varrho} \nu_{n+1}^{2} \right ) \langle \dot{x}_{n+1} , u_n\rangle \\ {\kern48pt}- \frac{1}{2} \left (\nu_n^{2} - {\varrho}^{2} \nu_{n+1}^{2}\right ) \| u_n \|^{2}- \frac{1}{2} \left (s e - \nu_{n+1}^{2} \right ) \|\dot{x}_{n+1} \|^{2} . \end{array} $$
(A.2)

In order to get this result, we readily notice that \(\dot {G}_{n+1}(s,q)\) can be formulated as

$$ \begin{array}{l} \dot{G}_{n+1}(s,q) = s (\dot{\nu}_{n+1} a_{n}+ \nu_{n+1}\dot{a}_{n+1} ) + s e \dot{b}_{n+1} + \nu^{2}_{n+1}\dot{c}_{n+1}+ {c}_{n} (\nu_{n+1}^{2} - {\nu_{n}^{2}}), \end{array} $$
(A.3)

where an := γ€ˆq βˆ’ xn,un〉, bn := (1/2)βˆ₯xn βˆ’ qβˆ₯2 and cn := (1/2)βˆ₯unβˆ₯2. For the sake of clarity, and so as to estimate the right side of (A.3), we set

$$ \begin{array}{@{}rcl@{}} && P_{n}= \langle q-x_{n+1} , \dot{x}_{n+1} \rangle , R_{n}= \langle q-x_{n+1} , \dot{y}_{n+1} \rangle~ \text{and} ~W_{n}=\langle \chi^{*}_{n}, x_{n+1} -q \rangle . \end{array} $$

Clearly, by an = γ€ˆq βˆ’ xn,un〉 and \(u_{n}=- \frac {1 }{\kappa } \dot {y}_{n+1} \) (from (A.1c)), we get

$$ \begin{array}{l} a_n= \langle q-x_{n+1} ,u_n\rangle + \langle \dot{x}_{n+1} , u_n\rangle = - \frac{1 }{\kappa} R_n + \langle \dot{x}_{n+1} , u_n\rangle . \end{array} $$
(A.4)

Again from an = γ€ˆq βˆ’ xn,un〉, and noticing \(\dot {u}_{n+1} = \dot {y}_{n+1} - \dot {x}_{n+1} \) (as un = yn βˆ’ xn), we readily have

$$ \dot{a}_{n+1} = \langle -\dot{x}_{n+1} , u_n \rangle + \langle q -x_{n+1} , \dot{u}_{n+1} \rangle = - \langle \dot{x}_{n+1} , u_n \rangle - P_n + R_n. $$
(A.5)

Taking the scalar product of each side of (A.1b) by \(q-\dot {x}_{n+1}\), along with \(u_{n}=- \frac {1 }{\kappa } \dot {y}_{n+1}\) (from (A.1c)), amounts to Pn βˆ’ Wn = ΞΊβˆ’β€‰1πœƒnRn, which, by \( \theta _{n} = \tau _{n}^{-1} (\nu _{n} - {\varrho } \nu _{n+1})\) (from (A.1a)) is equivalent to

$$ (\nu_{n} - {\varrho} \nu_{n+1}) R_n= \kappa \tau_n(P_n - W_n). $$
(A.6)

Therefore, by (A.4), (A.5) and (A.6), and recalling that Ο± = 1 βˆ’ ΞΊ, we are led to

$$ \begin{array}{@{}rcl@{}} &&\dot{\nu}_{n+1} a_n+ \nu_{n+1}\dot{a}_{n+1}\\ &=&\dot{\nu}_{n+1} \left ( \langle \dot{x}_{n+1} , u_n\rangle - \frac{1 }{\kappa} R_n\right ) + \nu_{n+1}\left (- \langle \dot{x}_{n+1} , u_n \rangle - P_n + R_n \right )\\ &=& (\dot{\nu}_{n+1}-\nu_{n+1} ) \langle \dot{x}_{n+1} , u_n \rangle - \nu_{n+1} P_n + \left (\nu_{n+1} - \frac{1 }{\kappa}\dot{\nu}_{n+1} \right ) R_n\\ &=& - \nu_n \langle \dot{x}_{n+1} , u_n \rangle - \nu_{n+1} P_n + \frac{1}{\kappa} \left (\nu_{n} - {\varrho} \nu_{n+1}\right ) R_n\\ &=& - \nu_n \langle \dot{x}_{n+1} , u_n \rangle - \nu_{n+1} P_n + \tau_n \left ( P_n - W_n \right ). \end{array} $$
(A.7)

From bn+ 1 = (1/2)βˆ₯xn+ 1 βˆ’ qβˆ₯2, we readily get

$$ \begin{array}{l} \dot{b}_{n+1} =\frac{1}{2}\langle \dot{x}_{n+1} , x_{n+1} -q \rangle + \frac{1}{2}\langle x_n -q , \dot{x}_{n+1} \rangle =-P_n - \frac{1}{2} \| \dot{x}_{n+1} \|^{2}. \end{array} $$
(A.8)

In addition, by cn+ 1 = (1/2)βˆ₯un+ 1βˆ₯2, and \(\dot {u}_{n+1}= - \kappa u_{n} - \dot {x}_{n+1} \) (from un = yn βˆ’ xn and \(\dot {y}_{n+1} =-\kappa u_{n}\)), we immediately have

$$ \begin{array}{@{}rcl@{}} \dot{c}_{n+1} &=& \frac{1}{2} \langle \dot{u}_{n+1}, {u}_{n+1} + u_n \rangle\\ &=&\langle - \dot{u}_{n+1}, -\frac{1}{2}\dot{u}_{n+1} -u_n\rangle\\ &=&\langle \kappa u_n + \dot{x}_{n+1} , \left (\frac{\kappa}{2} - 1 \right ) u_n + \frac{1}{2}\dot{x}_{n+1} \rangle\\ &=&\frac{1}{2} \| \dot{x}_{n+1} \|^{2}- \kappa \left (1- \frac{\kappa}{2} \right ) \|u_n\|^{2} - {\varrho} \langle \dot{x}_{n+1} ,u_n \rangle . \end{array} $$
(A.9)

In light of (A.3) together with (A.7), (A.8) and (A.9), we are led to

$$ \begin{array}{@{}rcl@{}} &&\dot{G}_{n+1}(s,q)\\ &&= s \left (- \nu_{n} \langle \dot{x}_{n+1} , u_{n} \rangle - \nu_{n+1} P_{n} + \tau_{n} \left ( P_{n} - W_{n} \right ) \right )\\ &&\quad+ se \left (-P_{n} - \frac{1}{2} \| \dot{x}_{n+1} \|^{2} \right ) \\ &&\quad+ \nu_{n+1}^{2} \left (\frac{1}{2} \| \dot{x}_{n+1} \|^{2}- \kappa \left (1- \frac{\kappa}{2} \right ) \|u_{n}\|^{2} - {\varrho} \langle \dot{x}_{n+1} ,u_{n} \rangle \right ) \\ &&\quad+ \frac{1}{2} (\nu_{n+1}^{2} - {\nu_{n}^{2}})\|u_{n}\|^{2}\\ \\ &&= -\left ( s\nu_{n} + {\varrho} \nu_{n+1}^{2} \right ) \langle \dot{x}_{n+1} ,u_{n} \rangle - \frac{1}{2} \left (se - \nu_{n+1}^{2} \right ) \|\dot{x}_{n+1} \|^{2} -\bar{\eta}_{n} \|u_{n}\|^{2} - s \tau_{n} W_{n}, \end{array} $$

where the quantity \( \bar {\eta }_{n}\) is given by

$$ \begin{array}{l} \bar{\eta}_{n}= \kappa \left (1- \frac{ \kappa}{2} \right )\nu_{n+1}^{2} - \frac{ 1 }{2}(\nu_{n+1}^{2} - {\nu_{n}^{2}}) \\ {\kern12.5pt}= \frac{1}{2} \left ({\nu_{n}^{2}}- \nu_{n+1}^{2} (1-\kappa)^{2}\right ) (\text{since}~ \kappa \left (1- \frac{ \kappa}{2} \right ) = \frac{1}{2} -\frac{1}{2}(1-\kappa)^{2} ). \end{array} $$

This leads to the desired result.

(r2) An estimate from the proximal part of the method. Let us establish that, for any ΞΎn≠ 1, it holds that

$$ \begin{array}{l} \xi_n \langle \chi_n^{*},\dot{x}_{n+1} \rangle + \frac{ 1 }{2} \| \dot{x}_{n+1} + \theta_n u_n\|^{2} \\ {}= \theta_n (1-\xi_n) \langle u_n,\dot{x}_{n+1} \rangle + \frac{1}{2} \theta_n^{2} \|u_n\|^{2} - \left (\xi_n- \frac{1}{2} \right ) \|\dot{x}_{n+1} \|^{2}. \end{array} $$
(A.10)

Indeed, we have \(\dot {x}_{n+1} =-\theta _n u_{n} -\chi _{n}^{*}\) (from (A.1b)), which, for any ΞΎn≠ 1, can be rewritten as

$$ \begin{array}{l} \xi_n \dot{x}_{n+1} = -(1-\xi_n) \left (\dot{x}_{n+1} + (1-\xi_n)^{-1}\theta_n u_n \right ) - \chi_n^{*} = -(1-\xi_n) H_n - \chi_n^{*} , \end{array} $$
(A.11)

where \(H_{n}= \dot {x}_{n+1} + (1-\xi _n)^{-1}\theta _n u_{n} \). Furthermore, by \(-\chi _{n}^{*}= \dot {x}_{n+1}+\theta _{n} u_{n}\) (again using (A.1b)) and denoting \(Q_{n}=\langle \dot {x}_{n+1}, u_{n}\rangle \), we simply obtain

$$ \begin{array}{l} \langle (-\chi_n^{*}), H_n \rangle = \langle \dot{x}_{n+1} + \theta_n u_n, \dot{x}_{n+1} + (1-\xi_n)^{-1}\theta_n u_n \rangle \\ {}= \|\dot{x}_{n+1} \|^{2} + (1-\xi_n)^{-1}\theta_n^{2} \|u_n\|^{2} + \frac{2-\xi_{n}}{(1-\xi_n)}\theta_n Q_{n}.\ \end{array} $$
(A.12)

Therefore, by adding \((1/2) \| \chi _{n}^{*}\|^{2}\) to the scalar product of the left side of equality (A.11) with \(\chi _{n}^{*}\), and using (A.12) and \( \| \chi _{n}^{*}\|^{2}= \|\dot {x}_{n+1}+ \theta _{n} u_{n}\|^{2}\), we get

$$ \begin{array}{l} \xi_{n} \langle \chi_{n}^{*},\dot{x}_{n+1} \rangle + \frac{1}{2} \| \chi_{n}^{*}\|^{2} = (1-\xi_n) \langle (-\chi_{n}^{*}), H_{n} \rangle - \frac{1}{2} \| \chi_{n}^{*}\|^{2}\\ {\kern99pt}= (1-\xi_n) \left (\|\dot{x}_{n+1} \|^{2} + \frac{ \theta_n^{2} }{(1-\xi_n)} \|u_{n}\|^{2} + \frac{2-\xi_{n}}{(1-\xi_n)} \theta_n Q_{n} \right ) \\ {\kern110pt}- \frac{1}{2}\left (\|\dot{x}_{n+1} \|^{2} + \theta_n^{2} \|u_{n}\|^{2} + 2 \theta_n Q_{n}\right ) \\ {\kern99pt}= (1-\xi_{n}) \theta_n Q_{n} + \frac{1}{2}\theta_n^{2} \|u_{n}\|^{2} + \left (\frac{1}{2} -\xi_{n} \right ) \|\dot{x}_{n+1} \|^{2} . \end{array} $$

This yields (A.10).

(r3) Combining proximal and inertial effects. Let (3.10) hold and set \(\rho _{n}= 1- (1-\kappa ) \frac {\nu _{n+1}}{\nu _{n}}\). It is worthwhile noticing that the term πœƒn involved in (A.1) can be simply expressed as

$$ \theta_n =\frac{\nu_{n} \rho_{n} }{ \tau_{n}} ~\text{where}~ \tau_{n}=e+\nu_{n+1}. $$
(A.13)

So, by (A.13), in light of ρn > 0 (from condition (3.3a)), we deduce that πœƒn is a positive sequence.

Now, we introduce the real sequence \((\bar {\gamma }_{n})\) defined by

$$ \bar{\gamma}_{n}=1- \frac{s \rho_{n}}{\tau_{n}} ~(\text{with}~ s > 0). $$
(A.14)

Clearly, by (A.14), along with Ο„n > 0 (as Ο„n := e + Ξ½n+ 1) and ρn > 0, we obviously have

$$ \bar{\gamma}_{n} < 1 ~(\text{for any}~ s >0). $$
(A.15)

Next, given s > 0, we show that the iterates produced by (3.8a) (or, equivalently, by (A.1)) verify

$$ \begin{array}{l} \dot{G}_{n+1}(s,q) + \frac{1}{2} \rho_n^{-1} \tau_n^{2} \|\dot{x}_{n+1} + \theta_n u_n\|^{2} \\ {}+ (s \tau_n)\langle \chi_n^{*}, x_{n+1} -q \rangle + \bar{\gamma}_{n} \rho_n^{-1}\tau_n^{2} \langle \chi_n^{*},\dot{x}_{n+1} \rangle = -T_n(u_n,\dot{x}_{n+1} ) , \end{array} $$
(A.16)

where Tn(u, x) is defined for any \((u,x) \in {\mathcal H}^{2}\) by

$$ T_{n}(u,x) = w_{n} \langle u, x \rangle +\eta_{n} \| u \|^{2}+ \sigma_{n} \|x\|^{2}, $$
(A.17)

together with the parameters

$$ \begin{array}{@{}rcl@{}} && {w}_{n} = {\varrho} \left (\nu_{n+1}^{2} + s \nu_{n+1} \right ), {\eta}_{n}= \frac{1}{2} {\varrho} \rho_{n} \nu_{n} \nu_{n+1} , \end{array} $$
(A.18a)
$$ \begin{array}{@{}rcl@{}} && {\sigma}_{n} =\frac{1}{2} \left (se - \nu_{n+1}^{2} + \rho_{n}^{-1} {\tau_{n}^{2}} \left (2 \bar{\gamma}_{n} - 1 \right ) \right ). \end{array} $$
(A.18b)

Indeed, by (A.2) and setting \(Q=\langle \dot {x}_{n+1} , u_{n} \rangle \), we know that

$$ \begin{array}{l} \dot{G}_{n+1}(s,q) + (s \tau_n)\langle \chi^{*}_{n}, x_{n+1} -q \rangle = \\ {\kern12pt}- \left (s \nu_n+ {\varrho} \nu_{n+1}^{2} \right ) Q_n - \frac{1}{2} \left (\nu_n^{2} - {\varrho}^{2} \nu_{n+1}^{2}\right ) \| u_n \|^{2} - \frac{1}{2} \left (s e - \nu_{n+1}^{2} \right ) \|\dot{x}_{n+1} \|^{2}. \end{array} $$
(A.19)

Moreover, in light of \(\bar {\gamma }_{n} \neq 1\) (from (A.15)), by using (A.10) (with the special value \(\xi _{n}= \gamma _{n}:=1- \frac {s \rho _{n}}{\tau _{n}}\)) and recalling that \(\theta _n = \frac {\nu _{n} \rho _{n}}{\tau _{n}}\) we obtain

$$ \begin{array}{l} \bar{\gamma}_{n} \langle \chi_{n}^{*},\dot{x}_{n+1} \rangle + \frac{ 1 }{2} \|\dot{x}_{n+1} + \theta_n u_{n}\|^{2}\\ {}= s \frac{\nu_{n} {\rho_{n}^{2}}}{{\tau_{n}^{2}}} Q_{n}+ \frac{1}{2} \frac{{\nu_{n}^{2}} {\rho_{n}^{2}}}{{\tau_{n}^{2}}} \|u_{n}\|^{2} - \left (\bar{\gamma}_{n} -\frac{1}{2} \right ) \|\dot{x}_{n+1} \|^{2} . \end{array} $$
(A.20)

Then, multiplying equality (A.20) by \(\rho _{n}^{-1} {\tau _{n}^{2}}\), and adding the resulting equality to (A.19), we get

$$ \begin{array}{l} \dot{G}_{n+1}(s,q) + (s \tau_{n})\langle \chi^{*}_{n}, x_{n+1} -q \rangle \\ {\kern12pt}+ \bar{\gamma}_{n} \rho_{n}^{-1} {\tau_{n}^{2}} \langle \chi_{n}^{*},\dot{x}_{n+1} \rangle + \frac{1}{2} \rho_{n}^{-1} {\tau_{n}^{2}} \|\dot{x}_{n+1} + \theta_n u_{n}\|^{2}\\ {}= \left (- \left (s \nu_{n}+ {\varrho} \nu_{n+1}^{2} \right ) + s \nu_{n} \rho_{n} \right ) Q_{n} \\ {\kern12pt}+ \left (- \frac{1}{2} \left ({\nu_{n}^{2}} - {\varrho}^{2} \nu_{n+1}^{2}\right ) + \frac{1}{2} {\nu_{n}^{2}} \rho_{n} \right ) \| u_{n} \|^{2} \\ {\kern12pt}\left ( \frac{1}{2} \left (s e - \nu_{n+1}^{2} \right ) + \rho_{n}^{-1} {\tau_{n}^{2}}\left (\bar{\gamma}_{n} -\frac{1}{2} \right ) \right ) \|\dot{x}_{n+1} \|^{2} . \end{array} $$

Hence, noticing that Ξ½nρn = Ξ½n βˆ’ Ο±Ξ½n+ 1, we infer that (A.16)–(A.17) is actually satisfied together with the parameters

$$ \begin{array}{l} {w}_{n} = \left (s \nu_{n} + {\varrho} \nu_{n+1}^{2} \right ) - s (\nu_{n} -{\varrho} \nu_{n+1}) = {\varrho} \left (\nu_{n+1}^{2} + s \nu_{n+1} \right ), \\ {\eta}_{n}= \frac{1}{2} \left ({\nu_{n}^{2}} - \nu_{n+1}^{2} {\varrho}^{2} \right ) - \frac{1}{2}\left ({\nu_{n}^{2}}- {\varrho} \nu_{n} \nu_{n+1}\right ) = \frac{1}{2} {\varrho} \nu_{n+1} \left ( \nu_{n} - \nu_{n+1} {\varrho} \right) = \frac{1}{2} {\varrho} \nu_{n+1} \nu_{n} \rho_{n}, \\ {\sigma}_{n} = \frac{1}{2} \left (se - \nu_{n+1}^{2} \right ) + \rho_{n}^{-1} {\tau_{n}^{2}} \left (\bar{\gamma}_{n} - \frac{1}{2} \right ) = \frac{1}{2} \left (se -\nu_{n+1}^{2} + \rho_{n}^{-1} {\tau_{n}^{2}} \left (2 \bar{\gamma}_{n} - 1 \right ) \right ) . \end{array} $$

This leads to the desired result.

(r4) Finally, we give an alternative formulation of the quantity Tn(u, x) given by (A.17)–(A.18). For this purpose, we begin with reformulating Οƒn. By the definitions Ο„n := e + Ξ½n+ 1, \(\bar {\gamma }_{n}:=1-s \frac {\rho _{n}}{\tau _{n}}\), and by an easy computation we have

$$ \begin{array}{l} \rho_{n}^{-1}{\tau_{n}^{2}} (2 \bar{\gamma}_{n} -1) = \frac{(e+ \nu_{n+1})^{2}}{ \rho_{n}} \left (1- 2s \frac{\rho_{n}}{(e+ \nu_{n+1})} \right ) \\ {\kern69pt}= \frac{1 }{\rho_{n}} \left (e^{2}+ 2 e \nu_{n+1} + (\nu_{n+1})^{2} \right )- 2s \left (e+ \nu_{n+1} \right ) \\ {\kern69pt}= e \left (\frac{e }{\rho_{n} } -s \right ) - s e + 2 \nu_{n+1} \left (\frac{e }{\rho_{n} } -s \right ) + \frac{(\nu_{n+1} )^{2} }{\rho_{n} } \\ {\kern69pt}= \left (e + 2 \nu_{n+1}\right ) \left (\frac{e }{\rho_{n} } -s \right ) - s e + \frac{(\nu_{n+1})^{2} }{\rho_{n} } \\ {\kern69pt}= \tau_{n,e} \left (e \rho_{n}^{-1} -s \right ) - s e + \rho_{n}^{-1} (\nu_{n+1} )^{2}, \end{array} $$

where Ο„n, t = t + 2Ξ½n+ 1 (for t β‰₯ 0). As a consequence, by the previous definition of Οƒn (in (A.18)), we obtain

$$ \begin{array}{l} 2 {\sigma}_{n}= (\rho_n^{-1}-1) (\nu_{n+1} )^{2} + \tau_{n,e} \left (e \rho_n^{-1}-s \right )\\ {\kern18pt}= \left ((\nu_{n+1} )^{2}+ e \tau_{n,e} \right ) \left (\rho_n^{-1} -1 \right ) + \tau_{n,e} (e -s). \end{array} $$
(A.21)

Then we consider the following two situations relative to the constant ΞΊ:

- In the special case when ΞΊ = 1 (hence ρn = 1 and \( \rho _{n}^{-1}=1\)), we obviously have wn = 0 and Ξ·n = 0. Then, for \((u,x) \in {\mathcal H}^{2}\), by definition of Tn (in (A.17)) along with \( {\sigma }_{n}= \frac {\left (e -s \right )}{2} \tau _{n,e} \) (from (A.21)) we obtain

$$ T_{n}(u,x)= \frac{\left (e -s \right )}{2} \tau_{n,e} \|x\|^{2}. $$
(A.22)

- For \(\kappa \in (0,1) \cup (1,\infty )\) (hence Ξ·n≠ 0), also setting \({\varsigma }_{n}:=\frac {w_{n}}{2 \eta _{n}}\), and \(\psi _{n}:= 4 {\sigma }_{n} {\eta }_{n}- {w}_{n}^{2}\), by definition of Tn (in (A.17)) we classically have

$$ T_{n}(u,x)= \eta_{n} \|u+{\varsigma}_{n}x \|^{2}+ \frac{\psi_{n}}{4 \eta_{n} } \|x\|^{2}. $$
(A.23)

On the one hand, by \({w}_{n} = {\varrho } \nu _{n+1} \left (\nu _{n+1} + s \right )\) (from (A.18)) and remembering that Ο„n, s = s + 2Ξ½n+ 1, we simply have \({w}_{n}^{2} = ({\varrho } \nu _{n+1})^{2} \left ((\nu _{n+1} )^{2} + s \tau _{n,s} \right )\). Hence, by (A.21) while using the definition of ψn, and setting Sn := ϱρnΞ½nΞ½n+ 1 (so that Sn = 2Ξ·n and \(\psi _{n}= 2 {\sigma }_{n} S_{n}- {w}_{n}^{2}\)), we obtain

$$ \psi_{n}= S_{n} \left ((\nu_{n+1} )^{2}+ e \tau_{n,e} \right ) \left (\rho_{n}^{-1} -1 \right ) + S_{n} \tau_{n,e} (e -s) - ({\varrho} \nu_{n+1})^{2} \left ( (\nu_{n+1} )^{2} + s \tau_{n,s} \right).$$

It is also easily checked that \(S_{n} \left (\rho _{n}^{-1} -1 \right ) = ({\varrho } \nu _{n+1})^{2} \), which by the previous equality yields

$$ \begin{array}{l} \psi_{n}= S_{n} \tau_{n,e} \left (e-s \right ) + ({\varrho} \nu_{n+1})^{2} \left (e \tau_{n,e} - s \tau_{n,s} \right ). \end{array} $$
(A.24)

Then, noticing that eΟ„n, e βˆ’ sΟ„n, s = (e βˆ’ s)Ο„n, e+s (as Ο„n, t := t + 2Ξ½n+ 1, for t β‰₯ 0), we infer that \( \psi _{n} = \left (e-s \right ) \left (S_{n} \tau _{n,e} + ({\varrho } \nu _{n+1})^{2} \tau _{n,e+s} \right )\), which by (A.23) entails that

$$ T_{n}(u,x)= \frac{1}{2} S_{n} \|u+{\varsigma}_{n} x \|^{2}+ \frac{\left (e-s \right )}{2 S_{n}} \left ( S_{n} \tau_{n,e} + ({\varrho} \nu_{n+1})^{2} \tau_{n,e+s} \right ) \|x\|^{2}. $$

On the other hand, we clearly have \({\varsigma }_{n}=\frac {w_{n}}{S_{n}}\) (since Sn = 2Ξ·n), together with \( {w}_{n} = {\varrho } \nu _{n+1} \left (\nu _{n+1} + s \right ) \), Sn = ϱρnΞ½nΞ½n+ 1 and \( \frac {1}{\theta _{n}}=\frac { e+ \nu _{n+1}}{ \rho _{n} \nu _{n}}\) (from (A.13)), which gives us

$$ \begin{array}{l} {\varsigma}_{n}= \frac{ \nu_{n+1} + s }{ \rho_{n} \nu_{n} } = \frac{ \left (e+ \nu_{n+1} \right ) -(e- s) }{ \rho_{n} \nu_{n} } = \frac{1}{\theta_n } - \frac{ (e-s) }{ \nu_{n} \rho_{n} }. \end{array} $$

Combining the last two results amounts to

$$ T_{n}(u,x)= \frac{1}{2} S_{n} \left \|u+ \left (\frac{1}{\theta_n } - \frac{ e-s }{ \nu_{n} \rho_{n} } \right ) x \right \|^{2}+ \frac{\left (e-s \right )}{2} \left ( \tau_{n,e} + \frac{({\varrho} \nu_{n+1})^{2}}{S_{n}} \tau_{n,e+s} \right ) \|x\|^{2}. $$

This completes the proof.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

MaingΓ©, PE., Labarre, F. Accelerated methods with fastly vanishing subgradients for structured non-smooth minimization. Numer Algor 90, 99–136 (2022). https://doi.org/10.1007/s11075-021-01181-y

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11075-021-01181-y

Keywords

Mathematics Subject Classification (2010)

Navigation