Abstract
In this paper, we develop rapidly convergent forward–backward algorithms for computing zeroes of the sum of two maximally monotone operators. A modification of the classical forward–backward method is considered, by incorporating an inertial term (closed to the acceleration techniques introduced by Nesterov), a constant relaxation factor and a correction term, along with a preconditioning process. In a Hilbert space setting, we prove the weak convergence to equilibria of the iterates \((x_n)\), with worst-case rates of \( o(n^{-1})\) in terms of both the discrete velocity and the fixed point residual, instead of the rates of \(\mathcal {O}(n^{-1/2})\) classically established for related algorithms. Our procedure can be also adapted to more general monotone inclusions. In particular, we propose a fast primal-dual algorithmic solution to some class of convex-concave saddle point problems. In addition, we provide a well-adapted framework for solving this class of problems by means of standard proximal-like algorithms dedicated to structured monotone inclusions. Numerical experiments are also performed so as to enlighten the efficiency of the proposed strategy.
Similar content being viewed by others
Data availibility
We do not analyse or generate any datasets, because our work proceeds within a theoretical and mathematical approach. One can obtain the relevant materials from the references below.
References
Attouch, H., Cabot, A.: Convergence of a relaxed inertial forward–backward algorithm for structured monotone inclusions. Appl. Math. Optim. 80, 547–598 (2019)
Attouch, H., László, S.C.: Newton-like inertial dynamics and proximal algorithms governed by maximally monotone operators. SIAM J. Optim. 30(4), 3252–3283 (2020)
Attouch, H., Peypouquet, J.: The rate of convergence of Nesterov’s accelerated forward–backward method is actually faster than \(1/k^{2}\). SIAM J. Optim. 26(3), 1824–1834 (2016)
Attouch, H., Peypouquet, J.: Convergence of inertial dynamics and proximal algorithms governed by maximal monotone operators. Math. Program. 174, 391–432 (2019)
Attouch, H., Peypouquet, J., Redont, P.: Fast convex optimization via inertial dynamics with hessian driven damping. J. Differ. Equ. 261(10), 5734–5783 (2016)
Bauschke, H.H., Combettes, P.L.: Convex Analysis and Monotone Operator Theory in Hilbert Spaces. Springer, New York (2017)
Beck, A., Teboulle, M.: A fast iterative shrinkage-thresholding algorithm for linear inverse problems. SIAM J. Imaging Sci. 2(1), 183–202 (2009)
Bot, R.I., Sedlmayer, M., Vuong, P.T.: A relaxed inertial forward–backward–forward algorithm for solving monotone inclusions with application to GANs. CoRR arXiv:2003.07886 (2020)
Brezis, H.: Opérateurs Maximaux Monotones. Math. Stud., vol. 5. North-Holland, Amsterdam (1973)
Brezis, H., Lions, P.L.: Produits infinis de résolvantes. Isr. J. Math. 29, 329–345 (1978)
Cevher, V., Vu, B.C.: A reflected forward–backward splitting method for monotone inclusions involving Lipschitzian operators. Set-Valued Var. Anal. (2020). https://doi.org/10.1007/s11228-020-00542-4
Chambolle, A., Dossal, C.: On the convergence of the iterates of Fista. JOTA 166(3), 968–982 (2015)
Chambolle, A., Pock, T.: A first-order primal-dual algorithm for convex problems with applications to imaging. J. Math. Imaging Vis. 40(1), 120–145 (2011)
Combettes, P.L., Pesquet, J.-C.: Primal-dual splitting algorithm for solving inclusions with mixtures of composite, Lipschitzian, and parallel-sum type monotone operators. Set-Valued Var. Anal. 20(2), 307–330 (2012)
Combettes, P.L., Wajs, V.: Signal recovery by proximal forward–backward splitting. SIAM Multiscale Model. Simul. 4(4), 1168–1200 (2005)
Condat, L.: A primal-dual splitting method for convex optimization involving Lipschitzian, proximable and linear composite terms. J. Optim. Theory Appl. 158(2), 460–479 (2013)
Corman, E., Yuan, X.: A generalized proximal point algorithm and its convergence rate. SIAM J. Optim. 24(4), 1614–1638 (2014)
Daubechies, I., Defrise, M., De Mol, C.: An iterative thresholding algorithm for linear inverse problems with a sparsity constraint. Commun. Pure Appl. Math. 57, 1413–1457 (2004)
Douglas, J., Rachford, H.H.: On the numerical solution of heat conduction problems in two and three space variables. Trans. Am. Math. Soc. 82, 421–439 (1956)
Drori, Y., Teboulle, M.: Performance of first-order methods for smooth convex minimization: a novel approach. Math. Program. 145(1–2), 451–482 (2014)
Duchi, J., Singer, Y.: Efficient online and batch learning using forward backward splitting. J. Mach. Learn. Res. 10, 2899–2934 (2009)
Eckstein, J., Bertsekas, D.P.: On the Douglas–Rachford splitting method and the proximal point algorithm for maximal monotone operators. Math. Program. 55(1–3), 293–318 (1992)
Güler, O.: On the convergence of the proximal point algorithm for convex minimization. SIAM J. Control Optim. 29, 403–419 (1991)
Güler, O.: New proximal point algorithms for convex minimization. SIAM J. Optim. 2(4), 649–664 (1992)
Kim, D.: Accelerated proximal point method for maximally monotone operators. Math. Program. 190, 57–87 (2021). https://doi.org/10.1007/s10107-021-01643-0
Labarre, F.: Approche numérique de problèmes d’optimisation et applications. PhD thesis, University of Antilles (2021)
Lemaire, B.: The proximal algorithm. In: Penot, J.P. (ed.) New Methods in Optimization and Their Industrial Uses. Internat. Ser. Numer. Math., vol. 87, pp. 73–87. Birkhauser, Basel (1989)
Lions, P.L., Mercier, B.: Splitting algorithms for the sum of two nonlinear operators. SIAM J. Numer. Anal. 16, 964–979 (1979)
Lorenz, D.A., Pock, T.: An inertial forward–backward algorithm for monotone inclusions. J. Math. Imaging Vis. 51, 311–325 (2015)
Maingé, P.E.: Accelerated proximal algorithms with a correction term for monotone inclusions. Appl. Math. Optim. 84, 2027–2061 (2021)
Maingé, P.E.: Fast convergence of generalized forward–backward algorithms for structured monotone inclusions. J. Convex Anal. 29, 893–920 (2022)
Maingé, P.E., Labarre, F.: Accelerated methods with fastly vanishing subgradients for structured non-smooth minimization. Numer. Algorithms (2021). https://doi.org/10.1007/s11075-021-01181-y
Maingé, P.E., Weng-Law, A.: Fast continuous dynamics inside the graph of maximally monotone operators. Set-Valued Var. Anal. (2023). https://doi.org/10.1007/s11228-023-00663-6
Martinet, B.: Régularisation d’inéquations variationnelles par approximations successives. Rev. Fr. Infor. Rech. Opération. 4, 154–158 (1970)
Moudafi, A., Oliny, M.: Convergence of a splitting inertial proximal method for monotone operators. J. Comput. Appl. Math. 155(2), 447–454 (2003)
Nesterov, Y.: A method of solving a convex programming problem with convergence rate O(1/k2). Soviet Math. Doklady 27, 372–376 (1983)
Nesterov, Y.: A method for unconstrained convex minimization problem with the rate of convergence O(1/k2). Dokl. Akad. Nauk. USSR 269(3), 543–7 (1983)
Nesterov, Y.: Gradient methods for minimizing composite objective function. Math. Program. Ser. B 140, 125–161 (2013). https://doi.org/10.1007/s10107-012-0629-5
O’Donoghue, B., Candès, E.: Adaptive restart for accelerated gradient schemes. Found. Comput. Math. 15(3), 715–32 (2015). https://doi.org/10.1007/s10208-013-9150-3
Passty, G.B.: Ergodic convergence to a zero of the sum of monotone operators in Hilbert space. J. Math. Anal. Appl. 72, 383–390 (1979)
Peaceman, D.W., Rachford, H.H.: The numerical solution of parabolic and elliptic differential equations. J. Soc. Ind. Appl. Math. 3(1), 28–41 (1955)
Raguet, H., Fadili, J., Peyré, G.: A generalized forward–backward splitting. SIAM J. Imaging Sci. 6, 1199–1226 (2013)
Rockafellar, R.T.: Monotone operators and the proximal point algorithm. SIAM J. Control Opt. 14(5), 877–898 (1976)
Rockafellar, R.T., Wets, J.B.: Variational Analysis. Springer, Berlin (1998)
Rudin, L., Osher, S., Fatemi, E.: Nonlinear total variation based noise removal algorithms. Physica D 60(1–4), 259–268 (1992)
Sebestyen, Z., Tarcsay, Z.: On the square root of a positive selfadjoint operator. Period. Math. Hung. 75, 268–272 (2017)
Shi, B., Du, S.S., Jordan, M.I., Su, W.J.: Understanding the acceleration phenomenon via high-resolution differential equations (2018). https://doi.org/10.13140/RG.2.2.20063.92329
Tseng, P.: A modified forward–backward splitting method for maximal monotone mappings. SIAM J. Control. Optim. 38(2), 431–446 (2000)
Villa, S., Salzo, S., Baldassarre, L., Verri, A.: Accelerated and inexact forward–backward algorithms. SIAM J. Optim. 23(3), 1607–1633 (2013)
Vu, B.: A splitting algorithm for dual monotone inclusions involving cocoercive operators. Adv. Comput. Math. 38(3), 667–681 (2013)
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Appendix
Appendix
1.1 Appendix A.1: Proof of Lemma 2.2
It is divided into the following several parts.
1.1.1 Appendix A.1.1: Proof of Lemma 2.2 (part 1): an estimation from a more general model
The next proposition was established in [32, Proposition 3.2].
Proposition A.1
Let \(\{ x_n, y_n, d_n\} \subset \mathcal {H}\) verify (2.16) along with \(\{e_*, \kappa , \nu _n\} \subset (0,\infty )\), and suppose that condition (2.11a) holds (for some integer \(n_1\)). Then for \((s,q) \in ( 0, \infty ) \times \mathcal {H}\) and for \(n \ge n_1\) we have
where \(\rho _n\) and \({T} _{n}(s)\) are given by (2.9) and (2.10), respectively, while \(G_n(s,q)\) is defined by
1.1.2 Appendix A.1.2: Proof of Lemma 2.2 (part 2)
At once we use Proposition A.1 so as to get estimates related to our proposed model. The following lemma is stated in the general setting of the parameter \(\gamma _n\):
Lemma A.2
Let \(\{ z_n, x_n, y_n \} \subset \mathcal {H}\) be generated by (1.8) along with \((\theta _n)\) given in (1.9) and parameters \(\{\gamma _n, e_*, \kappa , t, \mu , \nu _n \} \subset (0,\infty )\) verifying (2.11a) (for some integer \(n_1\)). Then, for any \((s,q) \in (0, \infty ) \times \mathcal {H}\) and for \(n \ge n_1\), we have
where \(\vartheta _{n}=1-s\rho _n (e_*+\nu _{n+1}) ^{-1}\) and \(Z_n=\langle t G (z_n)- t G (z_{n-1}),\dot{x}_{n+1} \rangle \).
Proof
It can be observed (in light of Remark 2.2) that the iterates \(\{z_n, x_n, y_n \}\) generated by CRIFBA enter the special case of algorithm (2.16) when \(d_n= t (G z_n)- t \gamma _n (G z_{n-1})\). Hence by Proposition A.1, and denoting \(W_n= \Vert \dot{x}_{n+1} + \theta _n (y_{n} -x_{n} )\Vert ^2\) (for the sake of simplicity), we obtain
where \(\tau _n=e_*+ \nu _{n+1}\) and \(\vartheta _{n}= 1 -s \frac{\rho _n}{\tau _n}\) (\(\vartheta _{n}\) being also depending on s). Moreover, setting \( U_{n} =\langle t G(z_{n-1}), x_{n} -q\rangle \) and \(H_n=\langle t G (z_{n-1}), \dot{x}_{n+1} \rangle \) we have
while denoting \(Z_n= \langle t G (z_n) - t G (z_{n-1}),\dot{x}_{n+1} \rangle \) (hence \(\langle G(z_n),\dot{x}_{n+1} \rangle = Z_n + H_n\)) gives us
Then, using the previous two results and checking that \( s\tau _n + \vartheta _{n}\rho _n^{-1} \tau _n^2= \rho _n^{-1} \tau _n^2\) amounts to
Thus, in light of (A.4) along with the previous equality, we infer that
It is also simply seen that
which by (A.6) and noticing that \({\mathcal E}_{n}(s,q)={G}_{n}(s,q) + s \tau _{n-1} U_{n}\) leads to
This completes the proof. \(\square \)
1.1.3 Appendix A.1.3: Proof of Lemma 2.2 (Part 3)
We prove the two results (2.12) and (2.14) successively.
Let us begin with proving (2.12). Suppose that (2.11a) holds (for some integer \(n_1\)). For \(n \ge n_1\), by \(\vartheta _{n}= 1-s \rho _n (e_*+\nu _{n+1})^{-1}\) and \(\gamma _n= 1-s_0 \rho _n (e_*+\nu _{n+1})^{-1}\), we readily have
Consequently, by (A.3) and (A.7a) we are led to (2.12), namely
where we recall that \(Z_n=\langle t G (z_n)- t G (z_{n-1}),\dot{x}_{n+1} \rangle \).
Now we prove (2.14). Suppose that (2.11b) holds (for some large enough integer \(n_1\) and some value \(\theta \)) and that (2.13) is satisfied. From the condition \(\nu _{n+1} \sim \nu _n\) (as \(n \rightarrow \infty \)) we clearly have \(\lim _{n \rightarrow \infty } (1-\kappa ) \frac{\nu _{n+1}}{\nu _n}=1-\kappa <1\). So, for \(n \ge n_1\) (with \(n_1\) large enough) we simply deduce \((1-\kappa )\frac{\nu _{n+1}}{\nu _n}<1\). It follows that (2.11a) is verified (for some integer \(n_1\)). Hence for \(n \ge n_1\), by the previous arguments we deduce that (A.8) holds. Thus, taking \(s=s_0\) in (A.8) and noticing that \(\vartheta _{n}\) reduces to \(\gamma _n\), we get
We proceed with our reasoning by estimating the right side of the previous equality. Clearly, by definition of \(T_n(s_0)\) we have
Moreover, given \(\varepsilon >0\), thanks to Young’s inequality we classically obtain
hence, noticing that \(\theta _n = \frac{\nu _n \rho _n}{e_*+\nu _{n+1} }\), we obtain
So, combining this last result with (A.10) yields
It is also readily seen that \(1-\rho _n=(1-\kappa )\nu _{n+1}\nu _n^{-1}\), so that the previous inequality can be rewritten as
or equivalently
where the quantities \(C_1(\varepsilon , \kappa )\) and \(C_2(\varepsilon , \kappa )\) are defined by
It is clear that for any given \(\kappa \in (0,2)\) we have \(|1-\kappa |<1\). Note that \(\kappa =1\) yields \(C_1(\varepsilon ,1)=C_2(\varepsilon ,1)=0\), while, for \(\kappa \ne 1\), by considering positive values \(\epsilon _0\) and \(\theta \in (0,\frac{1}{2})\) such that \( \varepsilon _0 = \frac{1}{2} \theta \frac{1-|1-\kappa |}{|1-\kappa | }\), we are led to
Then, for \(\kappa \in (0,2)\) and for \(n \ge n_1\), by inequality (A.11) we deduce that
where \( \bar{C}_1\) and \(\bar{C}_2\) are given by
Consequently, for \(n\ge n_1\), by (A.9) in light of (A.14) we infer that
It remains to estimate the terms \(\big ( 1- \bar{C}_1 \frac{\nu _{n+1}}{\nu _n} \big )\) and \(\big ( e_*+2 \nu _{n+1} - \bar{C}_2 \frac{\nu _{n+1}}{\nu _n} \big )\) arising in the previous inequality. Observe that, given \(\kappa \in (0,2)\), by the condition \(\nu _{n+1} \sim \nu _n\) (as \(n \rightarrow \infty \)) we obviously have
hence, for \(n \ge n_1\) (with \(n_1\) large enough) we get
It can be also checked without any difficulties that we have the following equivalencies
Then, for \(n \ge n_1\), by (A.17) and (A.18) we get
Moreover, for \(n \ge n_1\), it is not difficult to see that (2.11b) can be reformulated as
As a result, for \(n \ge n_1\), and letting c be introduced in (2.15), by (A.15a) in light of (A.19) and (A.20) we conclude that
which, by (A.16), amounts to (2.14). \(\square \)
1.2 Appendix A.2: Proof of Proposition 3.1
For any \(x,y \in \mathcal {H}\), by definition of \(\langle \cdot ,\cdot \rangle _M\) along with \({\mathcal B}:= M^{-1}B\) we readily have
hence condition (1.2b) equivalently writes
Thanks to the same previously used arguments we additionally have
which can be rewritten as
where \(L^{\frac{1}{2}}\) and \(L^{-\frac{1}{2}}\) are the square roots of the positive definite self-adjoint maps L and \(L^{-1}\), respectively. It is immediately deduced from (A.23) that
hence recalling that \(\Vert L^{\frac{1}{2}}\Vert ^2 = \Vert L\Vert \) (see Remark 3.2) leads us to
So, combining (A.22) and (A.25) amounts to
where \({\bar{\beta }}_M=\frac{1}{\Vert L\Vert \times \Vert M^{-1}\Vert }\). This proves that \({\mathcal B}\) is \({\bar{\beta }}_M\)-cocoercive in \((\mathcal {H}, \langle \cdot ,\cdot \rangle _M)\). \(\square \)
1.3 Appendix A.3: Proof of Proposition 3.5
For simplification reasons, we write G instead of \(G^M_{\mu }\) and, given any mapping \(\Gamma :\mathcal {H}\rightarrow \mathcal {H}\) and any \(\{x_1,x_2\} \subset \mathcal {H}\), we denote \(\Delta \Gamma (x_1,x_2)=\Gamma (x_1)-\Gamma (x_2)\). Let \(\bar{A}=M^{-1}A\), \(\bar{B}=M^{-1}B\) and \(C=I-\mu \bar{B}\). Clearly, \(\bar{A}\) is monotone in \((\mathcal {H},\Vert .\Vert _M)\). It is also seen for \( x \in \mathcal {H}\) that \(G(x) =\mu ^{-1} \big ( x -J_{\mu \bar{A}}\big (C(x) \big )\big ) \), or equivalently
where \(\bar{A}_{\mu }:=\mu ^{-1} \left( I -J_{\mu \bar{A}}\right) \) is the Yosida regularization of \(\bar{A}\). Now, given \((x_1, x_2) \in \mathcal {H}^2\), we get
hence, by \(I-C=\mu \bar{B}\), we equivalently obtain
Let us estimate separately the last two terms in the right side of the previous inequality. As a classical result, by the \(\mu \)-cocoercivity of \(\bar{A}_{\mu }\) in \((\mathcal {H},\Vert \cdot \Vert _M)\), we have
which by \( \bar{A}_{\mu }\circ C =G- \bar{B}\) (from (A.27)) can be rewritten as
Moreover, by \( \bar{A}_{\mu }\circ C =G- \bar{B}\) (from (A.27)), we simply get
Thus, by (A.28) and the previous arguments, we obtain
Hence, reminding that \(\bar{B}=M^{-1}B\), we equivalently obtain
Then by the cocoercivity assumption on B we infer that
Now, let us prove (3.13) for \(i=1\). From an easy computation, we obtain
hence, for \(\delta >0\), using successively Peter–Paul’s inequality and the previous one gives us
Therefore, combining this last inequality with (A.29) entails
that is (3.13) with \(i=1\).
Let us prove (3.13) for \(i=2\). Using again Peter–Paul’s inequality, we readily have
which, in light (A.29), gives us
that is (3.13) with \(i=2\). \(\square \)
1.4 Appendix A.4: A useful inequality
Let us recall the following basic but useful result.
Proposition A.3
For \(\epsilon \in (0,1)\), \(\{h, a_1,a_2, \sigma _1, \sigma _1 \}\subset [0,\infty )\) such that \(h \le \epsilon \sqrt{\sigma _1\sigma _2}\), we have the following inequality
Proof
Clearly, we have
which completes the proof. \(\square \)
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Maingé, PE., Weng-Law, A. Accelerated forward–backward algorithms for structured monotone inclusions. Comput Optim Appl 88, 167–215 (2024). https://doi.org/10.1007/s10589-023-00547-3
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10589-023-00547-3
Keywords
- Nesterov-type algorithm
- Optimal gradient method
- Inertial-type algorithm
- Global rate of convergence
- Fast first-order method
- Restarting techniques