Skip to main content
Log in

An Inertial Parallel and Asynchronous Forward–Backward Iteration for Distributed Convex Optimization

  • Regular Paper
  • Published:
Journal of Optimization Theory and Applications Aims and scope Submit manuscript

Abstract

Two characteristics that make convex decomposition algorithms attractive are simplicity of operations and generation of parallelizable structures. In principle, these schemes require that all coordinates update at the same time, i.e., they are synchronous by construction. Introducing asynchronicity in the updates can resolve several issues that appear in the synchronous case, like load imbalances in the computations or failing communication links. However, and to the best of our knowledge, there are no instances of asynchronous versions of commonly known algorithms combined with inertial acceleration techniques. In this work, we propose an inertial asynchronous and parallel fixed-point iteration, from which several new versions of existing convex optimization algorithms emanate. Departing from the norm that the frequency of the coordinates’ updates should comply to some prior distribution, we propose a scheme, where the only requirement is that the coordinates update within a bounded interval. We prove convergence of the sequence of iterates generated by the scheme at a linear rate. One instance of the proposed scheme is implemented to solve a distributed optimization load sharing problem in a smart grid setting, and its superiority with respect to the nonaccelerated version is illustrated.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6

Similar content being viewed by others

Notes

  1. For two nonnegative real numbers x and y, it holds that \(xy\le \frac{\delta x^2}{2}+\frac{y^2}{2\delta }\) for every \(\delta >0\).

References

  1. Combettes, P.L., Vũ, B.C.: Variable metric forward–backward splitting with applications to monotone inclusions in duality. Optimization 63(9), 1289–1318 (2014)

    Article  MathSciNet  MATH  Google Scholar 

  2. Alvarez, F., Attouch, H.: An inertial proximal method for maximal monotone operators via discretization of a nonlinear oscillator with damping. Set-Valued Anal. 9(1), 3–11 (2001)

    Article  MathSciNet  MATH  Google Scholar 

  3. Liu, J., Wright, S.J.: Asynchronous stochastic coordinate descent: parallelism and convergence properties. SIAM J. Optim. 25(1), 351–376 (2015)

    Article  MathSciNet  MATH  Google Scholar 

  4. Peng, Z., Xu, Y., Yan, M., Yin, W.: ARock: an algorithmic framework for asynchronous parallel coordinate updates. SIAM J. Sci. Comput. 38(5), A2851–A2879 (2016)

    Article  MathSciNet  MATH  Google Scholar 

  5. Bertsekas, D.P., Tsitsiklis, J.N.: Parallel and Distributed Computation. Prentice Hall Inc., New Jersey (1989)

    MATH  Google Scholar 

  6. Wright, S.J.: Coordinate descent algorithms. Math. Program. 151(1), 3–34 (2015)

    Article  MathSciNet  MATH  Google Scholar 

  7. Bauschke, H., Combettes, P.: Convex Analysis and Monotone Operator Theory in Hilbert Spaces. Springer, New York (2011)

    Book  MATH  Google Scholar 

  8. Ruy, E., Boyd, S.: A primer on monotone operator methods. Appl. Comput. Math. 15(1), 3–43 (2016)

    MathSciNet  MATH  Google Scholar 

  9. Polyak, B.: Some methods of speeding up the convergence of iteration methods. USSR Comput. Math. Math. Phys. 4(5), 1–17 (1987)

    Article  Google Scholar 

  10. Ochs, P., Brox, T., Pock, T.: iPiasco: inertial proximal algorithm for strongly convex optimization. J. Math. Imaging Vis. 53(2), 171–181 (2015)

    Article  MathSciNet  MATH  Google Scholar 

  11. Gurbuzbalaban, M., Ozdaglar, A., Parrilo, P.: On the convergence rate of incremental aggregated gradient algorithms. SIAM J. Optim. 27(2), 1035–1048 (2017)

    Article  MathSciNet  MATH  Google Scholar 

  12. Polyak, B.: Introduction to Optimization. Optimization Software Inc., Publication Division, New York (1987)

    MATH  Google Scholar 

  13. Ghadimi, E., Feyzmahdavian, H.R., Johansson, M.: Global convergence of the heavy-ball method for convex optimization. In: 2015 European Control Conference (ECC), pp. 310–315. IEEE (2015)

  14. Moudafi, A., Oliny, M.: Convergence of a splitting inertial proximal method for monotone operators. J. Comput. Appl. Math. 155(2), 447–454 (2003)

    Article  MathSciNet  MATH  Google Scholar 

  15. Krasnosel’skiĭ, A.: Two remarks on the method of successive approximations. Uspekhi Matematicheskikh Nauk 10(1), 123–127 (1955)

  16. Mann, R.: Mean value methods in iteration. Proc. Am. Math. Soc. 4(3), 506–510 (1953)

    Article  MathSciNet  MATH  Google Scholar 

  17. Liang, J., Fadili, J., Peyré, G.: Convergence rates with inexact non-expansive operators. Math. Program. 159, 1–32 (2014)

    MathSciNet  MATH  Google Scholar 

  18. Alvarez, F.: Weak convergence of a relaxed and inertial hybrid projection-proximal point algorithm for maximal monotone operators in Hilbert space. SIAM J. Optim. 14(3), 773–782 (2004)

    Article  MathSciNet  MATH  Google Scholar 

  19. Maingé, P.E.: Convergence theorems for inertial KM-type algorithms. J. Comput. Appl. Math. 219(1), 223–236 (2008)

    Article  MathSciNet  MATH  Google Scholar 

  20. Iutzeler, F., Hendrickx, M.J.: A Generic Linear Rate Acceleration of Optimization algorithms via Relaxation and Inertia. arXiv preprint arXiv:1603.05398v2 (2016)

  21. Feyzmahdavian, H.R., Aytekin, A., Johansson, M.: A delayed proximal gradient method with linear convergence rate. In: IEEE International Workshop on Machine Learning for Signal Processing (MLSP), pp. 1–6 (2014)

  22. Combettes, P.L., Eckstein, J.: Asynchronous block-iterative primal-dual decomposition methods for monotone inclusions. Math. Program. 168, 645–672 (2016)

    Article  MathSciNet  MATH  Google Scholar 

  23. Mishchenko, K., Iutzeler, F., Malick, J.: A Distributed Flexible Delay-tolerant Proximal Gradient Algorithm. arXiv preprint arXiv:1806.09429 (2018)

  24. Raguet, H., Fadili, J., Peyré, G.: A generalized forward–backward splitting. SIAM J. Imaging Sci. 6(3), 1199–1226 (2013)

    Article  MathSciNet  MATH  Google Scholar 

  25. Raguet, H., Landrieu, L.: Preconditioning of a generalized forward–backward splitting and application to optimization on graphs. SIAM J. Imaging Sci. 8(4), 2706–2739 (2015)

    Article  MathSciNet  MATH  Google Scholar 

  26. Briceno-Arias, L.M.: Forward-Douglas-Rachford splitting and forward-partial inverse method for solving monotone inclusions. arXiv preprint arXiv:1212.5942 (2012)

  27. Davis, D.: Convergence rate analysis of the forward-Douglas-Rachford splitting scheme. arXiv preprint arXiv:1410.2654 (2015)

  28. Combettes, P.L., Condat, L., Pesquet, J.C., Vũ, B.C.: A forward–backward view of some primal-dual optimization methods in image recovery. In: The IEEE International Conference on Image Processing, pp. 4141–4145 (2014)

  29. Dunning, I., Huchette, J., Lubin, M.: Jump: a modeling language for mathematical optimization. SIAM Rev. 59(2), 295–320 (2017)

    Article  MathSciNet  MATH  Google Scholar 

  30. Zou, H., Hastie, T.: Regularization and variable selection via the elastic net. J. R. Stat. Soc. Ser. B 67, 301–320 (2005)

    Article  MathSciNet  MATH  Google Scholar 

  31. Fabietti, L., Gorecki, T.T., Namor, E., Sossan, F., Paolone, M., Jones, C.N.: Dispatching active distribution networks through electrochemical storage systems and demand side management (2017)

  32. Gorecki, T.T., Qureshi, F.A., Jones, C.N.: Openbuild: an integrated simulation environment for building control (2015)

  33. Swissgrid: Test for Secondary Control Capability. https://www.swissgrid.ch/dam/swissgrid/customers/topics/ancillary-services/prequalification/4/D171130-Test-for-secondary-control-capability-V3R0-EN.pdf (2003)

Download references

Acknowledgements

This project has received funding from the European Research Council (ERC) under the European Unions Horizon 2020 research and innovation programme (Grant Agreement No. 755445).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Giorgos Stathopoulos.

Additional information

Alexandre Cabot.

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendices

Appendices

1.1 Proof of Lemma 5.1

Proof

Using Assumption 1 and Eq. (7), we can start by rewriting \(y_B=(T_B x^i_\mathrm {read})[i]\).

$$\begin{aligned} (T_B x^i_\mathrm {read})[i]&= x^i_\mathrm {read}[i] - \gamma (Bx^i_\mathrm {read})[i] \\&= x_{k}[i] - a_k^i[i] - \gamma (Bx_{k})[i] + \gamma (Bx_{k})[i] - \gamma (Bx^i_\mathrm {read})[i] \\&= (T_Bx_k)[i] + \gamma ((Bx_{k})[i]-(Bx^i_\mathrm {read})[i]) - a_k^i[i] \\&= (T_Bx_k)[i] + d_k^i[i], \end{aligned}$$

where \(d_k[i]=\gamma \left( (Bx_{k})[i]-(Bx^i_\mathrm {read})[i]\right) - a_k^i[i]\).

Similarly, we have from Eq. (7) that

$$\begin{aligned} \beta ( y_\mathrm {write} - y_\mathrm {write}^\mathrm {prev} )&= \beta ( x_k[i] - b_k^i[i] - x_k[i] + c_k^i[i] ) \\&= \beta (c_k^i[i] - b_k^i[i]). \end{aligned}$$

Using the above relations, a coordinate update of iteration (6) can be expressed as

$$\begin{aligned} x_{k+1}[i] = x_k[i] + \eta (T_{A_{i}}( (T_B x_k)[i] + d_k[i] + \beta (c_k^i[i] - b_k^i[i])) - x_k[i]), \end{aligned}$$

or, equivalently, as

$$\begin{aligned} x_{k+1}[i] = x_k[i] + \eta ( T_{A_{i}}((T_B x_k)[i]) - x_k[i] + e_k[i]), \end{aligned}$$

with \(e_k[i] = T_{A_{i}}((T_Bx_k)[i] + d_k[i] + \beta (c_k^i[i]-b_k^i[i])) - T_{A_{i}}((T_Bx_k)[i])\), which concludes the proof. \(\square \)

1.2 Proof of Lemma 5.2

Proof

Squaring (9) we get:

$$\begin{aligned} \Vert x_{k+1}-x_*\Vert ^2 = \Vert x_{k}-x_*\Vert ^2 -2\eta \langle x_k-x_*, Sx_k-e_k \rangle +\eta ^2\Vert Sx_k-e_k\Vert ^2. \end{aligned}$$
(22)

Let us now analyze the second and third term in (22).

  • Bound \(-2\eta \langle x_k-x_*, Sx_k-e_k \rangle \): We will upper-bound the resulting inner product terms. In order to do so, we use both the cocoercivity and the quasi-strong monotonicity of S, the former proven in ‘Appendix A.6,’ and the latter holding from Assumption 1. Since S is 1 / 2-cocoercive, we have that

    $$\begin{aligned} \langle x_k-x_*,Sx_k\rangle \ge \frac{1}{2}\Vert Sx_k\Vert ^2. \end{aligned}$$

    From the quasi-\(\nu \)-strong monotonicity of S, we have:

    $$\begin{aligned} \langle x_k-x_*,Sx_k\rangle \ge \nu \Vert x_k-x_*\Vert ^2. \end{aligned}$$

    Putting these two together, we get that

    $$\begin{aligned} -2\eta \langle x_k-x_*,Sx_k\rangle \le -\eta \nu {\mathbf {dist} }_k^2 -\frac{\eta }{2}\Vert Sx_k\Vert ^2. \end{aligned}$$
    (23)

    For the second inner product term involving the error, we can easily derive the bound

    $$\begin{aligned} 2\eta \langle x_k-x_*,e_k\rangle \le 2\eta {\mathbf {dist} }_k\Vert e_k\Vert . \end{aligned}$$
    (24)

    Equations (23) and (24) result in the bound

    $$\begin{aligned} -2\eta \langle x_k-x_*, Sx_k-e_k\rangle \le -\eta \nu {\mathbf {dist} }_k^2-\frac{\eta }{2}\Vert Sx_k\Vert ^2 + 2\eta {\mathbf {dist} }_k\Vert e_k\Vert . \end{aligned}$$
    (25)
  • Bound \(\eta ^2\Vert Sx_k-e_k\Vert ^2\): By developing the square, we have that

    $$\begin{aligned} \eta ^2\Vert Sx_k-e_k\Vert ^2&= \eta ^2 (\Vert Sx_k\Vert ^2 -2\langle Sx_k,e_k\rangle + \Vert e_k\Vert ^2). \end{aligned}$$
    (26)

    The inner product term in (26) can be bounded by employing Young’s inequalityFootnote 1 as follows:

    $$\begin{aligned} -\,2\langle Sx_k,e_k\rangle&\le 2\Vert Sx_k\Vert \Vert e_k\Vert \nonumber \\&\le 2\left( \frac{\delta }{2}\Vert Sx_k\Vert ^2+\frac{1}{2\delta }\Vert e_k\Vert ^2\right) \nonumber \\&= \delta \Vert Sx_k\Vert ^2 + \frac{1}{\delta }\Vert e_k\Vert ^2, \end{aligned}$$
    (27)

    for any \(\delta >0\). Putting together (26) and (27), we get the bound:

    $$\begin{aligned} \eta ^2\Vert Sx_k-e_k\Vert ^2 \le \eta ^2(1+\delta )\Vert Sx_k\Vert ^2 + \eta ^2\frac{(\delta +1)}{\delta }\Vert e_k\Vert ^2. \end{aligned}$$
    (28)

Using (25) and (28), inequality (22) can be written as

$$\begin{aligned} {\mathbf {dist} }_{k+1}^2\le & {} (1-\eta \nu ){\mathbf {dist} }_k^2 +\eta \left( -\frac{1}{2}+\eta (1+\delta )\right) \Vert Sx_k\Vert ^2\nonumber \\&+2\eta {\mathbf {dist} }_k\Vert e_k\Vert +\eta ^2\frac{(\delta +1)}{\delta }\Vert e_k\Vert ^2 . \end{aligned}$$
(29)

The second term in the sum can be eliminated by assuming that

$$\begin{aligned} -\frac{1}{2}+\eta (1+\delta )<0\Rightarrow \eta <\frac{1}{2(1+\delta )}, \end{aligned}$$
(30)

which gives rise to the inequality

$$\begin{aligned} {\mathbf {dist} }_{k+1}^2 \le (1-\eta \nu ){\mathbf {dist} }_k^2 +2\eta {\mathbf {dist} }_k\Vert e_k\Vert +\eta ^2\frac{(\delta +1)}{\delta }\Vert e_k\Vert ^2 . \end{aligned}$$
(31)

The complicating term on the right-hand side can be eliminated by using once more Young’s inequality, i.e.,

$$\begin{aligned} 2\eta {\mathbf {dist} }_k\Vert e_k\Vert&\le 2\eta \left( \frac{\epsilon }{2}{\mathbf {dist} }_k^2+\frac{1}{2\epsilon }\Vert e_k\Vert ^2\right) \\&= \eta \epsilon {\mathbf {dist} }_k^2+\frac{\eta }{\epsilon }\Vert e_k\Vert ^2. \end{aligned}$$

1.3 Proof of Lemma 5.3

Proof

We will bound the error term \(\Vert e_k\Vert \) componentwise. For some arbitrary \(i\in \{1,\ldots ,N\}\) and \(k\in \mathbb {N}\), we have from (13) that

$$\begin{aligned} \Vert e_k[i]\Vert \le \Vert d_k[i]\Vert + \beta (\Vert c_k^i[i]\Vert +\Vert b_k^i[i]\Vert ). \end{aligned}$$

Consequently,

$$\begin{aligned} \Vert e_k\Vert \le (1+\gamma L)N\underset{1\le i\le N}{\max }\Vert a_k^i\Vert + \beta N\underset{1\le i\le N}{\max }(\Vert c_k^i\Vert +\Vert b_k^i\Vert ). \end{aligned}$$
(32)

The first term can be recovered by using the 1 / L-cocoercivity of B (Assumption 1) in (11), as well as the inequality \(\Vert a_k^i[i]\Vert \le \Vert a_k^i\Vert \), while the second term follows from the inequality

$$\begin{aligned} \Vert c_k^i[i]\Vert +\Vert b_k^i[i]\Vert \le \Vert c_k^i\Vert +\Vert b_k^i\Vert . \end{aligned}$$
  • We want to bound the two summands of (32). Let us start with bounding \(\Vert a_k^i\Vert \), for which we have for all i:

    $$\begin{aligned} \underset{1\le i\le N}{\max }\Vert a_k^i\Vert&\le \sum _{m=k-2\tau }^{k-1}\Vert x_{m+1}-x_m\Vert \nonumber \\&= \eta \sum _{m=k-2\tau }^{k-1}\Vert e_m-Sx_m\Vert \nonumber \\&\le \eta \left( \sum _{m=k-2\tau }^{k-1}\Vert e_m\Vert +\sum _{m=k-2\tau }^{k-1}\Vert Sx_m\Vert \right) \nonumber \\&\le \eta \sum _{m=k-2\tau }^{k-1}(\Vert d_m\Vert + \beta \Vert c_m-b_m\Vert + \Vert Sx_m\Vert ). \end{aligned}$$
    (33)

    The first inequality follows from the definitions of \(a_k^i\) in (7), the first equality from (9), while the last two inequalities from the triangle inequality and (13).

  • Bound \(\Vert c_k-b_k\Vert \): Following the same process as in (33), we have that

    $$\begin{aligned} \underset{1\le i\le N}{\max }(\Vert c_k^i\Vert +\Vert b_k^i\Vert )&\le \sum _{m=k-3\tau }^{k-1}\Vert x_{m+1}-x_m\Vert + \sum _{m=k-2\tau }^{k-1}\Vert x_{m+1}-x_m\Vert \nonumber \\&\le \eta \left( \sum _{m=k-3\tau }^{k-1}(\Vert d_m\Vert + \beta \Vert c_m-b_m\Vert + \Vert Sx_m\Vert ) \right. \nonumber \\&\left. \quad +\,\sum _{m=k-2\tau }^{k-1}(\Vert d_m\Vert + \beta \Vert c_m-b_m\Vert + \Vert Sx_m\Vert )\right) . \end{aligned}$$
    (34)

Finally, (33), and (34) can be bounded by means of the quantity (\(\varSigma \)), and by substituting (33) to (32), the result follows.

1.4 Proof of Lemma 5.4

Proof

Let us start by bounding the quantities involved in (\(\varSigma \)), namely \(\Vert b_k\Vert \), \(\Vert c_k\Vert \) and \(\Vert d_k\Vert \) with respect to the maximum distance from the optimizer. The following inequalities hold:

$$\begin{aligned} \Vert b_k\Vert&\le N\underset{1\le i\le N}{\max }\Vert b_k^i[i]\Vert \le N\underset{1\le i\le N}{\max }\Vert b_k^i\Vert \nonumber \\ \Vert c_k\Vert&\le N\underset{1\le i\le N}{\max }\Vert c_k^i[i]\Vert \le N\underset{1\le i\le N}{\max }\Vert c_k^i\Vert \nonumber \\ \Vert d_k\Vert&\le (1+\gamma L)N\underset{1\le i\le N}{\max }\Vert a_k^i\Vert . \end{aligned}$$
(35)

Note, also, that \(\forall \;i=1,\ldots ,N\) and for \(l_i\in \{1,\ldots ,K\}\) holds

$$\begin{aligned} \Vert x_k-x_{k-l_i}\Vert&= \Vert x_k-x_*+x_*-x_{k-l_i}\Vert \\&\le \Vert x_k-x_*\Vert + \Vert x_*-x_{k-l_i}\Vert . \end{aligned}$$

Since the first inequality holds \(\forall \;i=1,\ldots ,N\), by denoting \(i_*={{\text { argmax}}}_{{i\in \{1,\ldots ,N\}}}\Vert x_k-x_{k-l_i}\Vert \), we get

$$\begin{aligned} \underset{1\le i\le N}{\max }\Vert x_k-x_{k-l_i}\Vert&= \Vert x_k-x_{k-l_{i_*}}\Vert \\&\le (\Vert x_k-x_*\Vert + \Vert x_*-x_{k-l_{i_*}}\Vert ) . \end{aligned}$$

From the definition of \(a_k^i\) in (7), we have that

$$\begin{aligned} \underset{1\le i\le N}{\max }\Vert a_k^i\Vert&\le \underset{1\le i\le N}{\max }\Vert x_k-x_{k-l_i}\Vert ,\quad l_i\in \{1,\ldots ,2\tau \} \nonumber \\&\le \left( \Vert x_k-x_*\Vert + \Vert x_*-x_{k-l_{i_*}}\Vert \right) \nonumber \\&\le \left( \Vert x_k-x_*\Vert + \underset{k-2\tau \le m \le k-1}{\max }{\mathbf {dist} }_m\right) \nonumber \\&\le 2\underset{k-2\tau \le m \le k}{\max }{\mathbf {dist} }_m, \end{aligned}$$
(36)

for some \(l_i\in \{1,\ldots ,2\tau \}\).

Substituting in (35), and following developments similar to (36), we conclude that

$$\begin{aligned} \Vert b_k\Vert&\le 2N\underset{k-2\tau \le m \le k}{\max }{\mathbf {dist} }_m\nonumber \\ \Vert c_k\Vert&\le 2N\underset{k-3\tau \le m \le k}{\max }{\mathbf {dist} }_m \nonumber \\ \Vert d_k\Vert&\le 2(1+\gamma L)N\underset{k-2\tau \le m \le k}{\max }{\mathbf {dist} }_m \end{aligned}$$
(37)

Using (37), the sums can be easily bounded as shown below.

$$\begin{aligned} \sum _{m=k-K}^{k-1}\Vert a_m\Vert&\le 2NK\underset{k-K-2\tau \le j \le k-1}{\max }{\mathbf {dist} }_j \nonumber \\ \sum _{m=k-K}^{k-1}\Vert b_m\Vert&\le 2NK\underset{k-K-2\tau \le j \le k-1}{\max }{\mathbf {dist} }_j \nonumber \\ \sum _{m=k-K}^{k-1}\Vert c_m\Vert&\le 2NK\underset{k-K-3\tau \le j \le k-1}{\max }{\mathbf {dist} }_j \nonumber \\ \sum _{m=k-K}^{k-1}\Vert d_m\Vert&\le 2(1+\gamma L)NK\underset{k-K-2\tau \le j \le k-1}{\max }{\mathbf {dist} }_j \nonumber \\ \sum _{m=k-K}^{k-1}\Vert Sx_m\Vert&\le 2K\underset{k-K \le j \le k-1}{\max }{\mathbf {dist} }_j, \end{aligned}$$
(38)

the last inequality following from Corollary 4.1.

From the definition of \(\varSigma _K(k)\) in (\(\varSigma \)) and from (38), by introducing

$$\begin{aligned} Y:=1+\gamma L+2\beta , \end{aligned}$$

we have that

$$\begin{aligned} \varSigma _K(k)&\le \sum _{m=k-K}^{k-1}(\Vert d_m\Vert + \beta \Vert c_m\Vert + \beta \Vert b_m\Vert + \Vert Sx_m\Vert ) \\&\le \underbrace{2K(YN+1)}_{W(K)}\underset{k-K-3\tau \le j \le k-1}{\max }{\mathbf {dist} }_j \end{aligned}$$

Since (\(\varSigma \)) is bounded, we can accordingly bound (13a) and (13b):

$$\begin{aligned} \Vert c_k-b_k\Vert&\le \eta N2\varSigma _{3\tau }(k) \nonumber \\&\le \eta N2W(3\tau )\underset{k-6\tau \le j \le k-1}{\max }{\mathbf {dist} }_j \nonumber \\&= \eta 2(3\tau )N(YN+1) \underset{k-6\tau \le j \le k-1}{\max }{\mathbf {dist} }_j \end{aligned}$$
(39a)
$$\begin{aligned} \Vert d_k\Vert&\le \eta N(1+\gamma L)\varSigma _{2\tau }(k) \nonumber \\&\le \eta N(1+\gamma L)W(2\tau )\underset{k-5\tau \le j \le k-1}{\max }{\mathbf {dist} }_j \nonumber \\&= \eta (1+\gamma L)4\tau N(YN+1)\underset{k-5\tau \le j \le k-1}{\max }{\mathbf {dist} }_j. \end{aligned}$$
(39b)

Using (39a) and (39b), \(\Vert e_k\Vert \) from (13) can be bounded as

$$\begin{aligned} \Vert e_k\Vert&\le \Vert d_k\Vert + \beta \Vert c_k-b_k\Vert \nonumber \\&\le \eta \underbrace{N(YN+1)(4\tau (1+\gamma L)+6\beta \tau )}_{X}\underset{k-6\tau \le j \le k-1}{\max }{\mathbf {dist} }_j, \end{aligned}$$
(40)

where we bounded the quantities with the maximum delay that appeared in (39a) and (39b).

1.5 Proof of Theorem 5.1

Proof

Note that, (16) simplifies to

$$\begin{aligned} \eta ^2X^2\left( \frac{1}{\epsilon }+\frac{\eta (1+\delta )}{\delta }\right) < \nu -\epsilon . \end{aligned}$$

As a result, we need to find parameters \(\eta ,\beta ,\gamma ,\delta ,\epsilon \) such that the following set of inequalities is satisfied:

$$\begin{aligned}&\eta ^2X^2\left( \frac{1}{\epsilon }+\frac{\eta (1+\delta )}{\delta }\right)< \nu -\epsilon ,\nonumber \\&Y = 1+\gamma L+2\beta ,\nonumber \\&X = N(YN+1)(4\tau (1+\gamma L)+6\beta \tau ),\nonumber \\&\delta>0,\nonumber \\&\epsilon>0,\nonumber \\&\beta >0,\nonumber \\&0<\gamma<\gamma _{\max },\nonumber \\&0<\rho <\frac{1}{2(1+\delta )}. \end{aligned}$$
(41)

The upper-bound \(\gamma _{\max }\) ensures that the stepsize \(\gamma \) is admissible (a possible option is, e.g., \(\gamma _{\max }=2/L\) as proven in ‘Appendix A.6’). We start by noting that the values of \(\delta \) and \(\epsilon \) are irrelevant as long as they are positive. To this end, we can start by choosing \(\epsilon \) such that \(\nu -\epsilon >0\). From the inequality \(\eta <1/(2(1+\delta ))\), it follows that

$$\begin{aligned} \frac{1}{\epsilon } + \frac{\eta (1+\delta )}{\delta } < \frac{2\delta +\epsilon }{2\delta \epsilon }, \end{aligned}$$

thus having

$$\begin{aligned} \eta ^2 < \frac{2\delta \epsilon (\nu -\epsilon )}{X^2(2\delta +\epsilon )}, \end{aligned}$$

from which the result follows. \(\square \)

1.6 Proof of Corollary 4.1.

Proof

  • From Assumption 1, B is 1 / L-cocoercive, which means that \(\gamma B\) is \(1/\gamma L\)-cocoercive. It follows from [24, Proposition 4.13] that \(T_B=I-\gamma B\) is \(\gamma L/2\)-averaged. From [7, Proposition 4.25 (i)], it follows that \(T_B\) is nonexpansive provided that \(\gamma <2/L\). Finally, we conclude that T is nonexpansive as the composition of nonexpansive operators.

  • From [7, Proposition 4.33], we have that S is 1 / 2-cocoercive if and only if T is nonexpansive, which is proven above.

  • The claim is proven in [4, Proposition 2] for the case of the proximal gradient method. The proof below is essentially the same generalized for an operator T. From [7, Example 22.5], we have that if T is c-Lipschitz continuous for some \(c\in [0,1)\) then \(I-T\) is \((1-c)\)-strongly monotone. Let us then prove that T is indeed Lipschitz continuous. For any \(x\in \mathcal {H}\) and \(x_*\in {\text {fix}} T\), it holds that:

    $$\begin{aligned} \Vert T_Bx-T_Bx_*\Vert ^2&= \Vert x-x_*\Vert ^2-2\gamma \langle x-x_*,Bx-Bx_*\rangle +\gamma ^2\Vert Bx-Bx_*\Vert ^2 \nonumber \\&\le \Vert x-x_*\Vert ^2-\gamma (2-\gamma L)\langle x-x_*,Bx-Bx_*\rangle \nonumber \\&\le \Vert x-x_*\Vert ^2-\mu \gamma (2-\gamma L)\Vert x-x_*\Vert ^2 \nonumber \\&= (1-2\gamma \mu +\mu \gamma ^2L)\Vert x-x_*\Vert ^2, \end{aligned}$$

    where the first inequality follows from the 1 / L-cocoercivity of B, while the second one from the \(\mu \)-strong monotonicity of B.

    Thus, \(\Vert T_Bx-T_Bx_*\Vert \le \sqrt{(1-2\gamma \mu +\mu \gamma ^2L)}\Vert x-x_*\Vert \) and since \(T_A\) is nonexpansive, we have that \(\Vert Tx-Tx_*\Vert \le \sqrt{(1-2\gamma \mu +\mu \gamma ^2L)}\Vert x-x_*\Vert \). Finally, S is quasi-\(\nu \)-strongly monotone with \(\nu =1-\sqrt{(1-2\gamma \mu +\mu \gamma ^2L)}\) for \(\gamma <2/L\).

\(\square \)

1.7 Proof of Lemma 6.1.

Proof

Since \(T_A=\mathbf{prox}_{\gamma g}\), it is firmly nonexpansive from [7, Proposition 12.27], and 1 / 2-averaged from [7, Proposition 4.2]. In addition, and as we saw in ‘Appendix A.6,’ \(T_B=I-\gamma \nabla f\) is \(\gamma L/2\)-averaged from [24, Proposition 4.13]. Since \(T=T_AT_B\), the composition is \(\alpha \)-averaged from [7, Proposition 4.24], with \(\alpha =\max \left\{ \frac{2}{3},\frac{2}{1+\frac{2}{\gamma L}}\right\} \). Making use of [7, Proposition 4.25 (iii)], we have that for all \(x\in {\mathbb {R} }^n,y\in {\mathbb {R} }^n\):

$$\begin{aligned}&\frac{1-\alpha }{\alpha }\Vert Sx-Sy\Vert ^2 +\Vert Tx-Ty\Vert ^2 \le \Vert x-y\Vert ^2 \\&(1-\alpha )\Vert Sx-Sy\Vert ^2 +\alpha \Vert (I-S)x-(I-S)y\Vert ^2 \le \alpha \Vert x-y\Vert ^2 \\&(1-\alpha )\Vert Sx-Sy\Vert ^2 +\alpha (\Vert x-y\Vert ^2-2\langle x-y,Sx-Sy\rangle +\Vert Sx-Sy\Vert ^2) \le \alpha \Vert x-y\Vert ^2 \end{aligned}$$

After performing the simplifications, we end up having

$$\begin{aligned} \langle x-y,Sx-Sy\rangle \ge \frac{1}{2\alpha }\Vert Sx-Sy\Vert ^2, \end{aligned}$$

which concludes the proof. \(\square \)

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Stathopoulos, G., Jones, C.N. An Inertial Parallel and Asynchronous Forward–Backward Iteration for Distributed Convex Optimization. J Optim Theory Appl 182, 1088–1119 (2019). https://doi.org/10.1007/s10957-019-01542-7

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10957-019-01542-7

Keywords

Mathematics Subject Classification

Navigation