Skip to main content
Log in

Inertial proximal incremental aggregated gradient method with linear convergence guarantees

  • Original Article
  • Published:
Mathematical Methods of Operations Research Aims and scope Submit manuscript

Abstract

In this paper, we propose an inertial version of the Proximal Incremental Aggregated Gradient (abbreviated by iPIAG) method for minimizing the sum of smooth convex component functions and a possibly nonsmooth convex regularization function. First, we prove that iPIAG converges linearly under the gradient Lipschitz continuity and the strong convexity, along with an upper bound estimation of the inertial parameter. Then, by employing the recent Lyapunov-function-based method, we derive a weaker linear convergence guarantee, which replaces the strong convexity by the quadratic growth condition. At last, we present two numerical tests to illustrate that iPIAG outperforms the original PIAG.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9

Similar content being viewed by others

Notes

  1. https://github.com/tiepvupsu/FISTA.git

References

  • Aytekin A (2019) Asynchronous first-order algorithms for large-scale optimization: analysis and implementation. PhD thesis, KTH Royal Institute of Technology,

  • Aytekin A, Feyzmahdavian HR, Johansson M (2016) Analysis and implementation of an asynchronous optimization algorithm for the parameter server. arXiv preprint arXiv:1610.05507

  • Beck A (2017) First-order methods in optimization. SIAM

  • Beck A, Shtern S (2017) Linearly convergent away-step conditional gradient for non-strongly convex functions. Math Program 164(1–2):1–27

    Article  MathSciNet  Google Scholar 

  • Beck A, Teboulle M (2009) A fast iterative shrinkage-thresholding algorithm for linear inverse problems. SIAM J Imag Sci 2(1):183–202

    Article  MathSciNet  Google Scholar 

  • Bolte J, Nguyen TP, Peypouquet J, Suter BW (2017) From error bounds to the complexity of first-order descent methods for convex functions. Math Program 165(2):471–507

    Article  MathSciNet  Google Scholar 

  • Chretien S (2010) An alternating \( \ell _1 \) approach to the compressed sensing problem. IEEE Signal Process Lett 17(2):181–184

  • Combettes PL, Glaudin LE (2017) Quasi-nonexpansive iterations on the affine hull of orbits: from mann’s mean value algorithm to inertial methods. SIAM J Optim 27(4):2356–2380

    Article  MathSciNet  Google Scholar 

  • Dn Blatt, Hero AO, Gauchman H (2007) A convergent incremental gradient method with a constant step size. SIAM J Optim 18(1):29–51

    Article  MathSciNet  Google Scholar 

  • Drusvyatskiy D, Lewis AS (2013) Tilt stability, uniform quadratic growth, and strong metric regularity of the subdifferential. SIAM J Optim 23(1):256–267

    Article  MathSciNet  Google Scholar 

  • Drusvyatskiy D, Lewis AS (2018) Error bounds, quadratic growth, and linear convergence of proximal methods. Math Oper Res 43(3):919–948

    Article  MathSciNet  Google Scholar 

  • Felipe A, Hedy A (2001) An inertial proximal method for maximal monotone operators via discretization of a nonlinear oscillator with damping. Set-Valued Anal 9(1):3–11

    MathSciNet  MATH  Google Scholar 

  • Feyzmahdavian HR, Aytekin A and Johansson M (2014) A delayed proximal gradient method with linear convergence rate. In: 2014 IEEE international workshop on machine learning for signal processing (MLSP), pp 1–6. IEEE

  • Gurbuzbalaban M, Ozdaglar A, Parrilo PA (2017) On the convergence rate of incremental aggregated gradient algorithms. SIAM J Optim 27(2):1035–1048

    Article  MathSciNet  Google Scholar 

  • Hale ET, Yin W and Zhang Z (2007) A fixed-point continuation method for \( \ell _1 \)-regularized minimization with applications to compressed sensing. CAAM TR07-07, Rice University, 43:44

  • Hoffman AJ (1952) On approximate solutions of systems of linear inequalities. J Res Natl Bur Stand 49(4):263–265

    Article  MathSciNet  Google Scholar 

  • Jia Z, Huang J and Cai X (2021) Proximal-like incremental aggregated gradient method with bregman distance in weakly convex optimization problems. J Global Optim, 1–24

  • Jingwei L, Jalal F, Gabriel P (2016) A multi-step inertial forward-backward splitting method for non-convex optimization. In: Advances in neural information processing systems, pp 4035–4043

  • Johnstone PR, Moulin P (2017) Local and global convergence of a general inertial proximal splitting scheme for minimizing composite functions. Comput Optim Appl 67(2):259–292

    Article  MathSciNet  Google Scholar 

  • László SC (2021) Convergence rates for an inertial algorithm of gradient type associated to a smooth non-convex minimization. Math Program 190(1):285–329

    Article  MathSciNet  Google Scholar 

  • Latafat P, Themelis A, Ahookhosh M and Patrinos P (2021) Bregman Finito/MISO for nonconvex regularized finite sum minimization without lipschitz gradient continuity. arXiv preprint arXiv:2102.10312

  • Li G, Pong TK (2018) Calculus of the exponent of kurdyka-łojasiewicz inequality and its applications to linear convergence of first-order methods. Found Comput Math 18(5):1199–1232

    Article  MathSciNet  Google Scholar 

  • Liu Yuncheng, Xia Fuquan (2021) Variable smoothing incremental aggregated gradient method for nonsmooth nonconvex regularized optimization. Optimization Letters, pages 1–18

  • Li M, Zhou L, Yang Z, Li A, Xia F, Andersen DG and Smola A (2013) Parameter server for distributed machine learning. In: Big Learning NIPS Workshop, 6, pp 2

  • Łojasiewicz S (1959) Sur le problème de la division. Studia Math 18:87–136

  • Łojasiewicz S (1958) Division d’une distribution par une fonction analytiquede variables réelles. Comptes Rendus Hebdomadaires Des Seances de l Academie Des Sciences 246(5):683–686

    MathSciNet  MATH  Google Scholar 

  • Meier L, Geer SV, Bühlmann P (2008) The group lasso for logistic regression. J Royal Stat Soc: Ser B (Stat Methodol) 70(1):53–71

    Article  MathSciNet  Google Scholar 

  • Necoara I, Nesterov Y, Glineur F (2019) Linear convergence of first order methods for non-strongly convex optimization. Math Program 175(1):69–107

    Article  MathSciNet  Google Scholar 

  • Nesterov Y (2013) Gradient methods for minimizing composite functions. Math Program 140(1):125–161

    Article  MathSciNet  Google Scholar 

  • Ochs P (2018) Local convergence of the heavy-ball method and ipiano for non-convex optimization. J Optim Theory Appl 177(1):153–180

    Article  MathSciNet  Google Scholar 

  • Ochs P, Brox T, Pock T (2015) ipiasco: inertial proximal algorithm for strongly convex optimization. J Math Imag Vision 53(2):171–181

    Article  MathSciNet  Google Scholar 

  • Parikh N, Boyd S (2014) Proximal algorithms. Found Trends® Optim 1(3):127–239

    Article  Google Scholar 

  • Peng CJ, Lee KL, Ingersoll GM (2002) An introduction to logistic regression analysis and reporting. J Educ Res 96(1):3–14

    Article  Google Scholar 

  • Peng W, Zhang H, Zhang X (2019) Nonconvex proximal incremental aggregated gradient method with linear convergence. J Optim Theory Appl 183(1):230–245

    Article  MathSciNet  Google Scholar 

  • Pock T, Sabach S (2016) Inertial proximal alternating linearized minimization (iPALM) for nonconvex and nonsmooth problems. SIAM J Imag Sci 9(4):1756–1787

    Article  MathSciNet  Google Scholar 

  • Polyak BT (1964) Some methods of speeding up the convergence of iteration methods. USSR Comput Math Math Phys 4(5):1–17

    Article  Google Scholar 

  • Rockafellar R (1970) On the maximal monotonicity of subdifferential mappings. Pacific J Math 33(1):209–216

    Article  MathSciNet  Google Scholar 

  • Scheinberg K, Goldfarb D, Bai X (2014) Fast first-order methods for composite convex optimization with backtracking. Found Comput Math 14(3):389–417

    Article  MathSciNet  Google Scholar 

  • Simon N, Friedman J, Hastie T, Tibshirani R (2013) A sparse-group lasso. J Comput Graphical Stat 22(2):231–245

    Article  MathSciNet  Google Scholar 

  • Vanli DN, Gurbuzbalaban M, Ozdaglar A (2018) Global convergence rate of proximal incremental aggregated gradient methods. SIAM J Optim 28(2):1282–1300

    Article  MathSciNet  Google Scholar 

  • Wen B, Chen X, Pong TK (2017) Linear convergence of proximal gradient algorithm with extrapolation for a class of nonconvex nonsmooth minimization problems. SIAM J Optim 27(1):124–145

    Article  MathSciNet  Google Scholar 

  • Yang Q, Liu Y, Chen T, Tong Y (2019) Federated machine learning: concept and applications. ACM Trans Intell Syst Technol (TIST) 10(2):1–19

    Article  Google Scholar 

  • Yu P, Li G, K PT (2021) Kurdyka-Łojasiewicz exponent via inf-projection. Found Comput Math, pp 1–47

  • Yurii N (2013) Introductory lectures on convex optimization: a basic course, volume 87. Springer Science & Business Media

  • Zhang H (2020) New analysis of linear convergence of gradient-type methods via unifying error bound conditions. Math Program 180(1):371–416

    Article  MathSciNet  Google Scholar 

  • Zhang H, Dai Y, Guo L, Peng W (2021) Proximal-like incremental aggregated gradient method with linear convergence under Bregman distance growth conditions. Math Oper Res 46(1):61–81

    Article  MathSciNet  Google Scholar 

Download references

Acknowledgements

We are really grateful to the anonymous referees and the associate editor for many useful comments, which allowed us to significantly improve the original presentation. This work is supported by the National Science Foundation of China (No.11971480), the Natural Science Fund of Hunan for Excellent Youth (No.2020JJ3038), and the Fund for NUDT Young Innovator Awards (No. 20190105).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Hui Zhang.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendix: Proofs of Theorems and Lemmas

Appendix: Proofs of Theorems and Lemmas

Proof of Theorem 1. From Lemma 2, for all \( k \ge 0\) we have

$$\begin{aligned} F(x_{k+1}) - F(x_k)&\le -\frac{1}{2\alpha }\Vert x_{k+1} -x_k\Vert ^2 +\frac{\beta }{\alpha }\langle x_k - x_{k-1},x_{k+1}-x_k\rangle + \\&\frac{L}{2} \sum \limits _{j=(k-\tau )_+}^{k-1}\Vert x_{j+1} - x_j\Vert ^2 . \end{aligned}$$

By combining the inequality above with Lemma 3, we get

$$\begin{aligned} (1+ \frac{\sigma \alpha }{8(2+\beta )}) (F(x_{k+1}) -F^{*})&\le (F(x_{k}) -F^{*}) \\&- \frac{1-2\beta }{4\alpha }\Vert x_{k+1}-x_{k}\Vert ^2 + \frac{3\beta }{4\alpha }\Vert x_k-x_{k-1}\Vert ^2 \\&+ \frac{3L}{4} \sum \limits _{j=(k-\tau )_+}^{k-1}\Vert x_{j+1} - x_j\Vert ^2, \forall k \ge 0. \end{aligned}$$

In order to apply Lemma 1, let \(V_k:=F(x_{k})-F^{*}\), \(\omega _k:=\Vert x_{k+1}-x_{k}\Vert ^2\), \(a := 1/(1+ \frac{\sigma \alpha }{8(2+\beta )}), \alpha _1 := 1, \alpha _2 := 0, b_1 := (\frac{1-2\beta }{4\alpha })a, b_2 := \frac{3\beta }{4\alpha }a, c := \frac{3L}{4}a, k_0 := \tau .\) Making this setting satisfy the required conditions of Lemma 1, we need the following inequalities to hold:

$$\begin{aligned} b_1-\frac{b_2}{a}>0, b_1>0 \end{aligned}$$

and

$$\begin{aligned} \frac{c}{1-a}\frac{1-a^{k_0+1}}{a^{k_0}} \le b_1 -\frac{b_2}{a} . \end{aligned}$$

The first condition could be guaranteed by letting \(\beta < \min \left\{ \frac{16}{83}, \frac{1}{2} \right\} = \frac{16}{83} \), since

$$\begin{aligned} \frac{b_2}{a} = \frac{3\beta }{4\alpha }(1+ \frac{\sigma \alpha }{8(2+\beta )})a \le \frac{3\beta }{4\alpha } \left( 1 + \frac{\alpha L}{16}\right) a \le \frac{3\beta }{4\alpha } \left( 1 + \frac{1}{16}\right) a\le b_1, \end{aligned}$$

according to \(\alpha L \in (0,1]\) and \(\beta \in [0,1)\). The second condition is guaranteed by

$$\begin{aligned} \alpha \le \frac{8(2+\beta )}{\sigma }\left[ \left( \frac{1 - \frac{83}{16}\beta }{(24(2+\beta )Q} +1\right) ^{\frac{1}{\tau +1}} -1 \right] . \end{aligned}$$

In fact, with this bound (guarantee of the following first inequality), we can derive that

$$\begin{aligned} \frac{c}{1-a}\frac{1-a^{k_0+1}}{a^{k_0}}&=\left[ \left( \frac{\alpha \sigma }{8(2+\beta )} + 1\right) ^{\tau +1} -1\right] \frac{6(\beta +2)L}{\alpha \sigma (1+ \frac{\alpha \sigma }{8(2+\beta )} )} \\&\le \frac{1 - \frac{83}{16}\beta }{4\alpha (1+ \frac{\alpha \sigma }{8(2+\beta )} )}\\&\le \frac{1- 2\beta }{4\alpha (1+ \frac{\alpha \sigma }{8(2+\beta )} )} - \frac{3\beta }{4\alpha }, \\&= b_1 -\frac{b_2}{a}, \end{aligned}$$

where the second inequality follows from \(\beta \ge 0\) and \(\alpha \sigma \le 1\). Hence, the second condition holds as well. Therefore, the claimed convergence follows from Lemma 1.

Proof of Theorem 2. Since \({\mathcal {X}}\) is a nonempty closed convex set, the projection point of z onto \({\mathcal {X}}\) is unique, denoted by \(z^{*}\). Note that \(F(x_k^{*})=F^{*}\). According to Lemma 4, we obtain

$$\begin{aligned} F(x_{k+1}) - F^{*} + \frac{1-\beta }{2\alpha }\Vert x_{k+1}-x_k^{*} \Vert ^2 \le&\frac{1}{2\alpha }\Vert x_k^{*} - x_{k}\Vert ^2 \nonumber \\&- \frac{1}{2\alpha }\Vert x_{k+1}-x_k\Vert ^2+ \frac{\beta }{2\alpha }\Vert x_{k}-x_{k-1}\Vert ^2 \nonumber \\&+ \frac{L(\tau +1)}{2}\sum _{j=(k-\tau )_+}^k\Vert x_{j+1}-x_{j}\Vert ^2, \forall k \ge 0 . \end{aligned}$$
(12)

Since \(x_k^{*}\in {\mathcal {X}}\), by the definition of projection, it holds that

$$\begin{aligned} \Vert x_k^{*}-x_{k+1}\Vert ^2 \ge \Vert x_{k+1}^{*}-x_{k+1}\Vert ^2 = d^2(x_{k+1}, {\mathcal {X}}), \forall k \ge 0. \end{aligned}$$

Now, in terms of the expression of the Lyapunov function \(\Psi \), we have

$$\begin{aligned} \Psi (x_{k+1})&\le \frac{1}{2\alpha }\Vert x_k^{*}-x_{k}\Vert ^2 -\frac{1}{2\alpha }\Vert x_{k+1}-x_{k} \Vert ^2\nonumber \\&+ \frac{\beta }{2\alpha }\Vert x_k-x_{k-1}\Vert ^2+ \frac{L(\tau +1)}{2}\sum _{j=(k-\tau )_+}^k\Vert x_{j+1}-x_{j}\Vert ^2, \forall k \ge 0. \end{aligned}$$
(13)

By using the quadratic growth condition, we obtain

$$\begin{aligned} \Vert x_k^{*}-x_k\Vert ^2 = d^2(x_{k}, {\mathcal {X}}) \le \frac{2}{\mu }(F(x_{k}) - F^{*}), \forall k \ge 0 \end{aligned}$$

and hence

$$\begin{aligned} \Vert x_k^{*}-x_{k}\Vert ^2 \le p\Vert x_k^{*}-x_{k}\Vert ^2 + \frac{2q}{\mu }(F(x_k)-F^{*}),\forall k \ge 0, \end{aligned}$$
(14)

with \(p+q=1, p, q \ge 0\). Picking \(p=\frac{1-\beta }{\alpha \mu +1-\beta },q=\frac{\alpha \mu }{\alpha \mu +1-\beta }\) and combining (13) and (14), we obtain

$$\begin{aligned} \Psi (x_{k+1})&\le \frac{1}{\alpha \mu +1-\beta }\Psi (x_k) -\frac{1}{2\alpha }\Vert x_{k+1}-x_{k} \Vert ^2+\frac{\beta }{2\alpha }\Vert x_k-x_{k-1}\Vert ^2 \nonumber \\&~~~~+ \frac{L(\tau +1)}{2}\sum _{j=(k-\tau )_+}^k\Vert x_{j+1}-x_{j}\Vert ^2, \forall k \ge 0. \end{aligned}$$
(15)

In order to apply Lemma 1, let \(V_k=\Psi (x_k)\), \(\omega _k=\Vert x_{k+1}-x_{k}\Vert ^2\), \(a = \frac{1}{\alpha \mu +1-\beta }, \alpha _1 := 1, \alpha _2 := 0, b_1= \frac{1}{2\alpha }\), \(b_2 = \frac{\beta }{2\alpha }\), \(c=\frac{L(\tau +1)}{2}\), \(k_0=\tau \); we need the parameters satisfy (5) and (6), that is

$$\begin{aligned} \left\{ \begin{aligned}&0< a < 1, \\&\frac{b_2}{a} \le b_1, \\&\frac{c}{1-a} \frac{1-a^{k_0+1}}{a^{k_0}} \le b_1-\frac{b_2}{a}. \end{aligned} \right. \end{aligned}$$
(16)

Since \(\alpha \le \frac{1}{L}\), if the parameters satisfy the following conditions

$$\begin{aligned}&0 \le \rho <1, \\&\alpha \le \min \left( \left[ \left( \frac{(1-\rho ) \mu r(\rho )}{L(\tau +1)} + 1 \right) ^{\frac{1}{\tau +1}} - 1 \right] /\mu , \frac{1}{L}\right) , \\&r(\rho ) := 1- \rho \frac{\mu }{L} - \rho (1-\rho )\frac{\mu ^2}{L^2}, \\&0 \le \beta \le \rho \alpha \mu , \end{aligned}$$

then we have that \(0< a < 1\), \(\frac{b_2}{a} \le b_1\) and

$$\begin{aligned} \frac{c}{1-a} \frac{1-a^{k_0+1}}{a^{k_0}}&=\frac{L(\tau +1)}{2} \frac{(\alpha \mu +1-\beta )^{\tau +1}-1}{\alpha \mu -\beta } \\&\le \frac{L(\tau +1)}{2} \frac{(\alpha \mu +1)^{\tau +1}-1}{(1-\rho )\alpha \mu } \\&\le \frac{1}{2\alpha } - \frac{\rho \alpha \mu }{2\alpha }(1+ \alpha \mu -\rho \alpha \mu ) \\&\le \frac{1}{2\alpha } - \frac{\beta }{2\alpha }(\alpha \mu +1-\beta )\\&=b_1-\frac{b_2}{a}. \end{aligned}$$

This indicates that (16) holds. Therefore, from Lemma 1. \(\Psi (x_k)\) converges linearly in the sense of (9). The results (10) and (11) directly follow from the definition of \(\Psi (x)\) and (9).

Proof of Lemma 1. Note that \(A=\alpha _1 a\) and \(B=\alpha _2 a^2\) in (5). The inequality (4) can be rewritten as:

$$\begin{aligned} V_{k+1}\le \alpha _1aV_k+\alpha _2a^2V_{k-1}-b_1\omega _k+b_2\omega _{k-1}+c\sum _{j=k-k_0}^k\omega _j. \end{aligned}$$
(17)

By dividing both sides of (17) by \(a^{k+1}\) and summing the resulting inequality up from \(k=1\) to \(K(K \ge 1)\), we derive that

$$\begin{aligned} \sum _{k=1}^K \frac{V_{k+1}}{a^{k+1}} \le&\alpha _1\sum _{k=1}^K \frac{V_{k}}{a^k}+\alpha _2\sum _{k=0}^{K-1}\frac{V_{k}}{a^{k}}-b_1\sum _{k=1}^K\frac{\omega _k}{a^{k+1}} \nonumber \\&+\frac{b_2}{a}\sum _{k=0}^{K-1}\frac{\omega _{k}}{a^{k+1}}+\sum _{k=1}^{K}\left[ \frac{c}{a^{k+1}}\sum _{j=k-k_0}^k\omega _j\right] \nonumber \\&\le \sum _{k=0}^K\frac{V_k}{a^k}-\left( b_1-\frac{b_2}{a}\right) \sum _{k=0}^K\frac{\omega _k}{a^{k+1}}\nonumber \\&+\sum _{k=0}^{K}\left[ \frac{c}{a^{k+1}}\sum _{j=k-k_0}^k\omega _j\right] +\frac{b_1\omega _0}{a} . \end{aligned}$$
(18)

Since \(w_k=0(k<0), w_k \ge 0(k \ge 0)\) and \(a>0\), we get

$$\begin{aligned} \sum _{k=0}^{K}\left[ \frac{c}{a^{k+1}}\sum _{j=k-k_0}^k\omega _j\right]&\le \frac{c}{a}\left( \frac{1}{a^{-k_0}} + \frac{1}{a^{-k_0+1}}+ \cdots + \frac{1}{a^{0}} \right) w_{-k_0} \nonumber \\&~~~~+ \frac{c}{a}\left( \frac{1}{a^{-k_0+1}} + \frac{1}{a^{-k_0+2}} + \cdots + \frac{1}{a^{1}} \right) w_{-k_0+1} \nonumber \\&~~~~ + \cdots + \frac{c}{a}\left( \frac{1}{a^{-k_0+K}} + \frac{1}{a^{-k_0+K+1}}+ \cdots + \frac{1}{a^{K}} \right) w_{-k_0+K}\nonumber \\&~~~~ + \cdots + \frac{c}{a}\left( \frac{1}{a^{K}} + \frac{1}{a^{K+1}} + \cdots + \frac{1}{a^{K+k_0}} \right) w_{K}\nonumber \\&\le \sum _{j=0}^{K+k_0}\frac{c}{a}\left( \frac{1}{a^{j-k_0}}+\dots + \frac{1}{a^{j}}\right) w_{j-k_0}\nonumber \\&= \sum _{k=-k_0}^{K}\frac{c}{a}\left( \frac{1}{a^{k}}+\dots + \frac{1}{a^{k+k_0}}\right) w_{k}\nonumber \\&= \sum _{k=0}^{K}\frac{c}{a}\left( \frac{1}{a^{k}}+\dots + \frac{1}{a^{k+k_0}}\right) w_{k}. \end{aligned}$$
(19)

Therefore, together (19) and (18), we have that

$$\begin{aligned}\sum _{k=1}^K \frac{V_{k+1}}{a^{k+1}}&\le \sum _{k=0}^K\frac{V_k}{a^k}+ \frac{b_1\omega _0}{a}+\sum _{k=0}^K\left[ c\left( 1+\frac{1}{a}+\cdots +\frac{1}{a^{k_0}}\right) -\left( b_1-\frac{b_2}{a}\right) \right] \frac{\omega _k}{a^{k+1}}\\&= \sum _{k=0}^K\frac{V_k}{a^k} + \frac{b_1\omega _0}{a}+ \sum _{k=0}^K\left[ \frac{c}{1-a}\frac{1-a^{k_0+1}}{a^{k_0}}-\left( b_1-\frac{b_2}{a}\right) \right] \frac{\omega _k}{a^{k+1}}. \end{aligned}$$

By condition (6), we obtain

$$\begin{aligned} \sum _{k=1}^K \frac{V_{k+1}}{a^{k+1}} \le \sum _{k=0}^K\frac{V_k}{a^k}+ \frac{b_1\omega _0}{a}, \end{aligned}$$
(20)

that is \(\frac{V_{K+1}}{a^{K+1}} \le V_0 + \frac{V_1}{a}+ \frac{b_1\omega _0}{a}\) for \(K \ge 1\). Besides, we know \(V_1 \le V_1+aV_0+b_1\omega _0\). Therefore, for \(\forall K\ge 1\), we have

$$\begin{aligned} V_{K}\le a^{K-1}\left( V_1+aV_0+b_1\omega _0\right) . \end{aligned}$$

This completes the proof.

Proof of Lemma 2. By using the L-gradient Lipschitz continuity of f, we have

$$\begin{aligned} f(x_{k+1}) - f(x_k)&\le \langle \nabla f(x_k) , x_{k+1} - x_k \rangle + \frac{L}{2} \Vert x_{k+1} - x_k\Vert ^2, \forall k \ge 0 . \end{aligned}$$
(21)

Together with the subgradient inequality of f,

$$\begin{aligned} f(x) - f(x_{k})&\ge \langle \nabla f(x_k) , x - x_k \rangle , \forall k \ge 0. \end{aligned}$$
(22)

we have

$$\begin{aligned} f(x_{k+1}) - f(x)&\le \langle \nabla f(x_k) , x_{k+1} - x\rangle + \frac{L}{2} \Vert x_{k+1} - x_k\Vert ^2, \forall k \ge 0. \end{aligned}$$
(23)

Since that \(x_{k+1}\) is the minimizer of the \(\frac{1}{\alpha }\)-strongly convex function:

$$\begin{aligned} z \mapsto h(z) + \frac{1}{2\alpha }\Vert z-(x_k+\beta (x_k-x_{k-1})-\alpha g_k)\Vert ^2, \end{aligned}$$

where \( g_k =\sum _{n=1}^N \nabla f_n(x_{k-\tau _{k}^n}) \). We have

$$\begin{aligned}&h(x_{k+1}) + \frac{1}{2\alpha }\Vert x_{k+1}-(x_k+\beta (x_k-x_{k-1})-\alpha g_k)\Vert ^2 \nonumber \\&\le h(x) + \frac{1}{2\alpha }\Vert x-(x_k+\beta (x_k-x_{k-1}) -\alpha g_k)\Vert ^2 - \frac{1}{2\alpha }\Vert x-x_{k+1}\Vert ^2, \forall k \ge 0. \end{aligned}$$
(24)

By combining (23) and (24), we derive that

$$\begin{aligned} F(x_{k+1}) - F(x)&\le \langle \nabla f(x_k) - g_k, x_{k+1} - x\rangle + \frac{1}{2\alpha }\Vert x - x_k\Vert ^2 - \frac{1}{2\alpha }\Vert x -x_{k+1}\Vert ^2 \nonumber \\&\qquad - \frac{1}{2\alpha }\Vert x_{k+1} - x_k\Vert ^2 +\frac{\beta }{\alpha }\langle x_k - x_{k-1},x_{k+1}-x\rangle \nonumber \\&\qquad + \frac{L}{2}\Vert x_{k+1} - x_k\Vert ^2 \le \Vert \nabla f(x_k) - g_k\Vert \Vert x_{k+1} - x\Vert \nonumber \\&\qquad + \frac{1}{2\alpha }\Vert x - x_k\Vert ^2 - \frac{1}{2\alpha }\Vert x -x_{k+1}\Vert ^2 - \frac{1}{2\alpha }\Vert x_{k+1} - x_k\Vert ^2 \nonumber \\&\qquad +\frac{\beta }{\alpha }\langle x_k - x_{k-1},x_{k+1}-x\rangle + \frac{L}{2}\Vert x_{k+1} - x_k\Vert ^2, \forall k \ge 0, \end{aligned}$$

where the inequality follows by the Cauchy-Schwartz inequality. Note that \(\Vert \nabla f(x_k) - g_k\Vert \le L \sum \limits _{j=(k-\tau )_+}^{k-1} \Vert x_{j+1} - x_j\Vert \). Then, by using the Cauchy-Schwartz inequality again, we have that

$$\begin{aligned} \Vert \nabla f(x_k) - g_k\Vert \Vert x_{k+1} - x\Vert&\le \frac{L}{2} \sum \limits _{j=(k-\tau )_+}^{k-1} (\Vert x_{j+1} - x_j\Vert ^2 + \Vert x_{k+1} - x\Vert ^2),\forall k \ge 0. \end{aligned}$$

Hence

$$\begin{aligned} F(x_{k+1}) - F(x)&\le \left( \frac{\tau L}{2} - \frac{1}{2\alpha }\right) \Vert x- x_{k+1}\Vert ^2 + \frac{1}{2\alpha } \Vert x - x_k\Vert ^2 \\&\qquad + \left( \frac{L}{2}-\frac{1}{2\alpha }\right) \Vert x_{k+1} -x_k\Vert ^2 \\&\qquad + \frac{\beta }{\alpha }\langle x_k - x_{k-1},x_{k+1}-x\rangle + \frac{L}{2} \sum \limits _{j=(k-\tau )_+}^{k-1}\Vert x_{j+1} - x_j\Vert ^2 \\&\le - \frac{L}{2} \Vert x- x_{k+1}\Vert ^2 + \frac{1}{2\alpha } \Vert x - x_k\Vert ^2 + \left( \frac{L}{2}-\frac{1}{2\alpha }\right) \Vert x_{k+1} -x_k\Vert ^2 \\&\qquad + \frac{\beta }{\alpha }\langle x_k - x_{k-1},x_{k+1}-x\rangle + \frac{L}{2} \sum \limits _{j=(k-\tau )_+}^{k-1}\Vert x_{j+1} - x_j\Vert ^2, \forall k \ge 0, \end{aligned}$$

where the last inequality uses the bound \(\alpha \le \frac{1}{(\tau +1)L}\). Put \(x = x_k\) , for \(\forall k \ge 0 \); then we have

$$\begin{aligned} F(x_{k+1})&- F(x_k) \le -\frac{1}{2\alpha } \Vert x_{k+1} - x_k\Vert ^2 +\frac{\beta }{\alpha }\langle x_k - x_{k-1},x_{k+1}-x_k\rangle \\&+ \frac{L}{2} \sum \limits _{j=(k-\tau )_+}^{k-1}\Vert x_{j+1} - x_j\Vert ^2 . \end{aligned}$$

Proof of Lemma 3. Denote \(x_{k+1}^\prime = \text {prox}_h^\alpha (x_k - \alpha \nabla f(x_k))\). By using the firm nonexpansivity of the proximal map (please refer to Theorem 6.42 (Beck 2017) for understanding the firm nonexpansivity) implies

$$\begin{aligned} \Vert \text {prox}_h^\alpha (x) - \text {prox}_h^\alpha (y)\Vert ^2 \le \langle \text {prox}_h^\alpha (x) - \text {prox}_h^\alpha (y), x-y\rangle . \end{aligned}$$
(25)

Take \(x=x_k -\alpha \nabla f(x_k)\) and \(y = x^{*} -\alpha \nabla f( x^{*})\) (obviously \( \text {prox}_h^\alpha (y) = x^{*}\)) into (25) to yield

$$\begin{aligned} \Vert x_{k+1}^\prime -x^{*} \Vert ^2&\le \langle x_{k+1}^\prime -x^{*} , x_k -\alpha \nabla f(x_k) - x^{*} + \alpha \nabla f( x^{*}) \rangle \\&= \langle x_{k+1}^\prime -x^{*}, x_{k+1}^\prime -x^{*} \rangle + \langle x_{k+1}^\prime -x^{*}, - x_{k+1}^\prime \\&\quad + x_k -\alpha \nabla f(x_k) + \alpha \nabla f( x^{*}) \rangle , \end{aligned}$$

which implies \( 0 \le \langle x_{k+1}^\prime -x^{*}, - x_{k+1}^\prime + x_k -\alpha \nabla f(x_k) + \alpha \nabla f( x^{*}) \rangle \) for all \( k \ge 0\). This inequality can be rewritten as follows:

$$\begin{aligned} \alpha \langle x_k -x^{*} , \nabla f(x_k) - \nabla f( x^{*})\rangle&\le \langle x_k -x^{*}, x_k -x_{k+1}^\prime \rangle - \Vert x_{k+1}^\prime -x_k \Vert ^2 \nonumber \\&~~~~~~~~+ \alpha \langle x_{k+1}^\prime -x_k ,\nabla f( x^{*})-\nabla f(x_k) \rangle \nonumber \\&\le \Vert x_{k+1}^\prime -x_k \Vert \left( \Vert x_k -x^{*}\Vert + \alpha \Vert \nabla f( x^{*})- \nabla f(x_k) \Vert \right) \nonumber \\\&\le \Vert x_{k+1}^\prime -x_k\Vert \left( \Vert x_k -x^{*}\Vert + \alpha L \Vert x_k -x^{*} \Vert \right) \ \nonumber \\&= (\alpha L +1)\Vert x_{k+1}^\prime -x_k \Vert \Vert x_k -x^{*}\Vert , \forall k \ge 0, \end{aligned}$$
(26)

where the second inequality is based on the negativeness of \(- \Vert x_{k+1}^\prime -x_k \Vert ^2 \) and the Cauchy-Schwartz inequality, the third inequality is according to the L-gradient Lipschitz continuity of f. Since \( \langle \nabla f(x_k) - \nabla f(x^{*}), x_k -x^{*} \rangle \ge \sigma \Vert x_k - x^{*}\Vert ^2\) due to the strong convexity of f and the condition \(\alpha L \le 1\), we obtain

$$\begin{aligned} \sigma \Vert x_k - x^{*}\Vert \le \frac{\alpha L+1}{\alpha }\Vert x_{k+1}^\prime -x_k \Vert \le \frac{2}{\alpha }\Vert x_{k+1}^\prime -x_k \Vert , \forall k \ge 0. \end{aligned}$$

Thus,

$$\begin{aligned} \frac{\sigma }{2}\Vert x_k - x^{*}\Vert&\le \frac{1}{\alpha }\Vert x_{k+1}^\prime - x_{k+1}\Vert + \frac{1}{\alpha }\Vert x_{k+1} - x_k\Vert \nonumber \\&= \frac{1}{\alpha }\Vert \text {prox}_h^\alpha (x_k - \alpha \nabla f(x_k)) - \text {prox}_h^\alpha (x_k - \alpha g_k + \beta (x_k - x_{k-1})) \Vert \nonumber \\&\quad + \frac{1}{\alpha }\Vert x_{k+1} - x_k\Vert \nonumber \\&\le \Vert \nabla f(x_k) - g_k + \frac{\beta }{\alpha }(x_k -x_{k-1})\Vert + \frac{1}{\alpha }\Vert x_{k+1} - x_k\Vert , \forall k \ge 0, \end{aligned}$$
(27)

where the second inequality is according to the nonexpansive property of the proximal map \(\Vert \text {prox}_h^\alpha (x) - \text {prox}_h^\alpha (y) \Vert \le \Vert x-y\Vert \) for any \(x, y \in {\mathbb {R}}^d\). After rearranging the terms and multiplying two sides by \(\Vert x_{k+1} - x_k\Vert \), we obtain

$$\begin{aligned} -\Vert x_{k+1} - x_k\Vert ^2 \le&-\frac{\alpha \sigma }{2}\Vert x_k - x^{*}\Vert \Vert x_{k+1} - x_k\Vert \nonumber \\&~~~~ + \Vert \alpha ( \nabla f(x_k) - g_k )+ \beta (x_k -x_{k-1} )\Vert \Vert x_{k+1} - x_k\Vert , \forall k \ge 0. \end{aligned}$$
(28)

The first term \( -\frac{\alpha \sigma }{2}\Vert x_k - x^{*}\Vert \Vert x_{k+1} - x_k\Vert \) of the right-hand side can be bounded as follows:

$$\begin{aligned} - \Vert x_k - x^{*}\Vert \Vert x_{k+1} - x_k\Vert&\le - \langle x^{*} - x_k, x_{k+1}-x_k\rangle \\&= - \Vert x_{k+1}- x_k\Vert ^2 - \langle x^{*} - x_{k+1}, x_{k+1}-x_k\rangle \\&= - \Vert x_{k+1}- x_k\Vert ^2 \\&\quad + \langle x_{k+1}-x^{*}, \beta (x_{k}-x_{k-1}) + \alpha (-g_k-r_{k+1})\rangle , \end{aligned}$$

where we denote \(r_{k+1} \in \partial h(x_{k+1})\). Due to convexity of h and (23), we have \( \langle x_{k+1}-x^{*}, -\alpha r_{k+1}\rangle \le \alpha (h(x^{*}) - h(x_{k+1})) \) and \(f(x_{k+1}) - f(x^{*}) \le \langle \nabla f(x_k) , x_{k+1} - x^{*} \rangle + \frac{L}{2} \Vert x_{k+1} - x_k\Vert ^2\). Denote \(F_k := F(x_k)-F(x^{*})\); then we can derive that

$$\begin{aligned}&- \Vert x_k - x^{*}\Vert \Vert x_{k+1} - x_k\Vert \nonumber \\&\le - \Vert x_{k+1}- x_k\Vert ^2 + \langle x_{k+1}-x^{*}, \beta (x_{k}-x_{k-1}) \rangle - \alpha F_{k+1} + \alpha \langle g_k \nonumber \\&\quad - \nabla f(x_{k}), x^{*}-x_{k+1}\rangle + \frac{\alpha L}{2}\Vert x_{k+1}-x_k\Vert ^2 \nonumber \\&\le -\alpha F_{k+1} + \left( \frac{\alpha L}{2} -1 \right) \Vert x_{k+1}- x_k\Vert ^2 + \Vert x_{k+1}-x^{*}\Vert \Vert \alpha (\nabla f(x_k) - g_k )\nonumber \\&\quad + \beta (x_{k}-x_{k-1}) \Vert \le -\alpha F_{k+1} + \left( \frac{\alpha L}{2} -1 \right) \Vert x_{k+1}- x_k\Vert ^2 \nonumber \\&\quad + \frac{2}{\alpha \sigma } \Vert \alpha (\nabla f(x_k) - g_k )+ \beta (x_{k}-x_{k-1}) \Vert ^2 \nonumber \\&\quad + \frac{2 + \alpha \sigma }{\alpha \sigma } \Vert x_{k+1}-x_k\Vert \Vert \alpha (\nabla f(x_k) - g_k )+ \beta (x_{k}-x_{k-1}) \Vert , \forall k \ge 0, \end{aligned}$$
(29)

where the last inequality is from (27). By combining (29) and (28), we get

$$\begin{aligned} -\Vert x_{k+1} - x_k\Vert ^2&\le -\frac{\alpha ^2\sigma }{2}F_{k+1} + \frac{\alpha \sigma }{2}\left( \frac{\alpha L}{2} -1 \right) \Vert x_{k+1}- x_k\Vert ^2 \nonumber \\&~~+ \Vert \alpha (\nabla f(x_k) - g_k )+ \beta (x_{k}-x_{k-1}) \Vert ^2 \nonumber \\&~~+ \left( 2 + \frac{ \alpha \sigma }{2} \right) \Vert x_{k+1}-x_k\Vert \Vert \alpha (\nabla f(x_k) - g_k )+ \beta (x_{k}-x_{k-1}) \Vert . \end{aligned}$$
(30)

Since \( \Vert \nabla f(x_k) - g_k\Vert \le L \sum \limits _{j=(k-\tau )_+}^{k-1} \Vert x_{j+1} - x_j\Vert \), we have that

$$\begin{aligned}&\Vert \alpha (\nabla f(x_k) - g_k )+ \beta (x_{k}-x_{k-1}) \Vert ^2 \nonumber \\&\le \left( \alpha L \sum \limits _{j=(k-\tau )_+}^{k-1} \Vert x_{j+1} - x_j\Vert + \beta \Vert x_{k}-x_{k-1}\Vert \right) ^2 \nonumber \\&\le \left( \tau \alpha ^2L^2 +\alpha L\beta \right) \sum \limits _{j=(k-\tau )_+}^{k-1} \Vert x_{j+1} - x_j\Vert ^2 +(\alpha \tau L\beta + \beta ^2) \Vert x_{k}-x_{k-1}\Vert ^2, \end{aligned}$$
(31)

and

$$\begin{aligned}&\Vert x_{k+1}-x_k\Vert \Vert \alpha (\nabla f(x_k) - g_k )+ \beta (x_{k}-x_{k-1}) \Vert \nonumber \\&\le \Vert x_{k+1}-x_k\Vert \left( \alpha L \sum \limits _{j=(k-\tau )_+}^{k-1} \Vert x_{j+1} - x_j\Vert + \beta \Vert x_{k}-x_{k-1}\Vert \right) \nonumber \\&\le \frac{ \alpha L}{2}\sum \limits _{j=(k-\tau )_+}^{k-1} \Vert x_{j+1} - x_j\Vert ^2 +\frac{\tau \alpha L + \beta }{2}\Vert x_{k+1}-x_{k}\Vert ^2+ \frac{\beta }{2} \Vert x_{k}-x_{k-1}\Vert ^2. \end{aligned}$$
(32)

Put (31), (32), and (30) together to yield

$$\begin{aligned} -\Vert x_{k+1} - x_k\Vert ^2&\le -\frac{\alpha ^2\sigma }{2}F_{k+1} +\left[ \frac{\alpha \sigma }{2}\left( \frac{\alpha L}{2} -1\right) + \left( 2 + \frac{\alpha \sigma }{2}\right) \frac{\tau \alpha L+\beta }{2} \right] \Vert x_{k+1}- x_k\Vert ^2 \\&\quad + \alpha L \left[ \tau \alpha L + \beta + \left( 2 + \frac{\alpha \sigma }{2}\right) \frac{1}{2} \right] \sum \limits _{j=(k-\tau )_+}^{k-1} \Vert x_{j+1} - x_j\Vert ^2 \\&\quad + \beta \left[ \left( 2 + \frac{\alpha \sigma }{2}\right) \frac{1}{2} + \alpha \tau L +\beta \right] \Vert x_{k}-x_{k-1} \Vert ^2, \forall k \ge 0. \end{aligned}$$

Since \(\alpha \le \frac{1}{(\tau +1)L}, 0 \le \beta < 1\), we have the following bounds

$$\begin{aligned}&\frac{\alpha \sigma }{2}\left( \frac{\alpha L}{2} -1\right) + \left( 2 + \frac{\alpha \sigma }{2}\right) \frac{\tau \alpha L+\beta }{2} \le \frac{\alpha \sigma }{2}\frac{\beta -1}{2}+ \tau \alpha L +\beta \le \beta + 1, \\&\alpha L\left[ \tau \alpha L + \beta + (2+\frac{\alpha \sigma }{2})\frac{1}{2} \right] \le \alpha L \left( 1- \alpha L + \beta + 1 +\frac{\alpha L}{4}\right) \le \alpha L ( \beta + 2 ), \\&\left( 2 + \frac{ \alpha L}{2} \right) \frac{\beta }{2} + \tau \alpha L \beta +\beta ^2 \le \beta ( \beta + 2). \end{aligned}$$

Hence, for \( k \ge 0\) it holds that

$$\begin{aligned} -\Vert x_{k+1} - x_k\Vert ^2 \le&-\frac{\sigma }{2(2+\beta )}\alpha ^2 F_{k+1} + \alpha L\sum \limits _{j=(k-\tau )_+}^{k-1}\Vert x_{j+1} - x_j\Vert ^2 +\beta \Vert x_k-x_{k-1}\Vert ^2. \end{aligned}$$

This completes the proof.

Proof of Lemma 4. Since each component function \(f_n(x)\) is convex with \(L_n\)-continuous gradient, we derive that

$$\begin{aligned} f_n(x_{k+1})&\leqslant f_n(x_{k-\tau _k^n}) + \langle \bigtriangledown f_n(x_{k-\tau _k^n}), x_{k+1}- x_{k-\tau _k^n} \rangle + \frac{L_n}{2} \Vert x_{k+1}- x_{k-\tau _k^n}\Vert ^2 \nonumber \\&\leqslant f_n(x) + \langle \bigtriangledown f_n(x_{k-\tau _k^n}), x_{k+1}- x \rangle + \frac{L_n}{2} \Vert x_{k+1}- x_{k-\tau _k^n}\Vert ^2, n=1,\cdots , N, \end{aligned}$$
(33)

where the second inequality follows from the convexity of \(f_n(x)\). By summing (33) over all components functions and using the expression of \(g_k\), we have

$$\begin{aligned} f(x_{k+1}) \le f(x) + \langle g_k, x_{k+1}- x \rangle + \sum _{n=1}^N \frac{L_n}{2} \Vert x_{k+1}- x_{k-\tau _k^n}\Vert ^2, \forall k \ge 0. \end{aligned}$$
(34)

Note that \(x_{k+1}\) is the minimizer of the \(\frac{1}{\alpha }\)-strongly convex function:

$$\begin{aligned} z \mapsto h(z) + \frac{1}{2\alpha }\Vert z-(x_k+\beta (x_k-x_{k-1}) -\alpha g_k)\Vert ^2, \forall k \ge 0. \end{aligned}$$

Then, we have

$$\begin{aligned}&h(x_{k+1}) + \frac{1}{2\alpha }\Vert x_{k+1}-\left( x_k+\beta (x_k-x_{k-1}) -\alpha g_k\right) \Vert ^2 \nonumber \\ \le&h(x) + \frac{1}{2\alpha }\Vert x-(x_k+\beta (x_k-x_{k-1})-\alpha g_k)\Vert ^2-\frac{1}{2\alpha }\Vert x-x_{k+1}\Vert ^2. \end{aligned}$$
(35)

After rearranging the terms of (35), we further get

$$\begin{aligned} \langle x_{k+1}-x, g_k \rangle \le&h(x) - h(x_{k+1})+ \frac{1}{2\alpha }\Vert x-x_k-\beta (x_k-x_{k-1})\Vert ^2 -\frac{1}{2\alpha }\Vert x_{k+1}-x\Vert ^2 \nonumber \\&~~~~-\frac{1}{2\alpha }\Vert x_{k+1}-x_k-\beta (x_k-x_{k-1})\Vert ^2, \forall k \ge 0, \forall x \in {\mathbb {R}}^d. \end{aligned}$$
(36)

By combining (34) and (36), we derive that

$$\begin{aligned} F(x_{k+1})&\le F(x) + \frac{1}{2\alpha }\Vert x- x_k-\beta (x_k-x_{k-1})\Vert ^2 - \frac{1}{2\alpha }\Vert x_{k+1}-x\Vert ^2 \nonumber \\&~~~~~~~~-\frac{1}{2\alpha }\Vert x_{k+1}-x_k-\beta (x_k-x_{k-1})\Vert ^2 + \sum _{n=1}^N \frac{L_n}{2} \Vert x_{k+1}- x_{k-\tau _k^n}\Vert ^2 \nonumber \\&= F(x) + \frac{1}{2\alpha }\Vert x- x_k\Vert ^2 -\frac{1}{2\alpha }\Vert x_{k+1}-x_{k}\Vert ^2 + \frac{\beta }{\alpha } \langle x_{k+1}-x, x_k-x_{k-1} \rangle \nonumber \\&~~~~~~~~ - \frac{1}{2\alpha }\Vert x_{k+1}-x\Vert ^2 + \sum _{n=1}^N \frac{L_n}{2} \Vert x_{k+1}- x_{k-\tau _k^n}\Vert ^2 \nonumber \\&\le F(x) + \frac{1}{2\alpha }\Vert x- x_{k}\Vert ^2 - \frac{1}{2\alpha }\Vert x_{k+1}-x_k\Vert ^2 - \frac{1}{2\alpha }\Vert x_{k+1}-x\Vert ^2 \nonumber \\&~~~~~~~~+ \frac{\beta }{2\alpha }\left( \Vert x-x_{k+1}\Vert ^2+\Vert x_{k}-x_{k-1}\Vert ^2\right) + \sum _{n=1}^N \frac{L_n}{2} \Vert x_{k+1}- x_{k-\tau _k^n}\Vert ^2. \end{aligned}$$
(37)

According to the Jensen inequality, we obtain \(\sum _{n=1}^N\frac{L_n}{2}\Vert x_{k+1}-x_{k-\tau _k^n}\Vert _2^2 = \sum _{n=1}^N\frac{L_n}{2}\Vert x_{k+1}-x_{k}+ \cdots +x_{k+1-\tau _k^n} -x_{k-\tau _k^n}\Vert ^2 \le \frac{L(\tau +1)}{2}\sum _{j=(k-\tau )_+}^k\Vert x_{j+1}-x_{j}\Vert ^2\). Therefore, the desired inequality follows. This completes the proof.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Zhang, X., Peng, W. & Zhang, H. Inertial proximal incremental aggregated gradient method with linear convergence guarantees. Math Meth Oper Res 96, 187–213 (2022). https://doi.org/10.1007/s00186-022-00790-0

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00186-022-00790-0

Keywords

Mathematics Subject Classification

Navigation