Inertial proximal incremental aggregated gradient method with linear convergence guarantees

Zhang, Xiaoya; Peng, Wei; Zhang, Hui

doi:10.1007/s00186-022-00790-0

Inertial proximal incremental aggregated gradient method with linear convergence guarantees

Original Article
Published: 25 June 2022

Volume 96, pages 187–213, (2022)
Cite this article

Mathematical Methods of Operations Research Aims and scope Submit manuscript

Xiaoya Zhang¹,
Wei Peng¹ &
Hui Zhang²

360 Accesses
Explore all metrics

Abstract

In this paper, we propose an inertial version of the Proximal Incremental Aggregated Gradient (abbreviated by iPIAG) method for minimizing the sum of smooth convex component functions and a possibly nonsmooth convex regularization function. First, we prove that iPIAG converges linearly under the gradient Lipschitz continuity and the strong convexity, along with an upper bound estimation of the inertial parameter. Then, by employing the recent Lyapunov-function-based method, we derive a weaker linear convergence guarantee, which replaces the strong convexity by the quadratic growth condition. At last, we present two numerical tests to illustrate that iPIAG outperforms the original PIAG.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1

Nonconvex Proximal Incremental Aggregated Gradient Method with Linear Convergence

Article 17 May 2019

Proximal-like incremental aggregated gradient method with Bregman distance in weakly convex optimization problems

Article 29 May 2021

Inertial self-adaptive algorithms for solving non-smooth convex optimization problems

Article 04 March 2024

Notes

https://github.com/tiepvupsu/FISTA.git

References

Aytekin A (2019) Asynchronous first-order algorithms for large-scale optimization: analysis and implementation. PhD thesis, KTH Royal Institute of Technology,
Aytekin A, Feyzmahdavian HR, Johansson M (2016) Analysis and implementation of an asynchronous optimization algorithm for the parameter server. arXiv preprint arXiv:1610.05507
Beck A (2017) First-order methods in optimization. SIAM
Beck A, Shtern S (2017) Linearly convergent away-step conditional gradient for non-strongly convex functions. Math Program 164(1–2):1–27
Article MathSciNet Google Scholar
Beck A, Teboulle M (2009) A fast iterative shrinkage-thresholding algorithm for linear inverse problems. SIAM J Imag Sci 2(1):183–202
Article MathSciNet Google Scholar
Bolte J, Nguyen TP, Peypouquet J, Suter BW (2017) From error bounds to the complexity of first-order descent methods for convex functions. Math Program 165(2):471–507
Article MathSciNet Google Scholar
Chretien S (2010) An alternating $ \ell _1 $ approach to the compressed sensing problem. IEEE Signal Process Lett 17(2):181–184
Combettes PL, Glaudin LE (2017) Quasi-nonexpansive iterations on the affine hull of orbits: from mann’s mean value algorithm to inertial methods. SIAM J Optim 27(4):2356–2380
Article MathSciNet Google Scholar
Dn Blatt, Hero AO, Gauchman H (2007) A convergent incremental gradient method with a constant step size. SIAM J Optim 18(1):29–51
Article MathSciNet Google Scholar
Drusvyatskiy D, Lewis AS (2013) Tilt stability, uniform quadratic growth, and strong metric regularity of the subdifferential. SIAM J Optim 23(1):256–267
Article MathSciNet Google Scholar
Drusvyatskiy D, Lewis AS (2018) Error bounds, quadratic growth, and linear convergence of proximal methods. Math Oper Res 43(3):919–948
Article MathSciNet Google Scholar
Felipe A, Hedy A (2001) An inertial proximal method for maximal monotone operators via discretization of a nonlinear oscillator with damping. Set-Valued Anal 9(1):3–11
MathSciNet MATH Google Scholar
Feyzmahdavian HR, Aytekin A and Johansson M (2014) A delayed proximal gradient method with linear convergence rate. In: 2014 IEEE international workshop on machine learning for signal processing (MLSP), pp 1–6. IEEE
Gurbuzbalaban M, Ozdaglar A, Parrilo PA (2017) On the convergence rate of incremental aggregated gradient algorithms. SIAM J Optim 27(2):1035–1048
Article MathSciNet Google Scholar
Hale ET, Yin W and Zhang Z (2007) A fixed-point continuation method for $ \ell _1 $-regularized minimization with applications to compressed sensing. CAAM TR07-07, Rice University, 43:44
Hoffman AJ (1952) On approximate solutions of systems of linear inequalities. J Res Natl Bur Stand 49(4):263–265
Article MathSciNet Google Scholar
Jia Z, Huang J and Cai X (2021) Proximal-like incremental aggregated gradient method with bregman distance in weakly convex optimization problems. J Global Optim, 1–24
Jingwei L, Jalal F, Gabriel P (2016) A multi-step inertial forward-backward splitting method for non-convex optimization. In: Advances in neural information processing systems, pp 4035–4043
Johnstone PR, Moulin P (2017) Local and global convergence of a general inertial proximal splitting scheme for minimizing composite functions. Comput Optim Appl 67(2):259–292
Article MathSciNet Google Scholar
László SC (2021) Convergence rates for an inertial algorithm of gradient type associated to a smooth non-convex minimization. Math Program 190(1):285–329
Article MathSciNet Google Scholar
Latafat P, Themelis A, Ahookhosh M and Patrinos P (2021) Bregman Finito/MISO for nonconvex regularized finite sum minimization without lipschitz gradient continuity. arXiv preprint arXiv:2102.10312
Li G, Pong TK (2018) Calculus of the exponent of kurdyka-łojasiewicz inequality and its applications to linear convergence of first-order methods. Found Comput Math 18(5):1199–1232
Article MathSciNet Google Scholar
Liu Yuncheng, Xia Fuquan (2021) Variable smoothing incremental aggregated gradient method for nonsmooth nonconvex regularized optimization. Optimization Letters, pages 1–18
Li M, Zhou L, Yang Z, Li A, Xia F, Andersen DG and Smola A (2013) Parameter server for distributed machine learning. In: Big Learning NIPS Workshop, 6, pp 2
Łojasiewicz S (1959) Sur le problème de la division. Studia Math 18:87–136
Łojasiewicz S (1958) Division d’une distribution par une fonction analytiquede variables réelles. Comptes Rendus Hebdomadaires Des Seances de l Academie Des Sciences 246(5):683–686
MathSciNet MATH Google Scholar
Meier L, Geer SV, Bühlmann P (2008) The group lasso for logistic regression. J Royal Stat Soc: Ser B (Stat Methodol) 70(1):53–71
Article MathSciNet Google Scholar
Necoara I, Nesterov Y, Glineur F (2019) Linear convergence of first order methods for non-strongly convex optimization. Math Program 175(1):69–107
Article MathSciNet Google Scholar
Nesterov Y (2013) Gradient methods for minimizing composite functions. Math Program 140(1):125–161
Article MathSciNet Google Scholar
Ochs P (2018) Local convergence of the heavy-ball method and ipiano for non-convex optimization. J Optim Theory Appl 177(1):153–180
Article MathSciNet Google Scholar
Ochs P, Brox T, Pock T (2015) ipiasco: inertial proximal algorithm for strongly convex optimization. J Math Imag Vision 53(2):171–181
Article MathSciNet Google Scholar
Parikh N, Boyd S (2014) Proximal algorithms. Found Trends® Optim 1(3):127–239
Article Google Scholar
Peng CJ, Lee KL, Ingersoll GM (2002) An introduction to logistic regression analysis and reporting. J Educ Res 96(1):3–14
Article Google Scholar
Peng W, Zhang H, Zhang X (2019) Nonconvex proximal incremental aggregated gradient method with linear convergence. J Optim Theory Appl 183(1):230–245
Article MathSciNet Google Scholar
Pock T, Sabach S (2016) Inertial proximal alternating linearized minimization (iPALM) for nonconvex and nonsmooth problems. SIAM J Imag Sci 9(4):1756–1787
Article MathSciNet Google Scholar
Polyak BT (1964) Some methods of speeding up the convergence of iteration methods. USSR Comput Math Math Phys 4(5):1–17
Article Google Scholar
Rockafellar R (1970) On the maximal monotonicity of subdifferential mappings. Pacific J Math 33(1):209–216
Article MathSciNet Google Scholar
Scheinberg K, Goldfarb D, Bai X (2014) Fast first-order methods for composite convex optimization with backtracking. Found Comput Math 14(3):389–417
Article MathSciNet Google Scholar
Simon N, Friedman J, Hastie T, Tibshirani R (2013) A sparse-group lasso. J Comput Graphical Stat 22(2):231–245
Article MathSciNet Google Scholar
Vanli DN, Gurbuzbalaban M, Ozdaglar A (2018) Global convergence rate of proximal incremental aggregated gradient methods. SIAM J Optim 28(2):1282–1300
Article MathSciNet Google Scholar
Wen B, Chen X, Pong TK (2017) Linear convergence of proximal gradient algorithm with extrapolation for a class of nonconvex nonsmooth minimization problems. SIAM J Optim 27(1):124–145
Article MathSciNet Google Scholar
Yang Q, Liu Y, Chen T, Tong Y (2019) Federated machine learning: concept and applications. ACM Trans Intell Syst Technol (TIST) 10(2):1–19
Article Google Scholar
Yu P, Li G, K PT (2021) Kurdyka-Łojasiewicz exponent via inf-projection. Found Comput Math, pp 1–47
Yurii N (2013) Introductory lectures on convex optimization: a basic course, volume 87. Springer Science & Business Media
Zhang H (2020) New analysis of linear convergence of gradient-type methods via unifying error bound conditions. Math Program 180(1):371–416
Article MathSciNet Google Scholar
Zhang H, Dai Y, Guo L, Peng W (2021) Proximal-like incremental aggregated gradient method with linear convergence under Bregman distance growth conditions. Math Oper Res 46(1):61–81
Article MathSciNet Google Scholar

Download references

Acknowledgements

We are really grateful to the anonymous referees and the associate editor for many useful comments, which allowed us to significantly improve the original presentation. This work is supported by the National Science Foundation of China (No.11971480), the Natural Science Fund of Hunan for Excellent Youth (No.2020JJ3038), and the Fund for NUDT Young Innovator Awards (No. 20190105).

Author information

Authors and Affiliations

Defense Innovation Institute, Chinese Academy of Military Science, Beijing, 100071, China
Xiaoya Zhang & Wei Peng
Department of Mathematics, National University of Defense Technology, Changsha, 410073, Hunan, China
Hui Zhang

Authors

Xiaoya Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Wei Peng
View author publications
You can also search for this author in PubMed Google Scholar
Hui Zhang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Hui Zhang.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendix: Proofs of Theorems and Lemmas

Proof of Theorem 1. From Lemma 2, for all $ k \ge 0$ we have

$$\begin{aligned} F(x_{k+1}) - F(x_k)&\le -\frac{1}{2\alpha }\Vert x_{k+1} -x_k\Vert ^2 +\frac{\beta }{\alpha }\langle x_k - x_{k-1},x_{k+1}-x_k\rangle + \\&\frac{L}{2} \sum \limits _{j=(k-\tau )_+}^{k-1}\Vert x_{j+1} - x_j\Vert ^2 . \end{aligned}$$

By combining the inequality above with Lemma 3, we get

$$\begin{aligned} (1+ \frac{\sigma \alpha }{8(2+\beta )}) (F(x_{k+1}) -F^{*})&\le (F(x_{k}) -F^{*}) \\&- \frac{1-2\beta }{4\alpha }\Vert x_{k+1}-x_{k}\Vert ^2 + \frac{3\beta }{4\alpha }\Vert x_k-x_{k-1}\Vert ^2 \\&+ \frac{3L}{4} \sum \limits _{j=(k-\tau )_+}^{k-1}\Vert x_{j+1} - x_j\Vert ^2, \forall k \ge 0. \end{aligned}$$

In order to apply Lemma 1, let $V_k:=F(x_{k})-F^{*}$, $\omega _k:=\Vert x_{k+1}-x_{k}\Vert ^2$, $a := 1/(1+ \frac{\sigma \alpha }{8(2+\beta )}), \alpha _1 := 1, \alpha _2 := 0, b_1 := (\frac{1-2\beta }{4\alpha })a, b_2 := \frac{3\beta }{4\alpha }a, c := \frac{3L}{4}a, k_0 := \tau .$ Making this setting satisfy the required conditions of Lemma 1, we need the following inequalities to hold:

$$\begin{aligned} b_1-\frac{b_2}{a}>0, b_1>0 \end{aligned}$$

and

$$\begin{aligned} \frac{c}{1-a}\frac{1-a^{k_0+1}}{a^{k_0}} \le b_1 -\frac{b_2}{a} . \end{aligned}$$

The first condition could be guaranteed by letting $\beta < \min \left\{ \frac{16}{83}, \frac{1}{2} \right\} = \frac{16}{83} $, since

$$\begin{aligned} \frac{b_2}{a} = \frac{3\beta }{4\alpha }(1+ \frac{\sigma \alpha }{8(2+\beta )})a \le \frac{3\beta }{4\alpha } \left( 1 + \frac{\alpha L}{16}\right) a \le \frac{3\beta }{4\alpha } \left( 1 + \frac{1}{16}\right) a\le b_1, \end{aligned}$$

according to $\alpha L \in (0,1]$ and $\beta \in [0,1)$. The second condition is guaranteed by

$$\begin{aligned} \alpha \le \frac{8(2+\beta )}{\sigma }\left[ \left( \frac{1 - \frac{83}{16}\beta }{(24(2+\beta )Q} +1\right) ^{\frac{1}{\tau +1}} -1 \right] . \end{aligned}$$

In fact, with this bound (guarantee of the following first inequality), we can derive that

$$\begin{aligned} \frac{c}{1-a}\frac{1-a^{k_0+1}}{a^{k_0}}&=\left[ \left( \frac{\alpha \sigma }{8(2+\beta )} + 1\right) ^{\tau +1} -1\right] \frac{6(\beta +2)L}{\alpha \sigma (1+ \frac{\alpha \sigma }{8(2+\beta )} )} \\&\le \frac{1 - \frac{83}{16}\beta }{4\alpha (1+ \frac{\alpha \sigma }{8(2+\beta )} )}\\&\le \frac{1- 2\beta }{4\alpha (1+ \frac{\alpha \sigma }{8(2+\beta )} )} - \frac{3\beta }{4\alpha }, \\&= b_1 -\frac{b_2}{a}, \end{aligned}$$

where the second inequality follows from $\beta \ge 0$ and $\alpha \sigma \le 1$. Hence, the second condition holds as well. Therefore, the claimed convergence follows from Lemma 1.

Proof of Theorem 2. Since ${\mathcal {X}}$ is a nonempty closed convex set, the projection point of z onto ${\mathcal {X}}$ is unique, denoted by $z^{*}$. Note that $F(x_k^{*})=F^{*}$. According to Lemma 4, we obtain

$$\begin{aligned} F(x_{k+1}) - F^{*} + \frac{1-\beta }{2\alpha }\Vert x_{k+1}-x_k^{*} \Vert ^2 \le&\frac{1}{2\alpha }\Vert x_k^{*} - x_{k}\Vert ^2 \nonumber \\&- \frac{1}{2\alpha }\Vert x_{k+1}-x_k\Vert ^2+ \frac{\beta }{2\alpha }\Vert x_{k}-x_{k-1}\Vert ^2 \nonumber \\&+ \frac{L(\tau +1)}{2}\sum _{j=(k-\tau )_+}^k\Vert x_{j+1}-x_{j}\Vert ^2, \forall k \ge 0 . \end{aligned}$$

(12)

Since $x_k^{*}\in {\mathcal {X}}$, by the definition of projection, it holds that

$$\begin{aligned} \Vert x_k^{*}-x_{k+1}\Vert ^2 \ge \Vert x_{k+1}^{*}-x_{k+1}\Vert ^2 = d^2(x_{k+1}, {\mathcal {X}}), \forall k \ge 0. \end{aligned}$$

Now, in terms of the expression of the Lyapunov function $\Psi $, we have

$$\begin{aligned} \Psi (x_{k+1})&\le \frac{1}{2\alpha }\Vert x_k^{*}-x_{k}\Vert ^2 -\frac{1}{2\alpha }\Vert x_{k+1}-x_{k} \Vert ^2\nonumber \\&+ \frac{\beta }{2\alpha }\Vert x_k-x_{k-1}\Vert ^2+ \frac{L(\tau +1)}{2}\sum _{j=(k-\tau )_+}^k\Vert x_{j+1}-x_{j}\Vert ^2, \forall k \ge 0. \end{aligned}$$

(13)

By using the quadratic growth condition, we obtain

$$\begin{aligned} \Vert x_k^{*}-x_k\Vert ^2 = d^2(x_{k}, {\mathcal {X}}) \le \frac{2}{\mu }(F(x_{k}) - F^{*}), \forall k \ge 0 \end{aligned}$$

and hence

$$\begin{aligned} \Vert x_k^{*}-x_{k}\Vert ^2 \le p\Vert x_k^{*}-x_{k}\Vert ^2 + \frac{2q}{\mu }(F(x_k)-F^{*}),\forall k \ge 0, \end{aligned}$$

(14)

with $p+q=1, p, q \ge 0$. Picking $p=\frac{1-\beta }{\alpha \mu +1-\beta },q=\frac{\alpha \mu }{\alpha \mu +1-\beta }$ and combining (13) and (14), we obtain

$$\begin{aligned} \Psi (x_{k+1})&\le \frac{1}{\alpha \mu +1-\beta }\Psi (x_k) -\frac{1}{2\alpha }\Vert x_{k+1}-x_{k} \Vert ^2+\frac{\beta }{2\alpha }\Vert x_k-x_{k-1}\Vert ^2 \nonumber \\&~~~~+ \frac{L(\tau +1)}{2}\sum _{j=(k-\tau )_+}^k\Vert x_{j+1}-x_{j}\Vert ^2, \forall k \ge 0. \end{aligned}$$

(15)

In order to apply Lemma 1, let $V_k=\Psi (x_k)$, $\omega _k=\Vert x_{k+1}-x_{k}\Vert ^2$, $a = \frac{1}{\alpha \mu +1-\beta }, \alpha _1 := 1, \alpha _2 := 0, b_1= \frac{1}{2\alpha }$, $b_2 = \frac{\beta }{2\alpha }$, $c=\frac{L(\tau +1)}{2}$, $k_0=\tau $; we need the parameters satisfy (5) and (6), that is

$$\begin{aligned} \left\{ \begin{aligned}&0< a < 1, \\&\frac{b_2}{a} \le b_1, \\&\frac{c}{1-a} \frac{1-a^{k_0+1}}{a^{k_0}} \le b_1-\frac{b_2}{a}. \end{aligned} \right. \end{aligned}$$

(16)

Since $\alpha \le \frac{1}{L}$, if the parameters satisfy the following conditions

$$\begin{aligned}&0 \le \rho <1, \\&\alpha \le \min \left( \left[ \left( \frac{(1-\rho ) \mu r(\rho )}{L(\tau +1)} + 1 \right) ^{\frac{1}{\tau +1}} - 1 \right] /\mu , \frac{1}{L}\right) , \\&r(\rho ) := 1- \rho \frac{\mu }{L} - \rho (1-\rho )\frac{\mu ^2}{L^2}, \\&0 \le \beta \le \rho \alpha \mu , \end{aligned}$$

then we have that $0< a < 1$, $\frac{b_2}{a} \le b_1$ and

$$\begin{aligned} \frac{c}{1-a} \frac{1-a^{k_0+1}}{a^{k_0}}&=\frac{L(\tau +1)}{2} \frac{(\alpha \mu +1-\beta )^{\tau +1}-1}{\alpha \mu -\beta } \\&\le \frac{L(\tau +1)}{2} \frac{(\alpha \mu +1)^{\tau +1}-1}{(1-\rho )\alpha \mu } \\&\le \frac{1}{2\alpha } - \frac{\rho \alpha \mu }{2\alpha }(1+ \alpha \mu -\rho \alpha \mu ) \\&\le \frac{1}{2\alpha } - \frac{\beta }{2\alpha }(\alpha \mu +1-\beta )\\&=b_1-\frac{b_2}{a}. \end{aligned}$$

This indicates that (16) holds. Therefore, from Lemma 1. $\Psi (x_k)$ converges linearly in the sense of (9). The results (10) and (11) directly follow from the definition of $\Psi (x)$ and (9).

Proof of Lemma 1. Note that $A=\alpha _1 a$ and $B=\alpha _2 a^2$ in (5). The inequality (4) can be rewritten as:

$$\begin{aligned} V_{k+1}\le \alpha _1aV_k+\alpha _2a^2V_{k-1}-b_1\omega _k+b_2\omega _{k-1}+c\sum _{j=k-k_0}^k\omega _j. \end{aligned}$$

(17)

By dividing both sides of (17) by $a^{k+1}$ and summing the resulting inequality up from $k=1$ to $K(K \ge 1)$, we derive that

$$\begin{aligned} \sum _{k=1}^K \frac{V_{k+1}}{a^{k+1}} \le&\alpha _1\sum _{k=1}^K \frac{V_{k}}{a^k}+\alpha _2\sum _{k=0}^{K-1}\frac{V_{k}}{a^{k}}-b_1\sum _{k=1}^K\frac{\omega _k}{a^{k+1}} \nonumber \\&+\frac{b_2}{a}\sum _{k=0}^{K-1}\frac{\omega _{k}}{a^{k+1}}+\sum _{k=1}^{K}\left[ \frac{c}{a^{k+1}}\sum _{j=k-k_0}^k\omega _j\right] \nonumber \\&\le \sum _{k=0}^K\frac{V_k}{a^k}-\left( b_1-\frac{b_2}{a}\right) \sum _{k=0}^K\frac{\omega _k}{a^{k+1}}\nonumber \\&+\sum _{k=0}^{K}\left[ \frac{c}{a^{k+1}}\sum _{j=k-k_0}^k\omega _j\right] +\frac{b_1\omega _0}{a} . \end{aligned}$$

(18)

Since $w_k=0(k<0), w_k \ge 0(k \ge 0)$ and $a>0$, we get

$$\begin{aligned} \sum _{k=0}^{K}\left[ \frac{c}{a^{k+1}}\sum _{j=k-k_0}^k\omega _j\right]&\le \frac{c}{a}\left( \frac{1}{a^{-k_0}} + \frac{1}{a^{-k_0+1}}+ \cdots + \frac{1}{a^{0}} \right) w_{-k_0} \nonumber \\&~~~~+ \frac{c}{a}\left( \frac{1}{a^{-k_0+1}} + \frac{1}{a^{-k_0+2}} + \cdots + \frac{1}{a^{1}} \right) w_{-k_0+1} \nonumber \\&~~~~ + \cdots + \frac{c}{a}\left( \frac{1}{a^{-k_0+K}} + \frac{1}{a^{-k_0+K+1}}+ \cdots + \frac{1}{a^{K}} \right) w_{-k_0+K}\nonumber \\&~~~~ + \cdots + \frac{c}{a}\left( \frac{1}{a^{K}} + \frac{1}{a^{K+1}} + \cdots + \frac{1}{a^{K+k_0}} \right) w_{K}\nonumber \\&\le \sum _{j=0}^{K+k_0}\frac{c}{a}\left( \frac{1}{a^{j-k_0}}+\dots + \frac{1}{a^{j}}\right) w_{j-k_0}\nonumber \\&= \sum _{k=-k_0}^{K}\frac{c}{a}\left( \frac{1}{a^{k}}+\dots + \frac{1}{a^{k+k_0}}\right) w_{k}\nonumber \\&= \sum _{k=0}^{K}\frac{c}{a}\left( \frac{1}{a^{k}}+\dots + \frac{1}{a^{k+k_0}}\right) w_{k}. \end{aligned}$$

(19)

Therefore, together (19) and (18), we have that

$$\begin{aligned}\sum _{k=1}^K \frac{V_{k+1}}{a^{k+1}}&\le \sum _{k=0}^K\frac{V_k}{a^k}+ \frac{b_1\omega _0}{a}+\sum _{k=0}^K\left[ c\left( 1+\frac{1}{a}+\cdots +\frac{1}{a^{k_0}}\right) -\left( b_1-\frac{b_2}{a}\right) \right] \frac{\omega _k}{a^{k+1}}\\&= \sum _{k=0}^K\frac{V_k}{a^k} + \frac{b_1\omega _0}{a}+ \sum _{k=0}^K\left[ \frac{c}{1-a}\frac{1-a^{k_0+1}}{a^{k_0}}-\left( b_1-\frac{b_2}{a}\right) \right] \frac{\omega _k}{a^{k+1}}. \end{aligned}$$

By condition (6), we obtain

$$\begin{aligned} \sum _{k=1}^K \frac{V_{k+1}}{a^{k+1}} \le \sum _{k=0}^K\frac{V_k}{a^k}+ \frac{b_1\omega _0}{a}, \end{aligned}$$

(20)

that is $\frac{V_{K+1}}{a^{K+1}} \le V_0 + \frac{V_1}{a}+ \frac{b_1\omega _0}{a}$ for $K \ge 1$. Besides, we know $V_1 \le V_1+aV_0+b_1\omega _0$. Therefore, for $\forall K\ge 1$, we have

$$\begin{aligned} V_{K}\le a^{K-1}\left( V_1+aV_0+b_1\omega _0\right) . \end{aligned}$$

This completes the proof.

Proof of Lemma 2. By using the L-gradient Lipschitz continuity of f, we have

$$\begin{aligned} f(x_{k+1}) - f(x_k)&\le \langle \nabla f(x_k) , x_{k+1} - x_k \rangle + \frac{L}{2} \Vert x_{k+1} - x_k\Vert ^2, \forall k \ge 0 . \end{aligned}$$

(21)

Together with the subgradient inequality of f,

$$\begin{aligned} f(x) - f(x_{k})&\ge \langle \nabla f(x_k) , x - x_k \rangle , \forall k \ge 0. \end{aligned}$$

(22)

we have

$$\begin{aligned} f(x_{k+1}) - f(x)&\le \langle \nabla f(x_k) , x_{k+1} - x\rangle + \frac{L}{2} \Vert x_{k+1} - x_k\Vert ^2, \forall k \ge 0. \end{aligned}$$

(23)

Since that $x_{k+1}$ is the minimizer of the $\frac{1}{\alpha }$-strongly convex function:

$$\begin{aligned} z \mapsto h(z) + \frac{1}{2\alpha }\Vert z-(x_k+\beta (x_k-x_{k-1})-\alpha g_k)\Vert ^2, \end{aligned}$$

where $ g_k =\sum _{n=1}^N \nabla f_n(x_{k-\tau _{k}^n}) $. We have

$$\begin{aligned}&h(x_{k+1}) + \frac{1}{2\alpha }\Vert x_{k+1}-(x_k+\beta (x_k-x_{k-1})-\alpha g_k)\Vert ^2 \nonumber \\&\le h(x) + \frac{1}{2\alpha }\Vert x-(x_k+\beta (x_k-x_{k-1}) -\alpha g_k)\Vert ^2 - \frac{1}{2\alpha }\Vert x-x_{k+1}\Vert ^2, \forall k \ge 0. \end{aligned}$$

(24)

By combining (23) and (24), we derive that

$$\begin{aligned} F(x_{k+1}) - F(x)&\le \langle \nabla f(x_k) - g_k, x_{k+1} - x\rangle + \frac{1}{2\alpha }\Vert x - x_k\Vert ^2 - \frac{1}{2\alpha }\Vert x -x_{k+1}\Vert ^2 \nonumber \\&\qquad - \frac{1}{2\alpha }\Vert x_{k+1} - x_k\Vert ^2 +\frac{\beta }{\alpha }\langle x_k - x_{k-1},x_{k+1}-x\rangle \nonumber \\&\qquad + \frac{L}{2}\Vert x_{k+1} - x_k\Vert ^2 \le \Vert \nabla f(x_k) - g_k\Vert \Vert x_{k+1} - x\Vert \nonumber \\&\qquad + \frac{1}{2\alpha }\Vert x - x_k\Vert ^2 - \frac{1}{2\alpha }\Vert x -x_{k+1}\Vert ^2 - \frac{1}{2\alpha }\Vert x_{k+1} - x_k\Vert ^2 \nonumber \\&\qquad +\frac{\beta }{\alpha }\langle x_k - x_{k-1},x_{k+1}-x\rangle + \frac{L}{2}\Vert x_{k+1} - x_k\Vert ^2, \forall k \ge 0, \end{aligned}$$

where the inequality follows by the Cauchy-Schwartz inequality. Note that $\Vert \nabla f(x_k) - g_k\Vert \le L \sum \limits _{j=(k-\tau )_+}^{k-1} \Vert x_{j+1} - x_j\Vert $. Then, by using the Cauchy-Schwartz inequality again, we have that

$$\begin{aligned} \Vert \nabla f(x_k) - g_k\Vert \Vert x_{k+1} - x\Vert&\le \frac{L}{2} \sum \limits _{j=(k-\tau )_+}^{k-1} (\Vert x_{j+1} - x_j\Vert ^2 + \Vert x_{k+1} - x\Vert ^2),\forall k \ge 0. \end{aligned}$$

Hence

$$\begin{aligned} F(x_{k+1}) - F(x)&\le \left( \frac{\tau L}{2} - \frac{1}{2\alpha }\right) \Vert x- x_{k+1}\Vert ^2 + \frac{1}{2\alpha } \Vert x - x_k\Vert ^2 \\&\qquad + \left( \frac{L}{2}-\frac{1}{2\alpha }\right) \Vert x_{k+1} -x_k\Vert ^2 \\&\qquad + \frac{\beta }{\alpha }\langle x_k - x_{k-1},x_{k+1}-x\rangle + \frac{L}{2} \sum \limits _{j=(k-\tau )_+}^{k-1}\Vert x_{j+1} - x_j\Vert ^2 \\&\le - \frac{L}{2} \Vert x- x_{k+1}\Vert ^2 + \frac{1}{2\alpha } \Vert x - x_k\Vert ^2 + \left( \frac{L}{2}-\frac{1}{2\alpha }\right) \Vert x_{k+1} -x_k\Vert ^2 \\&\qquad + \frac{\beta }{\alpha }\langle x_k - x_{k-1},x_{k+1}-x\rangle + \frac{L}{2} \sum \limits _{j=(k-\tau )_+}^{k-1}\Vert x_{j+1} - x_j\Vert ^2, \forall k \ge 0, \end{aligned}$$

where the last inequality uses the bound $\alpha \le \frac{1}{(\tau +1)L}$. Put $x = x_k$ , for $\forall k \ge 0 $; then we have

$$\begin{aligned} F(x_{k+1})&- F(x_k) \le -\frac{1}{2\alpha } \Vert x_{k+1} - x_k\Vert ^2 +\frac{\beta }{\alpha }\langle x_k - x_{k-1},x_{k+1}-x_k\rangle \\&+ \frac{L}{2} \sum \limits _{j=(k-\tau )_+}^{k-1}\Vert x_{j+1} - x_j\Vert ^2 . \end{aligned}$$

Proof of Lemma 3. Denote $x_{k+1}^\prime = \text {prox}_h^\alpha (x_k - \alpha \nabla f(x_k))$. By using the firm nonexpansivity of the proximal map (please refer to Theorem 6.42 (Beck 2017) for understanding the firm nonexpansivity) implies

$$\begin{aligned} \Vert \text {prox}_h^\alpha (x) - \text {prox}_h^\alpha (y)\Vert ^2 \le \langle \text {prox}_h^\alpha (x) - \text {prox}_h^\alpha (y), x-y\rangle . \end{aligned}$$

(25)

Take $x=x_k -\alpha \nabla f(x_k)$ and $y = x^{*} -\alpha \nabla f( x^{*})$ (obviously $ \text {prox}_h^\alpha (y) = x^{*}$) into (25) to yield

$$\begin{aligned} \Vert x_{k+1}^\prime -x^{*} \Vert ^2&\le \langle x_{k+1}^\prime -x^{*} , x_k -\alpha \nabla f(x_k) - x^{*} + \alpha \nabla f( x^{*}) \rangle \\&= \langle x_{k+1}^\prime -x^{*}, x_{k+1}^\prime -x^{*} \rangle + \langle x_{k+1}^\prime -x^{*}, - x_{k+1}^\prime \\&\quad + x_k -\alpha \nabla f(x_k) + \alpha \nabla f( x^{*}) \rangle , \end{aligned}$$

which implies $ 0 \le \langle x_{k+1}^\prime -x^{*}, - x_{k+1}^\prime + x_k -\alpha \nabla f(x_k) + \alpha \nabla f( x^{*}) \rangle $ for all $ k \ge 0$. This inequality can be rewritten as follows:

$$\begin{aligned} \alpha \langle x_k -x^{*} , \nabla f(x_k) - \nabla f( x^{*})\rangle&\le \langle x_k -x^{*}, x_k -x_{k+1}^\prime \rangle - \Vert x_{k+1}^\prime -x_k \Vert ^2 \nonumber \\&~~~~~~~~+ \alpha \langle x_{k+1}^\prime -x_k ,\nabla f( x^{*})-\nabla f(x_k) \rangle \nonumber \\&\le \Vert x_{k+1}^\prime -x_k \Vert \left( \Vert x_k -x^{*}\Vert + \alpha \Vert \nabla f( x^{*})- \nabla f(x_k) \Vert \right) \nonumber \\\&\le \Vert x_{k+1}^\prime -x_k\Vert \left( \Vert x_k -x^{*}\Vert + \alpha L \Vert x_k -x^{*} \Vert \right) \ \nonumber \\&= (\alpha L +1)\Vert x_{k+1}^\prime -x_k \Vert \Vert x_k -x^{*}\Vert , \forall k \ge 0, \end{aligned}$$

(26)

where the second inequality is based on the negativeness of $- \Vert x_{k+1}^\prime -x_k \Vert ^2 $ and the Cauchy-Schwartz inequality, the third inequality is according to the L-gradient Lipschitz continuity of f. Since $ \langle \nabla f(x_k) - \nabla f(x^{*}), x_k -x^{*} \rangle \ge \sigma \Vert x_k - x^{*}\Vert ^2$ due to the strong convexity of f and the condition $\alpha L \le 1$, we obtain

$$\begin{aligned} \sigma \Vert x_k - x^{*}\Vert \le \frac{\alpha L+1}{\alpha }\Vert x_{k+1}^\prime -x_k \Vert \le \frac{2}{\alpha }\Vert x_{k+1}^\prime -x_k \Vert , \forall k \ge 0. \end{aligned}$$

Thus,

$$\begin{aligned} \frac{\sigma }{2}\Vert x_k - x^{*}\Vert&\le \frac{1}{\alpha }\Vert x_{k+1}^\prime - x_{k+1}\Vert + \frac{1}{\alpha }\Vert x_{k+1} - x_k\Vert \nonumber \\&= \frac{1}{\alpha }\Vert \text {prox}_h^\alpha (x_k - \alpha \nabla f(x_k)) - \text {prox}_h^\alpha (x_k - \alpha g_k + \beta (x_k - x_{k-1})) \Vert \nonumber \\&\quad + \frac{1}{\alpha }\Vert x_{k+1} - x_k\Vert \nonumber \\&\le \Vert \nabla f(x_k) - g_k + \frac{\beta }{\alpha }(x_k -x_{k-1})\Vert + \frac{1}{\alpha }\Vert x_{k+1} - x_k\Vert , \forall k \ge 0, \end{aligned}$$

(27)

where the second inequality is according to the nonexpansive property of the proximal map $\Vert \text {prox}_h^\alpha (x) - \text {prox}_h^\alpha (y) \Vert \le \Vert x-y\Vert $ for any $x, y \in {\mathbb {R}}^d$. After rearranging the terms and multiplying two sides by $\Vert x_{k+1} - x_k\Vert $, we obtain

$$\begin{aligned} -\Vert x_{k+1} - x_k\Vert ^2 \le&-\frac{\alpha \sigma }{2}\Vert x_k - x^{*}\Vert \Vert x_{k+1} - x_k\Vert \nonumber \\&~~~~ + \Vert \alpha ( \nabla f(x_k) - g_k )+ \beta (x_k -x_{k-1} )\Vert \Vert x_{k+1} - x_k\Vert , \forall k \ge 0. \end{aligned}$$

(28)

The first term $ -\frac{\alpha \sigma }{2}\Vert x_k - x^{*}\Vert \Vert x_{k+1} - x_k\Vert $ of the right-hand side can be bounded as follows:

$$\begin{aligned} - \Vert x_k - x^{*}\Vert \Vert x_{k+1} - x_k\Vert&\le - \langle x^{*} - x_k, x_{k+1}-x_k\rangle \\&= - \Vert x_{k+1}- x_k\Vert ^2 - \langle x^{*} - x_{k+1}, x_{k+1}-x_k\rangle \\&= - \Vert x_{k+1}- x_k\Vert ^2 \\&\quad + \langle x_{k+1}-x^{*}, \beta (x_{k}-x_{k-1}) + \alpha (-g_k-r_{k+1})\rangle , \end{aligned}$$

where we denote $r_{k+1} \in \partial h(x_{k+1})$. Due to convexity of h and (23), we have $ \langle x_{k+1}-x^{*}, -\alpha r_{k+1}\rangle \le \alpha (h(x^{*}) - h(x_{k+1})) $ and $f(x_{k+1}) - f(x^{*}) \le \langle \nabla f(x_k) , x_{k+1} - x^{*} \rangle + \frac{L}{2} \Vert x_{k+1} - x_k\Vert ^2$. Denote $F_k := F(x_k)-F(x^{*})$; then we can derive that

$$\begin{aligned}&- \Vert x_k - x^{*}\Vert \Vert x_{k+1} - x_k\Vert \nonumber \\&\le - \Vert x_{k+1}- x_k\Vert ^2 + \langle x_{k+1}-x^{*}, \beta (x_{k}-x_{k-1}) \rangle - \alpha F_{k+1} + \alpha \langle g_k \nonumber \\&\quad - \nabla f(x_{k}), x^{*}-x_{k+1}\rangle + \frac{\alpha L}{2}\Vert x_{k+1}-x_k\Vert ^2 \nonumber \\&\le -\alpha F_{k+1} + \left( \frac{\alpha L}{2} -1 \right) \Vert x_{k+1}- x_k\Vert ^2 + \Vert x_{k+1}-x^{*}\Vert \Vert \alpha (\nabla f(x_k) - g_k )\nonumber \\&\quad + \beta (x_{k}-x_{k-1}) \Vert \le -\alpha F_{k+1} + \left( \frac{\alpha L}{2} -1 \right) \Vert x_{k+1}- x_k\Vert ^2 \nonumber \\&\quad + \frac{2}{\alpha \sigma } \Vert \alpha (\nabla f(x_k) - g_k )+ \beta (x_{k}-x_{k-1}) \Vert ^2 \nonumber \\&\quad + \frac{2 + \alpha \sigma }{\alpha \sigma } \Vert x_{k+1}-x_k\Vert \Vert \alpha (\nabla f(x_k) - g_k )+ \beta (x_{k}-x_{k-1}) \Vert , \forall k \ge 0, \end{aligned}$$

(29)

where the last inequality is from (27). By combining (29) and (28), we get

$$\begin{aligned} -\Vert x_{k+1} - x_k\Vert ^2&\le -\frac{\alpha ^2\sigma }{2}F_{k+1} + \frac{\alpha \sigma }{2}\left( \frac{\alpha L}{2} -1 \right) \Vert x_{k+1}- x_k\Vert ^2 \nonumber \\&~~+ \Vert \alpha (\nabla f(x_k) - g_k )+ \beta (x_{k}-x_{k-1}) \Vert ^2 \nonumber \\&~~+ \left( 2 + \frac{ \alpha \sigma }{2} \right) \Vert x_{k+1}-x_k\Vert \Vert \alpha (\nabla f(x_k) - g_k )+ \beta (x_{k}-x_{k-1}) \Vert . \end{aligned}$$

(30)

Since $ \Vert \nabla f(x_k) - g_k\Vert \le L \sum \limits _{j=(k-\tau )_+}^{k-1} \Vert x_{j+1} - x_j\Vert $, we have that

$$\begin{aligned}&\Vert \alpha (\nabla f(x_k) - g_k )+ \beta (x_{k}-x_{k-1}) \Vert ^2 \nonumber \\&\le \left( \alpha L \sum \limits _{j=(k-\tau )_+}^{k-1} \Vert x_{j+1} - x_j\Vert + \beta \Vert x_{k}-x_{k-1}\Vert \right) ^2 \nonumber \\&\le \left( \tau \alpha ^2L^2 +\alpha L\beta \right) \sum \limits _{j=(k-\tau )_+}^{k-1} \Vert x_{j+1} - x_j\Vert ^2 +(\alpha \tau L\beta + \beta ^2) \Vert x_{k}-x_{k-1}\Vert ^2, \end{aligned}$$

(31)

and

$$\begin{aligned}&\Vert x_{k+1}-x_k\Vert \Vert \alpha (\nabla f(x_k) - g_k )+ \beta (x_{k}-x_{k-1}) \Vert \nonumber \\&\le \Vert x_{k+1}-x_k\Vert \left( \alpha L \sum \limits _{j=(k-\tau )_+}^{k-1} \Vert x_{j+1} - x_j\Vert + \beta \Vert x_{k}-x_{k-1}\Vert \right) \nonumber \\&\le \frac{ \alpha L}{2}\sum \limits _{j=(k-\tau )_+}^{k-1} \Vert x_{j+1} - x_j\Vert ^2 +\frac{\tau \alpha L + \beta }{2}\Vert x_{k+1}-x_{k}\Vert ^2+ \frac{\beta }{2} \Vert x_{k}-x_{k-1}\Vert ^2. \end{aligned}$$

(32)

Put (31), (32), and (30) together to yield

$$\begin{aligned} -\Vert x_{k+1} - x_k\Vert ^2&\le -\frac{\alpha ^2\sigma }{2}F_{k+1} +\left[ \frac{\alpha \sigma }{2}\left( \frac{\alpha L}{2} -1\right) + \left( 2 + \frac{\alpha \sigma }{2}\right) \frac{\tau \alpha L+\beta }{2} \right] \Vert x_{k+1}- x_k\Vert ^2 \\&\quad + \alpha L \left[ \tau \alpha L + \beta + \left( 2 + \frac{\alpha \sigma }{2}\right) \frac{1}{2} \right] \sum \limits _{j=(k-\tau )_+}^{k-1} \Vert x_{j+1} - x_j\Vert ^2 \\&\quad + \beta \left[ \left( 2 + \frac{\alpha \sigma }{2}\right) \frac{1}{2} + \alpha \tau L +\beta \right] \Vert x_{k}-x_{k-1} \Vert ^2, \forall k \ge 0. \end{aligned}$$

Since $\alpha \le \frac{1}{(\tau +1)L}, 0 \le \beta < 1$, we have the following bounds

$$\begin{aligned}&\frac{\alpha \sigma }{2}\left( \frac{\alpha L}{2} -1\right) + \left( 2 + \frac{\alpha \sigma }{2}\right) \frac{\tau \alpha L+\beta }{2} \le \frac{\alpha \sigma }{2}\frac{\beta -1}{2}+ \tau \alpha L +\beta \le \beta + 1, \\&\alpha L\left[ \tau \alpha L + \beta + (2+\frac{\alpha \sigma }{2})\frac{1}{2} \right] \le \alpha L \left( 1- \alpha L + \beta + 1 +\frac{\alpha L}{4}\right) \le \alpha L ( \beta + 2 ), \\&\left( 2 + \frac{ \alpha L}{2} \right) \frac{\beta }{2} + \tau \alpha L \beta +\beta ^2 \le \beta ( \beta + 2). \end{aligned}$$

Hence, for $ k \ge 0$ it holds that

$$\begin{aligned} -\Vert x_{k+1} - x_k\Vert ^2 \le&-\frac{\sigma }{2(2+\beta )}\alpha ^2 F_{k+1} + \alpha L\sum \limits _{j=(k-\tau )_+}^{k-1}\Vert x_{j+1} - x_j\Vert ^2 +\beta \Vert x_k-x_{k-1}\Vert ^2. \end{aligned}$$

This completes the proof.

Proof of Lemma 4. Since each component function $f_n(x)$ is convex with $L_n$-continuous gradient, we derive that

$$\begin{aligned} f_n(x_{k+1})&\leqslant f_n(x_{k-\tau _k^n}) + \langle \bigtriangledown f_n(x_{k-\tau _k^n}), x_{k+1}- x_{k-\tau _k^n} \rangle + \frac{L_n}{2} \Vert x_{k+1}- x_{k-\tau _k^n}\Vert ^2 \nonumber \\&\leqslant f_n(x) + \langle \bigtriangledown f_n(x_{k-\tau _k^n}), x_{k+1}- x \rangle + \frac{L_n}{2} \Vert x_{k+1}- x_{k-\tau _k^n}\Vert ^2, n=1,\cdots , N, \end{aligned}$$

(33)

where the second inequality follows from the convexity of $f_n(x)$. By summing (33) over all components functions and using the expression of $g_k$, we have

$$\begin{aligned} f(x_{k+1}) \le f(x) + \langle g_k, x_{k+1}- x \rangle + \sum _{n=1}^N \frac{L_n}{2} \Vert x_{k+1}- x_{k-\tau _k^n}\Vert ^2, \forall k \ge 0. \end{aligned}$$

(34)

Note that $x_{k+1}$ is the minimizer of the $\frac{1}{\alpha }$-strongly convex function:

$$\begin{aligned} z \mapsto h(z) + \frac{1}{2\alpha }\Vert z-(x_k+\beta (x_k-x_{k-1}) -\alpha g_k)\Vert ^2, \forall k \ge 0. \end{aligned}$$

Then, we have

$$\begin{aligned}&h(x_{k+1}) + \frac{1}{2\alpha }\Vert x_{k+1}-\left( x_k+\beta (x_k-x_{k-1}) -\alpha g_k\right) \Vert ^2 \nonumber \\ \le&h(x) + \frac{1}{2\alpha }\Vert x-(x_k+\beta (x_k-x_{k-1})-\alpha g_k)\Vert ^2-\frac{1}{2\alpha }\Vert x-x_{k+1}\Vert ^2. \end{aligned}$$

(35)

After rearranging the terms of (35), we further get

$$\begin{aligned} \langle x_{k+1}-x, g_k \rangle \le&h(x) - h(x_{k+1})+ \frac{1}{2\alpha }\Vert x-x_k-\beta (x_k-x_{k-1})\Vert ^2 -\frac{1}{2\alpha }\Vert x_{k+1}-x\Vert ^2 \nonumber \\&~~~~-\frac{1}{2\alpha }\Vert x_{k+1}-x_k-\beta (x_k-x_{k-1})\Vert ^2, \forall k \ge 0, \forall x \in {\mathbb {R}}^d. \end{aligned}$$

(36)

By combining (34) and (36), we derive that

$$\begin{aligned} F(x_{k+1})&\le F(x) + \frac{1}{2\alpha }\Vert x- x_k-\beta (x_k-x_{k-1})\Vert ^2 - \frac{1}{2\alpha }\Vert x_{k+1}-x\Vert ^2 \nonumber \\&~~~~~~~~-\frac{1}{2\alpha }\Vert x_{k+1}-x_k-\beta (x_k-x_{k-1})\Vert ^2 + \sum _{n=1}^N \frac{L_n}{2} \Vert x_{k+1}- x_{k-\tau _k^n}\Vert ^2 \nonumber \\&= F(x) + \frac{1}{2\alpha }\Vert x- x_k\Vert ^2 -\frac{1}{2\alpha }\Vert x_{k+1}-x_{k}\Vert ^2 + \frac{\beta }{\alpha } \langle x_{k+1}-x, x_k-x_{k-1} \rangle \nonumber \\&~~~~~~~~ - \frac{1}{2\alpha }\Vert x_{k+1}-x\Vert ^2 + \sum _{n=1}^N \frac{L_n}{2} \Vert x_{k+1}- x_{k-\tau _k^n}\Vert ^2 \nonumber \\&\le F(x) + \frac{1}{2\alpha }\Vert x- x_{k}\Vert ^2 - \frac{1}{2\alpha }\Vert x_{k+1}-x_k\Vert ^2 - \frac{1}{2\alpha }\Vert x_{k+1}-x\Vert ^2 \nonumber \\&~~~~~~~~+ \frac{\beta }{2\alpha }\left( \Vert x-x_{k+1}\Vert ^2+\Vert x_{k}-x_{k-1}\Vert ^2\right) + \sum _{n=1}^N \frac{L_n}{2} \Vert x_{k+1}- x_{k-\tau _k^n}\Vert ^2. \end{aligned}$$

(37)

According to the Jensen inequality, we obtain $\sum _{n=1}^N\frac{L_n}{2}\Vert x_{k+1}-x_{k-\tau _k^n}\Vert _2^2 = \sum _{n=1}^N\frac{L_n}{2}\Vert x_{k+1}-x_{k}+ \cdots +x_{k+1-\tau _k^n} -x_{k-\tau _k^n}\Vert ^2 \le \frac{L(\tau +1)}{2}\sum _{j=(k-\tau )_+}^k\Vert x_{j+1}-x_{j}\Vert ^2$. Therefore, the desired inequality follows. This completes the proof.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Zhang, X., Peng, W. & Zhang, H. Inertial proximal incremental aggregated gradient method with linear convergence guarantees. Math Meth Oper Res 96, 187–213 (2022). https://doi.org/10.1007/s00186-022-00790-0

Download citation

Received: 22 April 2021
Revised: 20 May 2022
Accepted: 23 May 2022
Published: 25 June 2022
Issue Date: October 2022
DOI: https://doi.org/10.1007/s00186-022-00790-0

Keywords

Mathematics Subject Classification

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Inertial proximal incremental aggregated gradient method with linear convergence guarantees

Abstract

Access this article

Similar content being viewed by others

Nonconvex Proximal Incremental Aggregated Gradient Method with Linear Convergence

Proximal-like incremental aggregated gradient method with Bregman distance in weakly convex optimization problems

Inertial self-adaptive algorithms for solving non-smooth convex optimization problems

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Appendix: Proofs of Theorems and Lemmas

Rights and permissions

About this article

Cite this article

Keywords

Mathematics Subject Classification

Navigation

Inertial proximal incremental aggregated gradient method with linear convergence guarantees

Abstract

Access this article

Similar content being viewed by others

Nonconvex Proximal Incremental Aggregated Gradient Method with Linear Convergence

Proximal-like incremental aggregated gradient method with Bregman distance in weakly convex optimization problems

Inertial self-adaptive algorithms for solving non-smooth convex optimization problems

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Appendix: Proofs of Theorems and Lemmas

Appendix: Proofs of Theorems and Lemmas

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Mathematics Subject Classification

Search

Navigation