Skip to main content
Log in

A nonmonotone accelerated proximal gradient method with variable stepsize strategy for nonsmooth and nonconvex minimization problems

  • Published:
Journal of Global Optimization Aims and scope Submit manuscript

Abstract

In this paper, we consider the problem that minimizing the sum of a nonsmooth function with a smooth one in the nonconvex setting, which arising in many contemporary applications such as machine learning, statistics, and signal/image processing. To solve this problem, we propose a new nonmonotone accelerated proximal gradient method with variable stepsize strategy. Note that incorporating inertial term into proximal gradient method is a simple and efficient acceleration technique, while the descent property of the proximal gradient algorithm will lost. In our algorithm, the iterates generated by inertial proximal gradient scheme are accepted when the objective function values decrease or increase appropriately; otherwise, the iteration point is generated by proximal gradient scheme, which makes the function values on a subset of iterates are decreasing. We also introduce a variable stepsize strategy, which does not need a line search or does not need to know the Lipschitz constant and makes the algorithm easy to implement. We show that the sequence of iterates generated by the algorithm converges to a critical point of the objective function. Further, under the assumption that the objective function satisfies the Kurdyka–Łojasiewicz inequality, we prove the convergence rates of the objective function values and the iterates. Moreover, numerical results on both convex and nonconvex problems are reported to demonstrate the effectiveness and superiority of the proposed method and stepsize strategy.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Algorithm 1
Fig. 1
Fig. 2
Fig. 3
Fig. 4

Similar content being viewed by others

Data availability

Data sharing is not applicable to this article as no new data were created or analyzed in this study.

References

  1. Attouch, H., Bolte, J.: On the convergence of the proximal algorithm for nonsmooth functions involving analytic features. Math. Program. Ser. B 116(1–2), 5–16 (2009)

    Article  MathSciNet  Google Scholar 

  2. Attouch, H., Bolte, J., Redont, P., Soubeyran, A.: Proximal alternating minimization and projection methods for non-convex problems: an approach based on the Kurdyka–Łojasiewicz inequality. Math. Oper. Res. 35(2), 438–457 (2010)

    Article  MathSciNet  Google Scholar 

  3. Attouch, H., Bolte, J., Svaiter, B.F.: Convergence of descent methods for semi-algebraic and tame problems: proximal algorithms, forward–backward splitting, and regularized Gauss–Seidel methods. Math. Program. 137(1–2), 91–129 (2013)

    Article  MathSciNet  Google Scholar 

  4. Attouch, H., Peypouquet, J.: The rate of convergence of Nesterov’s accelerated forward–backward method is actually faster than \( {\frac{1}{{{k^2}}}} \). SIAM J. Optim. 26, 1824–1834 (2016)

    Article  MathSciNet  Google Scholar 

  5. Attouch, H., Cabot, A.: Convergence rates of inertial forward–backward algorithms. SIAM J. Optim. 28, 849–874 (2018)

    Article  MathSciNet  Google Scholar 

  6. Apidopoulos, V., Aujol, J., Dossal, C.: Convergence rate of inertial forward–backward algorithm beyond Nesterov’s rule. Math. Program. 180, 137–156 (2020)

    Article  MathSciNet  Google Scholar 

  7. Apidopoulos, V., Aujol, J., Dossal, C., et al.: Convergence rates of an inertial gradient descent algorithm under growth and flatness conditions. Math. Program. (2020). https://doi.org/10.1007/s10107-020-01476-3

    Article  Google Scholar 

  8. Aujol, J.F., Dossal, C.: Stability of over-relaxations for the forward–backward algorithm, application to FISTA. SIAM J. Optim. 25(4), 2408–2433 (2015)

    Article  MathSciNet  Google Scholar 

  9. Bolte, J., Daniilidis, A., Lewis, A.: The Łojasiewicz inequality for nonsmooth subanalytic functions with applications to subgradient dynamical systems. SIAM J. Optim. 17(4), 1205–1223 (2006)

    Article  MathSciNet  Google Scholar 

  10. Beck, A., Teboulle, M.: A fast iterative shrinkage-thresholding algorithm for linear inverse problems. SIAM J. Imaging Sci. 2(1), 183–202 (2009)

    Article  MathSciNet  Google Scholar 

  11. Beck, A., Teboulle, M.: Fast gradient-based algorithms for constrained total variation image denoising and deblurring problems. IEEE Trans. Image Process. 18, 2419–2434 (2009)

    Article  MathSciNet  Google Scholar 

  12. Bolte, J., Daniilidis, A., Ley, O., Mazet, L.: Characterizations of Łojasiewicz inequalities: subgradient flows, talweg, convexity. Trans. Am. Math. Soc. 362(6), 3319–3363 (2010)

    Article  Google Scholar 

  13. Bauschke, H.H., Combettes, P.L.: Convex Analysis and Monotone Operator Theory in Hilbert Spaces. Springer, Berlin (2011)

    Book  Google Scholar 

  14. Bolte, J., Sabach, S., Teboulle, M.: Proximal alternating linearized minimization for nonconvex and nonsmooth problems. Math. Program. 146(1–2), 459–494 (2014)

    Article  MathSciNet  Google Scholar 

  15. Bolte, J., Nguyen, T.P., Peypouquet, J., Suter, B.W.: From error bounds to the complexity of first-order descent methods for convex functions. Math. Program. 165, 1–37 (2015)

    MathSciNet  Google Scholar 

  16. Bot, R.I., Csetnek, E.R., László, S.C.: An inertial forward–backward algorithm for the minimization of the sum of two nonconvex functions. EURO J. Comput. Optim. 4, 3–25 (2016)

    Article  MathSciNet  Google Scholar 

  17. Bot, R.I., Csetnek, E.R., László, S.C.: Approaching nonsmooth non-convex minimization through second-order proximal-gradient dynamical systems. J. Evol. Equ. 18(3), 1291–1318 (2018)

    Article  MathSciNet  Google Scholar 

  18. Bauschke, H.H., Bui, M.N., Wang, X.: Applying FISTA to optimization problems (with or) without minimizers. Math. Program. 184, 349–381 (2020)

    Article  MathSciNet  Google Scholar 

  19. Bello-Cruz, Y., Gonçalves, M.L.N., Krislock, N.: On inexact accelerated proximal gradient methods with relative error rules. arXiv preprint arXiv:2005.03766 (2020)

  20. Chen, G.H.G., Rockafellar, R.T.: Convergence rates in forward–backward splitting. SIAM J. Optim. 7(2), 421–444 (1997)

    Article  MathSciNet  Google Scholar 

  21. Combettes, P.L., Pesquet, J.C.: Proximal splitting methods in signal processing. In: Fixed-Point Algorithms for Inverse Problems in Science and Engineering, pp. 185–212. Springer, New York (2011)

  22. Chambolle, A., Dossal, C.: On the convergence of the iterates of the “fast iterative shrinkage-thresholding algorithm. J. Optim. Theory Appl. 166, 968–982 (2015)

    Article  MathSciNet  Google Scholar 

  23. Daubechies, I., Defrise, M., De Mol, C.: An iterative thresholding algorithm for linear inverse problems with a sparsity constraint. Commun. Pure Appl. Math. 57(11), 1413–1457 (2004)

    Article  MathSciNet  Google Scholar 

  24. Donoho, D.L.: Compressed sensing. IEEE Trans. Inf. Theory 52(44), 1289–1306 (2006)

    Article  MathSciNet  Google Scholar 

  25. Donghwan, K., Jeffrey, A.F.: Another look at the fast iterative shrinkage/thresholding algorithm (FISTA). SIAM J. Optim. 28, 223–250 (2018)

    Article  MathSciNet  Google Scholar 

  26. Fan, J., Li, R.: Variable selection via nonconcave penalized likelihood and its oracle properties. J. Am. Stat. Assoc. 96, 1348–1360 (2001)

    Article  MathSciNet  Google Scholar 

  27. Johnstone, P.R., Moulin, P.: Convergence rates of inertial splitting schemes for nonconvex composite optimization. In: 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), IEEE, pp. 4716–4720 (2017)

  28. Kurdyka, K.: On gradients of functions definable in o-minimal structures. Ann. I. Fourier 48(3), 769–783 (1998)

    Article  MathSciNet  Google Scholar 

  29. Liu, H.W., Wang, T., Liu, Z.X.: Convergence rate of inertial forward–backward algorithms based on the local error bound condition. IMA J. Numer. Anal. (2023). https://doi.org/10.1093/imanum/drad031

    Article  Google Scholar 

  30. Liu, H.W., Wang, T., Liu, Z.X.: Some modified fast iteration shrinkage thresholding algorithms with a new adaptive non-monotone step size strategy for nonsmooth and convex minimization problems. Comput. Optim. Appl. 83, 651–691 (2022)

    Article  MathSciNet  Google Scholar 

  31. Li, H., Lin, Z.: Accelerated proximal gradient methods for nonconvex programming. In: Proceedings of NeurIPS, pp. 379–387 (2015)

  32. Liang, J., Fadili, J., Peyré, G.: Activity identification and local linear convergence of forward–backward-type methods. SIAM J. Optim. 27, 408–437 (2017)

    Article  MathSciNet  Google Scholar 

  33. László, S.C.: Forward–backward algorithms with different inertial terms for structured non-convex minimization problems. arXiv preprint arXiv:2002.07154 (2020)

  34. Mukkamala, M.C., Ochs, P., Pock, T., et al.: Convex–concave backtracking for inertial Bregman proximal gradient algorithms in nonconvex optimization. SIAM J. Math Data Sci. 2(3), 658–682 (2020)

    Article  MathSciNet  Google Scholar 

  35. Maingé, P.E., Gobinddass, M.: Convergence of one-step projected gradient methods for variational inequalities. J. Optim. Theory Appl. 171(1), 146–168 (2016)

    Article  MathSciNet  Google Scholar 

  36. Ochs, P., Chen, Y., Brox, T., Pock, T.: iPiano: inertial proximal algorithm for nonconvex optimization. SIAM J. Imaging Sci. 7(2), 1388–1419 (2014)

    Article  MathSciNet  Google Scholar 

  37. O’Donoghue, B., Candès, E.: Adaptive restart for accelerated gradient schemes. Found. Comput. Math. 15, 715–732 (2015)

    Article  MathSciNet  Google Scholar 

  38. Pock, T., Sabach, S.: Inertial proximal alternating linearized minimization (iPALM) for nonconvex and nonsmooth problems. SIAM J. Imaging Sci. 9(4), 1756–1787 (2016)

    Article  MathSciNet  Google Scholar 

  39. Parikh, N., Boyd, S.: Proximal algorithms. Found. Trends Optim. 1(3), 127–239 (2014)

    Article  Google Scholar 

  40. Palomar, D.P., Eldar, Y.C.: Convex Optimization in Signal Processing and Communications. Cambridge University Press, Cambridge (2010)

    Google Scholar 

  41. Rockafellar, R.T., Wets, R.J.B.: Variational Analysis. Springer, Berlin (1998)

    Book  Google Scholar 

  42. Sra, S., Nowozin, S., Wright, S.J.: Optimization for Machine Learning. MIT Press, Cambridge (2012)

    Google Scholar 

  43. Su, W., Boyd, S., Candes, E.J.: A differential equation for modeling Nesterov’s accelerated gradient method: theory and insights. J. Mach. Learn. Res. 17, 1–43 (2016)

    MathSciNet  Google Scholar 

  44. Wang, T., Liu, H.W.: Convergence results of a new monotone inertial forward–backward splitting algorithm under the local Hölder error bound condition. Appl. Math. Optim. (2022). https://doi.org/10.1007/s00245-022-09859-y

    Article  Google Scholar 

  45. Wen, B., Chen, X.J., Pong, T.K.: Linear convergence of proximal gradient algorithm with extrapolation for a class of nonconvex nonsmooth minimization problems. SIAM J. Optim. 27, 124–145 (2017)

    Article  MathSciNet  Google Scholar 

  46. Wu, Z.M., Li, C.S., Li, M., Lim, A.: Inertial proximal gradient methods with Bregman regularization for a class of nonconvex optimization problems. J. Global. Optim. (2021). https://doi.org/10.1007/s10898-020-00943-7

    Article  MathSciNet  Google Scholar 

  47. Wu, Z., Li, M.: General inertial proximal gradient method for a class of nonconvex nonsmooth optimization problems. Comput. Optim. Appl. 73, 129–158 (2019)

    Article  MathSciNet  Google Scholar 

  48. Xu, F.M., Lu, Z.S., Xu, Z.B.: An efficient optimization approach for a cardinality-constrained index tracking problem. Optim. Method Softw. 31(2), 258–271 (2016)

    Article  MathSciNet  Google Scholar 

  49. Yang, L.: Proximal gradient method with extrapolation and line search for a class of nonconvex and nonsmooth problems. arXiv preprint arXiv:1711.06831 (2017)

  50. Zeng, L.M., Xie, J.: Group variable selection via SCAD-l2. Statistics 48, 49–66 (2014)

    Article  MathSciNet  Google Scholar 

Download references

Acknowledgements

This research is supported by National Science Foundation of China (No. 12261019), Shaanxi Provincial Science and Technology Projects(No. 2024JC-YBQN-0048) and Guizhou Provincial Science and Technology Projects (No. QKHJC-ZK[2022]YB084).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Ting Wang.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendices

Appendix A: Proof of Lemma 3.1

Proof

By the adaptive non-monotone stepsize strategy, if \({\lambda _{k + 1}}\) is generated by (7), then,

$$\begin{aligned}{\lambda _{k + 1}} - {\lambda _k} < 0 \le E\left( k \right) ;\end{aligned}$$

otherwise,

$$\begin{aligned}{\lambda _{k + 1}} - {\lambda _k} \le E\left( k \right) .\end{aligned}$$

Hence, for any \(i \ge 1\) we have

$$\begin{aligned} {\lambda _{i + 1}} - {\lambda _i} \le \textrm{E}\left( i \right) . \end{aligned}$$
(88)

Denote that

$$\begin{aligned} {\lambda _{i + 1}} - {\lambda _i} = {\left( {{\lambda _{i + 1}} - {\lambda _i}} \right) ^ + } - {\left( {{\lambda _{i + 1}} - {\lambda _i}} \right) ^ - },\mathrm{{where}}{\left( \cdot \right) ^ + } = \max \{ 0, \cdot \} ,{\left( \cdot \right) ^ - } = - \min \{ 0, \cdot \} , \end{aligned}$$
(89)

we have

$$\begin{aligned} {\left( {{\lambda _{i + 1}} - {\lambda _i}} \right) ^ + } \le \textrm{E}\left( i \right) ,\forall i = 1,2, \ldots , \end{aligned}$$
(90)

which implies that \(\sum \nolimits _{i = 1}^\infty {{{\left( {{\lambda _{i + 1}} - {\lambda _i}} \right) }^ + }} \) is convergent from the fact that \(\sum \nolimits _{i = 1}^\infty {\textrm{E}\left( i \right) } \) is a convergent positive series.

The convergence of \(\sum \nolimits _{i = 1}^\infty {{{\left( {{\lambda _{i + 1}} - {\lambda _i}} \right) }^ - }} \) also can be proved as follows.

Assume by contradiction that \(\sum \nolimits _{i = 1}^\infty {{{\left( {{\lambda _{i + 1}} - {\lambda _i}} \right) }^ - }} \mathrm{{ = + }}\infty .\) Based on the convergence of \(\sum \nolimits _{i = 1}^\infty {{{\left( {{\lambda _{i + 1}} - {\lambda _i}} \right) }^ + }} \) and the equality

$$\begin{aligned} {\lambda _{k + 1}} - {\lambda _1}\mathrm{{ = }}\sum \limits _{i = 1}^k {\left( {{\lambda _{i + 1}} - {\lambda _i}} \right) = \sum \limits _{i = 1}^k {{{\left( {{\lambda _{i + 1}} - {\lambda _i}} \right) }^ + }} - \sum \limits _{i = 1}^k {{{\left( {{\lambda _{i + 1}} - {\lambda _i}} \right) }^ - }} }. \end{aligned}$$
(91)

We can easily deduce \(\mathop {\lim }\nolimits _{k \rightarrow \infty } {\lambda _k} = - \infty ,\) which is a contradiction with \({\lambda _{k}} > 0,\) \(\forall k \ge 1.\) Therefore, \(\sum \nolimits _{i = 1}^\infty {{{\left( {{\lambda _{i + 1}} - {\lambda _i}} \right) }^ - }} \) is a convergent series. Then, in view of (91), we obtain the sequence \(\left\{ {{\lambda _k}} \right\} \) is convergent.

We can easily prove that \(\forall k \ge 1,\) \( {\lambda _k} \ge \min \left\{ {{\lambda _1},\frac{{{\mu _1}}}{{{L_f}}}} \right\} \) holds by induction. \(\square \)

Appendix B: Proof of Lemma 3.2

Proof

Suppose that the conclusion is not true, there exists a \(\left\{ {{k_j}} \right\} \) with \({k_j} \rightarrow \infty \) such that

$$\begin{aligned} 2\left( {f\left( {\varvec{x_{{k_j}}}} \right) - f\left( {\varvec{y_{{k_j}}}} \right) - \left\langle {\nabla f\left( {\varvec{y_{{k_j}}}} \right) ,\varvec{x_{{k_j}}} - \varvec{y_{{k_j}}}} \right\rangle } \right) > \frac{{{\mu _0}}}{{{\lambda _{{k_j}}}}}{\left\| {\varvec{x_{{k_j}}} - \varvec{y_{{k_j}}}} \right\| ^2} \end{aligned}$$
(92)

holds. Then, based on the scheme of adaptive nonmonotone stepsize, we have

$$\begin{aligned} {\lambda _{{k_j} + 1}} = \frac{{{\mu _1} \cdot {{\left\| {\varvec{x_{{k_j}}} - \varvec{y_{{k_j}}}} \right\| }^2}}}{{2\left( {f\left( {\varvec{x_{{k_j}}}} \right) - f\left( {\varvec{y_{{k_j}}}} \right) - \left\langle {\nabla f\left( {\varvec{y_{{k_j}}}} \right) ,\varvec{x_{{k_j}}} - \varvec{y_{{k_j}}}} \right\rangle } \right) }}. \end{aligned}$$
(93)

From the above two formulas, it is easy to obtain

$$\begin{aligned}&{\left\| {\varvec{x_{{k_j}}} - \varvec{y_{{k_j}}}} \right\| ^2} < \frac{{2{\lambda _{{k_j}}}}}{{{\mu _0}}}\left( {f\left( {\varvec{x_{{k_j}}}} \right) - f\left( {\varvec{y_{{k_j}}}} \right) - \left\langle {\nabla f\left( {\varvec{y_{{k_j}}}} \right) ,\varvec{x_{{k_j}}} - \varvec{y_{{k_j}}}} \right\rangle } \right) \nonumber \\&\quad = \frac{{{\mu _1}{\lambda _{{k_j}}}}}{{{\mu _0}{\lambda _{{k_j} + 1}}}}{\left\| {\varvec{x_{{k_j}}} - \varvec{y_{{k_j}}}} \right\| ^2}. \end{aligned}$$
(94)

Based on the facts that \({\mu _1} < {\mu _0}\) and the sequence \(\left\{ {{\lambda _k}} \right\} \) is convergent, we can get

$$\begin{aligned} \frac{{{\mu _1}{\lambda _{{k_j}}}}}{{{\mu _0}{\lambda _{{k_j} + 1}}}} \rightarrow \frac{{{\mu _1}}}{{{\mu _0}}} < 1, \end{aligned}$$
(95)

which contradicts with (94). Therefore, (15) will holds constantly after a finite step \({\hat{k}}.\) \(\square \)

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Liu, H., Wang, T. & Liu, Z. A nonmonotone accelerated proximal gradient method with variable stepsize strategy for nonsmooth and nonconvex minimization problems. J Glob Optim 89, 863–897 (2024). https://doi.org/10.1007/s10898-024-01366-4

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10898-024-01366-4

Keywords

Mathematics Subject Classification