Abstract
In this paper, we consider the problem that minimizing the sum of a nonsmooth function with a smooth one in the nonconvex setting, which arising in many contemporary applications such as machine learning, statistics, and signal/image processing. To solve this problem, we propose a new nonmonotone accelerated proximal gradient method with variable stepsize strategy. Note that incorporating inertial term into proximal gradient method is a simple and efficient acceleration technique, while the descent property of the proximal gradient algorithm will lost. In our algorithm, the iterates generated by inertial proximal gradient scheme are accepted when the objective function values decrease or increase appropriately; otherwise, the iteration point is generated by proximal gradient scheme, which makes the function values on a subset of iterates are decreasing. We also introduce a variable stepsize strategy, which does not need a line search or does not need to know the Lipschitz constant and makes the algorithm easy to implement. We show that the sequence of iterates generated by the algorithm converges to a critical point of the objective function. Further, under the assumption that the objective function satisfies the Kurdyka–Łojasiewicz inequality, we prove the convergence rates of the objective function values and the iterates. Moreover, numerical results on both convex and nonconvex problems are reported to demonstrate the effectiveness and superiority of the proposed method and stepsize strategy.





Similar content being viewed by others
Data availability
Data sharing is not applicable to this article as no new data were created or analyzed in this study.
References
Attouch, H., Bolte, J.: On the convergence of the proximal algorithm for nonsmooth functions involving analytic features. Math. Program. Ser. B 116(1–2), 5–16 (2009)
Attouch, H., Bolte, J., Redont, P., Soubeyran, A.: Proximal alternating minimization and projection methods for non-convex problems: an approach based on the Kurdyka–Łojasiewicz inequality. Math. Oper. Res. 35(2), 438–457 (2010)
Attouch, H., Bolte, J., Svaiter, B.F.: Convergence of descent methods for semi-algebraic and tame problems: proximal algorithms, forward–backward splitting, and regularized Gauss–Seidel methods. Math. Program. 137(1–2), 91–129 (2013)
Attouch, H., Peypouquet, J.: The rate of convergence of Nesterov’s accelerated forward–backward method is actually faster than \( {\frac{1}{{{k^2}}}} \). SIAM J. Optim. 26, 1824–1834 (2016)
Attouch, H., Cabot, A.: Convergence rates of inertial forward–backward algorithms. SIAM J. Optim. 28, 849–874 (2018)
Apidopoulos, V., Aujol, J., Dossal, C.: Convergence rate of inertial forward–backward algorithm beyond Nesterov’s rule. Math. Program. 180, 137–156 (2020)
Apidopoulos, V., Aujol, J., Dossal, C., et al.: Convergence rates of an inertial gradient descent algorithm under growth and flatness conditions. Math. Program. (2020). https://doi.org/10.1007/s10107-020-01476-3
Aujol, J.F., Dossal, C.: Stability of over-relaxations for the forward–backward algorithm, application to FISTA. SIAM J. Optim. 25(4), 2408–2433 (2015)
Bolte, J., Daniilidis, A., Lewis, A.: The Łojasiewicz inequality for nonsmooth subanalytic functions with applications to subgradient dynamical systems. SIAM J. Optim. 17(4), 1205–1223 (2006)
Beck, A., Teboulle, M.: A fast iterative shrinkage-thresholding algorithm for linear inverse problems. SIAM J. Imaging Sci. 2(1), 183–202 (2009)
Beck, A., Teboulle, M.: Fast gradient-based algorithms for constrained total variation image denoising and deblurring problems. IEEE Trans. Image Process. 18, 2419–2434 (2009)
Bolte, J., Daniilidis, A., Ley, O., Mazet, L.: Characterizations of Łojasiewicz inequalities: subgradient flows, talweg, convexity. Trans. Am. Math. Soc. 362(6), 3319–3363 (2010)
Bauschke, H.H., Combettes, P.L.: Convex Analysis and Monotone Operator Theory in Hilbert Spaces. Springer, Berlin (2011)
Bolte, J., Sabach, S., Teboulle, M.: Proximal alternating linearized minimization for nonconvex and nonsmooth problems. Math. Program. 146(1–2), 459–494 (2014)
Bolte, J., Nguyen, T.P., Peypouquet, J., Suter, B.W.: From error bounds to the complexity of first-order descent methods for convex functions. Math. Program. 165, 1–37 (2015)
Bot, R.I., Csetnek, E.R., László, S.C.: An inertial forward–backward algorithm for the minimization of the sum of two nonconvex functions. EURO J. Comput. Optim. 4, 3–25 (2016)
Bot, R.I., Csetnek, E.R., László, S.C.: Approaching nonsmooth non-convex minimization through second-order proximal-gradient dynamical systems. J. Evol. Equ. 18(3), 1291–1318 (2018)
Bauschke, H.H., Bui, M.N., Wang, X.: Applying FISTA to optimization problems (with or) without minimizers. Math. Program. 184, 349–381 (2020)
Bello-Cruz, Y., Gonçalves, M.L.N., Krislock, N.: On inexact accelerated proximal gradient methods with relative error rules. arXiv preprint arXiv:2005.03766 (2020)
Chen, G.H.G., Rockafellar, R.T.: Convergence rates in forward–backward splitting. SIAM J. Optim. 7(2), 421–444 (1997)
Combettes, P.L., Pesquet, J.C.: Proximal splitting methods in signal processing. In: Fixed-Point Algorithms for Inverse Problems in Science and Engineering, pp. 185–212. Springer, New York (2011)
Chambolle, A., Dossal, C.: On the convergence of the iterates of the “fast iterative shrinkage-thresholding algorithm. J. Optim. Theory Appl. 166, 968–982 (2015)
Daubechies, I., Defrise, M., De Mol, C.: An iterative thresholding algorithm for linear inverse problems with a sparsity constraint. Commun. Pure Appl. Math. 57(11), 1413–1457 (2004)
Donoho, D.L.: Compressed sensing. IEEE Trans. Inf. Theory 52(44), 1289–1306 (2006)
Donghwan, K., Jeffrey, A.F.: Another look at the fast iterative shrinkage/thresholding algorithm (FISTA). SIAM J. Optim. 28, 223–250 (2018)
Fan, J., Li, R.: Variable selection via nonconcave penalized likelihood and its oracle properties. J. Am. Stat. Assoc. 96, 1348–1360 (2001)
Johnstone, P.R., Moulin, P.: Convergence rates of inertial splitting schemes for nonconvex composite optimization. In: 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), IEEE, pp. 4716–4720 (2017)
Kurdyka, K.: On gradients of functions definable in o-minimal structures. Ann. I. Fourier 48(3), 769–783 (1998)
Liu, H.W., Wang, T., Liu, Z.X.: Convergence rate of inertial forward–backward algorithms based on the local error bound condition. IMA J. Numer. Anal. (2023). https://doi.org/10.1093/imanum/drad031
Liu, H.W., Wang, T., Liu, Z.X.: Some modified fast iteration shrinkage thresholding algorithms with a new adaptive non-monotone step size strategy for nonsmooth and convex minimization problems. Comput. Optim. Appl. 83, 651–691 (2022)
Li, H., Lin, Z.: Accelerated proximal gradient methods for nonconvex programming. In: Proceedings of NeurIPS, pp. 379–387 (2015)
Liang, J., Fadili, J., Peyré, G.: Activity identification and local linear convergence of forward–backward-type methods. SIAM J. Optim. 27, 408–437 (2017)
László, S.C.: Forward–backward algorithms with different inertial terms for structured non-convex minimization problems. arXiv preprint arXiv:2002.07154 (2020)
Mukkamala, M.C., Ochs, P., Pock, T., et al.: Convex–concave backtracking for inertial Bregman proximal gradient algorithms in nonconvex optimization. SIAM J. Math Data Sci. 2(3), 658–682 (2020)
Maingé, P.E., Gobinddass, M.: Convergence of one-step projected gradient methods for variational inequalities. J. Optim. Theory Appl. 171(1), 146–168 (2016)
Ochs, P., Chen, Y., Brox, T., Pock, T.: iPiano: inertial proximal algorithm for nonconvex optimization. SIAM J. Imaging Sci. 7(2), 1388–1419 (2014)
O’Donoghue, B., Candès, E.: Adaptive restart for accelerated gradient schemes. Found. Comput. Math. 15, 715–732 (2015)
Pock, T., Sabach, S.: Inertial proximal alternating linearized minimization (iPALM) for nonconvex and nonsmooth problems. SIAM J. Imaging Sci. 9(4), 1756–1787 (2016)
Parikh, N., Boyd, S.: Proximal algorithms. Found. Trends Optim. 1(3), 127–239 (2014)
Palomar, D.P., Eldar, Y.C.: Convex Optimization in Signal Processing and Communications. Cambridge University Press, Cambridge (2010)
Rockafellar, R.T., Wets, R.J.B.: Variational Analysis. Springer, Berlin (1998)
Sra, S., Nowozin, S., Wright, S.J.: Optimization for Machine Learning. MIT Press, Cambridge (2012)
Su, W., Boyd, S., Candes, E.J.: A differential equation for modeling Nesterov’s accelerated gradient method: theory and insights. J. Mach. Learn. Res. 17, 1–43 (2016)
Wang, T., Liu, H.W.: Convergence results of a new monotone inertial forward–backward splitting algorithm under the local Hölder error bound condition. Appl. Math. Optim. (2022). https://doi.org/10.1007/s00245-022-09859-y
Wen, B., Chen, X.J., Pong, T.K.: Linear convergence of proximal gradient algorithm with extrapolation for a class of nonconvex nonsmooth minimization problems. SIAM J. Optim. 27, 124–145 (2017)
Wu, Z.M., Li, C.S., Li, M., Lim, A.: Inertial proximal gradient methods with Bregman regularization for a class of nonconvex optimization problems. J. Global. Optim. (2021). https://doi.org/10.1007/s10898-020-00943-7
Wu, Z., Li, M.: General inertial proximal gradient method for a class of nonconvex nonsmooth optimization problems. Comput. Optim. Appl. 73, 129–158 (2019)
Xu, F.M., Lu, Z.S., Xu, Z.B.: An efficient optimization approach for a cardinality-constrained index tracking problem. Optim. Method Softw. 31(2), 258–271 (2016)
Yang, L.: Proximal gradient method with extrapolation and line search for a class of nonconvex and nonsmooth problems. arXiv preprint arXiv:1711.06831 (2017)
Zeng, L.M., Xie, J.: Group variable selection via SCAD-l2. Statistics 48, 49–66 (2014)
Acknowledgements
This research is supported by National Science Foundation of China (No. 12261019), Shaanxi Provincial Science and Technology Projects(No. 2024JC-YBQN-0048) and Guizhou Provincial Science and Technology Projects (No. QKHJC-ZK[2022]YB084).
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Appendices
Appendix A: Proof of Lemma 3.1
Proof
By the adaptive non-monotone stepsize strategy, if \({\lambda _{k + 1}}\) is generated by (7), then,
otherwise,
Hence, for any \(i \ge 1\) we have
Denote that
we have
which implies that \(\sum \nolimits _{i = 1}^\infty {{{\left( {{\lambda _{i + 1}} - {\lambda _i}} \right) }^ + }} \) is convergent from the fact that \(\sum \nolimits _{i = 1}^\infty {\textrm{E}\left( i \right) } \) is a convergent positive series.
The convergence of \(\sum \nolimits _{i = 1}^\infty {{{\left( {{\lambda _{i + 1}} - {\lambda _i}} \right) }^ - }} \) also can be proved as follows.
Assume by contradiction that \(\sum \nolimits _{i = 1}^\infty {{{\left( {{\lambda _{i + 1}} - {\lambda _i}} \right) }^ - }} \mathrm{{ = + }}\infty .\) Based on the convergence of \(\sum \nolimits _{i = 1}^\infty {{{\left( {{\lambda _{i + 1}} - {\lambda _i}} \right) }^ + }} \) and the equality
We can easily deduce \(\mathop {\lim }\nolimits _{k \rightarrow \infty } {\lambda _k} = - \infty ,\) which is a contradiction with \({\lambda _{k}} > 0,\) \(\forall k \ge 1.\) Therefore, \(\sum \nolimits _{i = 1}^\infty {{{\left( {{\lambda _{i + 1}} - {\lambda _i}} \right) }^ - }} \) is a convergent series. Then, in view of (91), we obtain the sequence \(\left\{ {{\lambda _k}} \right\} \) is convergent.
We can easily prove that \(\forall k \ge 1,\) \( {\lambda _k} \ge \min \left\{ {{\lambda _1},\frac{{{\mu _1}}}{{{L_f}}}} \right\} \) holds by induction. \(\square \)
Appendix B: Proof of Lemma 3.2
Proof
Suppose that the conclusion is not true, there exists a \(\left\{ {{k_j}} \right\} \) with \({k_j} \rightarrow \infty \) such that
holds. Then, based on the scheme of adaptive nonmonotone stepsize, we have
From the above two formulas, it is easy to obtain
Based on the facts that \({\mu _1} < {\mu _0}\) and the sequence \(\left\{ {{\lambda _k}} \right\} \) is convergent, we can get
which contradicts with (94). Therefore, (15) will holds constantly after a finite step \({\hat{k}}.\) \(\square \)
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Liu, H., Wang, T. & Liu, Z. A nonmonotone accelerated proximal gradient method with variable stepsize strategy for nonsmooth and nonconvex minimization problems. J Glob Optim 89, 863–897 (2024). https://doi.org/10.1007/s10898-024-01366-4
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10898-024-01366-4
Keywords
- Nonconvex
- Nonsmooth
- Accelerated proximal gradient method
- Variable stepsize strategy
- Kurdyka–Łojasiewicz property
- Convergence