Skip to main content
Log in

Nested Alternating Minimization with FISTA for Non-convex and Non-smooth Optimization Problems

  • Published:
Journal of Optimization Theory and Applications Aims and scope Submit manuscript

Abstract

Motivated by a recent framework for proving global convergence to critical points of nested alternating minimization algorithms, which was proposed for the case of smooth subproblems, we first show here that non-smooth subproblems can also be handled within this framework. Specifically, we present a novel analysis of an optimization scheme that utilizes the FISTA method as a nested algorithm. We establish the global convergence of this nested scheme to critical points of non-convex and non-smooth optimization problems. In addition, we propose a hybrid framework that allows to implement FISTA when applicable, while still maintaining the global convergence result. The power of nested algorithms using FISTA in the non-convex and non-smooth setting is illustrated with some numerical experiments that show their superiority over existing methods.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Algorithm 1
Algorithm 2
Algorithm 3
Algorithm 4
Algorithm 5
Fig. 1
Fig. 2
Fig. 3
Fig. 4

Similar content being viewed by others

Data Availability

Data will be made available on request.

References

  1. Attouch, H., Bolte, J., Redont, P., Soubeyran, A.: Proximal alternating minimization and projection methods for nonconvex problems: An approach based on the Kurdyka-Łojasiewicz inequality. Math. Oper. Res. 35(2), 438–457 (2010)

    Article  MathSciNet  MATH  Google Scholar 

  2. Beck, A.: First-Order Methods in Optimization, vol. 25. SIAM (2017)

    Book  MATH  Google Scholar 

  3. Beck, A., Sabach, S., Teboulle, M.: An alternating semiproximal method for nonconvex regularized structured total least squares problems. SIAM J. Matrix Anal. Appl. 37(3), 1129–1150 (2016)

    Article  MathSciNet  MATH  Google Scholar 

  4. Beck, A., Teboulle, M.: A fast iterative shrinkage-thresholding algorithm for linear inverse problems. SIAM J. Imag. Sci. 2(1), 183–202 (2009)

    Article  MathSciNet  MATH  Google Scholar 

  5. Bolte, J., Sabach, S., Teboulle, M.: Proximal alternating linearized minimization for nonconvex and nonsmooth problems. Math. Program. 146(1–2), 459–494 (2014)

    Article  MathSciNet  MATH  Google Scholar 

  6. Bonettini, S., Prato, M., Rebegoldi, S.: A block coordinate variable metric linesearch based proximal gradient method. Comput. Optim. Appl. 71(1), 5–52 (2018)

    Article  MathSciNet  MATH  Google Scholar 

  7. Gan, J., Liu, T., Li, L., Zhang, J.: Non-negative matrix factorization: a survey. Comput. J. 64(7), 1080–1092 (2021)

    Article  MathSciNet  Google Scholar 

  8. Gorissen, B.L., Yanıkoğlu, İ, den Hertog, D.: A practical guide to robust optimization. Omega 53, 124–137 (2015)

    Article  Google Scholar 

  9. Groenen, P.J.F., van de Velden, M.: Multidimensional scaling by majorization: a review. J. Stat. Softw. 73, 1–26 (2016)

    Article  Google Scholar 

  10. Gur, E., Sabach, S., Shtern, S.: Alternating minimization based first-order method for the wireless sensor network localization problem. IEEE Trans. Signal Process. 68, 6418–6431 (2020)

    Article  MathSciNet  MATH  Google Scholar 

  11. Gur, E., Sabach, S., Shtern, S.: Convergent nested alternating minimization algorithms for nonconvex optimization problems. Math. Oper. Res., (2022)

  12. Gutjahr, W.J., Pichler, A.: Stochastic multi-objective optimization: a survey on non-scalarizing methods. Ann. Oper. Res. 236(2), 475–499 (2016)

    Article  MathSciNet  MATH  Google Scholar 

  13. Hansen, P.C., Nagy, J.G., O’leary, D.P.: Deblurring Images: Matrices, Spectra, and Filtering. SIAM, (2006)

  14. Jain, P., Kar, P.: Non-convex optimization for machine learning. Found. Trends® Mach. Learn. 10(3–4), 142–336 (2017)

    Article  MATH  Google Scholar 

  15. Kurdyka, K.: On gradients of functions definable in o-minimal structures. Ann. Inst. Fourier (Grenoble) 48(3), 769–783 (1998)

    Article  MathSciNet  MATH  Google Scholar 

  16. Łojasiewicz, S.: Une propriété topologique des sous-ensembles analytiques réels. In Les Équations aux Dérivées Partielles (Paris, 1962), pages 87–89. Éditions du Centre National de la Recherche Scientifique, Paris, (1963)

  17. Mohammadi, F.G., Amini, M.H., Arabnia, H.R.: Evolutionary computation, optimization, and learning algorithms for data science. In: Optimization, Learning, and Control for Interdependent Complex Networks, pages 37–65. Springer, (2020)

  18. Mordukhovich, B.S.: Variational Analysis and Generalized Differentiation I: Basic Theory, volume 330. Springer Science & Business Media, (2006)

  19. Nesterov, Y.E.: A method for solving the convex programming problem with convergence rate \(O(1/k^{2})\). Dokl. Akad. Nauk SSSR 269(3), 543–547 (1983)

    MathSciNet  Google Scholar 

  20. Ochs, P., Chen, Y., Brox, T., Pock, T.: iPiano: Inertial proximal algorithm for nonconvex optimization. SIAM J. Imag. Sci. 7(2), 1388–1419 (2014)

    Article  MathSciNet  MATH  Google Scholar 

  21. Paatero, P., Tapper, U.: Positive matrix factorization: a non-negative factor model with optimal utilization of error estimates of data values. Environmetrics 5(2), 111–126 (1994)

    Article  Google Scholar 

  22. Pock, T., Sabach, S.: Inertial proximal alternating linearized minimization (iPALM) for nonconvex and nonsmooth problems. SIAM J. Imag. Sci. 9(4), 1756–1787 (2016)

    Article  MathSciNet  MATH  Google Scholar 

  23. Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Comput. Math. Math. Phys. 4(5), 1–17 (1964)

    Article  Google Scholar 

  24. Pruessner, A., O’Leary, D.P.: Blind deconvolution using a regularized structured total least norm algorithm. SIAM J. Matrix Anal. Appl. 24(4), 1018–1037 (2003)

    Article  MathSciNet  MATH  Google Scholar 

  25. Teboulle, M., Vaisbourd, Y.: Novel proximal gradient methods for nonnegative matrix factorization with sparsity constraints. SIAM J. Imag. Sci. 13(1), 381–421 (2020)

    Article  MathSciNet  MATH  Google Scholar 

  26. Tseng, P.: Convergence of a block coordinate descent method for nondifferentiable minimization. J. Optim. Theory Appl. 109(3), 475–494 (2001)

    Article  MathSciNet  MATH  Google Scholar 

  27. Wang, H., Pan, J., Zhixun, S., Liang, S.: Blind image deblurring using elastic-net based rank prior. Comput. Vis. Image Underst. 168, 157–171 (2018)

    Article  Google Scholar 

  28. Wang, Y.-X., Zhang, Y.-J.: Nonnegative matrix factorization: a comprehensive review. IEEE Trans. Knowl. Data Eng. 25(6), 1336–1353 (2012)

    Article  Google Scholar 

  29. Wen, F., Chu, L., Liu, P., Qiu, R.C.: A survey on nonconvex regularization-based sparse and low-rank recovery in signal processing, statistics, and machine learning. IEEE Access 6, 69883–69906 (2018)

    Article  Google Scholar 

Download references

Acknowledgements

We express our gratitude to the anonymous reviewers whose valuable feedback has greatly contributed to enhancing the paper and making it more concise.

Funding

The work of Shoham Sabach and Eyal Gur was supported by the Israel Science Foundation, grant no. ISF 2480/21.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Eyal Gur.

Additional information

Communicated by Russel Luke.

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendix

Appendix

1.1 A FISTA Convergence Results

Here we prove several results about the FISTA method, which are used in the proof of Theorem 3.1 (see Sect. 3.1).

A.0 Let \(\mathcal {A}\) be some algorithm and let \({\left\{ {\textbf{v}^j}\right\} }_{j\ge 0}\) be a sequence generated by \(\mathcal {A}\) for minimizing a \(\sigma \)-strongly convex function \(f:\mathbb {R}^n\rightarrow \left( -\infty ,\infty \right] \). Assume that there exists a sequence of scalars \({\left\{ {\beta ^j}\right\} }_{j\ge 0}\) such that

  1. (a)

    \(\beta ^j\rightarrow 0\) as \(j\rightarrow \infty \).

  2. (b)

    \(f\left( {\textbf{v}^j} \right) -f\left( {\textbf{v}^*} \right) \le \beta ^j\left\| {\textbf{v}^0-\textbf{v}^*} \right\| ^2\), where \(\textbf{v}^*\in \mathbb {R}^n\) is the unique minimizer of f.

Notice that any convergent algorithm with a known convergence rate in terms of function values satisfies Assumption 1. In particular, following inequality (6), we see that FISTA satisfies this assumption.

Lemma A.1

Let \(f:\mathbb {R}^n\rightarrow \left( -\infty ,\infty \right] \) be a \(\sigma \)-strongly convex function, and let \(\textbf{v}^*\in \mathbb {R}^n\) be its minimizer. Then,

  1. (i)

    \(f\left( {\textbf{v}} \right) \ge f\left( {\textbf{v}^*} \right) +\left( {\sigma /2} \right) \cdot \left\| {\textbf{v}-\textbf{v}^*} \right\| ^2\) for any \(\textbf{v}\in \mathbb {R}^n\).

Let \({\left\{ {\textbf{v}^j}\right\} }_{j\ge 0}\) be a sequence generated by algorithm \(\mathcal {A}\) that satisfies Assumption 1. Then,

  1. (ii)

    \(\left\| {\textbf{v}^j-\textbf{v}^*} \right\| \le \sqrt{2\beta ^j/\sigma }\left\| {\textbf{v}^0-\textbf{v}^*} \right\| \) for any \(j\ge 0\). In particular, \(\textbf{v}^j\rightarrow \textbf{v}^*\) as \(j\rightarrow \infty \).

  2. (iii)

    If, in addition, \(\beta ^j\ge \beta ^{j+1}\) for any \(j\ge 0\), then \(\left\| {\textbf{v}^{j+1}-\textbf{v}^j} \right\| \le 2\sqrt{2\beta ^j/\sigma }\left\| {\textbf{v}^0-\textbf{v}^*} \right\| \).

Proof

Since f is \(\sigma \)-strongly convex, for any \(\textbf{v}\in \mathbb {R}^n\) and for any \(\varvec{\xi }\in \partial f\left( {\textbf{v}^*} \right) \), we have

$$\begin{aligned} f\left( {\textbf{v}} \right) \ge f\left( {\textbf{v}^*} \right) +\varvec{\xi }^T\left( {\textbf{v}-\textbf{v}^*} \right) +\frac{\sigma }{2}\left\| {\textbf{v}-\textbf{v}^*} \right\| ^2. \end{aligned}$$

Since \(\textbf{v}^*\) is a minimizer of the function f, then from the first-order optimality condition we have \(\textbf{0}_n\in \partial f\left( {\textbf{v}^*} \right) \), and item (i) follows. Now, item (ii) immediately follows from item (i) by plugging \(\textbf{v}=\textbf{v}^j\) and using Assumption 1(b). Moreover, from Assumption 1(a) we have that \(\textbf{v}^j\rightarrow \textbf{v}^*\) as \(j\rightarrow \infty \), as required.

To prove item (iii), notice that if \(\beta ^j\ge \beta ^{j+1}\) for any \(j\ge 0\), then from the triangle inequality and item (ii) we get

$$\begin{aligned} \left\| {\textbf{v}^{j+1}-\textbf{v}^j} \right\| \le \sqrt{\frac{2\beta ^{j+1}}{\sigma }}\left\| {\textbf{v}^0-\textbf{v}^*} \right\| +\sqrt{\frac{2\beta ^j}{\sigma }}\left\| {\textbf{v}^0-\textbf{v}^*} \right\| \le 2\sqrt{\frac{2\beta ^j}{\sigma }}\left\| {\textbf{v}^0-\textbf{v}^*} \right\| , \end{aligned}$$

and the proof is completed.\(\square \)

Using the inequalities obtained in Lemma A.1, in the following lemma we prove convergence results for the FISTA method in the strongly convex setting.

Lemma A.2

For \(k\ge 0\), let \({\left\{ {{{\textbf{x}}}^{k,j}}\right\} }_{j\ge 0}\) and \({\left\{ {\textbf{y}^{k,j}}\right\} }_{j\ge 0}\) be sequences generated by FISTA for Problem \(\left( {\textrm{P}^k} \right) \) (steps 79 in Algorithm 1). Then,

  1. (i)

    \( \left\| {{{\textbf{x}}}^{k,j+1}-\textbf{y}^{k,j}} \right\| \le 4\left\| {{{\textbf{x}}}^{k,j}-{{\textbf{x}}}^k_*} \right\| \sqrt{2\beta ^{k,j-1}_{\textrm{F}}/\sigma ^k}\) for all \(j\ge 1\).

  2. (ii)

    For all \(j\ge 0\) we have

    $$\begin{aligned} \varPsi ^k\left( {{{\textbf{x}}}^{k}} \right) -\varPsi ^k\left( {{{\textbf{x}}}^{k,j}} \right){} & {} \ge \frac{\sigma ^k}{2}\left\| {{{\textbf{x}}}^{k}-{{\textbf{x}}}^{k,j}} \right\| ^2-\sqrt{2\sigma ^k\beta ^{k,j}_{\textrm{F}}}\left\| {{{\textbf{x}}}^{k}-{{\textbf{x}}}^k_*} \right\| \\{} & {} \qquad \left\| {{{\textbf{x}}}^{k}-{{\textbf{x}}}^{k,j}} \right\| -\beta ^{k,j}_{\textrm{F}}\left\| {{{\textbf{x}}}^{k}-{{\textbf{x}}}^{k,j}} \right\| ^2.\end{aligned}$$
  3. (iii)

    \( \left\| {\textbf{w}^{k,j}} \right\| \le 8L^k\left\| {{{\textbf{x}}}^k-{{\textbf{x}}}^k_*} \right\| \sqrt{2\beta ^{k,j-2}_{\textrm{F}}/{\sigma ^k}}\) for all \(j\ge 2\) and some \(\textbf{w}^{k,j}\in \partial \varPsi ^k\left( {{{\textbf{x}}}^{k,j}} \right) \).

Proof

First, recall that \({{\textbf{x}}}^k={{\textbf{x}}}^{k,0}\) and that \({{\textbf{x}}}^{k,j^k}={{\textbf{x}}}^{k+1}\). In addition, recall that \({{\textbf{x}}}^k_*\) is the minimizer of the strongly convex function \(\varPsi ^k\) of Problem \(\left( {\textrm{P}^k} \right) \).

Since FISTA satisfies Assumption 1 with \(\beta ^{k,j}_{\textrm{F}}\) (see (6)), we can use Lemma A.1. Now we prove item (i). From Lemma A.1(iii), we have for all \(j\ge 0\) that

$$\begin{aligned} \left\| {{{\textbf{x}}}^{k,j+1}-{{\textbf{x}}}^{k,j}} \right\| \le 2\left\| {{{\textbf{x}}}^k-{{\textbf{x}}}^k_*} \right\| \sqrt{\frac{2\beta ^{k,j}_{\textrm{F}}}{\sigma ^k}}. \end{aligned}$$
(19)

Hence, for any \(j\ge 1\) it follows that

$$\begin{aligned} \begin{aligned} \left\| {{{\textbf{x}}}^{k,j+1}-\textbf{y}^{k,j}} \right\|&=\left\| {{{\textbf{x}}}^{k,j+1}-{{\textbf{x}}}^{k,j}-\frac{t_{j-1}-1}{t_j}\left( {{{\textbf{x}}}^{k,j}-{{\textbf{x}}}^{k,j-1}} \right) } \right\| \\&\quad \le \left\| {{{\textbf{x}}}^{k,j+1}-{{\textbf{x}}}^{k,j}} \right\| +\frac{t_{j-1}-1}{t_j}\left\| {{{\textbf{x}}}^{k,j}-{{\textbf{x}}}^{k,j-1}} \right\| \\&\le \left\| {{{\textbf{x}}}^{k,j+1}-{{\textbf{x}}}^{k,j}} \right\| +\left\| {{{\textbf{x}}}^{k,j}-{{\textbf{x}}}^{k,j-1}} \right\| \\&\quad \le 2\left\| {{{\textbf{x}}}^k-{{\textbf{x}}}^k_*} \right\| \sqrt{\frac{2\beta ^{k,j}_{\textrm{F}}}{\sigma ^k}}+2\left\| {{{\textbf{x}}}^k-{{\textbf{x}}}^k_*} \right\| \sqrt{\frac{2\beta ^{k,j-1}_{\textrm{F}}}{\sigma ^k}}\\&\le 4\left\| {{{\textbf{x}}}^k-{{\textbf{x}}}^k_*} \right\| \sqrt{\frac{2\beta ^{k,j-1}_{\textrm{F}}}{\sigma ^k}}, \end{aligned} \end{aligned}$$

where first equality follows from step 9 in Algorithm 1, the second inequality follows from the fact that \(t_0=1\) and \(t_{j-1}\le t_j\) for any \(j\ge 1\) (see step 8 in Algorithm 1), the third inequality follows from (19), and the last inequality follows from the fact that \(\beta _{\textrm{F}}^{k,j}\le \beta _{\textrm{F}}^{k,j-1}\) for any \(j\ge 1\).

Now we prove item (ii). From Lemma A.1(i), we have, for any \(j\ge 0\), that

$$\begin{aligned} \varPsi ^k\left( {{{\textbf{x}}}^{k}} \right) -\varPsi ^k\left( {{{\textbf{x}}}^k_*} \right) +\varPsi ^k\left( {{{\textbf{x}}}^{k,j}} \right) -\varPsi ^k\left( {{{\textbf{x}}}^{k,j}} \right) \ge \frac{\sigma ^k}{2}\left\| {{{\textbf{x}}}^{k}-{{\textbf{x}}}^k_*} \right\| ^2. \end{aligned}$$

Rearranging of the terms yields

$$\begin{aligned} \begin{aligned} \varPsi ^k\left( {{{\textbf{x}}}^{k}} \right) -\varPsi ^k\left( {{{\textbf{x}}}^{k,j}} \right)&\ge \frac{\sigma ^k}{2}\left\| {{{\textbf{x}}}^{k}-{{\textbf{x}}}^k_*} \right\| ^2-\left( {\varPsi ^k\left( {{{\textbf{x}}}^{k,j}} \right) -\varPsi ^k\left( {{{\textbf{x}}}^k_*} \right) } \right) \\&\quad =\frac{\sigma ^k}{2}\left\| {{{\textbf{x}}}^{k,j}-{{\textbf{x}}}^k_*} \right\| ^2+\frac{\sigma ^k}{2}\left\| {{{\textbf{x}}}^{k}-{{\textbf{x}}}^{k,j}} \right\| ^2\\&+\sigma \left( {{{\textbf{x}}}^{k,j}-{{\textbf{x}}}^k_*} \right) ^T\left( {{{\textbf{x}}}^{k}-{{\textbf{x}}}^{k,j}} \right) -\left( {\varPsi ^k\left( {{{\textbf{x}}}^{k,j}} \right) -\varPsi ^k\left( {{{\textbf{x}}}^k_*} \right) } \right) . \end{aligned} \end{aligned}$$
(20)

Since \(\left( {\sigma ^k/2} \right) \cdot \left\| {{{\textbf{x}}}^{k,j}-{{\textbf{x}}}^k_*} \right\| \ge 0\), we get from (20) using the Cauchy–Schwartz inequality

$$\begin{aligned} \begin{aligned} \varPsi ^k\left( {{{\textbf{x}}}^{k}} \right) -\varPsi ^k\left( {{{\textbf{x}}}^{k,j}} \right)&\ge \frac{\sigma ^k}{2}\left\| {{{\textbf{x}}}^{k}-{{\textbf{x}}}^{k,j}} \right\| ^2-\sigma ^k\left\| {{{\textbf{x}}}^{k,j}-{{\textbf{x}}}^k_*} \right\| \cdot \left\| {{{\textbf{x}}}^{k}-{{\textbf{x}}}^{k,j}} \right\| \\&\quad -\left( {\varPsi ^k\left( {{{\textbf{x}}}^{j}} \right) -\varPsi ^k\left( {{{\textbf{x}}}^k_*} \right) } \right) \\&\ge \frac{\sigma ^k}{2}\left\| {{{\textbf{x}}}^{k}-{{\textbf{x}}}^{k,j}} \right\| ^2-\sqrt{2\sigma ^k\beta ^{j}_{\textrm{F}}}\left\| {{{\textbf{x}}}^{k}-{{\textbf{x}}}^k_*} \right\| \cdot \left\| {{{\textbf{x}}}^{k}-{{\textbf{x}}}^{k,j}} \right\| \\&\quad -\beta ^{j}_{\textrm{F}}\left\| {{{\textbf{x}}}^{k}-{{\textbf{x}}}^{k,j}} \right\| ^2, \end{aligned} \end{aligned}$$

where the second inequality follows from (6) and Assumption 1(b) with \(\beta ^j=\beta ^{k,j}_{\textrm{F}}\), and Lemma A.1(ii).

Now we prove item (iii). For any \(j\ge 1\), denote

$$\begin{aligned} \textbf{w}^{k,j}\equiv L^k\left( {\textbf{y}^{k,j-1}-{{\textbf{x}}}^{k,j}} \right) +\nabla \varphi ^k\left( {{{\textbf{x}}}^{k,j}} \right) -\nabla \varphi ^k\left( {\textbf{y}^{k,j-1}} \right) \in \partial \varPsi ^k\left( {{{\textbf{x}}}^{k,j}} \right) , \end{aligned}$$
(21)

where the inclusion follows from the first-order optimality condition of step 7 in Algorithm 1. Now, for any \(j\ge 2\) we get

$$\begin{aligned} \begin{aligned} \left\| {\textbf{w}^{k,j}} \right\|&\le L^k\left\| {{{\textbf{x}}}^{k,j}-\textbf{y}^{k,j-1}} \right\| +\left\| {\nabla \varphi ^k\left( {{{\textbf{x}}}^{k,j}} \right) -\nabla \varphi ^k\left( {\textbf{y}^{k,j-1}} \right) } \right\| \\&\le L^k\left\| {{{\textbf{x}}}^{k,j}-\textbf{y}^{k,j-1}} \right\| +L^k\left\| {{{\textbf{x}}}^{k,j}-{{\textbf{x}}}^{k,j-1}} \right\| \le 8L^k\left\| {{{\textbf{x}}}^k-{{\textbf{x}}}^k_*} \right\| \sqrt{\frac{2\beta ^{k,j-2}_{\textrm{F}}}{\sigma ^k}}, \end{aligned} \end{aligned}$$

where the last inequality follows from item (i).

1.2 B Proof of Proposition 3.1

Proposition B.1

For all \(k\ge 0\), let \({\left\{ {{{\textbf{x}}}^{k,j}}\right\} }_{j\ge 0}\) and \({\left\{ {\textbf{y}^{k,j}}\right\} }_{j\ge 0}\) be sequences generated by FISTA (steps 79 in Algorithm 1) for minimizing Problem \(\left( {\textrm{P}^k} \right) \). Assume that the sequence generated by Algorithm 1 is bounded. Then, there exists \(M>0\) such that for any \(k\ge 0\) it holds that

  1. (i)

    \(\left\| {{{\textbf{x}}}^k-{{\textbf{x}}}^{k+1}} \right\| \le M\).

  2. (ii)

    \(\left\| {{{\textbf{x}}}^k-{{\textbf{x}}}^k_*} \right\| \le M\).

  3. (iii)

    \(\left\| {\nabla \varphi ^k\left( {\textbf{y}^{k,j}} \right) -L^k\textbf{y}^{k,j}} \right\| \le M\) for any \(j\ge 0\).

Proof

Since the sequence generated by NAM is bounded, there exists \(M_1>0\) such that

$$\begin{aligned} \left\| {\textbf{z}^k} \right\| \le M_1, \end{aligned}$$
(22)

and that

$$\begin{aligned} \left\| {\textbf{z}_i^k-\textbf{z}_i^{k+1}} \right\| =\left\| {{{\textbf{x}}}^k-{{\textbf{x}}}^{k+1}} \right\| \le M_1, \end{aligned}$$
(23)

for all \(k\ge 0\) and item (i) is established. To prove item (ii), notice that from item (i) we have for any \(k\ge 0\) that

$$\begin{aligned} \left\| {{{\textbf{x}}}^{k}-{{\textbf{x}}}^k_*} \right\| \le \left\| {{{\textbf{x}}}^{k}-{{\textbf{x}}}^{k+1}} \right\| +\left\| {{{\textbf{x}}}^{k+1}-{{\textbf{x}}}^{k}_*} \right\| \le M_1+\left\| {{{\textbf{x}}}^{k+1}-{{\textbf{x}}}^{k}_*} \right\| . \end{aligned}$$
(24)

From Lemma A.1(ii) and (7), we get

(25)

where the second inequality follows from Assumptions 12(c) and 1 (c), and the last inequality follows from (5). Combining (24) and (25) we get

$$\begin{aligned} 0\le \left( {1-\frac{2\sqrt{\kappa }}{2\sqrt{\kappa }+1}} \right) \left\| {{{\textbf{x}}}^{k}-{{\textbf{x}}}^{k}_*} \right\| \le M_1. \end{aligned}$$

Therefore, there exists \(M_2>0\), such that

$$\begin{aligned} \left\| {{{\textbf{x}}}^k-{{\textbf{x}}}^k_*} \right\| \le M_2, \end{aligned}$$
(26)

for all \(k\ge 0\), and item (ii) is established.

Now we prove item (iii). To this end, we first prove that the sequences \({\left\{ {{{\textbf{x}}}^{k,j}}\right\} }_{j\ge 0}\) and \({\left\{ {\textbf{y}^{k,j}}\right\} }_{j\ge 0}\) are bounded. For any \(j\ge 0\), we have

$$\begin{aligned} \left\| {{{\textbf{x}}}^{k,j}} \right\| \le \left\| {{{\textbf{x}}}^{k,j}-{{\textbf{x}}}^k_*} \right\| \!+\!\left\| {{{\textbf{x}}}^k_*} \right\| \!\le \!\frac{2\sqrt{\kappa }}{2\sqrt{\kappa }+1}\left\| {{{\textbf{x}}}^{k}-{{\textbf{x}}}^k_*} \right\| \!+\!\left\| {{{\textbf{x}}}^k_*} \right\| \le \frac{2M_2\sqrt{\kappa }}{2\sqrt{\kappa }+1}\!+\!\left\| {{{\textbf{x}}}^k_*} \right\| ,\nonumber \\ \end{aligned}$$
(27)

where the second inequality follows by similar arguments as in (25), and the last inequality follows from (26). In addition,

$$\begin{aligned} \left\| {{{\textbf{x}}}^{k}_*} \right\| \le \left\| {{{\textbf{x}}}^{k}-{{\textbf{x}}}^{k}_*} \right\| +\left\| {{{\textbf{x}}}^{k}} \right\| \le M_2+\left\| {{{\textbf{x}}}^{k}} \right\| , \end{aligned}$$
(28)

and since the sequence \({\left\{ {{{\textbf{x}}}^k}\right\} }_{k\ge 0}\) is assumed to be bounded, it follows from (27) and (28) that there exists \(M_3>0\) such that \(\left\| {{{\textbf{x}}}^{k,j}} \right\| \le M_3\) for any \(k\ge 0\) and for any \(j\ge 0\). In addition,

$$\begin{aligned} \left\| {\textbf{y}^{k,j}} \right\| \le \left\| {{{\textbf{x}}}^{k,j+1}-\textbf{y}^{k,j}} \right\| +\left\| {{{\textbf{x}}}^{k,j+1}} \right\| \le 8\sqrt{\kappa }\left\| {{{\textbf{x}}}^{k}-{{\textbf{x}}}^k_*} \right\| +M_3, \end{aligned}$$
(29)

where the second inequality follows from Lemma A.2(i) and the fact that \(\left\| {{{\textbf{x}}}^{k,j+1}} \right\| \le M_3\). Therefore, it follows from (26) and (29) that there exists \(M_4>0\) such that \(\left\| {\textbf{y}^{k,j}} \right\| \le M_4\) for any \(k\ge 0\) and for any \(j\ge 0\).

Last, notice that since the function G is continuously differentiable, then over the compact set containing the involved bounded iterates (which is a subset of the domain \(\mathbb {R}^d\times \mathbb {R}^{d_0}\)), the gradient of the function \(G:\mathbb {R}^d\times \mathbb {R}^{d_0}\rightarrow \mathbb {R}\) in Problem (P) is \(\mathcal {L}\)-Lipschitz continuous, for some \(\mathcal {L}>0\). Hence, we have (recall that \(\varphi ^k\left( {{{\textbf{x}}}} \right) =G\left( {\textbf{z}_1^{k+1},\ldots ,\textbf{z}_{i-1}^{k+1},{{\textbf{x}}},\textbf{z}_{i+1}^k,\ldots ,\textbf{z}_p^k,\textbf{u}^{k+1}} \right) \))

$$\begin{aligned} \begin{aligned} \left\| {\nabla \varphi ^k\left( {\textbf{y}^{k,j}} \right) } \right\| -\left\| {\nabla _{\textbf{z}_i}G\left( {\textbf{0}_{d+d_0}} \right) } \right\|&\le \left\| {\nabla \varphi ^k\left( {\textbf{y}^{k,j}} \right) -\nabla _{\textbf{z}_i}G\left( {\textbf{0}_{d+d_0}} \right) } \right\| \\&\le {\mathcal {L}}\left( {\sum _{j=1}^{i-1}\left\| {\textbf{z}_j^{k+1}} \right\| +\left\| {\textbf{y}^{k,j}} \right\| +\sum _{j=i}^{p}\left\| {\textbf{z}_j^{k}} \right\| } \right) \\&\le {\mathcal {L}}\left( {M_4+pM_1} \right) , \end{aligned} \end{aligned}$$

where we used (22). Since \(\left\| {\nabla _{\textbf{z}_i}G\left( {\textbf{0}_{d+d_0}} \right) } \right\| \) is a constant independent of \(k\ge 0\) and \(j\ge 0\), it follows that there exists \(M_5>0\) such that \(\left\| {\nabla \varphi ^k\left( {\textbf{y}^{k,j}} \right) } \right\| \le M_5\). Hence,

$$\begin{aligned} \left\| {\nabla \varphi ^k\left( {\textbf{y}^{k,j}} \right) -L^k\textbf{y}^{k,j}} \right\| \le M_5+{\bar{L}}M_4, \end{aligned}$$
(30)

for any \(k\ge 0\) and for any \(j\ge 0\).

Finally, by setting \(M\equiv \max {\left\{ {M_1,M_2,M_5+{\bar{L}}M_4}\right\} }\), the required results follow from (23), (26) and (30).

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Gur, E., Sabach, S. & Shtern, S. Nested Alternating Minimization with FISTA for Non-convex and Non-smooth Optimization Problems. J Optim Theory Appl 199, 1130–1157 (2023). https://doi.org/10.1007/s10957-023-02310-4

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10957-023-02310-4

Keywords

Mathematics Subject Classification

Navigation