Abstract
Motivated by a recent framework for proving global convergence to critical points of nested alternating minimization algorithms, which was proposed for the case of smooth subproblems, we first show here that non-smooth subproblems can also be handled within this framework. Specifically, we present a novel analysis of an optimization scheme that utilizes the FISTA method as a nested algorithm. We establish the global convergence of this nested scheme to critical points of non-convex and non-smooth optimization problems. In addition, we propose a hybrid framework that allows to implement FISTA when applicable, while still maintaining the global convergence result. The power of nested algorithms using FISTA in the non-convex and non-smooth setting is illustrated with some numerical experiments that show their superiority over existing methods.
Similar content being viewed by others
Data Availability
Data will be made available on request.
References
Attouch, H., Bolte, J., Redont, P., Soubeyran, A.: Proximal alternating minimization and projection methods for nonconvex problems: An approach based on the Kurdyka-Łojasiewicz inequality. Math. Oper. Res. 35(2), 438–457 (2010)
Beck, A.: First-Order Methods in Optimization, vol. 25. SIAM (2017)
Beck, A., Sabach, S., Teboulle, M.: An alternating semiproximal method for nonconvex regularized structured total least squares problems. SIAM J. Matrix Anal. Appl. 37(3), 1129–1150 (2016)
Beck, A., Teboulle, M.: A fast iterative shrinkage-thresholding algorithm for linear inverse problems. SIAM J. Imag. Sci. 2(1), 183–202 (2009)
Bolte, J., Sabach, S., Teboulle, M.: Proximal alternating linearized minimization for nonconvex and nonsmooth problems. Math. Program. 146(1–2), 459–494 (2014)
Bonettini, S., Prato, M., Rebegoldi, S.: A block coordinate variable metric linesearch based proximal gradient method. Comput. Optim. Appl. 71(1), 5–52 (2018)
Gan, J., Liu, T., Li, L., Zhang, J.: Non-negative matrix factorization: a survey. Comput. J. 64(7), 1080–1092 (2021)
Gorissen, B.L., Yanıkoğlu, İ, den Hertog, D.: A practical guide to robust optimization. Omega 53, 124–137 (2015)
Groenen, P.J.F., van de Velden, M.: Multidimensional scaling by majorization: a review. J. Stat. Softw. 73, 1–26 (2016)
Gur, E., Sabach, S., Shtern, S.: Alternating minimization based first-order method for the wireless sensor network localization problem. IEEE Trans. Signal Process. 68, 6418–6431 (2020)
Gur, E., Sabach, S., Shtern, S.: Convergent nested alternating minimization algorithms for nonconvex optimization problems. Math. Oper. Res., (2022)
Gutjahr, W.J., Pichler, A.: Stochastic multi-objective optimization: a survey on non-scalarizing methods. Ann. Oper. Res. 236(2), 475–499 (2016)
Hansen, P.C., Nagy, J.G., O’leary, D.P.: Deblurring Images: Matrices, Spectra, and Filtering. SIAM, (2006)
Jain, P., Kar, P.: Non-convex optimization for machine learning. Found. Trends® Mach. Learn. 10(3–4), 142–336 (2017)
Kurdyka, K.: On gradients of functions definable in o-minimal structures. Ann. Inst. Fourier (Grenoble) 48(3), 769–783 (1998)
Łojasiewicz, S.: Une propriété topologique des sous-ensembles analytiques réels. In Les Équations aux Dérivées Partielles (Paris, 1962), pages 87–89. Éditions du Centre National de la Recherche Scientifique, Paris, (1963)
Mohammadi, F.G., Amini, M.H., Arabnia, H.R.: Evolutionary computation, optimization, and learning algorithms for data science. In: Optimization, Learning, and Control for Interdependent Complex Networks, pages 37–65. Springer, (2020)
Mordukhovich, B.S.: Variational Analysis and Generalized Differentiation I: Basic Theory, volume 330. Springer Science & Business Media, (2006)
Nesterov, Y.E.: A method for solving the convex programming problem with convergence rate \(O(1/k^{2})\). Dokl. Akad. Nauk SSSR 269(3), 543–547 (1983)
Ochs, P., Chen, Y., Brox, T., Pock, T.: iPiano: Inertial proximal algorithm for nonconvex optimization. SIAM J. Imag. Sci. 7(2), 1388–1419 (2014)
Paatero, P., Tapper, U.: Positive matrix factorization: a non-negative factor model with optimal utilization of error estimates of data values. Environmetrics 5(2), 111–126 (1994)
Pock, T., Sabach, S.: Inertial proximal alternating linearized minimization (iPALM) for nonconvex and nonsmooth problems. SIAM J. Imag. Sci. 9(4), 1756–1787 (2016)
Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Comput. Math. Math. Phys. 4(5), 1–17 (1964)
Pruessner, A., O’Leary, D.P.: Blind deconvolution using a regularized structured total least norm algorithm. SIAM J. Matrix Anal. Appl. 24(4), 1018–1037 (2003)
Teboulle, M., Vaisbourd, Y.: Novel proximal gradient methods for nonnegative matrix factorization with sparsity constraints. SIAM J. Imag. Sci. 13(1), 381–421 (2020)
Tseng, P.: Convergence of a block coordinate descent method for nondifferentiable minimization. J. Optim. Theory Appl. 109(3), 475–494 (2001)
Wang, H., Pan, J., Zhixun, S., Liang, S.: Blind image deblurring using elastic-net based rank prior. Comput. Vis. Image Underst. 168, 157–171 (2018)
Wang, Y.-X., Zhang, Y.-J.: Nonnegative matrix factorization: a comprehensive review. IEEE Trans. Knowl. Data Eng. 25(6), 1336–1353 (2012)
Wen, F., Chu, L., Liu, P., Qiu, R.C.: A survey on nonconvex regularization-based sparse and low-rank recovery in signal processing, statistics, and machine learning. IEEE Access 6, 69883–69906 (2018)
Acknowledgements
We express our gratitude to the anonymous reviewers whose valuable feedback has greatly contributed to enhancing the paper and making it more concise.
Funding
The work of Shoham Sabach and Eyal Gur was supported by the Israel Science Foundation, grant no. ISF 2480/21.
Author information
Authors and Affiliations
Corresponding author
Additional information
Communicated by Russel Luke.
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Appendix
Appendix
1.1 A FISTA Convergence Results
Here we prove several results about the FISTA method, which are used in the proof of Theorem 3.1 (see Sect. 3.1).
A.0 Let \(\mathcal {A}\) be some algorithm and let \({\left\{ {\textbf{v}^j}\right\} }_{j\ge 0}\) be a sequence generated by \(\mathcal {A}\) for minimizing a \(\sigma \)-strongly convex function \(f:\mathbb {R}^n\rightarrow \left( -\infty ,\infty \right] \). Assume that there exists a sequence of scalars \({\left\{ {\beta ^j}\right\} }_{j\ge 0}\) such that
-
(a)
\(\beta ^j\rightarrow 0\) as \(j\rightarrow \infty \).
-
(b)
\(f\left( {\textbf{v}^j} \right) -f\left( {\textbf{v}^*} \right) \le \beta ^j\left\| {\textbf{v}^0-\textbf{v}^*} \right\| ^2\), where \(\textbf{v}^*\in \mathbb {R}^n\) is the unique minimizer of f.
Notice that any convergent algorithm with a known convergence rate in terms of function values satisfies Assumption 1. In particular, following inequality (6), we see that FISTA satisfies this assumption.
Lemma A.1
Let \(f:\mathbb {R}^n\rightarrow \left( -\infty ,\infty \right] \) be a \(\sigma \)-strongly convex function, and let \(\textbf{v}^*\in \mathbb {R}^n\) be its minimizer. Then,
-
(i)
\(f\left( {\textbf{v}} \right) \ge f\left( {\textbf{v}^*} \right) +\left( {\sigma /2} \right) \cdot \left\| {\textbf{v}-\textbf{v}^*} \right\| ^2\) for any \(\textbf{v}\in \mathbb {R}^n\).
Let \({\left\{ {\textbf{v}^j}\right\} }_{j\ge 0}\) be a sequence generated by algorithm \(\mathcal {A}\) that satisfies Assumption 1. Then,
-
(ii)
\(\left\| {\textbf{v}^j-\textbf{v}^*} \right\| \le \sqrt{2\beta ^j/\sigma }\left\| {\textbf{v}^0-\textbf{v}^*} \right\| \) for any \(j\ge 0\). In particular, \(\textbf{v}^j\rightarrow \textbf{v}^*\) as \(j\rightarrow \infty \).
-
(iii)
If, in addition, \(\beta ^j\ge \beta ^{j+1}\) for any \(j\ge 0\), then \(\left\| {\textbf{v}^{j+1}-\textbf{v}^j} \right\| \le 2\sqrt{2\beta ^j/\sigma }\left\| {\textbf{v}^0-\textbf{v}^*} \right\| \).
Proof
Since f is \(\sigma \)-strongly convex, for any \(\textbf{v}\in \mathbb {R}^n\) and for any \(\varvec{\xi }\in \partial f\left( {\textbf{v}^*} \right) \), we have
Since \(\textbf{v}^*\) is a minimizer of the function f, then from the first-order optimality condition we have \(\textbf{0}_n\in \partial f\left( {\textbf{v}^*} \right) \), and item (i) follows. Now, item (ii) immediately follows from item (i) by plugging \(\textbf{v}=\textbf{v}^j\) and using Assumption 1(b). Moreover, from Assumption 1(a) we have that \(\textbf{v}^j\rightarrow \textbf{v}^*\) as \(j\rightarrow \infty \), as required.
To prove item (iii), notice that if \(\beta ^j\ge \beta ^{j+1}\) for any \(j\ge 0\), then from the triangle inequality and item (ii) we get
and the proof is completed.\(\square \)
Using the inequalities obtained in Lemma A.1, in the following lemma we prove convergence results for the FISTA method in the strongly convex setting.
Lemma A.2
For \(k\ge 0\), let \({\left\{ {{{\textbf{x}}}^{k,j}}\right\} }_{j\ge 0}\) and \({\left\{ {\textbf{y}^{k,j}}\right\} }_{j\ge 0}\) be sequences generated by FISTA for Problem \(\left( {\textrm{P}^k} \right) \) (steps 7–9 in Algorithm 1). Then,
-
(i)
\( \left\| {{{\textbf{x}}}^{k,j+1}-\textbf{y}^{k,j}} \right\| \le 4\left\| {{{\textbf{x}}}^{k,j}-{{\textbf{x}}}^k_*} \right\| \sqrt{2\beta ^{k,j-1}_{\textrm{F}}/\sigma ^k}\) for all \(j\ge 1\).
-
(ii)
For all \(j\ge 0\) we have
$$\begin{aligned} \varPsi ^k\left( {{{\textbf{x}}}^{k}} \right) -\varPsi ^k\left( {{{\textbf{x}}}^{k,j}} \right){} & {} \ge \frac{\sigma ^k}{2}\left\| {{{\textbf{x}}}^{k}-{{\textbf{x}}}^{k,j}} \right\| ^2-\sqrt{2\sigma ^k\beta ^{k,j}_{\textrm{F}}}\left\| {{{\textbf{x}}}^{k}-{{\textbf{x}}}^k_*} \right\| \\{} & {} \qquad \left\| {{{\textbf{x}}}^{k}-{{\textbf{x}}}^{k,j}} \right\| -\beta ^{k,j}_{\textrm{F}}\left\| {{{\textbf{x}}}^{k}-{{\textbf{x}}}^{k,j}} \right\| ^2.\end{aligned}$$ -
(iii)
\( \left\| {\textbf{w}^{k,j}} \right\| \le 8L^k\left\| {{{\textbf{x}}}^k-{{\textbf{x}}}^k_*} \right\| \sqrt{2\beta ^{k,j-2}_{\textrm{F}}/{\sigma ^k}}\) for all \(j\ge 2\) and some \(\textbf{w}^{k,j}\in \partial \varPsi ^k\left( {{{\textbf{x}}}^{k,j}} \right) \).
Proof
First, recall that \({{\textbf{x}}}^k={{\textbf{x}}}^{k,0}\) and that \({{\textbf{x}}}^{k,j^k}={{\textbf{x}}}^{k+1}\). In addition, recall that \({{\textbf{x}}}^k_*\) is the minimizer of the strongly convex function \(\varPsi ^k\) of Problem \(\left( {\textrm{P}^k} \right) \).
Since FISTA satisfies Assumption 1 with \(\beta ^{k,j}_{\textrm{F}}\) (see (6)), we can use Lemma A.1. Now we prove item (i). From Lemma A.1(iii), we have for all \(j\ge 0\) that
Hence, for any \(j\ge 1\) it follows that
where first equality follows from step 9 in Algorithm 1, the second inequality follows from the fact that \(t_0=1\) and \(t_{j-1}\le t_j\) for any \(j\ge 1\) (see step 8 in Algorithm 1), the third inequality follows from (19), and the last inequality follows from the fact that \(\beta _{\textrm{F}}^{k,j}\le \beta _{\textrm{F}}^{k,j-1}\) for any \(j\ge 1\).
Now we prove item (ii). From Lemma A.1(i), we have, for any \(j\ge 0\), that
Rearranging of the terms yields
Since \(\left( {\sigma ^k/2} \right) \cdot \left\| {{{\textbf{x}}}^{k,j}-{{\textbf{x}}}^k_*} \right\| \ge 0\), we get from (20) using the Cauchy–Schwartz inequality
where the second inequality follows from (6) and Assumption 1(b) with \(\beta ^j=\beta ^{k,j}_{\textrm{F}}\), and Lemma A.1(ii).
Now we prove item (iii). For any \(j\ge 1\), denote
where the inclusion follows from the first-order optimality condition of step 7 in Algorithm 1. Now, for any \(j\ge 2\) we get
where the last inequality follows from item (i).
1.2 B Proof of Proposition 3.1
Proposition B.1
For all \(k\ge 0\), let \({\left\{ {{{\textbf{x}}}^{k,j}}\right\} }_{j\ge 0}\) and \({\left\{ {\textbf{y}^{k,j}}\right\} }_{j\ge 0}\) be sequences generated by FISTA (steps 7–9 in Algorithm 1) for minimizing Problem \(\left( {\textrm{P}^k} \right) \). Assume that the sequence generated by Algorithm 1 is bounded. Then, there exists \(M>0\) such that for any \(k\ge 0\) it holds that
-
(i)
\(\left\| {{{\textbf{x}}}^k-{{\textbf{x}}}^{k+1}} \right\| \le M\).
-
(ii)
\(\left\| {{{\textbf{x}}}^k-{{\textbf{x}}}^k_*} \right\| \le M\).
-
(iii)
\(\left\| {\nabla \varphi ^k\left( {\textbf{y}^{k,j}} \right) -L^k\textbf{y}^{k,j}} \right\| \le M\) for any \(j\ge 0\).
Proof
Since the sequence generated by NAM is bounded, there exists \(M_1>0\) such that
and that
for all \(k\ge 0\) and item (i) is established. To prove item (ii), notice that from item (i) we have for any \(k\ge 0\) that
From Lemma A.1(ii) and (7), we get
where the second inequality follows from Assumptions 1, 2(c) and 1 (c), and the last inequality follows from (5). Combining (24) and (25) we get
Therefore, there exists \(M_2>0\), such that
for all \(k\ge 0\), and item (ii) is established.
Now we prove item (iii). To this end, we first prove that the sequences \({\left\{ {{{\textbf{x}}}^{k,j}}\right\} }_{j\ge 0}\) and \({\left\{ {\textbf{y}^{k,j}}\right\} }_{j\ge 0}\) are bounded. For any \(j\ge 0\), we have
where the second inequality follows by similar arguments as in (25), and the last inequality follows from (26). In addition,
and since the sequence \({\left\{ {{{\textbf{x}}}^k}\right\} }_{k\ge 0}\) is assumed to be bounded, it follows from (27) and (28) that there exists \(M_3>0\) such that \(\left\| {{{\textbf{x}}}^{k,j}} \right\| \le M_3\) for any \(k\ge 0\) and for any \(j\ge 0\). In addition,
where the second inequality follows from Lemma A.2(i) and the fact that \(\left\| {{{\textbf{x}}}^{k,j+1}} \right\| \le M_3\). Therefore, it follows from (26) and (29) that there exists \(M_4>0\) such that \(\left\| {\textbf{y}^{k,j}} \right\| \le M_4\) for any \(k\ge 0\) and for any \(j\ge 0\).
Last, notice that since the function G is continuously differentiable, then over the compact set containing the involved bounded iterates (which is a subset of the domain \(\mathbb {R}^d\times \mathbb {R}^{d_0}\)), the gradient of the function \(G:\mathbb {R}^d\times \mathbb {R}^{d_0}\rightarrow \mathbb {R}\) in Problem (P) is \(\mathcal {L}\)-Lipschitz continuous, for some \(\mathcal {L}>0\). Hence, we have (recall that \(\varphi ^k\left( {{{\textbf{x}}}} \right) =G\left( {\textbf{z}_1^{k+1},\ldots ,\textbf{z}_{i-1}^{k+1},{{\textbf{x}}},\textbf{z}_{i+1}^k,\ldots ,\textbf{z}_p^k,\textbf{u}^{k+1}} \right) \))
where we used (22). Since \(\left\| {\nabla _{\textbf{z}_i}G\left( {\textbf{0}_{d+d_0}} \right) } \right\| \) is a constant independent of \(k\ge 0\) and \(j\ge 0\), it follows that there exists \(M_5>0\) such that \(\left\| {\nabla \varphi ^k\left( {\textbf{y}^{k,j}} \right) } \right\| \le M_5\). Hence,
for any \(k\ge 0\) and for any \(j\ge 0\).
Finally, by setting \(M\equiv \max {\left\{ {M_1,M_2,M_5+{\bar{L}}M_4}\right\} }\), the required results follow from (23), (26) and (30).
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Gur, E., Sabach, S. & Shtern, S. Nested Alternating Minimization with FISTA for Non-convex and Non-smooth Optimization Problems. J Optim Theory Appl 199, 1130–1157 (2023). https://doi.org/10.1007/s10957-023-02310-4
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10957-023-02310-4
Keywords
- Non-convex and non-smooth optimization
- Alternating minimization
- Global convergence
- Nested algorithms
- FISTA