Nested Alternating Minimization with FISTA for Non-convex and Non-smooth Optimization Problems

Gur, Eyal; Sabach, Shoham; Shtern, Shimrit

doi:10.1007/s10957-023-02310-4

Nested Alternating Minimization with FISTA for Non-convex and Non-smooth Optimization Problems

Published: 03 October 2023

Volume 199, pages 1130–1157, (2023)
Cite this article

Journal of Optimization Theory and Applications Aims and scope Submit manuscript

404 Accesses
1 Citation
Explore all metrics

Abstract

Motivated by a recent framework for proving global convergence to critical points of nested alternating minimization algorithms, which was proposed for the case of smooth subproblems, we first show here that non-smooth subproblems can also be handled within this framework. Specifically, we present a novel analysis of an optimization scheme that utilizes the FISTA method as a nested algorithm. We establish the global convergence of this nested scheme to critical points of non-convex and non-smooth optimization problems. In addition, we propose a hybrid framework that allows to implement FISTA when applicable, while still maintaining the global convergence result. The power of nested algorithms using FISTA in the non-convex and non-smooth setting is illustrated with some numerical experiments that show their superiority over existing methods.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Algorithm 5

An adaptive primal-dual framework for nonsmooth convex minimization

Article 31 October 2019

An inexact successive quadratic approximation method for a class of difference-of-convex optimization problems

Article 02 March 2022

Douglas–Rachford splitting and ADMM for nonconvex optimization: accelerated and Newton-type linesearch algorithms

Article 11 May 2022

Data Availability

Data will be made available on request.

References

Attouch, H., Bolte, J., Redont, P., Soubeyran, A.: Proximal alternating minimization and projection methods for nonconvex problems: An approach based on the Kurdyka-Łojasiewicz inequality. Math. Oper. Res. 35(2), 438–457 (2010)
Article MathSciNet MATH Google Scholar
Beck, A.: First-Order Methods in Optimization, vol. 25. SIAM (2017)
Book MATH Google Scholar
Beck, A., Sabach, S., Teboulle, M.: An alternating semiproximal method for nonconvex regularized structured total least squares problems. SIAM J. Matrix Anal. Appl. 37(3), 1129–1150 (2016)
Article MathSciNet MATH Google Scholar
Beck, A., Teboulle, M.: A fast iterative shrinkage-thresholding algorithm for linear inverse problems. SIAM J. Imag. Sci. 2(1), 183–202 (2009)
Article MathSciNet MATH Google Scholar
Bolte, J., Sabach, S., Teboulle, M.: Proximal alternating linearized minimization for nonconvex and nonsmooth problems. Math. Program. 146(1–2), 459–494 (2014)
Article MathSciNet MATH Google Scholar
Bonettini, S., Prato, M., Rebegoldi, S.: A block coordinate variable metric linesearch based proximal gradient method. Comput. Optim. Appl. 71(1), 5–52 (2018)
Article MathSciNet MATH Google Scholar
Gan, J., Liu, T., Li, L., Zhang, J.: Non-negative matrix factorization: a survey. Comput. J. 64(7), 1080–1092 (2021)
Article MathSciNet Google Scholar
Gorissen, B.L., Yanıkoğlu, İ, den Hertog, D.: A practical guide to robust optimization. Omega 53, 124–137 (2015)
Article Google Scholar
Groenen, P.J.F., van de Velden, M.: Multidimensional scaling by majorization: a review. J. Stat. Softw. 73, 1–26 (2016)
Article Google Scholar
Gur, E., Sabach, S., Shtern, S.: Alternating minimization based first-order method for the wireless sensor network localization problem. IEEE Trans. Signal Process. 68, 6418–6431 (2020)
Article MathSciNet MATH Google Scholar
Gur, E., Sabach, S., Shtern, S.: Convergent nested alternating minimization algorithms for nonconvex optimization problems. Math. Oper. Res., (2022)
Gutjahr, W.J., Pichler, A.: Stochastic multi-objective optimization: a survey on non-scalarizing methods. Ann. Oper. Res. 236(2), 475–499 (2016)
Article MathSciNet MATH Google Scholar
Hansen, P.C., Nagy, J.G., O’leary, D.P.: Deblurring Images: Matrices, Spectra, and Filtering. SIAM, (2006)
Jain, P., Kar, P.: Non-convex optimization for machine learning. Found. Trends® Mach. Learn. 10(3–4), 142–336 (2017)
Article MATH Google Scholar
Kurdyka, K.: On gradients of functions definable in o-minimal structures. Ann. Inst. Fourier (Grenoble) 48(3), 769–783 (1998)
Article MathSciNet MATH Google Scholar
Łojasiewicz, S.: Une propriété topologique des sous-ensembles analytiques réels. In Les Équations aux Dérivées Partielles (Paris, 1962), pages 87–89. Éditions du Centre National de la Recherche Scientifique, Paris, (1963)
Mohammadi, F.G., Amini, M.H., Arabnia, H.R.: Evolutionary computation, optimization, and learning algorithms for data science. In: Optimization, Learning, and Control for Interdependent Complex Networks, pages 37–65. Springer, (2020)
Mordukhovich, B.S.: Variational Analysis and Generalized Differentiation I: Basic Theory, volume 330. Springer Science & Business Media, (2006)
Nesterov, Y.E.: A method for solving the convex programming problem with convergence rate $O(1/k^{2})$. Dokl. Akad. Nauk SSSR 269(3), 543–547 (1983)
MathSciNet Google Scholar
Ochs, P., Chen, Y., Brox, T., Pock, T.: iPiano: Inertial proximal algorithm for nonconvex optimization. SIAM J. Imag. Sci. 7(2), 1388–1419 (2014)
Article MathSciNet MATH Google Scholar
Paatero, P., Tapper, U.: Positive matrix factorization: a non-negative factor model with optimal utilization of error estimates of data values. Environmetrics 5(2), 111–126 (1994)
Article Google Scholar
Pock, T., Sabach, S.: Inertial proximal alternating linearized minimization (iPALM) for nonconvex and nonsmooth problems. SIAM J. Imag. Sci. 9(4), 1756–1787 (2016)
Article MathSciNet MATH Google Scholar
Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Comput. Math. Math. Phys. 4(5), 1–17 (1964)
Article Google Scholar
Pruessner, A., O’Leary, D.P.: Blind deconvolution using a regularized structured total least norm algorithm. SIAM J. Matrix Anal. Appl. 24(4), 1018–1037 (2003)
Article MathSciNet MATH Google Scholar
Teboulle, M., Vaisbourd, Y.: Novel proximal gradient methods for nonnegative matrix factorization with sparsity constraints. SIAM J. Imag. Sci. 13(1), 381–421 (2020)
Article MathSciNet MATH Google Scholar
Tseng, P.: Convergence of a block coordinate descent method for nondifferentiable minimization. J. Optim. Theory Appl. 109(3), 475–494 (2001)
Article MathSciNet MATH Google Scholar
Wang, H., Pan, J., Zhixun, S., Liang, S.: Blind image deblurring using elastic-net based rank prior. Comput. Vis. Image Underst. 168, 157–171 (2018)
Article Google Scholar
Wang, Y.-X., Zhang, Y.-J.: Nonnegative matrix factorization: a comprehensive review. IEEE Trans. Knowl. Data Eng. 25(6), 1336–1353 (2012)
Article Google Scholar
Wen, F., Chu, L., Liu, P., Qiu, R.C.: A survey on nonconvex regularization-based sparse and low-rank recovery in signal processing, statistics, and machine learning. IEEE Access 6, 69883–69906 (2018)
Article Google Scholar

Download references

Acknowledgements

We express our gratitude to the anonymous reviewers whose valuable feedback has greatly contributed to enhancing the paper and making it more concise.

Funding

The work of Shoham Sabach and Eyal Gur was supported by the Israel Science Foundation, grant no. ISF 2480/21.

Author information

Authors and Affiliations

Faculty of Data and Decision Sciences, Technion – Israel Institute of Technology, 3200003, Haifa, Israel
Eyal Gur, Shoham Sabach & Shimrit Shtern

Authors

Eyal Gur
View author publications
You can also search for this author in PubMed Google Scholar
Shoham Sabach
View author publications
You can also search for this author in PubMed Google Scholar
Shimrit Shtern
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Eyal Gur.

Additional information

Communicated by Russel Luke.

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendix

1.1 A FISTA Convergence Results

Here we prove several results about the FISTA method, which are used in the proof of Theorem 3.1 (see Sect. 3.1).

A.0 Let $\mathcal {A}$ be some algorithm and let ${\left\{ {\textbf{v}^j}\right\} }_{j\ge 0}$ be a sequence generated by $\mathcal {A}$ for minimizing a $\sigma $-strongly convex function $f:\mathbb {R}^n\rightarrow \left( -\infty ,\infty \right] $. Assume that there exists a sequence of scalars ${\left\{ {\beta ^j}\right\} }_{j\ge 0}$ such that

(a)
$\beta ^j\rightarrow 0$ as $j\rightarrow \infty $.
(b)
$f\left( {\textbf{v}^j} \right) -f\left( {\textbf{v}^*} \right) \le \beta ^j\left\| {\textbf{v}^0-\textbf{v}^*} \right\| ^2$, where $\textbf{v}^*\in \mathbb {R}^n$ is the unique minimizer of f.

Notice that any convergent algorithm with a known convergence rate in terms of function values satisfies Assumption 1. In particular, following inequality (6), we see that FISTA satisfies this assumption.

Lemma A.1

Let $f:\mathbb {R}^n\rightarrow \left( -\infty ,\infty \right] $ be a $\sigma $-strongly convex function, and let $\textbf{v}^*\in \mathbb {R}^n$ be its minimizer. Then,

(i)
$f\left( {\textbf{v}} \right) \ge f\left( {\textbf{v}^*} \right) +\left( {\sigma /2} \right) \cdot \left\| {\textbf{v}-\textbf{v}^*} \right\| ^2$ for any $\textbf{v}\in \mathbb {R}^n$.

Let ${\left\{ {\textbf{v}^j}\right\} }_{j\ge 0}$ be a sequence generated by algorithm $\mathcal {A}$ that satisfies Assumption 1. Then,

(ii)
$\left\| {\textbf{v}^j-\textbf{v}^*} \right\| \le \sqrt{2\beta ^j/\sigma }\left\| {\textbf{v}^0-\textbf{v}^*} \right\| $ for any $j\ge 0$. In particular, $\textbf{v}^j\rightarrow \textbf{v}^*$ as $j\rightarrow \infty $.
(iii)
If, in addition, $\beta ^j\ge \beta ^{j+1}$ for any $j\ge 0$, then $\left\| {\textbf{v}^{j+1}-\textbf{v}^j} \right\| \le 2\sqrt{2\beta ^j/\sigma }\left\| {\textbf{v}^0-\textbf{v}^*} \right\| $.

Proof

Since f is $\sigma $-strongly convex, for any $\textbf{v}\in \mathbb {R}^n$ and for any $\varvec{\xi }\in \partial f\left( {\textbf{v}^*} \right) $, we have

$$\begin{aligned} f\left( {\textbf{v}} \right) \ge f\left( {\textbf{v}^*} \right) +\varvec{\xi }^T\left( {\textbf{v}-\textbf{v}^*} \right) +\frac{\sigma }{2}\left\| {\textbf{v}-\textbf{v}^*} \right\| ^2. \end{aligned}$$

Since $\textbf{v}^*$ is a minimizer of the function f, then from the first-order optimality condition we have $\textbf{0}_n\in \partial f\left( {\textbf{v}^*} \right) $, and item (i) follows. Now, item (ii) immediately follows from item (i) by plugging $\textbf{v}=\textbf{v}^j$ and using Assumption 1(b). Moreover, from Assumption 1(a) we have that $\textbf{v}^j\rightarrow \textbf{v}^*$ as $j\rightarrow \infty $, as required.

To prove item (iii), notice that if $\beta ^j\ge \beta ^{j+1}$ for any $j\ge 0$, then from the triangle inequality and item (ii) we get

$$\begin{aligned} \left\| {\textbf{v}^{j+1}-\textbf{v}^j} \right\| \le \sqrt{\frac{2\beta ^{j+1}}{\sigma }}\left\| {\textbf{v}^0-\textbf{v}^*} \right\| +\sqrt{\frac{2\beta ^j}{\sigma }}\left\| {\textbf{v}^0-\textbf{v}^*} \right\| \le 2\sqrt{\frac{2\beta ^j}{\sigma }}\left\| {\textbf{v}^0-\textbf{v}^*} \right\| , \end{aligned}$$

and the proof is completed.$\square $

Using the inequalities obtained in Lemma A.1, in the following lemma we prove convergence results for the FISTA method in the strongly convex setting.

Lemma A.2

For $k\ge 0$, let ${\left\{ {{{\textbf{x}}}^{k,j}}\right\} }_{j\ge 0}$ and ${\left\{ {\textbf{y}^{k,j}}\right\} }_{j\ge 0}$ be sequences generated by FISTA for Problem $\left( {\textrm{P}^k} \right) $ (steps 7–9 in Algorithm 1). Then,

(i)
$ \left\| {{{\textbf{x}}}^{k,j+1}-\textbf{y}^{k,j}} \right\| \le 4\left\| {{{\textbf{x}}}^{k,j}-{{\textbf{x}}}^k_*} \right\| \sqrt{2\beta ^{k,j-1}_{\textrm{F}}/\sigma ^k}$ for all $j\ge 1$.
(ii)
For all $j\ge 0$ we have
$$\begin{aligned} \varPsi ^k\left( {{{\textbf{x}}}^{k}} \right) -\varPsi ^k\left( {{{\textbf{x}}}^{k,j}} \right){} & {} \ge \frac{\sigma ^k}{2}\left\| {{{\textbf{x}}}^{k}-{{\textbf{x}}}^{k,j}} \right\| ^2-\sqrt{2\sigma ^k\beta ^{k,j}_{\textrm{F}}}\left\| {{{\textbf{x}}}^{k}-{{\textbf{x}}}^k_*} \right\| \\{} & {} \qquad \left\| {{{\textbf{x}}}^{k}-{{\textbf{x}}}^{k,j}} \right\| -\beta ^{k,j}_{\textrm{F}}\left\| {{{\textbf{x}}}^{k}-{{\textbf{x}}}^{k,j}} \right\| ^2.\end{aligned}$$
(iii)
$ \left\| {\textbf{w}^{k,j}} \right\| \le 8L^k\left\| {{{\textbf{x}}}^k-{{\textbf{x}}}^k_*} \right\| \sqrt{2\beta ^{k,j-2}_{\textrm{F}}/{\sigma ^k}}$ for all $j\ge 2$ and some $\textbf{w}^{k,j}\in \partial \varPsi ^k\left( {{{\textbf{x}}}^{k,j}} \right) $.

Proof

First, recall that ${{\textbf{x}}}^k={{\textbf{x}}}^{k,0}$ and that ${{\textbf{x}}}^{k,j^k}={{\textbf{x}}}^{k+1}$. In addition, recall that ${{\textbf{x}}}^k_*$ is the minimizer of the strongly convex function $\varPsi ^k$ of Problem $\left( {\textrm{P}^k} \right) $.

Since FISTA satisfies Assumption 1 with $\beta ^{k,j}_{\textrm{F}}$ (see (6)), we can use Lemma A.1. Now we prove item (i). From Lemma A.1(iii), we have for all $j\ge 0$ that

$$\begin{aligned} \left\| {{{\textbf{x}}}^{k,j+1}-{{\textbf{x}}}^{k,j}} \right\| \le 2\left\| {{{\textbf{x}}}^k-{{\textbf{x}}}^k_*} \right\| \sqrt{\frac{2\beta ^{k,j}_{\textrm{F}}}{\sigma ^k}}. \end{aligned}$$

(19)

Hence, for any $j\ge 1$ it follows that

$$\begin{aligned} \begin{aligned} \left\| {{{\textbf{x}}}^{k,j+1}-\textbf{y}^{k,j}} \right\|&=\left\| {{{\textbf{x}}}^{k,j+1}-{{\textbf{x}}}^{k,j}-\frac{t_{j-1}-1}{t_j}\left( {{{\textbf{x}}}^{k,j}-{{\textbf{x}}}^{k,j-1}} \right) } \right\| \\&\quad \le \left\| {{{\textbf{x}}}^{k,j+1}-{{\textbf{x}}}^{k,j}} \right\| +\frac{t_{j-1}-1}{t_j}\left\| {{{\textbf{x}}}^{k,j}-{{\textbf{x}}}^{k,j-1}} \right\| \\&\le \left\| {{{\textbf{x}}}^{k,j+1}-{{\textbf{x}}}^{k,j}} \right\| +\left\| {{{\textbf{x}}}^{k,j}-{{\textbf{x}}}^{k,j-1}} \right\| \\&\quad \le 2\left\| {{{\textbf{x}}}^k-{{\textbf{x}}}^k_*} \right\| \sqrt{\frac{2\beta ^{k,j}_{\textrm{F}}}{\sigma ^k}}+2\left\| {{{\textbf{x}}}^k-{{\textbf{x}}}^k_*} \right\| \sqrt{\frac{2\beta ^{k,j-1}_{\textrm{F}}}{\sigma ^k}}\\&\le 4\left\| {{{\textbf{x}}}^k-{{\textbf{x}}}^k_*} \right\| \sqrt{\frac{2\beta ^{k,j-1}_{\textrm{F}}}{\sigma ^k}}, \end{aligned} \end{aligned}$$

where first equality follows from step 9 in Algorithm 1, the second inequality follows from the fact that $t_0=1$ and $t_{j-1}\le t_j$ for any $j\ge 1$ (see step 8 in Algorithm 1), the third inequality follows from (19), and the last inequality follows from the fact that $\beta _{\textrm{F}}^{k,j}\le \beta _{\textrm{F}}^{k,j-1}$ for any $j\ge 1$.

Now we prove item (ii). From Lemma A.1(i), we have, for any $j\ge 0$, that

$$\begin{aligned} \varPsi ^k\left( {{{\textbf{x}}}^{k}} \right) -\varPsi ^k\left( {{{\textbf{x}}}^k_*} \right) +\varPsi ^k\left( {{{\textbf{x}}}^{k,j}} \right) -\varPsi ^k\left( {{{\textbf{x}}}^{k,j}} \right) \ge \frac{\sigma ^k}{2}\left\| {{{\textbf{x}}}^{k}-{{\textbf{x}}}^k_*} \right\| ^2. \end{aligned}$$

Rearranging of the terms yields

$$\begin{aligned} \begin{aligned} \varPsi ^k\left( {{{\textbf{x}}}^{k}} \right) -\varPsi ^k\left( {{{\textbf{x}}}^{k,j}} \right)&\ge \frac{\sigma ^k}{2}\left\| {{{\textbf{x}}}^{k}-{{\textbf{x}}}^k_*} \right\| ^2-\left( {\varPsi ^k\left( {{{\textbf{x}}}^{k,j}} \right) -\varPsi ^k\left( {{{\textbf{x}}}^k_*} \right) } \right) \\&\quad =\frac{\sigma ^k}{2}\left\| {{{\textbf{x}}}^{k,j}-{{\textbf{x}}}^k_*} \right\| ^2+\frac{\sigma ^k}{2}\left\| {{{\textbf{x}}}^{k}-{{\textbf{x}}}^{k,j}} \right\| ^2\\&+\sigma \left( {{{\textbf{x}}}^{k,j}-{{\textbf{x}}}^k_*} \right) ^T\left( {{{\textbf{x}}}^{k}-{{\textbf{x}}}^{k,j}} \right) -\left( {\varPsi ^k\left( {{{\textbf{x}}}^{k,j}} \right) -\varPsi ^k\left( {{{\textbf{x}}}^k_*} \right) } \right) . \end{aligned} \end{aligned}$$

(20)

Since $\left( {\sigma ^k/2} \right) \cdot \left\| {{{\textbf{x}}}^{k,j}-{{\textbf{x}}}^k_*} \right\| \ge 0$, we get from (20) using the Cauchy–Schwartz inequality

$$\begin{aligned} \begin{aligned} \varPsi ^k\left( {{{\textbf{x}}}^{k}} \right) -\varPsi ^k\left( {{{\textbf{x}}}^{k,j}} \right)&\ge \frac{\sigma ^k}{2}\left\| {{{\textbf{x}}}^{k}-{{\textbf{x}}}^{k,j}} \right\| ^2-\sigma ^k\left\| {{{\textbf{x}}}^{k,j}-{{\textbf{x}}}^k_*} \right\| \cdot \left\| {{{\textbf{x}}}^{k}-{{\textbf{x}}}^{k,j}} \right\| \\&\quad -\left( {\varPsi ^k\left( {{{\textbf{x}}}^{j}} \right) -\varPsi ^k\left( {{{\textbf{x}}}^k_*} \right) } \right) \\&\ge \frac{\sigma ^k}{2}\left\| {{{\textbf{x}}}^{k}-{{\textbf{x}}}^{k,j}} \right\| ^2-\sqrt{2\sigma ^k\beta ^{j}_{\textrm{F}}}\left\| {{{\textbf{x}}}^{k}-{{\textbf{x}}}^k_*} \right\| \cdot \left\| {{{\textbf{x}}}^{k}-{{\textbf{x}}}^{k,j}} \right\| \\&\quad -\beta ^{j}_{\textrm{F}}\left\| {{{\textbf{x}}}^{k}-{{\textbf{x}}}^{k,j}} \right\| ^2, \end{aligned} \end{aligned}$$

where the second inequality follows from (6) and Assumption 1(b) with $\beta ^j=\beta ^{k,j}_{\textrm{F}}$, and Lemma A.1(ii).

Now we prove item (iii). For any $j\ge 1$, denote

$$\begin{aligned} \textbf{w}^{k,j}\equiv L^k\left( {\textbf{y}^{k,j-1}-{{\textbf{x}}}^{k,j}} \right) +\nabla \varphi ^k\left( {{{\textbf{x}}}^{k,j}} \right) -\nabla \varphi ^k\left( {\textbf{y}^{k,j-1}} \right) \in \partial \varPsi ^k\left( {{{\textbf{x}}}^{k,j}} \right) , \end{aligned}$$

(21)

where the inclusion follows from the first-order optimality condition of step 7 in Algorithm 1. Now, for any $j\ge 2$ we get

$$\begin{aligned} \begin{aligned} \left\| {\textbf{w}^{k,j}} \right\|&\le L^k\left\| {{{\textbf{x}}}^{k,j}-\textbf{y}^{k,j-1}} \right\| +\left\| {\nabla \varphi ^k\left( {{{\textbf{x}}}^{k,j}} \right) -\nabla \varphi ^k\left( {\textbf{y}^{k,j-1}} \right) } \right\| \\&\le L^k\left\| {{{\textbf{x}}}^{k,j}-\textbf{y}^{k,j-1}} \right\| +L^k\left\| {{{\textbf{x}}}^{k,j}-{{\textbf{x}}}^{k,j-1}} \right\| \le 8L^k\left\| {{{\textbf{x}}}^k-{{\textbf{x}}}^k_*} \right\| \sqrt{\frac{2\beta ^{k,j-2}_{\textrm{F}}}{\sigma ^k}}, \end{aligned} \end{aligned}$$

where the last inequality follows from item (i).

1.2 B Proof of Proposition 3.1

Proposition B.1

For all $k\ge 0$, let ${\left\{ {{{\textbf{x}}}^{k,j}}\right\} }_{j\ge 0}$ and ${\left\{ {\textbf{y}^{k,j}}\right\} }_{j\ge 0}$ be sequences generated by FISTA (steps 7–9 in Algorithm 1) for minimizing Problem $\left( {\textrm{P}^k} \right) $. Assume that the sequence generated by Algorithm 1 is bounded. Then, there exists $M>0$ such that for any $k\ge 0$ it holds that

(i)
$\left\| {{{\textbf{x}}}^k-{{\textbf{x}}}^{k+1}} \right\| \le M$.
(ii)
$\left\| {{{\textbf{x}}}^k-{{\textbf{x}}}^k_*} \right\| \le M$.
(iii)
$\left\| {\nabla \varphi ^k\left( {\textbf{y}^{k,j}} \right) -L^k\textbf{y}^{k,j}} \right\| \le M$ for any $j\ge 0$.

Proof

Since the sequence generated by NAM is bounded, there exists $M_1>0$ such that

$$\begin{aligned} \left\| {\textbf{z}^k} \right\| \le M_1, \end{aligned}$$

(22)

and that

$$\begin{aligned} \left\| {\textbf{z}_i^k-\textbf{z}_i^{k+1}} \right\| =\left\| {{{\textbf{x}}}^k-{{\textbf{x}}}^{k+1}} \right\| \le M_1, \end{aligned}$$

(23)

for all $k\ge 0$ and item (i) is established. To prove item (ii), notice that from item (i) we have for any $k\ge 0$ that

$$\begin{aligned} \left\| {{{\textbf{x}}}^{k}-{{\textbf{x}}}^k_*} \right\| \le \left\| {{{\textbf{x}}}^{k}-{{\textbf{x}}}^{k+1}} \right\| +\left\| {{{\textbf{x}}}^{k+1}-{{\textbf{x}}}^{k}_*} \right\| \le M_1+\left\| {{{\textbf{x}}}^{k+1}-{{\textbf{x}}}^{k}_*} \right\| . \end{aligned}$$

(24)

From Lemma A.1(ii) and (7), we get

(25)

where the second inequality follows from Assumptions 1, 2(c) and 1 (c), and the last inequality follows from (5). Combining (24) and (25) we get

$$\begin{aligned} 0\le \left( {1-\frac{2\sqrt{\kappa }}{2\sqrt{\kappa }+1}} \right) \left\| {{{\textbf{x}}}^{k}-{{\textbf{x}}}^{k}_*} \right\| \le M_1. \end{aligned}$$

Therefore, there exists $M_2>0$, such that

$$\begin{aligned} \left\| {{{\textbf{x}}}^k-{{\textbf{x}}}^k_*} \right\| \le M_2, \end{aligned}$$

(26)

for all $k\ge 0$, and item (ii) is established.

Now we prove item (iii). To this end, we first prove that the sequences ${\left\{ {{{\textbf{x}}}^{k,j}}\right\} }_{j\ge 0}$ and ${\left\{ {\textbf{y}^{k,j}}\right\} }_{j\ge 0}$ are bounded. For any $j\ge 0$, we have

$$\begin{aligned} \left\| {{{\textbf{x}}}^{k,j}} \right\| \le \left\| {{{\textbf{x}}}^{k,j}-{{\textbf{x}}}^k_*} \right\| \!+\!\left\| {{{\textbf{x}}}^k_*} \right\| \!\le \!\frac{2\sqrt{\kappa }}{2\sqrt{\kappa }+1}\left\| {{{\textbf{x}}}^{k}-{{\textbf{x}}}^k_*} \right\| \!+\!\left\| {{{\textbf{x}}}^k_*} \right\| \le \frac{2M_2\sqrt{\kappa }}{2\sqrt{\kappa }+1}\!+\!\left\| {{{\textbf{x}}}^k_*} \right\| ,\nonumber \\ \end{aligned}$$

(27)

where the second inequality follows by similar arguments as in (25), and the last inequality follows from (26). In addition,

$$\begin{aligned} \left\| {{{\textbf{x}}}^{k}_*} \right\| \le \left\| {{{\textbf{x}}}^{k}-{{\textbf{x}}}^{k}_*} \right\| +\left\| {{{\textbf{x}}}^{k}} \right\| \le M_2+\left\| {{{\textbf{x}}}^{k}} \right\| , \end{aligned}$$

(28)

and since the sequence ${\left\{ {{{\textbf{x}}}^k}\right\} }_{k\ge 0}$ is assumed to be bounded, it follows from (27) and (28) that there exists $M_3>0$ such that $\left\| {{{\textbf{x}}}^{k,j}} \right\| \le M_3$ for any $k\ge 0$ and for any $j\ge 0$. In addition,

$$\begin{aligned} \left\| {\textbf{y}^{k,j}} \right\| \le \left\| {{{\textbf{x}}}^{k,j+1}-\textbf{y}^{k,j}} \right\| +\left\| {{{\textbf{x}}}^{k,j+1}} \right\| \le 8\sqrt{\kappa }\left\| {{{\textbf{x}}}^{k}-{{\textbf{x}}}^k_*} \right\| +M_3, \end{aligned}$$

(29)

where the second inequality follows from Lemma A.2(i) and the fact that $\left\| {{{\textbf{x}}}^{k,j+1}} \right\| \le M_3$. Therefore, it follows from (26) and (29) that there exists $M_4>0$ such that $\left\| {\textbf{y}^{k,j}} \right\| \le M_4$ for any $k\ge 0$ and for any $j\ge 0$.

Last, notice that since the function G is continuously differentiable, then over the compact set containing the involved bounded iterates (which is a subset of the domain $\mathbb {R}^d\times \mathbb {R}^{d_0}$), the gradient of the function $G:\mathbb {R}^d\times \mathbb {R}^{d_0}\rightarrow \mathbb {R}$ in Problem (P) is $\mathcal {L}$-Lipschitz continuous, for some $\mathcal {L}>0$. Hence, we have (recall that $\varphi ^k\left( {{{\textbf{x}}}} \right) =G\left( {\textbf{z}_1^{k+1},\ldots ,\textbf{z}_{i-1}^{k+1},{{\textbf{x}}},\textbf{z}_{i+1}^k,\ldots ,\textbf{z}_p^k,\textbf{u}^{k+1}} \right) $)

$$\begin{aligned} \begin{aligned} \left\| {\nabla \varphi ^k\left( {\textbf{y}^{k,j}} \right) } \right\| -\left\| {\nabla _{\textbf{z}_i}G\left( {\textbf{0}_{d+d_0}} \right) } \right\|&\le \left\| {\nabla \varphi ^k\left( {\textbf{y}^{k,j}} \right) -\nabla _{\textbf{z}_i}G\left( {\textbf{0}_{d+d_0}} \right) } \right\| \\&\le {\mathcal {L}}\left( {\sum _{j=1}^{i-1}\left\| {\textbf{z}_j^{k+1}} \right\| +\left\| {\textbf{y}^{k,j}} \right\| +\sum _{j=i}^{p}\left\| {\textbf{z}_j^{k}} \right\| } \right) \\&\le {\mathcal {L}}\left( {M_4+pM_1} \right) , \end{aligned} \end{aligned}$$

where we used (22). Since $\left\| {\nabla _{\textbf{z}_i}G\left( {\textbf{0}_{d+d_0}} \right) } \right\| $ is a constant independent of $k\ge 0$ and $j\ge 0$, it follows that there exists $M_5>0$ such that $\left\| {\nabla \varphi ^k\left( {\textbf{y}^{k,j}} \right) } \right\| \le M_5$. Hence,

$$\begin{aligned} \left\| {\nabla \varphi ^k\left( {\textbf{y}^{k,j}} \right) -L^k\textbf{y}^{k,j}} \right\| \le M_5+{\bar{L}}M_4, \end{aligned}$$

(30)

for any $k\ge 0$ and for any $j\ge 0$.

Finally, by setting $M\equiv \max {\left\{ {M_1,M_2,M_5+{\bar{L}}M_4}\right\} }$, the required results follow from (23), (26) and (30).

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Gur, E., Sabach, S. & Shtern, S. Nested Alternating Minimization with FISTA for Non-convex and Non-smooth Optimization Problems. J Optim Theory Appl 199, 1130–1157 (2023). https://doi.org/10.1007/s10957-023-02310-4

Download citation

Received: 21 December 2022
Accepted: 12 September 2023
Published: 03 October 2023
Issue Date: December 2023
DOI: https://doi.org/10.1007/s10957-023-02310-4

Keywords

Mathematics Subject Classification

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Nested Alternating Minimization with FISTA for Non-convex and Non-smooth Optimization Problems

Abstract

Access this article

Similar content being viewed by others

An adaptive primal-dual framework for nonsmooth convex minimization

An inexact successive quadratic approximation method for a class of difference-of-convex optimization problems

Douglas–Rachford splitting and ADMM for nonconvex optimization: accelerated and Newton-type linesearch algorithms

Data Availability

References

Acknowledgements

Funding

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Appendix

1.1 A FISTA Convergence Results

Lemma A.1

Proof

Lemma A.2

Proof

1.2 B Proof of Proposition 3.1

Proposition B.1

Proof

Rights and permissions

About this article

Cite this article

Keywords

Mathematics Subject Classification

Navigation

Nested Alternating Minimization with FISTA for Non-convex and Non-smooth Optimization Problems

Abstract

Access this article

Similar content being viewed by others

An adaptive primal-dual framework for nonsmooth convex minimization

An inexact successive quadratic approximation method for a class of difference-of-convex optimization problems

Douglas–Rachford splitting and ADMM for nonconvex optimization: accelerated and Newton-type linesearch algorithms

Data Availability

References

Acknowledgements

Funding

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Appendix

Appendix

1.1 A FISTA Convergence Results

Lemma A.1

Proof

Lemma A.2

Proof

1.2 B Proof of Proposition 3.1

Proposition B.1

Proof

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Mathematics Subject Classification

Search

Navigation