Two-stage convex relaxation approach to low-rank and sparsity regularized least squares loss

Han, Le; Bi, Shujun

doi:10.1007/s10898-017-0573-2

Two-stage convex relaxation approach to low-rank and sparsity regularized least squares loss

Published: 07 October 2017

Volume 70, pages 71–97, (2018)
Cite this article

Journal of Global Optimization Aims and scope Submit manuscript

Le Han¹ &
Shujun Bi¹

381 Accesses
2 Citations
Explore all metrics

Abstract

In this paper we consider the rank and zero norm regularized least squares loss minimization problem with a spectral norm ball constraint. For this class of NP-hard optimization problems, we propose a two-stage convex relaxation approach by majorizing some suitable locally Lipschitz continuous surrogates. Furthermore, the Frobenius norm error bound for the optimal solution of each stage is characterized and the theoretical guarantee is established for the two-stage convex relaxation approach by showing that the error bound of the first stage convex relaxation (i.e., the nuclear norm and $\ell _1$-norm regularized minimization problem), can be reduced much by the second stage convex relaxation under a suitable restricted eigenvalue condition. Also, we verify the efficiency of the proposed approach by applying it to some random test problems and some real problems.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Two-stage convex relaxation approach to least squares loss constrained low-rank plus sparsity optimization problems

Article 07 October 2015

Convex Relaxation Algorithm for a Structured Simultaneous Low-Rank and Sparse Recovery Problem

Article 11 July 2015

Equivalent Lipschitz surrogates for zero-norm and rank optimization problems

Article 18 June 2018

References

Agarwal, A., Negahban, S., Wainwright, M.J.: Noisy matrix decomposition via convex relaxation: optimal rates in high deimensions. Ann. Stat. 40, 1171–1197 (2012)
Article MATH Google Scholar
Aybat, N.S., Ma, S., Goldfarb, D.: Noisy efficient algorithms for robust and stable principal component pursuit problems. Comput. Optim. Appl. 58, 1–29 (2014)
Article MathSciNet MATH Google Scholar
Beck, A., Teboulle, M.: A fast iterative shrinkage-thresholding algorithm for linear inverse problems. SIAM J. Imaging Sci. 2(1), 183–202 (2009)
Article MathSciNet MATH Google Scholar
Bi, S.J., Pan, S.H.: Multi-satge convex relaxation approach based on equivalent MPGCC to rank regularized minimization. SIAM J. Control Optim. 55, 2493–2518 (2017)
Cand‘es, E.J., Li, X., Ma, Y., Wright, J.: Robust principal component analysis? J. ACM 58, 1–37 (2011)
Article MathSciNet Google Scholar
Chandrasekaran, V., Sanghavi, S., Parrilo, P.A., Willsky, A.S.: Rank-sparsity incoherence for matrix decomposition. SIAM J. Optim. 21, 572–596 (2011)
Article MathSciNet MATH Google Scholar
Chandrasekaran, V., Parrilo, P., Willsky, A.: Latent variable graphical model selection via convex optimization. Ann. Stat. 40, 1935–1967 (2012)
Article MathSciNet MATH Google Scholar
Chartrand, R.: Nonconvex splitting for regularized low-rank + sparse decomposition. IEEE Trans. Signal Process. 60, 5810–5819 (2012)
Article MathSciNet Google Scholar
Dong, W., Shi, G., Hu, X., Ma, Y.: Nonlocal sparse and low-rank regularization for optimal flow estimation. IEEE Trans. Image Process. 23, 4527–4538 (2014)
Article MathSciNet MATH Google Scholar
Fazel, M.: Matrix rank minimization with applications. Stanford University, Disertation for Ph.D. Degree. State of California (2002)
Golbabaee, M., Vandergheynst, P.: Hyperspectral image compressed sensing via low-rank and joint sparse matrix recovery, In: Proceedings of IEEE International Conference on on Acoustics, Speech and Signal Processing, Kyota, pp. 2741–2744 (2011)
Han, L., Bi, S., Pan, S.H.: Two-stage convex relaxation approcha to least squares loss constrained low-rank plus sparsity optimization. Comput. Optim. Appl. 64, 119–148 (2016)
Article MathSciNet MATH Google Scholar
He, R., Sun, Z., Tan, T., Zheng, W.S.: Recovery of corrupted low-rank matrices via half-quadratic based nonconvex minimization, In: Proceedings of IEEE International Coference on Computer Vision and Pattern Recognation, Providence, RI, pp. 2889–2896 (2011)
Hsu, D., Kakade, S.M., Zhang, T.: Robust matrix decomposition with sparse corruptions. IEEE Trans. Inf. Theory 57, 7221–7234 (2011)
Article MathSciNet MATH Google Scholar
Kong, L., Xiu, N.H.: Exact low-rank matrix recovery via nonconvex schatten p-minimization. Asia-Pacific J. Oper. Res. 30, 1340010 (2013)
Article MATH Google Scholar
Lai, M.J., Xu, Y., Yin, W.: Improved iteratively reweighted least squares for unconstrained smoothed $l_q$ minimization. SIAM J. Numer. Anal. 5, 927–957 (2013)
Article MATH Google Scholar
Lewis, A.S.: The convex analysis of unitarily invariant matrix functions. J. Convex Anal. 2, 173–183 (1995)
MathSciNet MATH Google Scholar
Li, X., Ng, M.K., Yuan, X.: Median filtering-based methods for static background extraction from surveillance video. Numer. Linear Algebra Appl. 22, 1–10 (2015)
Article MathSciNet MATH Google Scholar
McCoy, M., Tropp, J.: Two proposals for robust PCA using semidefinite programming. Electron. J. Stat. 2, 1123–1160 (2011)
Article MathSciNet MATH Google Scholar
Miao, W.M.: Matrix completion model with fixed basis coefficients and rank regularized problems with hard constraints. National University of Singapore, Disertation for Ph.D. Degree. Republic of Singapore (2013)
Miao, W.M., Pan, S.H., Sun, D.F.: A rank-corrected procedure for matrix completion with fixed basis coefficients. Math. Program. 159, 289–338 (2016)
Article MathSciNet MATH Google Scholar
Peng, Y., Ganesh, A., Wright, J., Ma, Y.: RASL: Robust alignment via sparse and low-rank decomposition for linearly correlated images. IEEE Trans. Pattern Anal. Mach. Intell. 34, 2233–2246 (2012)
Article Google Scholar
Rao, G., Peng, Y., Xun, Z.B.: The robust low-rank and sparse matrix decomposition based on $S_{1/2}$ modeling. Sci. China Inf. Sci. 6, 733–748 (2013). (in Chinese)
Google Scholar
Rockafellar, R.T.: Convex Analysis, Princeton. Princeton University Press, Princeton (1970)
Book MATH Google Scholar
Shu, X., Ahuja, N.: Imaging via three-dimensional compressive sampling (3DCS). In: Proceedings of IEEE International Conference on Computer Vision, pp. 439–436 (2011)
Tao, M., Yuan, X.: Recovering low-rank and sparse components of matrices from incomplete and noisy obervations. SIAM J. Optim. 21, 57–81 (2011)
Article MathSciNet MATH Google Scholar
Waters, A., Sankaranarayanan, A., Baraniuk, R.: Sparcs: recovering low-rank and sparse matrices from compressive measurements. In: Proceedings of Neural Information Processing Systems, Granada, Spain, pp. 1–9 (2011)
Wright, J., Ganesh, A., Min, K., Ma, Y.: Compressive prinipalompo-nent pursuit. In: IEEE International Symposium on Information Theory Proeedings, pp. 1276–1280 (2012)
Yang, L., Pong, T.K., Chen, X.: Alternating direction method of multipliers for a class of nonconvex and nonsmooth problems with applications to background/foreground extraction. SIAM J. Imaging Sci. 10, 74–110 (2017)
Article MathSciNet MATH Google Scholar
Zhang, T.: Some sharp performance bounds for least squares regression with $L_1$ reguarization. Ann. Stat. 37, 2109–2144 (2009)
Article MATH Google Scholar
Zhang, Y., Mu, C., Kuo, H., Wright, J.: Towards guaranteed illumination models for nonconvex objects.In: In International Conference on Computer Vision (2013)
Zhang, Z., Ganesh, A., Liang, X., Ma, Y.: TILT: Transform-invariant low-rank textures. Int. J. Comput. Vis. 99, 1–24 (2012)
Article MathSciNet MATH Google Scholar
Zhou, Z., Li, X., Wright, J., Candés, E., Ma, Y.: Stable principal component pursuit. In: Proceedings of IEEE International Symposium on Information Theory, Austin, TX, pp. 1518–1522 (2010)

Download references

Author information

Authors and Affiliations

Department of Mathematics, South China University of Technology, Guangzhou, 510641, China
Le Han & Shujun Bi

Authors

Le Han
View author publications
You can also search for this author in PubMed Google Scholar
Shujun Bi
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Le Han.

Additional information

Supported by the National Natural Science Foundation of China under Project Nos. 11501219 and 11701186, the Guangdong Natural Science Funds under Project Nos. 2015A030310298 and 2017A030310418, the Open Project Program of the State Key Lab of CAD and CG (Grant No. A1613), Zhejiang University, and the Fundamental Research Funds for the Central Universities (SCUT).

Appendix

1.1 The proof of Lemma 2.3

We first show the proof of the inequality (5). Without loss of generality, we assume that $n_1<n_2$ and the index set $\Omega $ takes the form

$$\begin{aligned} \Omega =\big \{(i,j)\ |\ i+n_1(j-1)\le s^*\ \ \mathrm{with}\ i\in \{1,\ldots ,n_1\}, j\in \{1,\ldots ,\lfloor {s^*}/{n_1}\rfloor +1\}\big \}, \end{aligned}$$

and all components $G_{ij}$ with $i+n_1(j-1)>s^*$ are arranged in a descending order of $|G_{ij}|$:

$$\begin{aligned} |G_{i_0+1,j_0}|\ge \cdots \ge |G_{n_1,j_0}|\ge |G_{1,j_0+1}|\ge \cdots \ge |G_{n_1,j_0+1}|\ge |G_{1,j_0+2}|\ge \cdots |G_{n_1,n_2}|, \end{aligned}$$

where $i_0$ and $j_0$ are nonnegative integers such that $i_0+n_1(j_0-1)=s^*$. For $k=1,2,\ldots $, let

$$\begin{aligned}&\Omega _k:=\Big \{(i,j)\ |\ s^*+(k-1)t<i+n_1(j-1)\le s^*+kt\ \ \mathrm{for}\ i\in \{1,\ldots ,n_1\}, \\&\qquad \qquad j\in \big \{\lfloor \frac{s^*}{n_1}\rfloor +1,\ldots ,\lfloor \frac{s^*+kt}{n_1} \rfloor +1\big \}\Big \}, \end{aligned}$$

except that the largest column index in the last block stops at $n_2$. From the definition of these index sets, it is immediate to see that $\Gamma =\Omega \cup \Omega _1$. Notice that $\Vert \mathcal {P}_{\Omega _k}(G)\Vert _{\infty }\le \frac{\Vert \mathcal {P}_{\Omega _{k-1}}(G)\Vert _{1}}{t}$ when $k>1$, which implies that $\sum _{k>1}\Vert \mathcal {P}_{\Omega _k}(G)\Vert _{\infty }\le \frac{\Vert \mathcal {P}_{\Omega }(G)\Vert _{1}}{t}$. Hence,

$$\begin{aligned} \langle H,\mathcal{A}^* \mathcal{A}(G)\rangle -\langle H,\mathcal{A}^* \mathcal{A}\mathcal {P}_{\Gamma }(G-H)\rangle&=\langle H,\mathcal{A}^* \mathcal{A}(H)\rangle + \langle H, \mathcal{A}^* \mathcal{A}\mathcal {P}_{\Gamma ^c}(G)\rangle \nonumber \\&=\langle H,\mathcal{A}^* \mathcal{A}(H)\rangle + \sum _{k>1}\langle H, \mathcal{A}^* \mathcal{A}\mathcal {P}_{\Omega _k}(G)\rangle \nonumber \\&=\langle H,\mathcal{A}^* \mathcal{A}(H)\rangle \left[ 1+\frac{\sum _{k>1}\langle H, \mathcal{A}^* \mathcal{A}\mathcal {P}_{\Omega _k}(G)\rangle }{\langle H,\mathcal{A}^* \mathcal{A}(H)\rangle }\right] \nonumber \\&\ge \langle H,\mathcal{A}^* \mathcal{A}(H)\rangle \left[ 1-\varpi (s^*+t,t)\sum _{k>1}\frac{\Vert \mathcal {P}_{\Omega _k}(G)\Vert _{\infty }}{\Vert H\Vert _F}\right] \nonumber \\&\ge \langle H,\mathcal{A}^* \mathcal{A}(H)\rangle \left[ 1-\frac{\varpi (s^*+t,t)}{t}\frac{\Vert \mathcal {P}_{\Omega ^c}(G)\Vert _1}{\Vert H\Vert _F}\right] . \end{aligned}$$

Combining the last inequality with the following inequality

$$\begin{aligned}&\langle H,\mathcal{A}^* \mathcal{A}\mathcal {P}_{\Gamma }(G-H)\rangle \ge -\Vert \mathcal {A}(H)\Vert _F\Vert \mathcal {A}\mathcal {P}_{\Gamma }(G-H)\Vert _F\\&\quad \ge -\chi _{+}(s^*+t)\Vert H\Vert _F\Vert \mathcal {P}_{\Gamma }(G-H)\Vert _F, \end{aligned}$$

we immediately obtain the desired result (5).

Next, we give the proof of the inequality (6). Let $\widehat{H}$ be an arbitrary matrix from $\mathcal {I}$. By the definition of $\vartheta _+(2r^* +l)$, $\Vert \mathcal {A}\mathcal {P}_{\mathcal {I}}(\widehat{H}-\widehat{G})\Vert _F^2 \le \vartheta _+(2r^* +l)\Vert \mathcal {P}_{\mathcal {I}}(\widehat{H}-\widehat{G})\Vert _F^2$ and $\Vert \mathcal {A}(\widehat{H})\Vert _F^2\le \vartheta _+(2r^* +l)\Vert \widehat{H}\Vert _F^2$. Then,

$$\begin{aligned}&\big \langle \widehat{H},\mathcal {A}^*\mathcal {A}\mathcal {P}_{\mathcal {I}} (\widehat{G}-\widehat{H})\big \rangle \ge -\Vert \mathcal {A}(\widehat{H})\Vert _F\Vert \mathcal {A}\mathcal {P}_{\mathcal {I}}(\widehat{H}-\widehat{G})\Vert _F\nonumber \\&\quad \ge -\vartheta _+(2r^*+l)\Vert \widehat{H}\Vert _F\Vert \mathcal {P}_{\mathcal {I}}(\widehat{H}-\widehat{G})\Vert _F. \end{aligned}$$

(25)

We proceed the arguments by considering the following two cases.

Case 1: $\mathrm{rank}((U_{2}^*)^{\mathbb {T}}\widehat{G}V_{2}^*)\le l\le \min (n_1 - r^*,n_2 - r^* )$. Now, by the expression of $\mathcal {P}_{\mathcal {J}_1}$, we have

$$\begin{aligned} \mathcal {P}_{\mathcal {J}_1}(\widehat{G})&=U_2^*P_{1}P_{1}^{\mathbb {T}}(U_2^*)^{\mathbb {T}}\widehat{G}V_2^*Q_{1}Q_{1}^{\mathbb {T}}(V_2^*)^{\mathbb {T}} =U_2^*P_{1}\big [\mathrm{Diag}(\sigma ((U_2^*)^{\mathbb {T}}\widehat{G}V_2^*))\ \ 0\big ]Q_{1}^{\mathbb {T}}(V_2^*)^{\mathbb {T}}\\&=U_2^*(U_2^*)^{\mathbb {T}}\widehat{G}V_2^*(V_2^*)^{\mathbb {T}}, \end{aligned}$$

where the last two equalities are due to $(U_{2}^*)^{\mathbb {T}}\widehat{G}V_{2}^* =P_{1}[\mathrm{Diag}\left( \sigma ((U_{2}^2)^{\mathbb {T}}\widehat{G}V_{2}^*)\right) \ \ 0]Q_{1}^{\mathbb {T}}$. Note that $\mathcal {P}_{\mathcal {T}^{\perp }}(\widehat{G})=U_{2}^*(U_{2}^*)^{\mathbb {T}}\widehat{G}V_{2}^*(V_{2}^*)^{\mathbb {T}}$ by (4). So, $\mathcal {P}_{\mathcal {T}^{\perp }}(\widehat{G})=\mathcal {P}_{\mathcal {J}_1}(\widehat{G})$, i.e., $\widehat{G}\in \mathcal {I}$. Then,

$$\begin{aligned} \langle \mathcal {A}(\widehat{H}),\mathcal {A}(\widehat{G})\rangle&=\langle \mathcal {A}(\widehat{H}),\mathcal {A}(\widehat{H})\rangle + \langle \mathcal {A}(\widehat{H}),\mathcal {A}\mathcal {P}_{\mathcal {I}}(\widehat{G}-\widehat{H})\rangle \\&\ge \langle \mathcal {A}(\widehat{H}),\mathcal {A}(\widehat{H})\rangle -\vartheta _+(2r^* +l)\Vert \widehat{H}\Vert _F\Vert \mathcal {P}_{\mathcal {I}}(\widehat{H}-\widehat{G})\Vert _F. \end{aligned}$$

This inequality implies the desired result (6). Thus, we complete the proof for this case.

Case 2: $l<\mathrm{rank}((U_{2}^*)^{\mathbb {T}}\widehat{G}V_{2}^*)$. Let k be the smallest positive integer such that $kl\ge \min (n_1 - r^*,n_2 - r^*)$. Clearly, $k\ge 2$. Let $l_{i}$ and $\widetilde{l}_{i}$ for $i=1,2,\ldots ,k$ be such that

$$\begin{aligned} l_{1}\!=\!\cdots =l_{k-1}=l,\ l_{k}=n_1 - r^* -l(k-1),\ \ \widetilde{l}_{1}\!=\!\cdots \!=\!\widetilde{l}_{k-1}\!=\!l,\ \widetilde{l}_{k}\!=\!n_2 - r^*-l(k-1). \end{aligned}$$

For each $2\le i\le k$, we define the subspace $\mathcal {J}_i:=\big \{U_{2}^*P_iZ(V_{2}^*Q_{i})^{\mathbb {T}}\ |\ Z\in \mathbb {R}^{l_{i}\times \widetilde{l}_{i} }\big \}$, where $P_i\in \mathbb {O}^{(n_1 - r^*)\times l_{i}}$ is the matrix consisting of the $\left( \sum _{j=1}^{i-1}l_{j}+1\right) $th column to the $\left( \sum _{j=1}^{i}l_{j}\right) $th column of P; and $Q_i\in \mathbb {O}^{(n_1 - r^*)\times \widetilde{l}_{i}}$ is the matrix consisting of the $\left( \sum _{j=1}^{i-1}\widetilde{l}_{j}+1\right) $th column to the $\left( \sum _{j=1}^{i}\widetilde{l}_{j}\right) $th column of Q. Clearly, $\mathcal {J}_1\perp \mathcal {J}_i$ for $i\ge 2$. For each $i\ge 1$, it is easy to calculate that

$$\begin{aligned} \mathcal {P}_{\mathcal {J}_i}(Z)=U_{2}^*P_{i}(U_{2}^*P_{i})^{\mathbb {T}}ZV_{2}^*Q_{i}(V_{2}^*Q_{i})^{\mathbb {T}} \quad \ \forall Z\in \mathbb {R}^{n_1\times n_2}. \end{aligned}$$

This, together with $\mathcal {P}_{\mathcal {T}^{\perp }}(\widehat{G})=U_{2}^*(U_{2}^*)^{\mathbb {T}}\widehat{G}V_{2}^*(V_{2}^*)^{\mathbb {T}}$, implies that $ \mathcal {P}_{\mathcal {T}^{\perp }}(\widehat{G}) =\sum _{i=1}^k\mathcal {P}_{J_i}(\widehat{G}). $ Then, $\langle \widehat{H},\mathcal {A}^*\mathcal {A}(\widehat{G})\rangle =\langle \widehat{H},\mathcal {A}^*\mathcal {A}\mathcal {P}_{\mathcal {I}}(\widehat{G})\rangle +{\textstyle {\sum _{i>1}\big \langle \widehat{H},\mathcal {A}^*\mathcal {A}\mathcal {P}_{J_i}(\widehat{G})\big \rangle }}$. Consequently, we have that

$$\begin{aligned}&\big \langle \widehat{H},\mathcal {A}^*\mathcal {A}(\widehat{G})\big \rangle -\big \langle \widehat{H},\mathcal {A}^*\mathcal {A}\mathcal {P}_{\mathcal {I}}(\widehat{G}-\widehat{H})\big \rangle \nonumber \\&=\langle \widehat{H},\mathcal {A}^*\mathcal {A}(\widehat{H})\rangle +\sum _{i>1}\big \langle \mathcal {P}_{\mathcal {I}}(\widehat{H}),\mathcal {A}^*\mathcal {A}\mathcal {P}_{\mathcal {J}_i}(\widehat{G})\big \rangle \nonumber \\&=\langle \widehat{H},\mathcal {A}^*\mathcal {A}(\widehat{H})\rangle \Big (1+\sum _{i>1}\frac{\langle \widehat{H},\mathcal {A}^*\mathcal {A}\mathcal {P}_{\mathcal {J}_i}(\widehat{G})\rangle \Vert \widehat{H}\Vert _F}{\Vert \mathcal {A}(\widehat{H})\Vert _F^2\Vert \mathcal {P}_{\mathcal {J}_i}(\widehat{G})\Vert }\frac{\Vert \mathcal {P}_{\mathcal {J}_i}(\widehat{G})\Vert }{\Vert \widehat{H}\Vert _F} \Big )\nonumber \\&\ge \langle \widehat{H},\mathcal {A}^*\mathcal {A}(\widehat{H})\rangle \Big (1-\pi ( 2r^* +l,l)\frac{\sum _{i>1}\Vert \mathcal {P}_{\mathcal {J}_i}(\widehat{G})\Vert }{\Vert \widehat{H}\Vert _F}\Big )\qquad \nonumber \\&\ge \langle \widehat{H},\mathcal {A}^*\mathcal {A}(\widehat{H})\rangle \Big (1-\frac{\pi ( 2r^* +l,l)\Vert \mathcal {P}_{\mathcal {T}^{\perp }}(\widehat{G})\Vert _*}{l\Vert \widehat{H}\Vert _F}\Big ) \end{aligned}$$

(26)

where the first inequality is using the definition of $\pi $ by the fact that $\widehat{H}\in \mathcal {I},\mathcal {P}_{\mathcal {J}_i}(\widehat{G})\in \mathcal {J}_i$ and $\mathrm{rank}(\mathcal {P}_{\mathcal {J}_i}(\widehat{G}))\le l$, $\mathcal {I}\perp \mathcal {J}_i$ for $i>1$, and the second inequality is due to

$$\begin{aligned} {\textstyle {\sum _{i>1}\Vert \mathcal {P}_{\mathcal {J}_i}(\widehat{G})\Vert \le l^{-1}\sum _{i=1}\big \Vert \mathcal {P}_{\mathcal {J}_i}(\widehat{G})\big \Vert _* =l^{-1}\Vert \mathcal {P}_{\mathcal {T}^{\perp }}(\widehat{G})\Vert _*}}, \end{aligned}$$

since $\Vert \mathcal {P}_{\mathcal {J}_{i+1}}(\widehat{G})\Vert \le l^{-1}\Vert \mathcal {P}_{\mathcal {J}_i}(\widehat{G})\Vert _*$. Combining (26) with (25), we get the inequality (6). Thus, we complete the proof.

1.2 The proof of Lemma 4.2

(a) Notice that $\mathcal{P}_\mathcal{K}(\Delta L^k)=\Delta L^k - \mathcal{P}_{\mathcal {T}^\bot }(\Delta L^k) + \mathcal{P}_\mathcal{H}(\Delta L^k)$ implies $L_q=q(\mathcal{P}_{\mathcal {T}^\bot }(\Delta L^k) - \mathcal{P}_\mathcal{H}(\Delta L^k)+ L^* ) +(1-q)L^k$. It suffices to argue that $\Vert \mathcal{P}_{\mathcal {T}^\bot }(\Delta L^k) - \mathcal{P}_\mathcal{H}(\Delta L^k) + L^*\Vert \le \tau $. By the expression of $\mathcal {P}_{\mathcal {T}^{\perp }}(\Delta L^k)$ and the SVD of $(U_2^*)^\mathbb {T}\Delta L^kV_2^*$, we have that

$$\begin{aligned} \mathcal {P}_{\mathcal {T}^{\perp }}(\Delta L^{k}) =U_{2}^*P\mathrm{Diag}\big (\sigma ((U^*_2)^\mathbb {T}\Delta L^{k}V^*_{2})\big )Q^\mathbb {T}(V_{2}^*)^{\mathbb {T}}. \end{aligned}$$

(27)

From the definition of the subspace $\mathcal {H}$, we have $\mathcal {P}_{\mathcal {H}}(Z)=U_{2}^*P_1P_1^{\mathbb {T}}(U_{2}^*)^{\mathbb {T}} ZV_2^*Q_1Q_1^\mathbb {T}(V_2^*)^\mathbb {T}$ for any $Z\in \mathbb {R}^{n_1\times n_2}$. Together with the last equation, it follows that

$$\begin{aligned} \mathcal {P}_{\mathcal {H}}(\Delta L^{k})&=\mathcal {P}_{\mathcal {H}}(\mathcal {P}_{\mathcal {T}^{\perp }}(\Delta L^{k})) = \mathcal {P}_{\mathcal {H}}\big (U_{2}^*P\mathrm{Diag}\big (\sigma ((U^*_2)^\mathbb {T}\Delta L^{k}V^*_{2})\big )Q^\mathbb {T}(V_{2}^*)^{\mathbb {T}}\big )\nonumber \\&=U_{2}^*P_1P_1^{\mathbb {T}}(U_{2}^*)^{\mathbb {T}}U_{2}^*P\mathrm{Diag}\big (\sigma ((U^*_2)^\mathbb {T}\Delta L^{k}V^*_{2})\big )Q^\mathbb {T}(V_{2}^*)^{\mathbb {T}}V_2^*Q_1Q_1^\mathbb {T}(V_2^*)^\mathbb {T}\nonumber \\&=U_{2}^*P_1\mathrm{Diag}\big (\sigma ^{\downarrow ,l}((U^*_2)^\mathbb {T}\Delta L^{k}V^*_{2})\big )Q_1^\mathbb {T}(V_2^*)^\mathbb {T} \end{aligned}$$

(28)

where $\sigma ^{\downarrow ,l}(Z)$ means the vector consisting of the first l components of $\sigma (Z)$. Thus, we have that

$$\begin{aligned} \mathcal{P}_{\mathcal {T}^\bot }(\Delta L^k) - \mathcal{P}_\mathcal{H}(\Delta L^k)=U_2^*P \mathrm{Diag}(0,\cdots ,0,\sigma _{l+1},\cdots ,\sigma _n)Q^{\mathbb {T}}(V_2^*)^{\mathbb {T}} \end{aligned}$$

where $\sigma _{l+1}\ge \cdots \ge \sigma _n$ are the smallest $n-l$ singular values of $(U_2^*)^{\mathbb {T}} \Delta L^k V_2^*$. Then, $\mathcal{P}_{\mathcal {T}^\bot }(\Delta L^k) - \mathcal{P}_\mathcal{H}(\Delta L^k) + L^*$ has the SVD as

$$\begin{aligned}{}[U_1^* \ U_2^*P]\mathrm{Diag}(\sigma _1(L^*),\ldots ,\sigma _{r^*}(L^*),0,\ldots ,0,\sigma _{l+1},\ldots ,\sigma _n)[V_1^* \ V_2^* Q]^{\mathbb {T}}. \end{aligned}$$

(29)

Note that $\sigma _{l+1}\le \Vert \mathcal{P}_{\mathcal {T}^\bot }(\Delta L^k)\Vert =\Vert \mathcal{P}_{\mathcal {T}^\bot }(L^k)\Vert \le \tau $. The last equation implies that $\Vert \mathcal{P}_{\mathcal {T}^\bot }(\Delta L^k) - \mathcal{P}_\mathcal{H}(\Delta L^k) + L^*\Vert \le \tau $. Thus, we show that $\Vert L_q\Vert \le \tau $.

(b) The Eqs. (27) and (28) imply that $\Vert \mathcal {P}_{\mathcal {T}^\bot }(\Delta L^k )- \mathcal {P}_{\mathcal {H}}(\Delta L^k)\Vert \le l^{-1}\Vert \mathcal {P}_{\mathcal {H}}(\Delta L^k)\Vert _*$ and $\Vert \mathcal {P}_{\mathcal {T}^\bot }(\Delta L^k )- \mathcal {P}_{\mathcal {H}} (\Delta L^k)\Vert _* =\Vert \mathcal {P}_{\mathcal {T}^\bot }(\Delta L^k )\Vert _*-\Vert \mathcal {P}_{\mathcal {H}} (\Delta L^k)\Vert _*$. Then, we have that

$$\begin{aligned} \Vert \mathcal {P}_{\mathcal {K}^\bot }(\Delta L^k)\Vert _F&= \Vert \Delta L^k - \mathcal {P}_{\mathcal {K}}(\Delta L^k)\Vert _F\le (\Vert \Delta L^k -\mathcal {P}_{\mathcal {K}}(\Delta L^k)\Vert \Vert \Delta L^k - \mathcal {P}_{\mathcal {K}}(\Delta L^k)\Vert _*)^{1/2}\nonumber \\&=(\Vert \mathcal {P}_{\mathcal {T}^\bot }(\Delta L^k )- \mathcal {P}_{\mathcal {H}}(\Delta L^k)\Vert \Vert \mathcal {P}_{\mathcal {T}^\bot }(\Delta L^k )- \mathcal {P}_{\mathcal {H}}(\Delta L^k)\Vert _*)^{1/2}\nonumber \\&\le \big [\frac{1}{l}\Vert \mathcal {P}_{\mathcal {H}}(\Delta L^k)\Vert _* (\Vert \mathcal {P}_{\mathcal {T}^\bot }(\Delta L^k)\Vert _* -\Vert \mathcal {P}_{\mathcal {H}}(\Delta L^k)\Vert _* )\big ]^{1/2} \le \frac{1}{2\sqrt{l}}\big \Vert \mathcal {P}_{\mathcal {T}^\bot }(\Delta L^k)\big \Vert _* \end{aligned}$$

where the second equality is using $\mathcal {P}_{\mathcal {K}}(\Delta L^k)=\mathcal {P}_{\mathcal {T}}(\Delta L^k)+\mathcal {P}_{\mathcal {H}}(\Delta L^k)$, and the last inequality is using the fact that $xy\le \frac{(x+y)^2}{4}$ for any $x,y\in \mathbb {R}$. In addition,

$$\begin{aligned} \Vert \mathcal {P}_{\Gamma ^{c}}(\Delta S^k)\Vert _F&= \Vert \Delta S^k - \mathcal {P}_{\Gamma }(\Delta S^k)\Vert _F \le \big (\Vert \Delta S^k -\mathcal {P}_\Gamma (\Delta S^k) \Vert _\infty \Vert \Delta S^k -\mathcal {P}_\Gamma (\Delta S^k) \Vert _1\big )^{1/2}\nonumber \\&=\big (\Vert \mathcal {P}_{\Omega ^c}( \Delta S^k)-\mathcal {P}_{\Lambda }(\Delta S^k) \Vert _\infty \Vert \mathcal {P}_{\Omega ^c}( \Delta S^k)-\mathcal {P}_{\Lambda }(\Delta S^k) \Vert _1\big )^{1/2}\nonumber \\&\le \!\big [\frac{1}{t}\Vert \mathcal {P}_{\Lambda }(\Delta S^k)\Vert _1 (\Vert \mathcal {P}_{\Omega ^c}( \Delta S^k)\Vert _1 \!-\!\Vert \mathcal {P}_{\Lambda }(\Delta S^k)\Vert _1)\big ]^{1/2} \!\le \! \frac{ 1}{2\sqrt{t}} \Vert \mathcal {P}_{\Omega ^c}(\Delta S^k)\Vert _1. \end{aligned}$$

Let $\gamma ^{k-1}:=\min ( a_{k-1},b_{k-1})$. From the last two inequalities, we have

$$\begin{aligned} \Vert \mathcal {P}_{\mathcal {K}^\bot }(\Delta L^k)\Vert _F^2 + \Vert \mathcal {P}_{\Gamma ^c}(\Delta S^k)\Vert _F^2&\le \frac{1}{4l}\Vert \mathcal {P}_{\mathcal {T}^\bot }(\Delta L^k)\Vert _*^2 + \frac{1}{4t}\Vert \mathcal {P}_{\Omega ^c}(\Delta S^k)\Vert _1^2 \nonumber \\&=\frac{a_{k-1}^2}{4la_{k-1}^2} \Vert \mathcal {P}_{\mathcal {T}^\bot }(\Delta L^k)\Vert _*^2 +\frac{b_{k-1}^2}{4tb_{k-1}^2}\Vert \mathcal {P}_{\Omega ^c}(\Delta S^k)\Vert _1^2 \nonumber \\&\le \frac{1}{4\min (l,t)(\gamma ^{k-1})^2}\big (a_{k-1}^2\Vert \mathcal {P}_{\mathcal {T}^{\perp }}(\Delta L^k)\Vert _*^2 + b_{k-1}^2\Vert \mathcal {P}_{\Omega ^c}(\Delta S^k)\Vert _1^2\big )\nonumber \\&\le \frac{1}{4\min (l,t)(\gamma ^{k-1})^2}\big (a_{k-1} \big \Vert \mathcal {P}_{\mathcal {T}^\bot }(\Delta L^k)\big \Vert _* + b_{k-1}\Vert \mathcal {P}_{\Omega ^c}(\Delta S^k)\Vert _1\big )^2\nonumber \\&\le \frac{1}{4\min (l,t)(\gamma ^{k-1})^2}\big (\widehat{a}_{k-1} \big \Vert \mathcal {P}_{\mathcal {T}}(\Delta L^k)\big \Vert _* + \widehat{b}_{k-1}\Vert \mathcal {P}_\Omega (\Delta S^k)\Vert _1\big )^2\nonumber \\&\le \frac{1}{4\min (l,t)}\big (\sqrt{2r^*}\xi _{k-1}\big \Vert \mathcal {P}_{\mathcal {T}}(\Delta L^k)\big \Vert _F + \sqrt{s^*}\eta _{k-1}\Vert \mathcal {P}_\Omega (\Delta S^k)\Vert _F\big )^2\nonumber \\&\le \frac{1}{4\min (l,t)}\big (\sqrt{2r^*}\xi _{k-1}\big \Vert \mathcal {P}_{\mathcal {K}}(\Delta L^k)\big \Vert _F + \sqrt{s^*}\eta _{k-1}\Vert \mathcal {P}_\Gamma (\Delta S^k)\Vert _F\big )^2 \end{aligned}$$

(30)

where the second inequality is using the fact that $x^2+y^2 \le (x+y)^2$ for any $x,y\in \mathbb {R}_+ $ and the fourth inequality is using Lemma 4.1. Notice that $\Vert \Delta L^k\Vert _F^2=\Vert \mathcal {P}_{\mathcal {K}}(\Delta L^k)\Vert _F^2+\Vert \mathcal {P}_{\mathcal {K}^{\perp }}(\Delta L^k)\Vert _F^2$ and $\Vert \Delta S^k\Vert _F^2=\Vert \mathcal{P}_\Gamma (\Delta S^k)\Vert _F^2+\Vert \mathcal{P}_{\Gamma ^c}(\Delta S^k)\Vert _F^2$. From inequality (30), we readily obtain the desired result. The proof is completed.

1.3 The proof of Theorem 4.1

Let the subspaces $\mathcal {H}$ and $\mathcal {K}$ and the index sets $\Gamma $ and $\Lambda $ be defined as in Lemma 4.2. Let $L_q:=L^k - q \mathcal{P}_\mathcal{K}(\Delta L^k )$ and $S_q:=S^k - q \mathcal{P}_{\Gamma }(\Delta S^k)$ for any $q\in (0,1)$. From Lemma 4.2 (a), the point $(L_q,S_q)$ is feasible to the problem (13). Together with the optimality of $(L^k,S^k)$ to the problem (13), we have that

$$\begin{aligned}&\frac{1}{2} \Vert \mathcal{A}(L^k+S^k)-M\Vert _F^2 + \lambda _{k-1}\big ( \Vert L^k\Vert _*-\langle W^{k-1},L^k\rangle \big ) + \nu _{k-1}\Vert T^{k-1}\circ S^k\Vert _1 \\ \le&\frac{1}{2} \Vert \mathcal{A}(L_q+S_q)-M\Vert _F^2 + \lambda _{k-1} \big (\Vert L_q\Vert _*-\langle W^{k-1},L_q\rangle \big ) +\nu _{k-1}\Vert T^{k-1} \circ S_q\Vert _1. \end{aligned}$$

Then we get that

$$\begin{aligned}&\frac{1}{2} \Vert \mathcal{A}(\Delta L^k+ \Delta S^k)\Vert _F^2 - \frac{1}{2} \Vert \mathcal{A}(L_q - L^*+ S_q - S^*)\Vert _F^2 \\ \le&\langle L^k - L_q + S^k -S_q, \mathcal{A}^*(N^*) \rangle + \lambda _{k-1}(\Vert L_q\Vert _*- \Vert L^k\Vert _*) \\&+ \lambda _{k-1} \langle W^{k-1}, L^k - L_q \rangle + \nu _{k-1} \big (\Vert T^{k-1}\circ S_q\Vert _1 - \Vert T^{k-1}\circ S^k\Vert _1 \big )\\ =&q \langle \mathcal{A}^*(N^*),\mathcal{P}_\mathcal{K}(\Delta L^k )\rangle + q \langle \mathcal{A}^*(N^*),\mathcal{P}_\Gamma (\Delta S^k) \rangle +q \lambda _{k-1} \langle W^{k-1}, \mathcal{P}_\mathcal{K}(\Delta L^k)\rangle \\&+ \lambda _{k-1}\big (\Vert q(\mathcal{P}_{\mathcal {T}^\bot }(\Delta L^k) - \mathcal{P}_\mathcal{H}(\Delta L^k)+ L^* ) +(1-q)L^k\Vert _*-\Vert L^k\Vert _* \big )\\&+ \nu _{k-1}\big (\Vert T^{k-1}\circ [q(S^*+\mathcal{P}_{\Omega ^c}(\Delta S^k)-\mathcal{P}_\Lambda (\Delta S^k))+(1-q)S^k]\Vert _1-\Vert T^{k-1}\circ S^k\Vert _1 \big ) \\ \le&q \langle \mathcal{A}^*(N^*),\mathcal{P}_\mathcal{K}(\Delta L^k )\rangle + q \langle \mathcal{A}^*(N^*),\mathcal{P}_\Gamma (\Delta S^k) \rangle \\&+ q\lambda _{k-1}\big (\Vert L^* + \mathcal {P}_{\mathcal {T}^\bot }(\Delta L^k) -\mathcal {P}_{\mathcal {H}}(\Delta L^k)\Vert _*-\Vert L^k\Vert _* +\langle W^{k-1}, \mathcal{P}_\mathcal{K}(\Delta L^k)\rangle \big )\\&+ q \nu _{k-1}\big (\Vert T^{k-1}\circ (S^*+\mathcal{P}_{\Omega ^c}(\Delta S^k)-\mathcal{P}_\Lambda (\Delta S^k))\Vert _1-\Vert T^{k-1}\circ S^k\Vert _1 \big ). \end{aligned}$$

From the proof of Lemma 4.2 (a), we get that $ U_1^*(V_1^*)^\mathbb {T} +U_2^* P \mathrm{Diag}(e) Q^\mathbb {T}(V_{2}^*)^\mathbb {T}$ is a subdifferential of the nuclear norm at $L^* + \mathcal {P}_{\mathcal {T}^\bot }(\Delta L^k) -\mathcal {P}_{\mathcal {H}}(\Delta L^k)$. Similarly, $ \mathrm{sgn}(S^*) + \mathrm{sgn}(\mathcal{P}_{\Omega ^c} (\Delta S^k))$ is a subdifferential of the $\ell _1$ norm at $T^{k-1}\circ (S^*+\mathcal{P}_{\Omega ^c}(\Delta S^k)-\mathcal{P}_\Lambda (\Delta S^k))$. Thus, it follows that

$$\begin{aligned}&\frac{1}{2} \Vert \mathcal{A}(\Delta L^k+ \Delta S^k)\Vert _F^2 - \frac{1}{2} \Vert \mathcal{A}(L_q - L^*+ S_q - S^*)\Vert _F^2 \nonumber \\ \le&q \langle \mathcal{A}^*(N^*),\mathcal{P}_\mathcal{K}(\Delta L^k )\rangle + q \langle \mathcal{A}^*(N^*),\mathcal{P}_\Gamma (\Delta S^k) \rangle \nonumber \\&+ q\lambda _{k-1}\big (\langle W^{k-1}, \mathcal{P}_\mathcal{K}(\Delta L^k)\rangle - \langle U_1^*(V_1^*)^\mathbb {T} +U_2^* P \mathrm{Diag}(e) Q^\mathbb {T}(V_{2}^*)^\mathbb {T} ,\mathcal{P}_\mathcal{K}(\Delta L^k)\rangle \big )\nonumber \\&- q \nu _{k-1} \langle T^{k-1}\circ (\mathrm{sgn}(S^*) + \mathrm{sgn}(\mathcal{P}_{\Omega ^c} (\Delta S^k)),\mathcal{P}_\Gamma (\Delta S^k)\rangle . \end{aligned}$$

(31)

Note that

$$\begin{aligned}&\langle W^{k-1}, \mathcal{P}_\mathcal{K}(\Delta L^k)\rangle - \langle U_1^*(V_1^*)^\mathbb {T} +U_2^* P \mathrm{Diag}(e) Q^\mathbb {T}(V_{2}^*)^\mathbb {T} ,\mathcal{P}_\mathcal{K}(\Delta L^k)\rangle \nonumber \\ =&\langle W^{k-1}- U_1^*(V_1^*)^\mathbb {T}, \mathcal{P}_\mathcal{K}(\Delta L^k)\rangle -\langle U_2^* P \mathrm{Diag}(e) Q^\mathbb {T}(V_{2}^*)^\mathbb {T} ,\mathcal{P}_\mathcal{K}(\Delta L^k)\rangle \nonumber \\ =&\langle W^{k-1}- U_1^*(V_1^*)^\mathbb {T}, \mathcal{P}_\mathcal{T}(\Delta L^k)\rangle +\langle W^{k-1},\mathcal{P}_\mathcal{H}(\Delta L^k)\rangle - \Vert \mathcal{P}_\mathcal{H}(\Delta L^k)\Vert _* \\ \le&\Vert \mathcal{P}_\mathcal{H}(W^{k-1})\Vert \Vert \mathcal{P}_\mathcal{H}(\Delta L^k)\Vert _* \!+\!\Vert \mathcal{P}_\mathcal{T}(U_1^* (V_1^*)^\mathbb {T} - W^{k-1}) \Vert _F \Vert \mathcal{P}_\mathcal{T}(\Delta L^k)\Vert _F \!-\! \Vert \mathcal{P}_\mathcal{H}(\Delta L^k)\Vert _*, \nonumber \end{aligned}$$

(32)

where the second equality is obtained by $\mathcal {K}:=\mathcal {T}\oplus \mathcal {H}$ and (29). In addition, it holds

$$\begin{aligned}&\langle T^{k-1}\circ (\mathrm{sgn}(S^*) + \mathrm{sgn}(\mathcal{P}_{\Omega ^c} (\Delta S^k)),\mathcal{P}_\Gamma (\Delta S^k)\rangle \nonumber \\ =&\langle T^{k-1}\circ \mathrm{sgn}(S^*),\mathcal{P}_{\Omega } (\Delta S^k)\rangle +\langle T^{k-1}\circ \mathrm{sgn}(\mathcal{P}_{\Omega ^c} (\Delta S^k)), \mathcal{P}_\Lambda (\Delta S^k)\rangle \nonumber \\ =&\langle T^{k-1} \circ \mathrm{sgn}(S^*), \mathcal{P}_\Omega (\Delta S^k) \rangle - \Vert T^{k-1} \circ \mathcal{P}_\Lambda (\Delta S^k) \Vert _1 \nonumber \\ \le&\Vert \mathcal{P}_\Omega (T^{k-1})\Vert _F \Vert \mathcal{P}_\Omega (\Delta S^k)\Vert _F- \Vert T^{k-1} \circ \mathcal{P}_\Lambda (\Delta S^k) \Vert _1 \end{aligned}$$

(33)

and

$$\begin{aligned}&\langle \mathcal{A}^*(N^*),\mathcal{P}_\mathcal{K}(\Delta L^k )\rangle + \langle \mathcal{A}^*(N^*),\mathcal{P}_\Gamma (\Delta S^k) \rangle \nonumber \\ \le&\Vert \mathcal{P}_\mathcal{T}(\mathcal{A}^*(N^*))\Vert _F \Vert \mathcal{P}_\mathcal{T}(\Delta L^k) \Vert _F + \Vert \mathcal{P}_\mathcal{H}(\mathcal{A}^*(N^*))\Vert \Vert \mathcal{P}_\mathcal{H}(\Delta L^k)\Vert _* \nonumber \\&+\Vert \mathcal{P}_\Omega (\mathcal{A}^*(N^*))\Vert _F \Vert \mathcal{P}_\Omega (\Delta S^k)\Vert _F + \Vert \mathcal{P}_\Lambda (\mathcal{A}^*(N^*))\Vert _\infty \Vert \mathcal{P}_\Lambda (\Delta S^k)\Vert _1. \end{aligned}$$

(34)

Combine with (31–34), we get that

$$\begin{aligned}&\frac{1}{2} \Vert \mathcal{A}(\Delta L^k+ \Delta S^k)\Vert _F^2 - \frac{1}{2} \Vert \mathcal{A}(L_q - L^*+ S_q - S^*)\Vert _F^2 \nonumber \\ \le&q \Vert \mathcal{P}_\mathcal{T}(\mathcal{A}^*(N^*))\Vert _F \Vert \mathcal{P}_\mathcal{T}(\Delta L^k) \Vert _F +q \lambda _{k-1} \delta _{k-1} \Vert \mathcal{P}_\mathcal{T}(\Delta L^k)\Vert _F -qa_{k-1}\Vert \mathcal{P}_\mathcal{H}(\Delta L^k)\Vert _*\nonumber \\&+ q \Vert \mathcal{P}_\Omega (\mathcal{A}^*(N^*))\Vert _F \Vert \mathcal{P}_\Omega (\Delta S^k)\Vert _F +q \nu _{k-1} \overline{\delta }_{k-1} \Vert \mathcal{P}_\Omega (\Delta S^k)\Vert _F -qb_{k-1}\Vert \mathcal{P}_\Gamma (\Delta S^k)\Vert _1 \nonumber \\ \le&q \big (\Vert \mathcal{P}_\mathcal{T}(\mathcal{A}^*(N^*))\Vert _F + \lambda _{k-1} \delta _{k-1} \big ) \Vert \mathcal{P}_\mathcal{K}(\Delta L^k)\Vert _F \nonumber \\&+ q \big (\Vert \mathcal{P}_\Omega (\mathcal{A}^*(N^*))\Vert _F + \nu _{k-1} \overline{\delta }_{k-1} \big ) \Vert \mathcal{P}_\Gamma (\Delta S^k)\Vert _F, \end{aligned}$$

(35)

where $\delta _{k-1} := \Vert \mathcal{P}_\mathcal{T}(U_1^* (V_1^*)^\mathbb {T}-W^{k-1})\Vert _F$ and $\overline{\delta }_{k-1} := \Vert \mathcal{P}_\Omega (T^{k-1})\Vert _F$. On the other hand, for any $q\in (0,1)$, we have that

$$\begin{aligned}&\frac{1}{2} \Vert \mathcal{A}(\Delta L^k+ \Delta S^k)\Vert _F^2 - \frac{1}{2} \Vert \mathcal{A}(L_q - L^*+ S_q - S^*)\Vert _F^2 \nonumber \\ =&\frac{1}{2} \Vert \mathcal{A}(\Delta L^k+ \Delta S^k)\Vert _F^2 - \frac{1}{2} \Vert \mathcal{A}(\Delta L^k+ \Delta S^k- q \mathcal{P}_\mathcal{K}(\Delta L^k)- q \mathcal{P}_\Gamma (\Delta S^k)\Vert _F^2 \nonumber \\ =&-\frac{q^2}{2} \Vert \mathcal{A}(\mathcal{P}_\mathcal{K}(\Delta L^k)+ \mathcal{P}_\Gamma (\Delta S^k))\Vert _F^2 + q\langle \mathcal{A}(\Delta L^k+ \Delta S^k),\mathcal{A}(\mathcal{P}_\mathcal{K}(\Delta L^k)+ \mathcal{P}_\Gamma (\Delta S^k))\rangle \end{aligned}$$

(36)

Combining (35) and (36), and taking the limit $q\rightarrow 0^+$, we obtain that

$$\begin{aligned}&\big (\Vert \mathcal{P}_\mathcal{T}(\mathcal{A}^*(N^*))\Vert _F + \lambda _{k-1} \delta _{k-1} \big ) \Vert \mathcal{P}_\mathcal{K}(\Delta L^k)\Vert _F \nonumber \\&\ + \big (\Vert \mathcal{P}_\Omega (\mathcal{A}^*(N^*))\Vert _F + \nu _{k-1} \overline{\delta }_{k-1} \big ) \Vert \mathcal{P}_\Gamma (\Delta S^k)\Vert _F\nonumber \\ \ge&\langle \mathcal{A}(\Delta L^k+ \Delta S^k),\mathcal{A}(\mathcal{P}_\mathcal{K}(\Delta L^k)+ \mathcal{P}_\Gamma (\Delta S^k))\rangle \nonumber \\ =&\langle \mathcal{Q}(\Delta L^k), \mathcal{P}_\mathcal{K}(\Delta L^k)\rangle + \langle \mathcal{Q}(\Delta S^k), \mathcal{P}_\Gamma (\Delta S^k)\rangle \nonumber \\&\ + \langle \mathcal{Q}(\Delta L^k), \mathcal{P}_\Gamma (\Delta S^k)\rangle + \langle \Delta S^k, \mathcal{Q}(\mathcal{P}_\mathcal{K}(\Delta L^k))\rangle . \end{aligned}$$

(37)

Notice that $\Vert \Delta L^k\Vert _\infty \le \Vert \Delta L^k\Vert \le 2\tau $. We have that

$$\begin{aligned}&|\langle \mathcal{Q}(\Delta L^k), \mathcal{P}_\Gamma (\Delta S^k)\rangle | \le \Vert \mathcal{Q}(\Delta L^k)\Vert _\infty \Vert \Delta S^k\Vert _1 \le \Vert \mathcal {Q}\Vert _\infty \Vert \Delta L^k\Vert _\infty \Vert \Delta S^k\Vert _1\le \\&\quad 2\tau \Vert \mathcal {Q}\Vert _\infty \Vert \Delta S^k\Vert _1. \end{aligned}$$

Similarly, we get that $ |\langle \Delta S^k, \mathcal{Q}(\mathcal{P}_\mathcal{K}(\Delta L^k))\rangle | \le 2\tau \Vert \mathcal {Q}\Vert _\infty \Vert \Delta S^k\Vert _1. $ Together with (37), it is immediate to obtain that

$$\begin{aligned}&\big (\Vert \mathcal{P}_\mathcal{T}(\mathcal{A}^*(N^*))\Vert _F + \lambda _{k-1} \delta _{k-1} \big ) \Vert \mathcal{P}_\mathcal{K}(\Delta L^k)\Vert _F \\&+ \big (\Vert \mathcal{P}_\Omega (\mathcal{A}^*(N^*))\Vert _F + \nu _{k-1} \overline{\delta }_{k-1} \big ) \Vert \mathcal{P}_\Gamma (\Delta S^k)\Vert _F + 4\tau \Vert \mathcal {Q}\Vert _\infty \Vert \Delta S^k\Vert _1\\ \ge&\langle \mathcal{Q}(\Delta L^k), \mathcal{P}_\mathcal{K}(\Delta L^k)\rangle + \langle \mathcal{Q}(\Delta S^k), \mathcal{P}_\Gamma (\Delta S^k)\rangle . \end{aligned}$$

Using Lemma 2.3 with $G=\Delta S^k$, $H=\mathcal {P}_{\Gamma }(\Delta S^k)$, and $\mathcal {J}_1=\mathcal {H}$, $\mathcal {I}=\mathcal {K},\widehat{G}=\Delta L^{k}$, $\widehat{H}=\mathcal {P}_{\mathcal {K}}(\Delta L^k)$ yields that $\Vert \mathcal{P}_\Gamma (G-H)\Vert _F=0, \Vert \mathcal{P}_{\mathcal {I}}(\widehat{G}-\widehat{H})\Vert _F=0$ and

$$\begin{aligned}&\big (\Vert \mathcal{P}_\mathcal{T}(\mathcal{A}^*(N^*))\Vert _F + \lambda _{k-1} \delta _{k-1} \big ) \Vert \mathcal{P}_\mathcal{K}(\Delta L^k)\Vert _F \nonumber \\&+ \big (\Vert \mathcal{P}_\Omega (\mathcal{A}^*(N^*))\Vert _F + \nu _{k-1} \overline{\delta }_{k-1} \big ) \Vert \mathcal{P}_\Gamma (\Delta S^k)\Vert _F + 4\tau \Vert \mathcal {Q}\Vert _\infty \Vert \Delta S^k\Vert _1\nonumber \\ \ge&\langle \mathcal {P}_{\mathcal {K}}(\Delta L^k),\mathcal {A}^*\mathcal {A}(\mathcal {P}_{\mathcal {K}}(\Delta L^k))\rangle \Big [1-\frac{\pi ( 2r^*+l,l)\Vert \mathcal {P}_{\mathcal {T}^{\perp }}(\Delta L^{k})\Vert _*}{l\Vert \mathcal {P}_{\mathcal {K}}(\Delta L^k)\Vert _F}\Big ]\nonumber \\&+ \langle \mathcal {P}_{\Gamma }(\Delta S^k),\mathcal{A}^* \mathcal{A}(\mathcal {P}_{\Gamma }(\Delta S^k))\rangle \left[ 1-\frac{\varpi (s^*+t,t)}{t}\frac{\Vert \mathcal {P}_{\Omega ^c}(\Delta S^k)\Vert _1}{\Vert \mathcal {P}_{\Gamma }(\Delta S^k)\Vert _F}\right] \nonumber \\ =&\Vert \mathcal{A}\mathcal {P}_{\mathcal {K}}(\Delta L^k) \Vert _F^2 - \frac{\pi (2r^*+l,l)}{l}\Vert \mathcal {P}_{\mathcal {T}^\bot }(\Delta L^k)\Vert _* \Vert \mathcal{A}\mathcal {P}_{\mathcal {K}}(\Delta L^k) \Vert _F^2 /\Vert \mathcal {P}_{\mathcal {K}}(\Delta L^k) \Vert _F \nonumber \\&\ + \Vert \mathcal{A}\mathcal {P}_{\Gamma }(\Delta S^k) \Vert _F^2 -\frac{ \varpi (s^*+t,t)}{t}\Vert \mathcal {P}_{\Omega ^c}(\Delta S^k)\Vert _1 \Vert \mathcal{A}\mathcal {P}_{\Gamma }(\Delta S^k) \Vert _F^2 /\Vert \mathcal {P}_{\Gamma }(\Delta S^k) \Vert _F \nonumber \\ \ge&\vartheta _{-}(2r^*+l)\Vert \mathcal {P}_{\mathcal {K}}(\Delta L^k)\Vert _F^2 - \frac{ \vartheta _{+}(2r^*+l) \pi (2r^*+l,l)}{l}\Vert \mathcal {P}_{\mathcal {T}^\bot }(\Delta L^k)\Vert _* \Vert \mathcal {P}_{\mathcal {K}}(\Delta L^k)\Vert _F \nonumber \\&\ +\chi _-(s^*+t) \Vert \mathcal {P}_{\Gamma }(\Delta S^k)\Vert _F^2 - \frac{\chi _+(s^*+t) \varpi (s^*+t,t)}{t}\Vert \mathcal {P}_{\Omega ^c}(\Delta S^k)\Vert _1 \Vert \mathcal {P}_{\Gamma }(\Delta S^k)\Vert _F \\ \ge&\vartheta _{-}(2r^*+l) \big (1\!-\frac{\pi (2r^*+l,l)}{2l}\big )\Vert \mathcal {P}_{\mathcal {K}}(\Delta L^k)\Vert _F^2\! -\! \frac{ \pi (2r^*\!+l,l)\vartheta _{+}^2(2r^*+l)}{2l\vartheta _{-}(2r^*+l)}\Vert \mathcal {P}_{\mathcal {T}^\perp }(\Delta L^k)\Vert _*^2 \nonumber \\&+\chi _-(s^*+t) \big ( 1- \frac{\varpi (s^*+t,t)}{2t}\big ) \Vert \mathcal {P}_{\Gamma }(\Delta S^k)\Vert _F^2 - \frac{ \varpi (s^*+t,t)\chi _+^2(s^*+t)}{2t\chi _-(s^*+t)}\Vert \mathcal {P}_{\Omega ^c}(\Delta S^k)\Vert _1^2 \nonumber \end{aligned}$$

(38)

where the second inequality is using the eigenvalue concepts of a linear operator $\mathcal{A}$ in Sect. 2, and the last inequality is using the fact that $xy\le \frac{x^2/z+z y^2}{2}$ for any $x,y,z\in \mathbb {R}_+ $. Let $ \beta ^k\equiv \max \big (\frac{\sqrt{ \vartheta _{+}^2(2r^*+l) \pi (2r^*+l,l)}}{\sqrt{l \vartheta _{-}(2r^*+l)}a_{k-1}},\frac{\sqrt{ \chi _+^2(s^*+t) \varpi (s^*+t,t)}}{\sqrt{t \chi _-(s^*+t)}b_{k-1}}\big ). $ From Lemma 4.1 and the definition of $\widetilde{\gamma }:=\max \big (\frac{\vartheta _{+}^2(2r^*+l) \pi (2r^*+l,l)}{l\vartheta _{-}(2r^*+l)},\frac{ \chi _+^2(s^*+t) \varpi (s^*+t,t)}{t\chi _-(s^*+t)}\big )$,

$$\begin{aligned}&\sqrt{\frac{ \vartheta _{+}^2(2r^*+l) \pi (2r^*+l,l)}{2l\vartheta _{-}(2r^*+l)}}\Vert \mathcal {P}_{\mathcal {T}^\bot }(\Delta L^k)\Vert _* + \sqrt{\frac{\chi _+^2(s^*+t) \varpi (s^*+t,t)}{2t\chi _-(s^*+t)}}\Vert \mathcal {P}_{\Omega ^c}(\Delta S^k)\Vert _1\nonumber \\ \le&\frac{1}{\sqrt{2}}\beta ^k\big (a_{k-1}\Vert \mathcal{P}_{\mathcal{T}^\bot }(\Delta L^k)\Vert _* + b_{k-1}\Vert \mathcal{P}_{\Omega ^c}(\Delta S^k)\Vert _1\big )\nonumber \\ \le&\frac{1}{\sqrt{2}} \beta ^k\big ( \widehat{a}_{k-1}\big \Vert \mathcal {P}_{\mathcal {T}}(\Delta L^k)\big \Vert _* + \widehat{b}_{k-1}\Vert \mathcal {P}_\Omega (\Delta S^k)\Vert _1 \big )\nonumber \\ \le&\frac{1}{\sqrt{2}} \beta ^k\big ( \sqrt{2r^*}\widehat{a}_{k-1}\big \Vert \mathcal {P}_{\mathcal {T}}(\Delta L^k)\big \Vert _F + \sqrt{s^*}\widehat{b}_{k-1}\Vert \mathcal {P}_\Omega (\Delta S^k)\Vert _F\big )\nonumber \\ \le&\sqrt{\widetilde{\gamma }/2} \left( \sqrt{2r^*}\xi _{k-1}\big \Vert \mathcal {P}_{\mathcal {K}}(\Delta L^k)\big \Vert _F + \sqrt{s^*}\eta _{k-1}\Vert \mathcal {P}_\Gamma (\Delta S^k)\Vert _F\right) \nonumber \\ \le&\sqrt{\widetilde{\gamma }\big ( 2r^* \xi _{k-1}^2\Vert \mathcal {P}_{\mathcal {K}}(\Delta L^k)\big \Vert _F^2 + s^*\eta _{k-1}^2 \Vert \mathcal {P}_\Gamma (\Delta S^k)\Vert _F^2 \big )} \end{aligned}$$

(39)

where the last second inequality is due to $\beta ^k\le \sqrt{\widetilde{\gamma }}/ \min (a_{k-1},b_{k-1})$ and the fact that $\xi _{k-1}:=\widehat{a}_{k-1}/ \min ( a_{k-1} , b_{k-1}), \eta _{k-1}:=\widehat{b}_{k-1}/ \min ( a_{k-1}, b_{k-1})$, and the last inequality is using the fact that $x+y\le \sqrt{2(x^2+y^2)}$ for any $x,y\in \mathbb {R}$. From Lemma 4.1 and the definitions of $\zeta _{k-1}:=\widehat{a}_{k-1}/b_{k-1}$ and $\mu _{k-1}:=\widehat{b}_{k-1}/ b_{k-1}$, it follows that

$$\begin{aligned} \Vert \Delta S^k\Vert _1&\le \Vert \mathcal{P}_{\Omega ^c}(\Delta S^k)\Vert _1 +\Vert \mathcal{P}_\Omega (\Delta S^k)\Vert _1 \nonumber \\&\le \zeta _{k-1}\big \Vert \mathcal {P}_{\mathcal {T}}(\Delta L^k)\big \Vert _* + (1+\mu _{k-1})\Vert \mathcal {P}_\Omega (\Delta S^k)\Vert _1\nonumber \\&\le \zeta _{k-1}\sqrt{2r^*}\Vert \mathcal {P}_{\mathcal {T}}(\Delta L^k)\Vert _F + \sqrt{s^*}(1+\mu _{k-1})\Vert \mathcal {P}_\Omega (\Delta S^k)\Vert _F\nonumber \\&\le \zeta _{k-1}\sqrt{2r^*}\Vert \mathcal {P}_{\mathcal {K}}(\Delta L^k)\Vert _F + \sqrt{s^*}(1+\mu _{k-1})\Vert \mathcal {P}_\Gamma (\Delta S^k)\Vert _F. \end{aligned}$$

(40)

Combining with inequalities (38)–(40) and noticing that $x^2+y^2 \le (x+y)^2$ holds for any $x,y\in \mathbb {R}_+$, we obtain that

$$\begin{aligned}&\big [ \vartheta _{-}(2r^*+l) \big (1-\frac{\pi (2r^*+l,l)}{2l}\big ) - 2r^*\widetilde{\gamma }\xi _{k-1}^2 \big ]\Vert \mathcal {P}_{\mathcal {K}}(\Delta L^k)\Vert _F^2 \nonumber \\&\ + \big [ \chi _-(s^*+t)\big (1- \frac{\varpi (s^*+t,t)}{2t}\big )- s^*\widetilde{\gamma } \eta _{k-1}^2\big ]\Vert \mathcal {P}_\Gamma (\Delta S^k)\Vert _F^2 \nonumber \\ \le&\big [4\tau \Vert \mathcal{Q}\Vert _\infty \zeta _{k-1}\sqrt{2r^*} + \Vert \mathcal{P}_\mathcal{T}(\mathcal{A}^*(N^*))\Vert _F + \lambda _{k-1} \delta _{k-1} \big ]\Vert \mathcal {P}_{\mathcal {K}}(\Delta L^k)\Vert _F \nonumber \\&\ + \big [4\tau \Vert \mathcal{Q}\Vert _\infty \sqrt{s^*}(1+\mu _{k-1}) + \Vert \mathcal{P}_\Omega (\mathcal{A}^*(N^*))\Vert _F + \nu _{k-1} \overline{\delta }_{k-1} \big ] \Vert \mathcal {P}_\Gamma (\Delta S^k)\Vert _F. \end{aligned}$$

(41)

Notice that combining Assumption 4.1 with Lemmas 2.1 and 2.2, we get $\frac{\varpi (s^*+t,t)}{2t}\le c_2 $, $\frac{\pi (2r^*+l,l)}{2l}\le c_1$ and then $\widetilde{\gamma }\le 2\max \big (c_1 \frac{\vartheta _{+}^2(2r^*+l)}{\vartheta _{-}(2r^*+l)} , c_2 \frac{\chi _+^2(s^*+t)}{\chi _-(s^*+t)}\big ) $ which implies

$$\begin{aligned}&\frac{\vartheta _{-}(2r^*+l)\big (1-\frac{\pi (2r^*+l,l)}{2l}\big )}{2r^*\widetilde{\gamma }}\ge \frac{1-c_1}{2r^* \gamma _1}>\xi _{k-1}^2, \\&\frac{\chi _-(s^*+t) \big ( 1- \frac{\varpi (s^*+t,t)}{2t}\big )}{s^*\widetilde{\gamma }}\ge \frac{1-c_2}{s^*\gamma _2}>\eta _{k-1}^2 \end{aligned}$$

where

$$\begin{aligned}&\gamma _1:=2\max \big (c_1 \frac{\vartheta _{+}^2(2r^*+l)}{\vartheta _{-}^2(2r^*+l)} , c_2 \frac{\chi _+^2(s^*+t)}{\vartheta _{-}(2r^*+l)\chi _-(s^*+t)}\big ), \\&\gamma _2:=2\max \big (c_1 \frac{\vartheta _{+}^2(2r^*+l)}{\vartheta _{-}(2r^*+l)\chi _-(s^*+t)} , c_2 \frac{\chi _+^2(s^*+t)}{\chi _-^2(s^*+t)}\big ) \end{aligned}$$

and the assumptions of $\xi _{k-1}^2$ and $\eta _{k-1}^2$ are used. So, the coefficients on the left side in (41) are positive.

Then, by the definitions of $\widetilde{a}_{k-1},\widetilde{b}_{k-1},\overline{a}_{k-1}$ and $\overline{b}_{k-1}$ in (21), we rewrite (41) as

$$\begin{aligned} \widetilde{a}_{k-1} \Vert \mathcal {P}_{\mathcal {K}}(\Delta L^k)\Vert _F^2 + \widetilde{b}_{k-1}\Vert \mathcal {P}_\Gamma (\Delta S^k)\Vert _F^2 \le \overline{a}_{k-1} \Vert \mathcal {P}_{\mathcal {K}}(\Delta L^k)\Vert _F + \overline{b}_{k-1} \Vert \mathcal {P}_\Gamma (\Delta S^k)\Vert _F \end{aligned}$$

where $\widetilde{a}_{k-1},\widetilde{b}_{k-1}>0$. It is not hard to obtain that

$$\begin{aligned}&\overline{a}_{k-1} \Vert \mathcal {P}_{\mathcal {K}}(\Delta L^k)\Vert _F + \overline{b}_{k-1} \Vert \mathcal {P}_\Gamma (\Delta S^k)\Vert _F \\ \le&\frac{\frac{\overline{a}_{k-1}^2}{\widetilde{a}_{k-1}}+\widetilde{a}_{k-1} \Vert \mathcal {P}_{\mathcal {K}}(\Delta L^k)\Vert _F^2}{2}+\frac{\frac{\overline{b}_{k-1}^2}{\widetilde{b}_{k-1}} +\widetilde{b}_{k-1} \Vert \mathcal {P}_\Gamma (\Delta S^k)\Vert _F^2}{2}, \end{aligned}$$

which implies

$$\begin{aligned} \Vert \mathcal {P}_{\mathcal {K}}(\Delta L^k)\Vert _F \le \sqrt{\frac{\overline{a}_{k-1}^2}{\widetilde{a}_{k-1}^2} + \frac{\overline{b}_{k-1}^2}{\widetilde{a}_{k-1}\widetilde{b}_{k-1}}}:=\theta _{k-1} \end{aligned}$$

and

$$\begin{aligned} \Vert \mathcal {P}_\Gamma (\Delta S^k)\Vert _F\le \sqrt{\frac{\overline{b}_{k-1}^2}{\widetilde{b}_{k-1}^2} + \frac{\overline{a}_{k-1}^2}{\widetilde{a}_{k-1}\widetilde{b}_{k-1}}}:=\overline{\theta }_{k-1}. \end{aligned}$$

Substituting the bounds of $\Vert \mathcal {P}_{\mathcal {K}}(\Delta L^k)\Vert _F$ and $\Vert \mathcal {P}_{\Gamma }(\Delta S^k)\Vert _F$ into (18) gives that the desired result. The proof is then completed.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Han, L., Bi, S. Two-stage convex relaxation approach to low-rank and sparsity regularized least squares loss. J Glob Optim 70, 71–97 (2018). https://doi.org/10.1007/s10898-017-0573-2

Download citation

Received: 26 November 2016
Accepted: 24 September 2017
Published: 07 October 2017
Issue Date: January 2018
DOI: https://doi.org/10.1007/s10898-017-0573-2

Keywords

Mathematical Subject Classification

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Two-stage convex relaxation approach to low-rank and sparsity regularized least squares loss

Abstract

Access this article

Similar content being viewed by others

Two-stage convex relaxation approach to least squares loss constrained low-rank plus sparsity optimization problems

Convex Relaxation Algorithm for a Structured Simultaneous Low-Rank and Sparse Recovery Problem

Equivalent Lipschitz surrogates for zero-norm and rank optimization problems

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Appendix

1.1 The proof of Lemma 2.3

1.2 The proof of Lemma 4.2

1.3 The proof of Theorem 4.1

Rights and permissions

About this article

Cite this article

Keywords

Mathematical Subject Classification

Navigation

Two-stage convex relaxation approach to low-rank and sparsity regularized least squares loss

Abstract

Access this article

Similar content being viewed by others

Two-stage convex relaxation approach to least squares loss constrained low-rank plus sparsity optimization problems

Convex Relaxation Algorithm for a Structured Simultaneous Low-Rank and Sparse Recovery Problem

Equivalent Lipschitz surrogates for zero-norm and rank optimization problems

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Appendix

Appendix

1.1 The proof of Lemma 2.3

1.2 The proof of Lemma 4.2

1.3 The proof of Theorem 4.1

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Mathematical Subject Classification

Search

Navigation