Skip to main content
Log in

Two-stage convex relaxation approach to low-rank and sparsity regularized least squares loss

  • Published:
Journal of Global Optimization Aims and scope Submit manuscript

Abstract

In this paper we consider the rank and zero norm regularized least squares loss minimization problem with a spectral norm ball constraint. For this class of NP-hard optimization problems, we propose a two-stage convex relaxation approach by majorizing some suitable locally Lipschitz continuous surrogates. Furthermore, the Frobenius norm error bound for the optimal solution of each stage is characterized and the theoretical guarantee is established for the two-stage convex relaxation approach by showing that the error bound of the first stage convex relaxation (i.e., the nuclear norm and \(\ell _1\)-norm regularized minimization problem), can be reduced much by the second stage convex relaxation under a suitable restricted eigenvalue condition. Also, we verify the efficiency of the proposed approach by applying it to some random test problems and some real problems.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3

Similar content being viewed by others

References

  1. Agarwal, A., Negahban, S., Wainwright, M.J.: Noisy matrix decomposition via convex relaxation: optimal rates in high deimensions. Ann. Stat. 40, 1171–1197 (2012)

    Article  MATH  Google Scholar 

  2. Aybat, N.S., Ma, S., Goldfarb, D.: Noisy efficient algorithms for robust and stable principal component pursuit problems. Comput. Optim. Appl. 58, 1–29 (2014)

    Article  MathSciNet  MATH  Google Scholar 

  3. Beck, A., Teboulle, M.: A fast iterative shrinkage-thresholding algorithm for linear inverse problems. SIAM J. Imaging Sci. 2(1), 183–202 (2009)

    Article  MathSciNet  MATH  Google Scholar 

  4. Bi, S.J., Pan, S.H.: Multi-satge convex relaxation approach based on equivalent MPGCC to rank regularized minimization. SIAM J. Control Optim. 55, 2493–2518 (2017)

  5. Cand‘es, E.J., Li, X., Ma, Y., Wright, J.: Robust principal component analysis? J. ACM 58, 1–37 (2011)

    Article  MathSciNet  Google Scholar 

  6. Chandrasekaran, V., Sanghavi, S., Parrilo, P.A., Willsky, A.S.: Rank-sparsity incoherence for matrix decomposition. SIAM J. Optim. 21, 572–596 (2011)

    Article  MathSciNet  MATH  Google Scholar 

  7. Chandrasekaran, V., Parrilo, P., Willsky, A.: Latent variable graphical model selection via convex optimization. Ann. Stat. 40, 1935–1967 (2012)

    Article  MathSciNet  MATH  Google Scholar 

  8. Chartrand, R.: Nonconvex splitting for regularized low-rank + sparse decomposition. IEEE Trans. Signal Process. 60, 5810–5819 (2012)

    Article  MathSciNet  Google Scholar 

  9. Dong, W., Shi, G., Hu, X., Ma, Y.: Nonlocal sparse and low-rank regularization for optimal flow estimation. IEEE Trans. Image Process. 23, 4527–4538 (2014)

    Article  MathSciNet  MATH  Google Scholar 

  10. Fazel, M.: Matrix rank minimization with applications. Stanford University, Disertation for Ph.D. Degree. State of California (2002)

  11. Golbabaee, M., Vandergheynst, P.: Hyperspectral image compressed sensing via low-rank and joint sparse matrix recovery, In: Proceedings of IEEE International Conference on on Acoustics, Speech and Signal Processing, Kyota, pp. 2741–2744 (2011)

  12. Han, L., Bi, S., Pan, S.H.: Two-stage convex relaxation approcha to least squares loss constrained low-rank plus sparsity optimization. Comput. Optim. Appl. 64, 119–148 (2016)

    Article  MathSciNet  MATH  Google Scholar 

  13. He, R., Sun, Z., Tan, T., Zheng, W.S.: Recovery of corrupted low-rank matrices via half-quadratic based nonconvex minimization, In: Proceedings of IEEE International Coference on Computer Vision and Pattern Recognation, Providence, RI, pp. 2889–2896 (2011)

  14. Hsu, D., Kakade, S.M., Zhang, T.: Robust matrix decomposition with sparse corruptions. IEEE Trans. Inf. Theory 57, 7221–7234 (2011)

    Article  MathSciNet  MATH  Google Scholar 

  15. Kong, L., Xiu, N.H.: Exact low-rank matrix recovery via nonconvex schatten p-minimization. Asia-Pacific J. Oper. Res. 30, 1340010 (2013)

    Article  MATH  Google Scholar 

  16. Lai, M.J., Xu, Y., Yin, W.: Improved iteratively reweighted least squares for unconstrained smoothed \(l_q\) minimization. SIAM J. Numer. Anal. 5, 927–957 (2013)

    Article  MATH  Google Scholar 

  17. Lewis, A.S.: The convex analysis of unitarily invariant matrix functions. J. Convex Anal. 2, 173–183 (1995)

    MathSciNet  MATH  Google Scholar 

  18. Li, X., Ng, M.K., Yuan, X.: Median filtering-based methods for static background extraction from surveillance video. Numer. Linear Algebra Appl. 22, 1–10 (2015)

    Article  MathSciNet  MATH  Google Scholar 

  19. McCoy, M., Tropp, J.: Two proposals for robust PCA using semidefinite programming. Electron. J. Stat. 2, 1123–1160 (2011)

    Article  MathSciNet  MATH  Google Scholar 

  20. Miao, W.M.: Matrix completion model with fixed basis coefficients and rank regularized problems with hard constraints. National University of Singapore, Disertation for Ph.D. Degree. Republic of Singapore (2013)

  21. Miao, W.M., Pan, S.H., Sun, D.F.: A rank-corrected procedure for matrix completion with fixed basis coefficients. Math. Program. 159, 289–338 (2016)

    Article  MathSciNet  MATH  Google Scholar 

  22. Peng, Y., Ganesh, A., Wright, J., Ma, Y.: RASL: Robust alignment via sparse and low-rank decomposition for linearly correlated images. IEEE Trans. Pattern Anal. Mach. Intell. 34, 2233–2246 (2012)

    Article  Google Scholar 

  23. Rao, G., Peng, Y., Xun, Z.B.: The robust low-rank and sparse matrix decomposition based on \(S_{1/2}\) modeling. Sci. China Inf. Sci. 6, 733–748 (2013). (in Chinese)

    Google Scholar 

  24. Rockafellar, R.T.: Convex Analysis, Princeton. Princeton University Press, Princeton (1970)

    Book  MATH  Google Scholar 

  25. Shu, X., Ahuja, N.: Imaging via three-dimensional compressive sampling (3DCS). In: Proceedings of IEEE International Conference on Computer Vision, pp. 439–436 (2011)

  26. Tao, M., Yuan, X.: Recovering low-rank and sparse components of matrices from incomplete and noisy obervations. SIAM J. Optim. 21, 57–81 (2011)

    Article  MathSciNet  MATH  Google Scholar 

  27. Waters, A., Sankaranarayanan, A., Baraniuk, R.: Sparcs: recovering low-rank and sparse matrices from compressive measurements. In: Proceedings of Neural Information Processing Systems, Granada, Spain, pp. 1–9 (2011)

  28. Wright, J., Ganesh, A., Min, K., Ma, Y.: Compressive prinipalompo-nent pursuit. In: IEEE International Symposium on Information Theory Proeedings, pp. 1276–1280 (2012)

  29. Yang, L., Pong, T.K., Chen, X.: Alternating direction method of multipliers for a class of nonconvex and nonsmooth problems with applications to background/foreground extraction. SIAM J. Imaging Sci. 10, 74–110 (2017)

    Article  MathSciNet  MATH  Google Scholar 

  30. Zhang, T.: Some sharp performance bounds for least squares regression with \(L_1\) reguarization. Ann. Stat. 37, 2109–2144 (2009)

    Article  MATH  Google Scholar 

  31. Zhang, Y., Mu, C., Kuo, H., Wright, J.: Towards guaranteed illumination models for nonconvex objects.In: In International Conference on Computer Vision (2013)

  32. Zhang, Z., Ganesh, A., Liang, X., Ma, Y.: TILT: Transform-invariant low-rank textures. Int. J. Comput. Vis. 99, 1–24 (2012)

    Article  MathSciNet  MATH  Google Scholar 

  33. Zhou, Z., Li, X., Wright, J., Candés, E., Ma, Y.: Stable principal component pursuit. In: Proceedings of IEEE International Symposium on Information Theory, Austin, TX, pp. 1518–1522 (2010)

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Le Han.

Additional information

Supported by the National Natural Science Foundation of China under Project Nos. 11501219 and 11701186, the Guangdong Natural Science Funds under Project Nos. 2015A030310298 and 2017A030310418, the Open Project Program of the State Key Lab of CAD and CG (Grant No. A1613), Zhejiang University, and the Fundamental Research Funds for the Central Universities (SCUT).

Appendix

Appendix

1.1 The proof of Lemma 2.3

We first show the proof of the inequality (5). Without loss of generality, we assume that \(n_1<n_2\) and the index set \(\Omega \) takes the form

$$\begin{aligned} \Omega =\big \{(i,j)\ |\ i+n_1(j-1)\le s^*\ \ \mathrm{with}\ i\in \{1,\ldots ,n_1\}, j\in \{1,\ldots ,\lfloor {s^*}/{n_1}\rfloor +1\}\big \}, \end{aligned}$$

and all components \(G_{ij}\) with \(i+n_1(j-1)>s^*\) are arranged in a descending order of \(|G_{ij}|\):

$$\begin{aligned} |G_{i_0+1,j_0}|\ge \cdots \ge |G_{n_1,j_0}|\ge |G_{1,j_0+1}|\ge \cdots \ge |G_{n_1,j_0+1}|\ge |G_{1,j_0+2}|\ge \cdots |G_{n_1,n_2}|, \end{aligned}$$

where \(i_0\) and \(j_0\) are nonnegative integers such that \(i_0+n_1(j_0-1)=s^*\). For \(k=1,2,\ldots \), let

$$\begin{aligned}&\Omega _k:=\Big \{(i,j)\ |\ s^*+(k-1)t<i+n_1(j-1)\le s^*+kt\ \ \mathrm{for}\ i\in \{1,\ldots ,n_1\}, \\&\qquad \qquad j\in \big \{\lfloor \frac{s^*}{n_1}\rfloor +1,\ldots ,\lfloor \frac{s^*+kt}{n_1} \rfloor +1\big \}\Big \}, \end{aligned}$$

except that the largest column index in the last block stops at \(n_2\). From the definition of these index sets, it is immediate to see that \(\Gamma =\Omega \cup \Omega _1\). Notice that \(\Vert \mathcal {P}_{\Omega _k}(G)\Vert _{\infty }\le \frac{\Vert \mathcal {P}_{\Omega _{k-1}}(G)\Vert _{1}}{t}\) when \(k>1\), which implies that \(\sum _{k>1}\Vert \mathcal {P}_{\Omega _k}(G)\Vert _{\infty }\le \frac{\Vert \mathcal {P}_{\Omega }(G)\Vert _{1}}{t}\). Hence,

$$\begin{aligned} \langle H,\mathcal{A}^* \mathcal{A}(G)\rangle -\langle H,\mathcal{A}^* \mathcal{A}\mathcal {P}_{\Gamma }(G-H)\rangle&=\langle H,\mathcal{A}^* \mathcal{A}(H)\rangle + \langle H, \mathcal{A}^* \mathcal{A}\mathcal {P}_{\Gamma ^c}(G)\rangle \nonumber \\&=\langle H,\mathcal{A}^* \mathcal{A}(H)\rangle + \sum _{k>1}\langle H, \mathcal{A}^* \mathcal{A}\mathcal {P}_{\Omega _k}(G)\rangle \nonumber \\&=\langle H,\mathcal{A}^* \mathcal{A}(H)\rangle \left[ 1+\frac{\sum _{k>1}\langle H, \mathcal{A}^* \mathcal{A}\mathcal {P}_{\Omega _k}(G)\rangle }{\langle H,\mathcal{A}^* \mathcal{A}(H)\rangle }\right] \nonumber \\&\ge \langle H,\mathcal{A}^* \mathcal{A}(H)\rangle \left[ 1-\varpi (s^*+t,t)\sum _{k>1}\frac{\Vert \mathcal {P}_{\Omega _k}(G)\Vert _{\infty }}{\Vert H\Vert _F}\right] \nonumber \\&\ge \langle H,\mathcal{A}^* \mathcal{A}(H)\rangle \left[ 1-\frac{\varpi (s^*+t,t)}{t}\frac{\Vert \mathcal {P}_{\Omega ^c}(G)\Vert _1}{\Vert H\Vert _F}\right] . \end{aligned}$$

Combining the last inequality with the following inequality

$$\begin{aligned}&\langle H,\mathcal{A}^* \mathcal{A}\mathcal {P}_{\Gamma }(G-H)\rangle \ge -\Vert \mathcal {A}(H)\Vert _F\Vert \mathcal {A}\mathcal {P}_{\Gamma }(G-H)\Vert _F\\&\quad \ge -\chi _{+}(s^*+t)\Vert H\Vert _F\Vert \mathcal {P}_{\Gamma }(G-H)\Vert _F, \end{aligned}$$

we immediately obtain the desired result (5).

Next, we give the proof of the inequality (6). Let \(\widehat{H}\) be an arbitrary matrix from \(\mathcal {I}\). By the definition of \(\vartheta _+(2r^* +l)\), \(\Vert \mathcal {A}\mathcal {P}_{\mathcal {I}}(\widehat{H}-\widehat{G})\Vert _F^2 \le \vartheta _+(2r^* +l)\Vert \mathcal {P}_{\mathcal {I}}(\widehat{H}-\widehat{G})\Vert _F^2\) and \(\Vert \mathcal {A}(\widehat{H})\Vert _F^2\le \vartheta _+(2r^* +l)\Vert \widehat{H}\Vert _F^2\). Then,

$$\begin{aligned}&\big \langle \widehat{H},\mathcal {A}^*\mathcal {A}\mathcal {P}_{\mathcal {I}} (\widehat{G}-\widehat{H})\big \rangle \ge -\Vert \mathcal {A}(\widehat{H})\Vert _F\Vert \mathcal {A}\mathcal {P}_{\mathcal {I}}(\widehat{H}-\widehat{G})\Vert _F\nonumber \\&\quad \ge -\vartheta _+(2r^*+l)\Vert \widehat{H}\Vert _F\Vert \mathcal {P}_{\mathcal {I}}(\widehat{H}-\widehat{G})\Vert _F. \end{aligned}$$
(25)

We proceed the arguments by considering the following two cases.

Case 1: \(\mathrm{rank}((U_{2}^*)^{\mathbb {T}}\widehat{G}V_{2}^*)\le l\le \min (n_1 - r^*,n_2 - r^* )\). Now, by the expression of \(\mathcal {P}_{\mathcal {J}_1}\), we have

$$\begin{aligned} \mathcal {P}_{\mathcal {J}_1}(\widehat{G})&=U_2^*P_{1}P_{1}^{\mathbb {T}}(U_2^*)^{\mathbb {T}}\widehat{G}V_2^*Q_{1}Q_{1}^{\mathbb {T}}(V_2^*)^{\mathbb {T}} =U_2^*P_{1}\big [\mathrm{Diag}(\sigma ((U_2^*)^{\mathbb {T}}\widehat{G}V_2^*))\ \ 0\big ]Q_{1}^{\mathbb {T}}(V_2^*)^{\mathbb {T}}\\&=U_2^*(U_2^*)^{\mathbb {T}}\widehat{G}V_2^*(V_2^*)^{\mathbb {T}}, \end{aligned}$$

where the last two equalities are due to \((U_{2}^*)^{\mathbb {T}}\widehat{G}V_{2}^* =P_{1}[\mathrm{Diag}\left( \sigma ((U_{2}^2)^{\mathbb {T}}\widehat{G}V_{2}^*)\right) \ \ 0]Q_{1}^{\mathbb {T}}\). Note that \(\mathcal {P}_{\mathcal {T}^{\perp }}(\widehat{G})=U_{2}^*(U_{2}^*)^{\mathbb {T}}\widehat{G}V_{2}^*(V_{2}^*)^{\mathbb {T}}\) by (4). So, \(\mathcal {P}_{\mathcal {T}^{\perp }}(\widehat{G})=\mathcal {P}_{\mathcal {J}_1}(\widehat{G})\), i.e., \(\widehat{G}\in \mathcal {I}\). Then,

$$\begin{aligned} \langle \mathcal {A}(\widehat{H}),\mathcal {A}(\widehat{G})\rangle&=\langle \mathcal {A}(\widehat{H}),\mathcal {A}(\widehat{H})\rangle + \langle \mathcal {A}(\widehat{H}),\mathcal {A}\mathcal {P}_{\mathcal {I}}(\widehat{G}-\widehat{H})\rangle \\&\ge \langle \mathcal {A}(\widehat{H}),\mathcal {A}(\widehat{H})\rangle -\vartheta _+(2r^* +l)\Vert \widehat{H}\Vert _F\Vert \mathcal {P}_{\mathcal {I}}(\widehat{H}-\widehat{G})\Vert _F. \end{aligned}$$

This inequality implies the desired result (6). Thus, we complete the proof for this case.

Case 2: \(l<\mathrm{rank}((U_{2}^*)^{\mathbb {T}}\widehat{G}V_{2}^*)\). Let k be the smallest positive integer such that \(kl\ge \min (n_1 - r^*,n_2 - r^*)\). Clearly, \(k\ge 2\). Let \(l_{i}\) and \(\widetilde{l}_{i}\) for \(i=1,2,\ldots ,k\) be such that

$$\begin{aligned} l_{1}\!=\!\cdots =l_{k-1}=l,\ l_{k}=n_1 - r^* -l(k-1),\ \ \widetilde{l}_{1}\!=\!\cdots \!=\!\widetilde{l}_{k-1}\!=\!l,\ \widetilde{l}_{k}\!=\!n_2 - r^*-l(k-1). \end{aligned}$$

For each \(2\le i\le k\), we define the subspace \(\mathcal {J}_i:=\big \{U_{2}^*P_iZ(V_{2}^*Q_{i})^{\mathbb {T}}\ |\ Z\in \mathbb {R}^{l_{i}\times \widetilde{l}_{i} }\big \}\), where \(P_i\in \mathbb {O}^{(n_1 - r^*)\times l_{i}}\) is the matrix consisting of the \(\left( \sum _{j=1}^{i-1}l_{j}+1\right) \)th column to the \(\left( \sum _{j=1}^{i}l_{j}\right) \)th column of P; and \(Q_i\in \mathbb {O}^{(n_1 - r^*)\times \widetilde{l}_{i}}\) is the matrix consisting of the \(\left( \sum _{j=1}^{i-1}\widetilde{l}_{j}+1\right) \)th column to the \(\left( \sum _{j=1}^{i}\widetilde{l}_{j}\right) \)th column of Q. Clearly, \(\mathcal {J}_1\perp \mathcal {J}_i\) for \(i\ge 2\). For each \(i\ge 1\), it is easy to calculate that

$$\begin{aligned} \mathcal {P}_{\mathcal {J}_i}(Z)=U_{2}^*P_{i}(U_{2}^*P_{i})^{\mathbb {T}}ZV_{2}^*Q_{i}(V_{2}^*Q_{i})^{\mathbb {T}} \quad \ \forall Z\in \mathbb {R}^{n_1\times n_2}. \end{aligned}$$

This, together with \(\mathcal {P}_{\mathcal {T}^{\perp }}(\widehat{G})=U_{2}^*(U_{2}^*)^{\mathbb {T}}\widehat{G}V_{2}^*(V_{2}^*)^{\mathbb {T}}\), implies that \( \mathcal {P}_{\mathcal {T}^{\perp }}(\widehat{G}) =\sum _{i=1}^k\mathcal {P}_{J_i}(\widehat{G}). \) Then, \(\langle \widehat{H},\mathcal {A}^*\mathcal {A}(\widehat{G})\rangle =\langle \widehat{H},\mathcal {A}^*\mathcal {A}\mathcal {P}_{\mathcal {I}}(\widehat{G})\rangle +{\textstyle {\sum _{i>1}\big \langle \widehat{H},\mathcal {A}^*\mathcal {A}\mathcal {P}_{J_i}(\widehat{G})\big \rangle }}\). Consequently, we have that

$$\begin{aligned}&\big \langle \widehat{H},\mathcal {A}^*\mathcal {A}(\widehat{G})\big \rangle -\big \langle \widehat{H},\mathcal {A}^*\mathcal {A}\mathcal {P}_{\mathcal {I}}(\widehat{G}-\widehat{H})\big \rangle \nonumber \\&=\langle \widehat{H},\mathcal {A}^*\mathcal {A}(\widehat{H})\rangle +\sum _{i>1}\big \langle \mathcal {P}_{\mathcal {I}}(\widehat{H}),\mathcal {A}^*\mathcal {A}\mathcal {P}_{\mathcal {J}_i}(\widehat{G})\big \rangle \nonumber \\&=\langle \widehat{H},\mathcal {A}^*\mathcal {A}(\widehat{H})\rangle \Big (1+\sum _{i>1}\frac{\langle \widehat{H},\mathcal {A}^*\mathcal {A}\mathcal {P}_{\mathcal {J}_i}(\widehat{G})\rangle \Vert \widehat{H}\Vert _F}{\Vert \mathcal {A}(\widehat{H})\Vert _F^2\Vert \mathcal {P}_{\mathcal {J}_i}(\widehat{G})\Vert }\frac{\Vert \mathcal {P}_{\mathcal {J}_i}(\widehat{G})\Vert }{\Vert \widehat{H}\Vert _F} \Big )\nonumber \\&\ge \langle \widehat{H},\mathcal {A}^*\mathcal {A}(\widehat{H})\rangle \Big (1-\pi ( 2r^* +l,l)\frac{\sum _{i>1}\Vert \mathcal {P}_{\mathcal {J}_i}(\widehat{G})\Vert }{\Vert \widehat{H}\Vert _F}\Big )\qquad \nonumber \\&\ge \langle \widehat{H},\mathcal {A}^*\mathcal {A}(\widehat{H})\rangle \Big (1-\frac{\pi ( 2r^* +l,l)\Vert \mathcal {P}_{\mathcal {T}^{\perp }}(\widehat{G})\Vert _*}{l\Vert \widehat{H}\Vert _F}\Big ) \end{aligned}$$
(26)

where the first inequality is using the definition of \(\pi \) by the fact that \(\widehat{H}\in \mathcal {I},\mathcal {P}_{\mathcal {J}_i}(\widehat{G})\in \mathcal {J}_i\) and \(\mathrm{rank}(\mathcal {P}_{\mathcal {J}_i}(\widehat{G}))\le l\), \(\mathcal {I}\perp \mathcal {J}_i\) for \(i>1\), and the second inequality is due to

$$\begin{aligned} {\textstyle {\sum _{i>1}\Vert \mathcal {P}_{\mathcal {J}_i}(\widehat{G})\Vert \le l^{-1}\sum _{i=1}\big \Vert \mathcal {P}_{\mathcal {J}_i}(\widehat{G})\big \Vert _* =l^{-1}\Vert \mathcal {P}_{\mathcal {T}^{\perp }}(\widehat{G})\Vert _*}}, \end{aligned}$$

since \(\Vert \mathcal {P}_{\mathcal {J}_{i+1}}(\widehat{G})\Vert \le l^{-1}\Vert \mathcal {P}_{\mathcal {J}_i}(\widehat{G})\Vert _*\). Combining (26) with (25), we get the inequality (6). Thus, we complete the proof.

1.2 The proof of Lemma 4.2

(a) Notice that \(\mathcal{P}_\mathcal{K}(\Delta L^k)=\Delta L^k - \mathcal{P}_{\mathcal {T}^\bot }(\Delta L^k) + \mathcal{P}_\mathcal{H}(\Delta L^k)\) implies \(L_q=q(\mathcal{P}_{\mathcal {T}^\bot }(\Delta L^k) - \mathcal{P}_\mathcal{H}(\Delta L^k)+ L^* ) +(1-q)L^k\). It suffices to argue that \(\Vert \mathcal{P}_{\mathcal {T}^\bot }(\Delta L^k) - \mathcal{P}_\mathcal{H}(\Delta L^k) + L^*\Vert \le \tau \). By the expression of \(\mathcal {P}_{\mathcal {T}^{\perp }}(\Delta L^k)\) and the SVD of \((U_2^*)^\mathbb {T}\Delta L^kV_2^*\), we have that

$$\begin{aligned} \mathcal {P}_{\mathcal {T}^{\perp }}(\Delta L^{k}) =U_{2}^*P\mathrm{Diag}\big (\sigma ((U^*_2)^\mathbb {T}\Delta L^{k}V^*_{2})\big )Q^\mathbb {T}(V_{2}^*)^{\mathbb {T}}. \end{aligned}$$
(27)

From the definition of the subspace \(\mathcal {H}\), we have \(\mathcal {P}_{\mathcal {H}}(Z)=U_{2}^*P_1P_1^{\mathbb {T}}(U_{2}^*)^{\mathbb {T}} ZV_2^*Q_1Q_1^\mathbb {T}(V_2^*)^\mathbb {T}\) for any \(Z\in \mathbb {R}^{n_1\times n_2}\). Together with the last equation, it follows that

$$\begin{aligned} \mathcal {P}_{\mathcal {H}}(\Delta L^{k})&=\mathcal {P}_{\mathcal {H}}(\mathcal {P}_{\mathcal {T}^{\perp }}(\Delta L^{k})) = \mathcal {P}_{\mathcal {H}}\big (U_{2}^*P\mathrm{Diag}\big (\sigma ((U^*_2)^\mathbb {T}\Delta L^{k}V^*_{2})\big )Q^\mathbb {T}(V_{2}^*)^{\mathbb {T}}\big )\nonumber \\&=U_{2}^*P_1P_1^{\mathbb {T}}(U_{2}^*)^{\mathbb {T}}U_{2}^*P\mathrm{Diag}\big (\sigma ((U^*_2)^\mathbb {T}\Delta L^{k}V^*_{2})\big )Q^\mathbb {T}(V_{2}^*)^{\mathbb {T}}V_2^*Q_1Q_1^\mathbb {T}(V_2^*)^\mathbb {T}\nonumber \\&=U_{2}^*P_1\mathrm{Diag}\big (\sigma ^{\downarrow ,l}((U^*_2)^\mathbb {T}\Delta L^{k}V^*_{2})\big )Q_1^\mathbb {T}(V_2^*)^\mathbb {T} \end{aligned}$$
(28)

where \(\sigma ^{\downarrow ,l}(Z)\) means the vector consisting of the first l components of \(\sigma (Z)\). Thus, we have that

$$\begin{aligned} \mathcal{P}_{\mathcal {T}^\bot }(\Delta L^k) - \mathcal{P}_\mathcal{H}(\Delta L^k)=U_2^*P \mathrm{Diag}(0,\cdots ,0,\sigma _{l+1},\cdots ,\sigma _n)Q^{\mathbb {T}}(V_2^*)^{\mathbb {T}} \end{aligned}$$

where \(\sigma _{l+1}\ge \cdots \ge \sigma _n\) are the smallest \(n-l\) singular values of \((U_2^*)^{\mathbb {T}} \Delta L^k V_2^*\). Then, \(\mathcal{P}_{\mathcal {T}^\bot }(\Delta L^k) - \mathcal{P}_\mathcal{H}(\Delta L^k) + L^*\) has the SVD as

$$\begin{aligned}{}[U_1^* \ U_2^*P]\mathrm{Diag}(\sigma _1(L^*),\ldots ,\sigma _{r^*}(L^*),0,\ldots ,0,\sigma _{l+1},\ldots ,\sigma _n)[V_1^* \ V_2^* Q]^{\mathbb {T}}. \end{aligned}$$
(29)

Note that \(\sigma _{l+1}\le \Vert \mathcal{P}_{\mathcal {T}^\bot }(\Delta L^k)\Vert =\Vert \mathcal{P}_{\mathcal {T}^\bot }(L^k)\Vert \le \tau \). The last equation implies that \(\Vert \mathcal{P}_{\mathcal {T}^\bot }(\Delta L^k) - \mathcal{P}_\mathcal{H}(\Delta L^k) + L^*\Vert \le \tau \). Thus, we show that \(\Vert L_q\Vert \le \tau \).

(b) The Eqs. (27) and (28) imply that \(\Vert \mathcal {P}_{\mathcal {T}^\bot }(\Delta L^k )- \mathcal {P}_{\mathcal {H}}(\Delta L^k)\Vert \le l^{-1}\Vert \mathcal {P}_{\mathcal {H}}(\Delta L^k)\Vert _*\) and \(\Vert \mathcal {P}_{\mathcal {T}^\bot }(\Delta L^k )- \mathcal {P}_{\mathcal {H}} (\Delta L^k)\Vert _* =\Vert \mathcal {P}_{\mathcal {T}^\bot }(\Delta L^k )\Vert _*-\Vert \mathcal {P}_{\mathcal {H}} (\Delta L^k)\Vert _*\). Then, we have that

$$\begin{aligned} \Vert \mathcal {P}_{\mathcal {K}^\bot }(\Delta L^k)\Vert _F&= \Vert \Delta L^k - \mathcal {P}_{\mathcal {K}}(\Delta L^k)\Vert _F\le (\Vert \Delta L^k -\mathcal {P}_{\mathcal {K}}(\Delta L^k)\Vert \Vert \Delta L^k - \mathcal {P}_{\mathcal {K}}(\Delta L^k)\Vert _*)^{1/2}\nonumber \\&=(\Vert \mathcal {P}_{\mathcal {T}^\bot }(\Delta L^k )- \mathcal {P}_{\mathcal {H}}(\Delta L^k)\Vert \Vert \mathcal {P}_{\mathcal {T}^\bot }(\Delta L^k )- \mathcal {P}_{\mathcal {H}}(\Delta L^k)\Vert _*)^{1/2}\nonumber \\&\le \big [\frac{1}{l}\Vert \mathcal {P}_{\mathcal {H}}(\Delta L^k)\Vert _* (\Vert \mathcal {P}_{\mathcal {T}^\bot }(\Delta L^k)\Vert _* -\Vert \mathcal {P}_{\mathcal {H}}(\Delta L^k)\Vert _* )\big ]^{1/2} \le \frac{1}{2\sqrt{l}}\big \Vert \mathcal {P}_{\mathcal {T}^\bot }(\Delta L^k)\big \Vert _* \end{aligned}$$

where the second equality is using \(\mathcal {P}_{\mathcal {K}}(\Delta L^k)=\mathcal {P}_{\mathcal {T}}(\Delta L^k)+\mathcal {P}_{\mathcal {H}}(\Delta L^k)\), and the last inequality is using the fact that \(xy\le \frac{(x+y)^2}{4}\) for any \(x,y\in \mathbb {R}\). In addition,

$$\begin{aligned} \Vert \mathcal {P}_{\Gamma ^{c}}(\Delta S^k)\Vert _F&= \Vert \Delta S^k - \mathcal {P}_{\Gamma }(\Delta S^k)\Vert _F \le \big (\Vert \Delta S^k -\mathcal {P}_\Gamma (\Delta S^k) \Vert _\infty \Vert \Delta S^k -\mathcal {P}_\Gamma (\Delta S^k) \Vert _1\big )^{1/2}\nonumber \\&=\big (\Vert \mathcal {P}_{\Omega ^c}( \Delta S^k)-\mathcal {P}_{\Lambda }(\Delta S^k) \Vert _\infty \Vert \mathcal {P}_{\Omega ^c}( \Delta S^k)-\mathcal {P}_{\Lambda }(\Delta S^k) \Vert _1\big )^{1/2}\nonumber \\&\le \!\big [\frac{1}{t}\Vert \mathcal {P}_{\Lambda }(\Delta S^k)\Vert _1 (\Vert \mathcal {P}_{\Omega ^c}( \Delta S^k)\Vert _1 \!-\!\Vert \mathcal {P}_{\Lambda }(\Delta S^k)\Vert _1)\big ]^{1/2} \!\le \! \frac{ 1}{2\sqrt{t}} \Vert \mathcal {P}_{\Omega ^c}(\Delta S^k)\Vert _1. \end{aligned}$$

Let \(\gamma ^{k-1}:=\min ( a_{k-1},b_{k-1})\). From the last two inequalities, we have

$$\begin{aligned} \Vert \mathcal {P}_{\mathcal {K}^\bot }(\Delta L^k)\Vert _F^2 + \Vert \mathcal {P}_{\Gamma ^c}(\Delta S^k)\Vert _F^2&\le \frac{1}{4l}\Vert \mathcal {P}_{\mathcal {T}^\bot }(\Delta L^k)\Vert _*^2 + \frac{1}{4t}\Vert \mathcal {P}_{\Omega ^c}(\Delta S^k)\Vert _1^2 \nonumber \\&=\frac{a_{k-1}^2}{4la_{k-1}^2} \Vert \mathcal {P}_{\mathcal {T}^\bot }(\Delta L^k)\Vert _*^2 +\frac{b_{k-1}^2}{4tb_{k-1}^2}\Vert \mathcal {P}_{\Omega ^c}(\Delta S^k)\Vert _1^2 \nonumber \\&\le \frac{1}{4\min (l,t)(\gamma ^{k-1})^2}\big (a_{k-1}^2\Vert \mathcal {P}_{\mathcal {T}^{\perp }}(\Delta L^k)\Vert _*^2 + b_{k-1}^2\Vert \mathcal {P}_{\Omega ^c}(\Delta S^k)\Vert _1^2\big )\nonumber \\&\le \frac{1}{4\min (l,t)(\gamma ^{k-1})^2}\big (a_{k-1} \big \Vert \mathcal {P}_{\mathcal {T}^\bot }(\Delta L^k)\big \Vert _* + b_{k-1}\Vert \mathcal {P}_{\Omega ^c}(\Delta S^k)\Vert _1\big )^2\nonumber \\&\le \frac{1}{4\min (l,t)(\gamma ^{k-1})^2}\big (\widehat{a}_{k-1} \big \Vert \mathcal {P}_{\mathcal {T}}(\Delta L^k)\big \Vert _* + \widehat{b}_{k-1}\Vert \mathcal {P}_\Omega (\Delta S^k)\Vert _1\big )^2\nonumber \\&\le \frac{1}{4\min (l,t)}\big (\sqrt{2r^*}\xi _{k-1}\big \Vert \mathcal {P}_{\mathcal {T}}(\Delta L^k)\big \Vert _F + \sqrt{s^*}\eta _{k-1}\Vert \mathcal {P}_\Omega (\Delta S^k)\Vert _F\big )^2\nonumber \\&\le \frac{1}{4\min (l,t)}\big (\sqrt{2r^*}\xi _{k-1}\big \Vert \mathcal {P}_{\mathcal {K}}(\Delta L^k)\big \Vert _F + \sqrt{s^*}\eta _{k-1}\Vert \mathcal {P}_\Gamma (\Delta S^k)\Vert _F\big )^2 \end{aligned}$$
(30)

where the second inequality is using the fact that \(x^2+y^2 \le (x+y)^2\) for any \(x,y\in \mathbb {R}_+ \) and the fourth inequality is using Lemma 4.1. Notice that \(\Vert \Delta L^k\Vert _F^2=\Vert \mathcal {P}_{\mathcal {K}}(\Delta L^k)\Vert _F^2+\Vert \mathcal {P}_{\mathcal {K}^{\perp }}(\Delta L^k)\Vert _F^2\) and \(\Vert \Delta S^k\Vert _F^2=\Vert \mathcal{P}_\Gamma (\Delta S^k)\Vert _F^2+\Vert \mathcal{P}_{\Gamma ^c}(\Delta S^k)\Vert _F^2\). From inequality (30), we readily obtain the desired result. The proof is completed.

1.3 The proof of Theorem 4.1

Let the subspaces \(\mathcal {H}\) and \(\mathcal {K}\) and the index sets \(\Gamma \) and \(\Lambda \) be defined as in Lemma 4.2. Let \(L_q:=L^k - q \mathcal{P}_\mathcal{K}(\Delta L^k )\) and \(S_q:=S^k - q \mathcal{P}_{\Gamma }(\Delta S^k)\) for any \(q\in (0,1)\). From Lemma 4.2 (a), the point \((L_q,S_q)\) is feasible to the problem (13). Together with the optimality of \((L^k,S^k)\) to the problem (13), we have that

$$\begin{aligned}&\frac{1}{2} \Vert \mathcal{A}(L^k+S^k)-M\Vert _F^2 + \lambda _{k-1}\big ( \Vert L^k\Vert _*-\langle W^{k-1},L^k\rangle \big ) + \nu _{k-1}\Vert T^{k-1}\circ S^k\Vert _1 \\ \le&\frac{1}{2} \Vert \mathcal{A}(L_q+S_q)-M\Vert _F^2 + \lambda _{k-1} \big (\Vert L_q\Vert _*-\langle W^{k-1},L_q\rangle \big ) +\nu _{k-1}\Vert T^{k-1} \circ S_q\Vert _1. \end{aligned}$$

Then we get that

$$\begin{aligned}&\frac{1}{2} \Vert \mathcal{A}(\Delta L^k+ \Delta S^k)\Vert _F^2 - \frac{1}{2} \Vert \mathcal{A}(L_q - L^*+ S_q - S^*)\Vert _F^2 \\ \le&\langle L^k - L_q + S^k -S_q, \mathcal{A}^*(N^*) \rangle + \lambda _{k-1}(\Vert L_q\Vert _*- \Vert L^k\Vert _*) \\&+ \lambda _{k-1} \langle W^{k-1}, L^k - L_q \rangle + \nu _{k-1} \big (\Vert T^{k-1}\circ S_q\Vert _1 - \Vert T^{k-1}\circ S^k\Vert _1 \big )\\ =&q \langle \mathcal{A}^*(N^*),\mathcal{P}_\mathcal{K}(\Delta L^k )\rangle + q \langle \mathcal{A}^*(N^*),\mathcal{P}_\Gamma (\Delta S^k) \rangle +q \lambda _{k-1} \langle W^{k-1}, \mathcal{P}_\mathcal{K}(\Delta L^k)\rangle \\&+ \lambda _{k-1}\big (\Vert q(\mathcal{P}_{\mathcal {T}^\bot }(\Delta L^k) - \mathcal{P}_\mathcal{H}(\Delta L^k)+ L^* ) +(1-q)L^k\Vert _*-\Vert L^k\Vert _* \big )\\&+ \nu _{k-1}\big (\Vert T^{k-1}\circ [q(S^*+\mathcal{P}_{\Omega ^c}(\Delta S^k)-\mathcal{P}_\Lambda (\Delta S^k))+(1-q)S^k]\Vert _1-\Vert T^{k-1}\circ S^k\Vert _1 \big ) \\ \le&q \langle \mathcal{A}^*(N^*),\mathcal{P}_\mathcal{K}(\Delta L^k )\rangle + q \langle \mathcal{A}^*(N^*),\mathcal{P}_\Gamma (\Delta S^k) \rangle \\&+ q\lambda _{k-1}\big (\Vert L^* + \mathcal {P}_{\mathcal {T}^\bot }(\Delta L^k) -\mathcal {P}_{\mathcal {H}}(\Delta L^k)\Vert _*-\Vert L^k\Vert _* +\langle W^{k-1}, \mathcal{P}_\mathcal{K}(\Delta L^k)\rangle \big )\\&+ q \nu _{k-1}\big (\Vert T^{k-1}\circ (S^*+\mathcal{P}_{\Omega ^c}(\Delta S^k)-\mathcal{P}_\Lambda (\Delta S^k))\Vert _1-\Vert T^{k-1}\circ S^k\Vert _1 \big ). \end{aligned}$$

From the proof of Lemma 4.2 (a), we get that \( U_1^*(V_1^*)^\mathbb {T} +U_2^* P \mathrm{Diag}(e) Q^\mathbb {T}(V_{2}^*)^\mathbb {T}\) is a subdifferential of the nuclear norm at \(L^* + \mathcal {P}_{\mathcal {T}^\bot }(\Delta L^k) -\mathcal {P}_{\mathcal {H}}(\Delta L^k)\). Similarly, \( \mathrm{sgn}(S^*) + \mathrm{sgn}(\mathcal{P}_{\Omega ^c} (\Delta S^k))\) is a subdifferential of the \(\ell _1\) norm at \(T^{k-1}\circ (S^*+\mathcal{P}_{\Omega ^c}(\Delta S^k)-\mathcal{P}_\Lambda (\Delta S^k))\). Thus, it follows that

$$\begin{aligned}&\frac{1}{2} \Vert \mathcal{A}(\Delta L^k+ \Delta S^k)\Vert _F^2 - \frac{1}{2} \Vert \mathcal{A}(L_q - L^*+ S_q - S^*)\Vert _F^2 \nonumber \\ \le&q \langle \mathcal{A}^*(N^*),\mathcal{P}_\mathcal{K}(\Delta L^k )\rangle + q \langle \mathcal{A}^*(N^*),\mathcal{P}_\Gamma (\Delta S^k) \rangle \nonumber \\&+ q\lambda _{k-1}\big (\langle W^{k-1}, \mathcal{P}_\mathcal{K}(\Delta L^k)\rangle - \langle U_1^*(V_1^*)^\mathbb {T} +U_2^* P \mathrm{Diag}(e) Q^\mathbb {T}(V_{2}^*)^\mathbb {T} ,\mathcal{P}_\mathcal{K}(\Delta L^k)\rangle \big )\nonumber \\&- q \nu _{k-1} \langle T^{k-1}\circ (\mathrm{sgn}(S^*) + \mathrm{sgn}(\mathcal{P}_{\Omega ^c} (\Delta S^k)),\mathcal{P}_\Gamma (\Delta S^k)\rangle . \end{aligned}$$
(31)

Note that

$$\begin{aligned}&\langle W^{k-1}, \mathcal{P}_\mathcal{K}(\Delta L^k)\rangle - \langle U_1^*(V_1^*)^\mathbb {T} +U_2^* P \mathrm{Diag}(e) Q^\mathbb {T}(V_{2}^*)^\mathbb {T} ,\mathcal{P}_\mathcal{K}(\Delta L^k)\rangle \nonumber \\ =&\langle W^{k-1}- U_1^*(V_1^*)^\mathbb {T}, \mathcal{P}_\mathcal{K}(\Delta L^k)\rangle -\langle U_2^* P \mathrm{Diag}(e) Q^\mathbb {T}(V_{2}^*)^\mathbb {T} ,\mathcal{P}_\mathcal{K}(\Delta L^k)\rangle \nonumber \\ =&\langle W^{k-1}- U_1^*(V_1^*)^\mathbb {T}, \mathcal{P}_\mathcal{T}(\Delta L^k)\rangle +\langle W^{k-1},\mathcal{P}_\mathcal{H}(\Delta L^k)\rangle - \Vert \mathcal{P}_\mathcal{H}(\Delta L^k)\Vert _* \\ \le&\Vert \mathcal{P}_\mathcal{H}(W^{k-1})\Vert \Vert \mathcal{P}_\mathcal{H}(\Delta L^k)\Vert _* \!+\!\Vert \mathcal{P}_\mathcal{T}(U_1^* (V_1^*)^\mathbb {T} - W^{k-1}) \Vert _F \Vert \mathcal{P}_\mathcal{T}(\Delta L^k)\Vert _F \!-\! \Vert \mathcal{P}_\mathcal{H}(\Delta L^k)\Vert _*, \nonumber \end{aligned}$$
(32)

where the second equality is obtained by \(\mathcal {K}:=\mathcal {T}\oplus \mathcal {H}\) and (29). In addition, it holds

$$\begin{aligned}&\langle T^{k-1}\circ (\mathrm{sgn}(S^*) + \mathrm{sgn}(\mathcal{P}_{\Omega ^c} (\Delta S^k)),\mathcal{P}_\Gamma (\Delta S^k)\rangle \nonumber \\ =&\langle T^{k-1}\circ \mathrm{sgn}(S^*),\mathcal{P}_{\Omega } (\Delta S^k)\rangle +\langle T^{k-1}\circ \mathrm{sgn}(\mathcal{P}_{\Omega ^c} (\Delta S^k)), \mathcal{P}_\Lambda (\Delta S^k)\rangle \nonumber \\ =&\langle T^{k-1} \circ \mathrm{sgn}(S^*), \mathcal{P}_\Omega (\Delta S^k) \rangle - \Vert T^{k-1} \circ \mathcal{P}_\Lambda (\Delta S^k) \Vert _1 \nonumber \\ \le&\Vert \mathcal{P}_\Omega (T^{k-1})\Vert _F \Vert \mathcal{P}_\Omega (\Delta S^k)\Vert _F- \Vert T^{k-1} \circ \mathcal{P}_\Lambda (\Delta S^k) \Vert _1 \end{aligned}$$
(33)

and

$$\begin{aligned}&\langle \mathcal{A}^*(N^*),\mathcal{P}_\mathcal{K}(\Delta L^k )\rangle + \langle \mathcal{A}^*(N^*),\mathcal{P}_\Gamma (\Delta S^k) \rangle \nonumber \\ \le&\Vert \mathcal{P}_\mathcal{T}(\mathcal{A}^*(N^*))\Vert _F \Vert \mathcal{P}_\mathcal{T}(\Delta L^k) \Vert _F + \Vert \mathcal{P}_\mathcal{H}(\mathcal{A}^*(N^*))\Vert \Vert \mathcal{P}_\mathcal{H}(\Delta L^k)\Vert _* \nonumber \\&+\Vert \mathcal{P}_\Omega (\mathcal{A}^*(N^*))\Vert _F \Vert \mathcal{P}_\Omega (\Delta S^k)\Vert _F + \Vert \mathcal{P}_\Lambda (\mathcal{A}^*(N^*))\Vert _\infty \Vert \mathcal{P}_\Lambda (\Delta S^k)\Vert _1. \end{aligned}$$
(34)

Combine with (3134), we get that

$$\begin{aligned}&\frac{1}{2} \Vert \mathcal{A}(\Delta L^k+ \Delta S^k)\Vert _F^2 - \frac{1}{2} \Vert \mathcal{A}(L_q - L^*+ S_q - S^*)\Vert _F^2 \nonumber \\ \le&q \Vert \mathcal{P}_\mathcal{T}(\mathcal{A}^*(N^*))\Vert _F \Vert \mathcal{P}_\mathcal{T}(\Delta L^k) \Vert _F +q \lambda _{k-1} \delta _{k-1} \Vert \mathcal{P}_\mathcal{T}(\Delta L^k)\Vert _F -qa_{k-1}\Vert \mathcal{P}_\mathcal{H}(\Delta L^k)\Vert _*\nonumber \\&+ q \Vert \mathcal{P}_\Omega (\mathcal{A}^*(N^*))\Vert _F \Vert \mathcal{P}_\Omega (\Delta S^k)\Vert _F +q \nu _{k-1} \overline{\delta }_{k-1} \Vert \mathcal{P}_\Omega (\Delta S^k)\Vert _F -qb_{k-1}\Vert \mathcal{P}_\Gamma (\Delta S^k)\Vert _1 \nonumber \\ \le&q \big (\Vert \mathcal{P}_\mathcal{T}(\mathcal{A}^*(N^*))\Vert _F + \lambda _{k-1} \delta _{k-1} \big ) \Vert \mathcal{P}_\mathcal{K}(\Delta L^k)\Vert _F \nonumber \\&+ q \big (\Vert \mathcal{P}_\Omega (\mathcal{A}^*(N^*))\Vert _F + \nu _{k-1} \overline{\delta }_{k-1} \big ) \Vert \mathcal{P}_\Gamma (\Delta S^k)\Vert _F, \end{aligned}$$
(35)

where \(\delta _{k-1} := \Vert \mathcal{P}_\mathcal{T}(U_1^* (V_1^*)^\mathbb {T}-W^{k-1})\Vert _F\) and \(\overline{\delta }_{k-1} := \Vert \mathcal{P}_\Omega (T^{k-1})\Vert _F\). On the other hand, for any \(q\in (0,1)\), we have that

$$\begin{aligned}&\frac{1}{2} \Vert \mathcal{A}(\Delta L^k+ \Delta S^k)\Vert _F^2 - \frac{1}{2} \Vert \mathcal{A}(L_q - L^*+ S_q - S^*)\Vert _F^2 \nonumber \\ =&\frac{1}{2} \Vert \mathcal{A}(\Delta L^k+ \Delta S^k)\Vert _F^2 - \frac{1}{2} \Vert \mathcal{A}(\Delta L^k+ \Delta S^k- q \mathcal{P}_\mathcal{K}(\Delta L^k)- q \mathcal{P}_\Gamma (\Delta S^k)\Vert _F^2 \nonumber \\ =&-\frac{q^2}{2} \Vert \mathcal{A}(\mathcal{P}_\mathcal{K}(\Delta L^k)+ \mathcal{P}_\Gamma (\Delta S^k))\Vert _F^2 + q\langle \mathcal{A}(\Delta L^k+ \Delta S^k),\mathcal{A}(\mathcal{P}_\mathcal{K}(\Delta L^k)+ \mathcal{P}_\Gamma (\Delta S^k))\rangle \end{aligned}$$
(36)

Combining (35) and (36), and taking the limit \(q\rightarrow 0^+\), we obtain that

$$\begin{aligned}&\big (\Vert \mathcal{P}_\mathcal{T}(\mathcal{A}^*(N^*))\Vert _F + \lambda _{k-1} \delta _{k-1} \big ) \Vert \mathcal{P}_\mathcal{K}(\Delta L^k)\Vert _F \nonumber \\&\ + \big (\Vert \mathcal{P}_\Omega (\mathcal{A}^*(N^*))\Vert _F + \nu _{k-1} \overline{\delta }_{k-1} \big ) \Vert \mathcal{P}_\Gamma (\Delta S^k)\Vert _F\nonumber \\ \ge&\langle \mathcal{A}(\Delta L^k+ \Delta S^k),\mathcal{A}(\mathcal{P}_\mathcal{K}(\Delta L^k)+ \mathcal{P}_\Gamma (\Delta S^k))\rangle \nonumber \\ =&\langle \mathcal{Q}(\Delta L^k), \mathcal{P}_\mathcal{K}(\Delta L^k)\rangle + \langle \mathcal{Q}(\Delta S^k), \mathcal{P}_\Gamma (\Delta S^k)\rangle \nonumber \\&\ + \langle \mathcal{Q}(\Delta L^k), \mathcal{P}_\Gamma (\Delta S^k)\rangle + \langle \Delta S^k, \mathcal{Q}(\mathcal{P}_\mathcal{K}(\Delta L^k))\rangle . \end{aligned}$$
(37)

Notice that \(\Vert \Delta L^k\Vert _\infty \le \Vert \Delta L^k\Vert \le 2\tau \). We have that

$$\begin{aligned}&|\langle \mathcal{Q}(\Delta L^k), \mathcal{P}_\Gamma (\Delta S^k)\rangle | \le \Vert \mathcal{Q}(\Delta L^k)\Vert _\infty \Vert \Delta S^k\Vert _1 \le \Vert \mathcal {Q}\Vert _\infty \Vert \Delta L^k\Vert _\infty \Vert \Delta S^k\Vert _1\le \\&\quad 2\tau \Vert \mathcal {Q}\Vert _\infty \Vert \Delta S^k\Vert _1. \end{aligned}$$

Similarly, we get that \( |\langle \Delta S^k, \mathcal{Q}(\mathcal{P}_\mathcal{K}(\Delta L^k))\rangle | \le 2\tau \Vert \mathcal {Q}\Vert _\infty \Vert \Delta S^k\Vert _1. \) Together with (37), it is immediate to obtain that

$$\begin{aligned}&\big (\Vert \mathcal{P}_\mathcal{T}(\mathcal{A}^*(N^*))\Vert _F + \lambda _{k-1} \delta _{k-1} \big ) \Vert \mathcal{P}_\mathcal{K}(\Delta L^k)\Vert _F \\&+ \big (\Vert \mathcal{P}_\Omega (\mathcal{A}^*(N^*))\Vert _F + \nu _{k-1} \overline{\delta }_{k-1} \big ) \Vert \mathcal{P}_\Gamma (\Delta S^k)\Vert _F + 4\tau \Vert \mathcal {Q}\Vert _\infty \Vert \Delta S^k\Vert _1\\ \ge&\langle \mathcal{Q}(\Delta L^k), \mathcal{P}_\mathcal{K}(\Delta L^k)\rangle + \langle \mathcal{Q}(\Delta S^k), \mathcal{P}_\Gamma (\Delta S^k)\rangle . \end{aligned}$$

Using Lemma 2.3 with \(G=\Delta S^k\), \(H=\mathcal {P}_{\Gamma }(\Delta S^k)\), and \(\mathcal {J}_1=\mathcal {H}\), \(\mathcal {I}=\mathcal {K},\widehat{G}=\Delta L^{k}\), \(\widehat{H}=\mathcal {P}_{\mathcal {K}}(\Delta L^k)\) yields that \(\Vert \mathcal{P}_\Gamma (G-H)\Vert _F=0, \Vert \mathcal{P}_{\mathcal {I}}(\widehat{G}-\widehat{H})\Vert _F=0\) and

$$\begin{aligned}&\big (\Vert \mathcal{P}_\mathcal{T}(\mathcal{A}^*(N^*))\Vert _F + \lambda _{k-1} \delta _{k-1} \big ) \Vert \mathcal{P}_\mathcal{K}(\Delta L^k)\Vert _F \nonumber \\&+ \big (\Vert \mathcal{P}_\Omega (\mathcal{A}^*(N^*))\Vert _F + \nu _{k-1} \overline{\delta }_{k-1} \big ) \Vert \mathcal{P}_\Gamma (\Delta S^k)\Vert _F + 4\tau \Vert \mathcal {Q}\Vert _\infty \Vert \Delta S^k\Vert _1\nonumber \\ \ge&\langle \mathcal {P}_{\mathcal {K}}(\Delta L^k),\mathcal {A}^*\mathcal {A}(\mathcal {P}_{\mathcal {K}}(\Delta L^k))\rangle \Big [1-\frac{\pi ( 2r^*+l,l)\Vert \mathcal {P}_{\mathcal {T}^{\perp }}(\Delta L^{k})\Vert _*}{l\Vert \mathcal {P}_{\mathcal {K}}(\Delta L^k)\Vert _F}\Big ]\nonumber \\&+ \langle \mathcal {P}_{\Gamma }(\Delta S^k),\mathcal{A}^* \mathcal{A}(\mathcal {P}_{\Gamma }(\Delta S^k))\rangle \left[ 1-\frac{\varpi (s^*+t,t)}{t}\frac{\Vert \mathcal {P}_{\Omega ^c}(\Delta S^k)\Vert _1}{\Vert \mathcal {P}_{\Gamma }(\Delta S^k)\Vert _F}\right] \nonumber \\ =&\Vert \mathcal{A}\mathcal {P}_{\mathcal {K}}(\Delta L^k) \Vert _F^2 - \frac{\pi (2r^*+l,l)}{l}\Vert \mathcal {P}_{\mathcal {T}^\bot }(\Delta L^k)\Vert _* \Vert \mathcal{A}\mathcal {P}_{\mathcal {K}}(\Delta L^k) \Vert _F^2 /\Vert \mathcal {P}_{\mathcal {K}}(\Delta L^k) \Vert _F \nonumber \\&\ + \Vert \mathcal{A}\mathcal {P}_{\Gamma }(\Delta S^k) \Vert _F^2 -\frac{ \varpi (s^*+t,t)}{t}\Vert \mathcal {P}_{\Omega ^c}(\Delta S^k)\Vert _1 \Vert \mathcal{A}\mathcal {P}_{\Gamma }(\Delta S^k) \Vert _F^2 /\Vert \mathcal {P}_{\Gamma }(\Delta S^k) \Vert _F \nonumber \\ \ge&\vartheta _{-}(2r^*+l)\Vert \mathcal {P}_{\mathcal {K}}(\Delta L^k)\Vert _F^2 - \frac{ \vartheta _{+}(2r^*+l) \pi (2r^*+l,l)}{l}\Vert \mathcal {P}_{\mathcal {T}^\bot }(\Delta L^k)\Vert _* \Vert \mathcal {P}_{\mathcal {K}}(\Delta L^k)\Vert _F \nonumber \\&\ +\chi _-(s^*+t) \Vert \mathcal {P}_{\Gamma }(\Delta S^k)\Vert _F^2 - \frac{\chi _+(s^*+t) \varpi (s^*+t,t)}{t}\Vert \mathcal {P}_{\Omega ^c}(\Delta S^k)\Vert _1 \Vert \mathcal {P}_{\Gamma }(\Delta S^k)\Vert _F \\ \ge&\vartheta _{-}(2r^*+l) \big (1\!-\frac{\pi (2r^*+l,l)}{2l}\big )\Vert \mathcal {P}_{\mathcal {K}}(\Delta L^k)\Vert _F^2\! -\! \frac{ \pi (2r^*\!+l,l)\vartheta _{+}^2(2r^*+l)}{2l\vartheta _{-}(2r^*+l)}\Vert \mathcal {P}_{\mathcal {T}^\perp }(\Delta L^k)\Vert _*^2 \nonumber \\&+\chi _-(s^*+t) \big ( 1- \frac{\varpi (s^*+t,t)}{2t}\big ) \Vert \mathcal {P}_{\Gamma }(\Delta S^k)\Vert _F^2 - \frac{ \varpi (s^*+t,t)\chi _+^2(s^*+t)}{2t\chi _-(s^*+t)}\Vert \mathcal {P}_{\Omega ^c}(\Delta S^k)\Vert _1^2 \nonumber \end{aligned}$$
(38)

where the second inequality is using the eigenvalue concepts of a linear operator \(\mathcal{A}\) in Sect. 2, and the last inequality is using the fact that \(xy\le \frac{x^2/z+z y^2}{2}\) for any \(x,y,z\in \mathbb {R}_+ \). Let \( \beta ^k\equiv \max \big (\frac{\sqrt{ \vartheta _{+}^2(2r^*+l) \pi (2r^*+l,l)}}{\sqrt{l \vartheta _{-}(2r^*+l)}a_{k-1}},\frac{\sqrt{ \chi _+^2(s^*+t) \varpi (s^*+t,t)}}{\sqrt{t \chi _-(s^*+t)}b_{k-1}}\big ). \) From Lemma 4.1 and the definition of \(\widetilde{\gamma }:=\max \big (\frac{\vartheta _{+}^2(2r^*+l) \pi (2r^*+l,l)}{l\vartheta _{-}(2r^*+l)},\frac{ \chi _+^2(s^*+t) \varpi (s^*+t,t)}{t\chi _-(s^*+t)}\big )\),

$$\begin{aligned}&\sqrt{\frac{ \vartheta _{+}^2(2r^*+l) \pi (2r^*+l,l)}{2l\vartheta _{-}(2r^*+l)}}\Vert \mathcal {P}_{\mathcal {T}^\bot }(\Delta L^k)\Vert _* + \sqrt{\frac{\chi _+^2(s^*+t) \varpi (s^*+t,t)}{2t\chi _-(s^*+t)}}\Vert \mathcal {P}_{\Omega ^c}(\Delta S^k)\Vert _1\nonumber \\ \le&\frac{1}{\sqrt{2}}\beta ^k\big (a_{k-1}\Vert \mathcal{P}_{\mathcal{T}^\bot }(\Delta L^k)\Vert _* + b_{k-1}\Vert \mathcal{P}_{\Omega ^c}(\Delta S^k)\Vert _1\big )\nonumber \\ \le&\frac{1}{\sqrt{2}} \beta ^k\big ( \widehat{a}_{k-1}\big \Vert \mathcal {P}_{\mathcal {T}}(\Delta L^k)\big \Vert _* + \widehat{b}_{k-1}\Vert \mathcal {P}_\Omega (\Delta S^k)\Vert _1 \big )\nonumber \\ \le&\frac{1}{\sqrt{2}} \beta ^k\big ( \sqrt{2r^*}\widehat{a}_{k-1}\big \Vert \mathcal {P}_{\mathcal {T}}(\Delta L^k)\big \Vert _F + \sqrt{s^*}\widehat{b}_{k-1}\Vert \mathcal {P}_\Omega (\Delta S^k)\Vert _F\big )\nonumber \\ \le&\sqrt{\widetilde{\gamma }/2} \left( \sqrt{2r^*}\xi _{k-1}\big \Vert \mathcal {P}_{\mathcal {K}}(\Delta L^k)\big \Vert _F + \sqrt{s^*}\eta _{k-1}\Vert \mathcal {P}_\Gamma (\Delta S^k)\Vert _F\right) \nonumber \\ \le&\sqrt{\widetilde{\gamma }\big ( 2r^* \xi _{k-1}^2\Vert \mathcal {P}_{\mathcal {K}}(\Delta L^k)\big \Vert _F^2 + s^*\eta _{k-1}^2 \Vert \mathcal {P}_\Gamma (\Delta S^k)\Vert _F^2 \big )} \end{aligned}$$
(39)

where the last second inequality is due to \(\beta ^k\le \sqrt{\widetilde{\gamma }}/ \min (a_{k-1},b_{k-1})\) and the fact that \(\xi _{k-1}:=\widehat{a}_{k-1}/ \min ( a_{k-1} , b_{k-1}), \eta _{k-1}:=\widehat{b}_{k-1}/ \min ( a_{k-1}, b_{k-1})\), and the last inequality is using the fact that \(x+y\le \sqrt{2(x^2+y^2)}\) for any \(x,y\in \mathbb {R}\). From Lemma 4.1 and the definitions of \(\zeta _{k-1}:=\widehat{a}_{k-1}/b_{k-1}\) and \(\mu _{k-1}:=\widehat{b}_{k-1}/ b_{k-1}\), it follows that

$$\begin{aligned} \Vert \Delta S^k\Vert _1&\le \Vert \mathcal{P}_{\Omega ^c}(\Delta S^k)\Vert _1 +\Vert \mathcal{P}_\Omega (\Delta S^k)\Vert _1 \nonumber \\&\le \zeta _{k-1}\big \Vert \mathcal {P}_{\mathcal {T}}(\Delta L^k)\big \Vert _* + (1+\mu _{k-1})\Vert \mathcal {P}_\Omega (\Delta S^k)\Vert _1\nonumber \\&\le \zeta _{k-1}\sqrt{2r^*}\Vert \mathcal {P}_{\mathcal {T}}(\Delta L^k)\Vert _F + \sqrt{s^*}(1+\mu _{k-1})\Vert \mathcal {P}_\Omega (\Delta S^k)\Vert _F\nonumber \\&\le \zeta _{k-1}\sqrt{2r^*}\Vert \mathcal {P}_{\mathcal {K}}(\Delta L^k)\Vert _F + \sqrt{s^*}(1+\mu _{k-1})\Vert \mathcal {P}_\Gamma (\Delta S^k)\Vert _F. \end{aligned}$$
(40)

Combining with inequalities (38)–(40) and noticing that \(x^2+y^2 \le (x+y)^2\) holds for any \(x,y\in \mathbb {R}_+\), we obtain that

$$\begin{aligned}&\big [ \vartheta _{-}(2r^*+l) \big (1-\frac{\pi (2r^*+l,l)}{2l}\big ) - 2r^*\widetilde{\gamma }\xi _{k-1}^2 \big ]\Vert \mathcal {P}_{\mathcal {K}}(\Delta L^k)\Vert _F^2 \nonumber \\&\ + \big [ \chi _-(s^*+t)\big (1- \frac{\varpi (s^*+t,t)}{2t}\big )- s^*\widetilde{\gamma } \eta _{k-1}^2\big ]\Vert \mathcal {P}_\Gamma (\Delta S^k)\Vert _F^2 \nonumber \\ \le&\big [4\tau \Vert \mathcal{Q}\Vert _\infty \zeta _{k-1}\sqrt{2r^*} + \Vert \mathcal{P}_\mathcal{T}(\mathcal{A}^*(N^*))\Vert _F + \lambda _{k-1} \delta _{k-1} \big ]\Vert \mathcal {P}_{\mathcal {K}}(\Delta L^k)\Vert _F \nonumber \\&\ + \big [4\tau \Vert \mathcal{Q}\Vert _\infty \sqrt{s^*}(1+\mu _{k-1}) + \Vert \mathcal{P}_\Omega (\mathcal{A}^*(N^*))\Vert _F + \nu _{k-1} \overline{\delta }_{k-1} \big ] \Vert \mathcal {P}_\Gamma (\Delta S^k)\Vert _F. \end{aligned}$$
(41)

Notice that combining Assumption 4.1 with Lemmas 2.1 and 2.2, we get \(\frac{\varpi (s^*+t,t)}{2t}\le c_2 \), \(\frac{\pi (2r^*+l,l)}{2l}\le c_1\) and then \(\widetilde{\gamma }\le 2\max \big (c_1 \frac{\vartheta _{+}^2(2r^*+l)}{\vartheta _{-}(2r^*+l)} , c_2 \frac{\chi _+^2(s^*+t)}{\chi _-(s^*+t)}\big ) \) which implies

$$\begin{aligned}&\frac{\vartheta _{-}(2r^*+l)\big (1-\frac{\pi (2r^*+l,l)}{2l}\big )}{2r^*\widetilde{\gamma }}\ge \frac{1-c_1}{2r^* \gamma _1}>\xi _{k-1}^2, \\&\frac{\chi _-(s^*+t) \big ( 1- \frac{\varpi (s^*+t,t)}{2t}\big )}{s^*\widetilde{\gamma }}\ge \frac{1-c_2}{s^*\gamma _2}>\eta _{k-1}^2 \end{aligned}$$

where

$$\begin{aligned}&\gamma _1:=2\max \big (c_1 \frac{\vartheta _{+}^2(2r^*+l)}{\vartheta _{-}^2(2r^*+l)} , c_2 \frac{\chi _+^2(s^*+t)}{\vartheta _{-}(2r^*+l)\chi _-(s^*+t)}\big ), \\&\gamma _2:=2\max \big (c_1 \frac{\vartheta _{+}^2(2r^*+l)}{\vartheta _{-}(2r^*+l)\chi _-(s^*+t)} , c_2 \frac{\chi _+^2(s^*+t)}{\chi _-^2(s^*+t)}\big ) \end{aligned}$$

and the assumptions of \(\xi _{k-1}^2\) and \(\eta _{k-1}^2\) are used. So, the coefficients on the left side in (41) are positive.

Then, by the definitions of \(\widetilde{a}_{k-1},\widetilde{b}_{k-1},\overline{a}_{k-1}\) and \(\overline{b}_{k-1}\) in (21), we rewrite (41) as

$$\begin{aligned} \widetilde{a}_{k-1} \Vert \mathcal {P}_{\mathcal {K}}(\Delta L^k)\Vert _F^2 + \widetilde{b}_{k-1}\Vert \mathcal {P}_\Gamma (\Delta S^k)\Vert _F^2 \le \overline{a}_{k-1} \Vert \mathcal {P}_{\mathcal {K}}(\Delta L^k)\Vert _F + \overline{b}_{k-1} \Vert \mathcal {P}_\Gamma (\Delta S^k)\Vert _F \end{aligned}$$

where \(\widetilde{a}_{k-1},\widetilde{b}_{k-1}>0\). It is not hard to obtain that

$$\begin{aligned}&\overline{a}_{k-1} \Vert \mathcal {P}_{\mathcal {K}}(\Delta L^k)\Vert _F + \overline{b}_{k-1} \Vert \mathcal {P}_\Gamma (\Delta S^k)\Vert _F \\ \le&\frac{\frac{\overline{a}_{k-1}^2}{\widetilde{a}_{k-1}}+\widetilde{a}_{k-1} \Vert \mathcal {P}_{\mathcal {K}}(\Delta L^k)\Vert _F^2}{2}+\frac{\frac{\overline{b}_{k-1}^2}{\widetilde{b}_{k-1}} +\widetilde{b}_{k-1} \Vert \mathcal {P}_\Gamma (\Delta S^k)\Vert _F^2}{2}, \end{aligned}$$

which implies

$$\begin{aligned} \Vert \mathcal {P}_{\mathcal {K}}(\Delta L^k)\Vert _F \le \sqrt{\frac{\overline{a}_{k-1}^2}{\widetilde{a}_{k-1}^2} + \frac{\overline{b}_{k-1}^2}{\widetilde{a}_{k-1}\widetilde{b}_{k-1}}}:=\theta _{k-1} \end{aligned}$$

and

$$\begin{aligned} \Vert \mathcal {P}_\Gamma (\Delta S^k)\Vert _F\le \sqrt{\frac{\overline{b}_{k-1}^2}{\widetilde{b}_{k-1}^2} + \frac{\overline{a}_{k-1}^2}{\widetilde{a}_{k-1}\widetilde{b}_{k-1}}}:=\overline{\theta }_{k-1}. \end{aligned}$$

Substituting the bounds of \(\Vert \mathcal {P}_{\mathcal {K}}(\Delta L^k)\Vert _F\) and \(\Vert \mathcal {P}_{\Gamma }(\Delta S^k)\Vert _F\) into (18) gives that the desired result. The proof is then completed.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Han, L., Bi, S. Two-stage convex relaxation approach to low-rank and sparsity regularized least squares loss. J Glob Optim 70, 71–97 (2018). https://doi.org/10.1007/s10898-017-0573-2

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10898-017-0573-2

Keywords

Mathematical Subject Classification

Navigation