Skip to main content
Log in

Two-stage convex relaxation approach to least squares loss constrained low-rank plus sparsity optimization problems

  • Published:
Computational Optimization and Applications Aims and scope Submit manuscript

Abstract

This paper is concerned with the least squares loss constrained low-rank plus sparsity optimization problems that seek a low-rank matrix and a sparse matrix by minimizing a positive combination of the rank function and the zero norm over a least squares constraint set describing the observation or prior information on the target matrix pair. For this class of NP-hard optimization problems, we propose a two-stage convex relaxation approach by the majorization for suitable locally Lipschitz continuous surrogates, which have a remarkable advantage in reducing the error yielded by the popular nuclear norm plus \(\ell _1\)-norm convex relaxation method. Also, under a suitable restricted eigenvalue condition, we establish a Frobenius norm error bound for the optimal solution of each stage and show that the error bound of the first stage convex relaxation (i.e. the nuclear norm plus \(\ell _1\)-norm convex relaxation), can be reduced much by the second stage convex relaxation, thereby providing the theoretical guarantee for the two-stage convex relaxation approach. We also verify the efficiency of the proposed approach by applying it to some random test problems and some problems with real data arising from specularity removal from face images, and foreground/background separation from surveillance videos.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5

Similar content being viewed by others

References

  1. Agarwal, A., Negahban, S., Wainwright, M.J.: Noisy matrix decomposition via convex relaxation: optimal rates in high deimensions. Ann. Stat. 40, 1171–1197 (2012)

    Article  MathSciNet  MATH  Google Scholar 

  2. Aybat, N.S., Ma, S., Goldfarb, D.: Noisy efficient algorithms for robust and stable principal component pursuit problems. Comput. Optim. Appl. 58, 1–29 (2014)

    Article  MathSciNet  MATH  Google Scholar 

  3. Bach, F.R.: Consistency of trace norm minimization. J. Mach. Learn. Res. 8, 1019–1048 (2008)

    MathSciNet  MATH  Google Scholar 

  4. Bi, S.J.: Study for multi-stage convex relaxation approach tolow-rank optimization problems, Disertation for ph.D. Degree, South China University of Technology, Guangzhou (2014)

  5. Candès, E.J., Tao, T.: Decoding by linear programming. IEEE Trans. Inf. Theor. 51, 4203–4215 (2005)

    Article  MathSciNet  MATH  Google Scholar 

  6. Candès, E.J., Recht, B.: Exact matrix completion via convex optimization. Found. Comput. Math. 9, 717–772 (2009)

    Article  MathSciNet  MATH  Google Scholar 

  7. Candès, E.J., Li, X., Ma, Y., Wright, J.: Robust principal component analysis? J. ACM 58, 1–37 (2011)

    Article  MathSciNet  MATH  Google Scholar 

  8. Candès, E.J., Plain, Y.: Tight oracle inequalities for low-rank matrix recovery from a minimal number of noisy random measurements. IEEE Trans. Inf. Theory 57, 2342–2359 (2011)

    Article  MathSciNet  Google Scholar 

  9. Chandrasekaran, V., Sanghavi, S., Parrilo, P.A., Willsky, A.S.: Rank-sparsity incoherence for matrix decomposition. SIAM J. Optim. 21, 572–596 (2011)

    Article  MathSciNet  MATH  Google Scholar 

  10. Chartrand, R.: Nonconvex splitting for regularized low-rank + sparse decomposition. IEEE Trans. Signal Process. 60, 5810–5819 (2012)

    Article  MathSciNet  Google Scholar 

  11. Chen, C.H., He, B.S., Ye, Y.Y., Yuan, X.M.: The directExtension of ADMM for multi-block convex minimization problems isnot necessarily convergent. Math. Progr. Ser. A (2014). doi: 10.1007/s10107-014-0826-5

  12. Chen, Y., Jalali, A., Sanghavi, S., Caramanis, C.: Low-rank matrix recovery from errors and erasures. IEEE Trans. Inf. Theory 59, 4324–4337 (2013)

    Article  Google Scholar 

  13. Donoho, D.: For most large underdetermined systems of linear equations the minimal l1-norm solution is also the sparest solution. Commun. Pure Appl. Math. 59, 797–829 (2006)

    Article  MathSciNet  MATH  Google Scholar 

  14. Fazel, M.: Matrix Rank Minimization with Applications, Disertation for ph.D. Degree, StanfordUniversity, California (2002)

  15. Georghiades, A., Belhumeur, P., Kriegman, D.: From few to many: illumination cone models for face recognition under variable lighting and pose. IEEE Trans. Pattern Anal. Mach. Intell. 23, 643–660 (2001)

    Article  Google Scholar 

  16. Golbabaee, M., Vandergheynst, P.: Hyperspectral image compressed sensing via low-rank and joint sparse matrix recovery. In: Proceedings of IEEE International Conference on on Acoustics, Speech and Signal Processing, Kyota, pp. 2741–2744 (2011)

  17. He, B.S., Tao, M., Yuan, X.: Alternating direction method with Gaussian back substitution for separable convex programming. SIAM J. Optim. 22, 57–81 (2012)

    Article  MathSciNet  MATH  Google Scholar 

  18. He, R., Sun, Z., Tan, T., Zheng, W.S.: Recovery of corrupted low-rank matrices via half-quadratic based nonconvex minimization. In: Proceedings of IEEE International Coference on Computer Vision and Pattern Recognation, Providence, RI, pp. 2889–2896 (2011)

  19. Hsu, D., Kakade, S.M., Zhang, T.: Robust matrix decomposition with sparse corruptions. IEEE Trans. Inf. Theory 57, 7221–7234 (2011)

    Article  MathSciNet  Google Scholar 

  20. Kong, L., Xiu, N.H.: Exact low-rank matrix recovery via nonconvex schatten p-minimization. Asia-Pac. J. Op. Res. 30, 1340010 (2013)

    Article  MATH  Google Scholar 

  21. Lai, M.J., Xu, Y., Yin, W.: Improved iteratively reweighted least squares for unconstrained smoothed \(l_q\) minimization. SIAM J. Numer. Anal. 5, 927–957 (2013)

    Article  MathSciNet  MATH  Google Scholar 

  22. Lewis, A.S.: The convex analysis of unitarily invariant matrix functions. J. Convex Anal. 2, 173–183 (1995)

    MathSciNet  MATH  Google Scholar 

  23. Li, L., Huang, W., Gu, I.Y., Qi, T.: Statistical modeling of complex backgrounds for foreground object detection. IEEE Trans. Image Process. 13, 1459–1472 (2004)

    Article  Google Scholar 

  24. McCoy, M., Tropp, J.: Two proposals for robust PCA using semidefinite programming. Electr. J. Stat. 2, 1123–1160 (2011)

    Article  MathSciNet  MATH  Google Scholar 

  25. Miao, W.M.: Matrix completion model with fixed basiscoefficients and rank regularized problems with hard constraints, Disertation for ph.D. Degree. NationalUniversity of Singapore, Singapore (2013)

  26. Negahban, S., Wainwright, M.J.: Restricted strong convexity and weighted matrix completion: Optimal bounds with noise. J. Mach. Learn. Res. 13, 1665–1697 (2012)

    MathSciNet  MATH  Google Scholar 

  27. Peng, Y., Ganesh, A., Wright, J., Ma, Y.: RASL: Robust alignment via sparse and low-rank decomposition for linearly correlated images. IEEE Trans. Pattern Anal. Mach. Intell. 34, 2233–2246 (2012)

    Article  Google Scholar 

  28. Rao, G., Peng, Y., Xun, Z.B.: The robust low-rank and sparse matrix decomposition based on \(S_{1/2}\) modeling. Sci China: Inf. Sci. 6, 733–748 (2013). (in Chinese)

    Google Scholar 

  29. Raskutti, G., Wainwright, M.J., Yu, B.: Restricted eigenvalue properties for correlated Gaussian designs. J. Mach. Learn. Res. 11, 2241–2259 (2010)

    MathSciNet  MATH  Google Scholar 

  30. Recht, B., Fazel, M., Parrilo, P.: Guaranteed minimum rank solutions of matrix equations via nuclear norm minimization. SIAM Rev. 52, 471–501 (2010)

    Article  MathSciNet  MATH  Google Scholar 

  31. Rockafellar, R.T.: Convex Analysis. Princeton University Press, Princeton (1970)

    Book  MATH  Google Scholar 

  32. Shu, X., Ahuja, N.: Imaging via three-dimensional compressive sampling (3DCS). In: Proceedings of IEEE International Conference on Computer Vision, pp. 439–436 (2011)

  33. Tao, M., Yuan, X.: Recovering low-rank and sparse components of matrices from incomplete and noisy obervations. SIAM J. Optim. 21, 57–81 (2011)

    Article  MathSciNet  MATH  Google Scholar 

  34. Waters, A., Sankaranarayanan, A., Baraniuk, R.: Sparcs: recovering low-rank and sparse matrices from compressive measurements. In: Proceedings of Neural Information Processing Systems, Granada, Spain, pp. 1–9 (2011)

  35. Wright, J., Ganesh, A., Min, K., Ma, Y.: Compressive principal component pursuit. In: Proceedings of IEEE International Symposium on Information Theory, pp. 1276–1280 (2012)

  36. Xu, H., Sanghavi, S., Caramanis, C.: Robust PCA via outlier pursuit. IEEE Trans. Inf. Theory 58, 3047–3064 (2012)

    Article  MathSciNet  Google Scholar 

  37. Zhang, T.: Some sharp performance bounds for least squares regression with \(L_1\) reguarization. Ann. Stat. 37, 2109–2144 (2009)

    Article  MATH  Google Scholar 

  38. Zhang, Z., Ganesh, A., Liang, X., Ma, Y.: TILT: transform-invariant low-rank textures. Int. J. Comput. Vis. 99, 1–24 (2012)

    Article  MathSciNet  MATH  Google Scholar 

  39. Zhou, Z., Li, X., Wright, J., Candés, E., Ma, Y.: Stable principal component pursuit. In: Proceedings of IEEE International Symposium on Information Theory, Austin, TX, pp. 1518–1522 (2010)

Download references

Acknowledgments

The authors would like to thank two anonymous referees for their helpful suggestions on the revision of the original manuscript. Supported by the National Natural Science Foundation of China under project Nos. 11501219 and 11571120, the Natural Science Foundation of Guangdong Province under project Nos. 2015A030313214 and 2015A030310298, and the Fundamental Research Funds for the Central Universities(SCUT).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Shaohua Pan.

Appendix

Appendix

Lemma 6.1

Let \({\mathcal {T}}\) be the subspace given by (3). Then, for any \(Z\in {\mathbb {R}}^{n_1\times n_2}\), it holds that

$$\begin{aligned}&{\mathcal {P}}_{{\mathcal {T}}}(Z) = U_1^*(U_1^*)^{{\mathbb {T}}}Z + ZV_1^*(V_1^*)^{{\mathbb {T}}}-U_1^*(U_1^*)^{{\mathbb {T}}} ZV_1^*(V_1^*)^{{\mathbb {T}}} \ \ \mathrm{and} \\&{\mathcal {P}}_{{\mathcal {T}}^{\perp }}(Z) = U_2^*(U_2^*)^{{\mathbb {T}}}ZV_2^*(V_2^*)^{{\mathbb {T}}}. \end{aligned}$$

Proof

Let \({\mathcal {B}}:{\mathbb {R}}^{n_1\times n_2}\rightarrow {\mathbb {R}}^{n_1\times n_2}\) be defined by \({\mathcal {B}}(X)= U_1^*(U_1^*)^{{\mathbb {T}}}XV_2^* (V_2^*)^{{\mathbb {T}}}+XV_1^*(V_1^*)^{{\mathbb {T}}}\) for \(X\in {\mathbb {R}}^{n_1\times n_2}\). Then, from the definition of the subspace \({\mathcal {T}}\), it follows that

$$\begin{aligned} {\mathcal {P}}_{{\mathcal {T}}}(Z) = \mathop {\arg \min }_{X\in {\mathbb {R}}^{n_1\times n_2}}\left\{ \frac{1}{2} \big \Vert X-Z\big \Vert _F^2:\ X={\mathcal {B}}(X)\right\} . \end{aligned}$$
(20)

By the necessary optimal conditions of (20), there exists \(Y\in {\mathbb {R}}^{n_1\times n_2}\), such that

$$\begin{aligned} {\mathcal {P}}_{{\mathcal {T}}}(Z)-Z-Y+{\mathcal {B}}^*(Y)=0\ \ \mathrm{and}\ \ {\mathcal {P}}_{{\mathcal {T}}}(Z) = {\mathcal {B}} ({\mathcal {P}}_{{\mathcal {T}}}(Z)). \end{aligned}$$

By the expression of \({\mathcal {B}}\), it is easy to check that \({\mathcal {B}}\) is self-adjoint and \({\mathcal {B}}({\mathcal {B}}(X))={\mathcal {B}}(X)\) for any \(X\in {\mathbb {R}}^{n_1\times n_2}\). Together with the last equation, we deduce that

$$\begin{aligned}&{\mathcal {B}}({\mathcal {P}}_{{\mathcal {T}}}(Z)) - {\mathcal {B}}(Z)-{\mathcal {B}}(Y)+{\mathcal {B}}({\mathcal {B}}^*(Y))=0 \Longleftrightarrow {\mathcal {P}}_{{\mathcal {T}}}(Z) - {\mathcal {B}}(Z){-}{\mathcal {B}}(Y)+\,{\mathcal {B}}(Y)=0, \end{aligned}$$

which implies the expression of \({\mathcal {P}}_{{\mathcal {T}}}(Z)\). Using the same arguments, one may easily obtain the expression of \({\mathcal {P}}_{{\mathcal {T}}^{\perp }}(Z)\). Thus, we complete the proof. \(\square \)

The proof of Lemma 2.2

It suffices to consider that the right-hand side is positive (otherwise the inequality is trivial), which implies that \(\langle H,{\mathcal {A}}^*{\mathcal {A}}(G)\rangle >0\), and then \(\langle H,{\mathcal {A}}^*{\mathcal {A}}(H)\rangle >0\). Without loss of generality, we assume that \(n_1\le n_2\), \(\Omega \) takes the form

$$\begin{aligned} \Omega =\left\{ (i,j)\ |\ i+(j-1)n_1\le s^*\ \ \mathrm{with}\ i\in \{1,\ldots ,n_1\}, j\in \{1,\ldots ,\lfloor \frac{s^*}{n_1}\rfloor +1\}\right\} ,\nonumber \\ \end{aligned}$$
(21)

where \(\lfloor \frac{s^*}{n_1}\rfloor \) means the integer not more than \(\frac{s^*}{n_1}\), and all components \(G_{ij}\) for \(i+(j-1)n_1>s^*\) are arranged in a descending order by the column index, i.e.,

$$\begin{aligned} |G_{i_0+1,j_0}|\ge & {} \cdots \ge |G_{n_1,j_0}|\ge |G_{1,j_0+1}|\ge \cdots \ge |G_{n_1,j_0+1}| \\\ge & {} \cdots \ge |G_{1,n_2}|\ge \cdots \ge |G_{n_1,n_2}|, \end{aligned}$$

where \(i_0\) and \(j_0\) are positive integers such that \(i_0+n_1(j_0-1)=s^*\). For \(k=1,2,\ldots \), let

$$\begin{aligned}&\Omega _k:=\Big \{(i,j)\ |\ s^*+(k-1)t<i+(j-1)n_1\le s^*+kt\ \ \mathrm{for}\ i\in \{1,\ldots ,n_1\}, \\&\qquad \quad j\in \big \{\big \lfloor \frac{s^*}{n_1}\big \rfloor + 1,\ldots ,\big \lfloor \frac{s^*+kt}{n_1}\big \rfloor +1\big \}\Big \}, \end{aligned}$$

except that the largest column index in the last block stops at \(n_2\). By comparing with (21), it is immediate to see that \(\Omega =\Omega _0\) and \(\Lambda =\Omega _1\), and consequently \(\Gamma =\Omega _0\cup \Omega _1\). In addition, from the order of \(|G_{ij}|\) for \(i+(j-1)n_1>s^*\), we have \(\Vert {\mathcal {P}}_{\Omega _k}(G)\Vert _{\infty }\le \frac{\Vert {\mathcal {P}}_{\Omega _{k-1}}(G)\Vert _{1}}{t}\) when \(k>1\), which implies that \(\sum _{k>1}\Vert {\mathcal {P}}_{\Omega _k}(G)\Vert _{\infty }\le \frac{\Vert {\mathcal {P}}_{\Omega ^{c}}(G)\Vert _{1}}{t}\). Then, we have that

$$\begin{aligned}&\big \langle H,{\mathcal {A}}^*{\mathcal {A}}\big \rangle -\big \langle H,{\mathcal {A}}^*{\mathcal {A}} ({\mathcal {P}}_{\Gamma } (G-H))\big \rangle \\&\quad =\big \langle H,{\mathcal {A}}^*{\mathcal {A}}(H)\big \rangle + \big \langle H, {\mathcal {A}}^*{\mathcal {A}}({\mathcal {P}}_{\Gamma ^c}(G))\big \rangle \\&\quad =\big \langle H,{\mathcal {A}}^*{\mathcal {A}}(H)\big \rangle + \sum _{k>1}\big \langle H, {\mathcal {A}}^*{\mathcal {A}}({\mathcal {P}}_{\Omega _k}(G))\big \rangle \\&\quad =\big \langle H,{\mathcal {A}}^*{\mathcal {A}}(H)\big \rangle \left[ 1+\frac{\sum _{k>1}\big \langle H, {\mathcal {A}}^*{\mathcal {A}}{\mathcal {P}}_{\Omega _k}(G)\big \rangle }{\big \langle H,{\mathcal {A}}^*{\mathcal {A}}(H)\big \rangle }\right] \\&\quad \ge \big \langle H,{\mathcal {A}}^*{\mathcal {A}}(H) \big \rangle \left[ 1-\varpi (s^*+t,t) \sum _{k>1}\frac{\Vert {\mathcal {P}}_{\Omega _k}(G)\Vert _{\infty }}{\Vert H\Vert _F}\right] \\&\quad \ge \langle H,{\mathcal {A}}^*{\mathcal {A}} (H)\rangle \left[ 1-\frac{\varpi (s^* + t,t)}{t}\frac{\Vert {\mathcal {P}}_{\Omega ^c}(G)\Vert _1}{\Vert H\Vert _F}\right] \\&\quad \ge \chi _{-}(s^*+t)\left[ \Vert H\Vert _F - \frac{\varpi (s^*+t,t)}{t} \Vert {\mathcal {P}}_{\Omega ^c}(G)\Vert _1\right] \Vert H\Vert _F. \end{aligned}$$

where the second equality is using \(\Gamma =\Omega _0\cup \Omega _1\), the third one is due to \(\langle H,{\mathcal {A}}^*{\mathcal {A}}(H)\rangle >0\), and the first inequality is by the definition of \(\varpi (\cdot ,\cdot )\), Combining the last inequality with

$$\begin{aligned} \langle H,{\mathcal {A}}^*{\mathcal {A}}({\mathcal {P}}_{\Gamma }(G-H))\rangle \ge -\chi _{+}(s^*+t,t)\Vert H\Vert _F\Vert {\mathcal {P}}_{\Gamma }(G-H)\Vert _F, \end{aligned}$$

we immediately obtain the desired result. Thus, we complete the proof. \(\square \)

The proof of Theorem 4.1

Let the subspaces \({\mathcal {H}}\) and \({\mathcal {K}}\) and the index sets \(\Gamma \) and \(\Lambda \) be defined as in Lemma 4.2. Using Lemma 2.4 with \({\mathcal {J}}_1={\mathcal {H}}, {\mathcal {I}}={\mathcal {K}},G=\Delta L^{k}\) and \(H={\mathcal {P}}_{{\mathcal {K}}}(\Delta L^k)\) yields that

$$\begin{aligned}&\max \left( 0,\frac{\langle {\mathcal {P}}_{{\mathcal {K}}} (\Delta L^k), {\mathcal {Q}}(\Delta L^k)\rangle }{\Vert {\mathcal {P}}_{{\mathcal {K}}}(\Delta L^k)\Vert _F}\right) \\&\quad \ge \vartheta _{-}(2r^*+l)\Big (\Vert {\mathcal {P}}_{{\mathcal {K}}} (\Delta L^k)\Vert _F -\frac{\pi (2r^*+l,l)}{l}\Vert {\mathcal {P}}_{{\mathcal {T}}^\bot } (\Delta L^k)\Vert _*\Big ). \end{aligned}$$

Using Lemma 2.2 with \(H={\mathcal {P}}_{\Gamma }(\Delta S^k)\) and \(G=\Delta S^k\) then yields that

$$\begin{aligned}&\max \left( 0,\frac{\big \langle {\mathcal {P}}_{\Gamma }(\Delta S^k),{\mathcal {Q}}(\Delta S^k)\big \rangle }{\Vert {\mathcal {P}}_{\Gamma }(\Delta S^k)\Vert _F}\right) \\&\quad \ge \chi _-(s^*+t)\Big (\Vert {\mathcal {P}}_{\Gamma }(\Delta S^k)\Vert _F - \frac{\varpi (s^*+t,t)}{t}\Vert {\mathcal {P}}_{\Omega ^c}(\Delta S^k)\Vert _1\Big ). \end{aligned}$$

In addition, from the definitions of \(\vartheta _{+}(\cdot )\) and \(\chi _{+}(\cdot )\), it follows that

$$\begin{aligned} \big \langle {\mathcal {P}}_{{\mathcal {K}}}(\Delta L^k), {\mathcal {Q}}(\Delta L^k )\big \rangle&\le \Vert {\mathcal {A}}{\mathcal {P}}_{{\mathcal {K}}}(\Delta L^k)\Vert _F\Vert {\mathcal {A}}(\Delta L^k )\Vert _F \\&\le \sqrt{\vartheta _+(2r^*+l)}\Vert {\mathcal {P}}_{{\mathcal {K}}}(\Delta L^k)\Vert _F\Vert {\mathcal {A}}(\Delta L^k )\Vert _F,\\ \big \langle {\mathcal {P}}_{\Gamma }(\Delta S^k), {\mathcal {Q}}(\Delta S^k )\big \rangle&\le \Vert {\mathcal {A}}{\mathcal {P}}_{\Gamma }(\Delta S^k)\Vert _F\Vert {\mathcal {A}}(\Delta S^k )\Vert _F \\&\le \sqrt{\chi _+(s^*+t)}\Vert {\mathcal {P}}_{\Gamma }(\Delta S^k)\Vert _F\Vert {\mathcal {A}}(\Delta S^k )\Vert _F. \end{aligned}$$

From the above four inequalities, it is immediate to obtain that

$$\begin{aligned}&\big \Vert {\mathcal {P}}_{{\mathcal {K}}}(\Delta L^k)\big \Vert _F +\big \Vert {\mathcal {P}}_{\Gamma }(\Delta S^k)\big \Vert _F -\frac{\pi (2r^*+l,l)}{l}\big \Vert {\mathcal {P}}_{{\mathcal {T}}^\bot }(\Delta L^k)\big \Vert _* \nonumber \\&\qquad -\frac{\varpi (s^*+t,t)}{t} \big \Vert {\mathcal {P}}_{\Omega ^c}(\Delta S^k)\big \Vert _1 \nonumber \\&\quad \le \frac{\sqrt{\vartheta _+(2r^*+ l)}}{\vartheta _{-}(2r^*+l)}\big \Vert {\mathcal {A}}(\Delta L^k)\big \Vert _F + \frac{\sqrt{\chi _+(s^*+t)}}{\chi _-(s^*+t) }\big \Vert {\mathcal {A}}(\Delta S^k)\big \Vert _F. \end{aligned}$$
(22)

Let \(\beta ^k\equiv \max \Big (\frac{\pi (2r^*+l,l)}{l(1-\Vert {\mathcal {P}}_{{\mathcal {T}}^\bot }(W^{k-1})\Vert )}, \frac{\varpi (s^*+t,t)}{t\nu _{k-1}T_\mathrm{min}^{k-1}}\Big )\). From Lemma 4.1 and the definition of \(\widehat{\gamma }\),

$$\begin{aligned}&\frac{\pi (2r^*+l,l)}{l}\big \Vert \mathcal{P}_{\mathcal{T}^\bot }(\Delta L^k)\big \Vert _* +\frac{\varpi (s^*+t,t)}{t} \big \Vert \mathcal{P}_{\Omega ^c}(\Delta S^k)\big \Vert _1 \\&\quad \le \beta ^k\left( (1-\Vert {\mathcal {P}}_{{\mathcal {T}}^\bot } (W^{k-1})\Vert )\Vert \mathcal{P}_{\mathcal{T}^\bot }(\Delta L^k)\Vert _* + \nu _{k-1}T_\mathrm{min}^{k-1}\Vert \mathcal{P}_{\Omega ^c} (\Delta S^k)\Vert _1\right) \\&\quad \le \beta ^k\left( \big \Vert W^{k-1} - U_1^*(V_1^*)^{\mathbb {T}}\big \Vert \big \Vert {\mathcal {P}}_{{\mathcal {T}}}(\Delta L^k)\big \Vert _* + \nu _{k-1}T_\mathrm{max}^{k-1}\Vert {\mathcal {P}}_\Omega (\Delta S^k) \Vert _1\right) \\&\quad \le \beta ^k\left( \sqrt{2r^*}\big \Vert W^{k-1} - U_1^*(V_1^*)^{\mathbb {T}}\big \Vert \big \Vert {\mathcal {P}}_{{\mathcal {T}}}(\Delta L^k)\big \Vert _F + \nu _{k-1}\sqrt{s^*} T_\mathrm{max}^{k-1} \Vert {\mathcal {P}}_\Omega (\Delta S^k)\Vert _F\right) , \\&\quad \le \widehat{\gamma }\left( \sqrt{2r^*} \xi _{k-1}\big \Vert {\mathcal {P}}_{{\mathcal {K}}}(\Delta L^k)\big \Vert _F + \sqrt{s^*}\eta _{k-1}\Vert {\mathcal {P}}_\Gamma (\Delta S^k)\Vert _F\right) , \end{aligned}$$

where the last inequality is due to \(\beta ^k\le \frac{\widehat{\gamma }}{\min [(1-\Vert {\mathcal {P}}_{{\mathcal {T}}^{\perp }}(W^{k-1})\Vert ),\,\nu _{k-1}T_\mathrm{min}^{k-1}]}\) and \(\Gamma =\Omega \cup \Lambda \). Together with equation (22), we obtain that

$$\begin{aligned}&(1-\widehat{\gamma }\sqrt{2r^*}\xi _{k-1})\Vert {\mathcal {P}}_{{\mathcal {K}}}(\Delta L^k)\Vert _F + (1-\widehat{\gamma }\sqrt{s^*}\eta _{k-1})\Vert {\mathcal {P}}_\Gamma (\Delta S^k)\Vert _F \\&\quad \le \frac{\sqrt{\vartheta _+(2r^*+l)}}{\vartheta _{-}(2r^*+l)} \Vert {\mathcal {A}}(\Delta L^k)\Vert _F +\frac{\sqrt{\chi _+(s^*+t)}}{\chi _-(s^*+t) }\Vert {\mathcal {A}}(\Delta S^k)\Vert _F \\&\quad \le \widetilde{\gamma }\big (\Vert {\mathcal {A}}(\Delta L^k)\Vert _F+\Vert {\mathcal {A}}(\Delta S^k)\Vert _F\big ). \end{aligned}$$

Notice that Assumption 4.1 holds and \(0\le \xi _{k-1}<\frac{1}{c_1}\) and \(0\le \eta _{k-1}<\frac{1}{c_2}\) imply that \(1-\widehat{\gamma }\sqrt{2r^*}\xi _{k-1}>0\) and \(1-\widehat{\gamma }\sqrt{s^*}\eta _{k-1}>0\). By the definitions of \(a_{k-1}\) and \(b_{k-1}\), we have

$$\begin{aligned} \Vert {\mathcal {P}}_{{\mathcal {K}}}(\Delta L^k)\Vert _F\le a_{k-1}\big (\Vert {\mathcal {A}}(\Delta L^k)\Vert _F + \Vert {\mathcal {A}}(\Delta S^k)\Vert _F\big )\nonumber \\ \Vert {\mathcal {P}}_\Gamma (\Delta S^k)\Vert _F\le b_{k-1}\big (\Vert {\mathcal {A}}(\Delta L^k)\Vert _F + \Vert {\mathcal {A}}(\Delta S^k)\Vert _F\big ). \end{aligned}$$
(23)

Next we focus on characterizing the bounds of \(\Vert {\mathcal {A}}(\Delta L^k)\Vert _F\) and \(\Vert {\mathcal {A}}(\Delta S^k)\Vert _F\). Since

$$\begin{aligned} \Vert {\mathcal {A}}(\Delta L^k)\Vert _F^2 + \Vert {\mathcal {A}}(\Delta S^k)\Vert _F^2&=\Vert {\mathcal {A}}(\Delta L^k+\Delta S^k)\Vert _F^2-2\langle {\mathcal {A}}^*{\mathcal {A}}(\Delta L^k),\Delta S^k\rangle , \\&\le 4\delta ^2 -2\langle {\mathcal {A}}^*{\mathcal {A}}(\Delta L^k),\Delta S^k\rangle , \end{aligned}$$

we only need to bound \(|\langle \mathcal{Q}(\Delta L^k),\Delta S^k\rangle |\). Notice that \(\Vert \Delta L^k\Vert _\infty \le 2\tau \). We have that

$$\begin{aligned}&|\langle \mathcal{Q}(\Delta L^k),\Delta S^k\rangle | \le \Vert \mathcal{Q}(\Delta L^k)\Vert _\infty \Vert \Delta S^k\Vert _1 \le \Vert {\mathcal {Q}}\Vert _\infty \Vert \Delta L^k\Vert _\infty \Vert \Delta S^k\Vert _1\\&\quad \le 2\tau \Vert {\mathcal {Q}}\Vert _\infty \Vert \Delta S^k\Vert _1. \end{aligned}$$

In addition, from Lemma 4.1 and the definitions of \(\zeta _{k-1}\) and \(\mu _{k-1}\), it follows that

$$\begin{aligned} \Vert \mathcal{P}_{\Omega ^c}(\Delta S^k)\Vert _1 \le \zeta _{k-1}\big \Vert {\mathcal {P}}_{{\mathcal {T}}}(\Delta L^k)\big \Vert _* +\mu _{k-1}\Vert {\mathcal {P}}_\Omega (\Delta S^k)\Vert _1. \end{aligned}$$

From the last three inequalities, it follows that

$$\begin{aligned}&\Vert {\mathcal {A}}(\Delta L^k)\Vert _F^2 + \Vert {\mathcal {A}}(\Delta S^k)\Vert _F^2 \le 4\delta ^2 + 4\tau \Vert \mathcal{Q}\Vert _\infty \Vert \Delta S^k\Vert _1\nonumber \\&\quad \le 4\delta ^2 + 4\tau \Vert \mathcal{Q}\Vert _\infty \Vert \mathcal{P}_{\Omega ^c}(\Delta S^k)\Vert _1 + 4\tau \Vert \mathcal{Q}\Vert _\infty \Vert \mathcal{P}_\Omega (\Delta S^k)\Vert _1\nonumber \\&\quad \le 4\delta ^2 + 4\tau \Vert \mathcal{Q}\Vert _\infty \zeta _{k-1}\big \Vert {\mathcal {P}}_{{\mathcal {T}}}(\Delta L^k)\big \Vert _* + 4\tau \Vert \mathcal{Q}\Vert _\infty (1+\mu _{k-1})\Vert {\mathcal {P}}_\Omega (\Delta S^k)\Vert _1\nonumber \\&\quad \le 4\delta ^2 + 4\tau \Vert \mathcal{Q}\Vert _\infty \big (\zeta _{k-1}\sqrt{2r^*}\Vert \Delta L^k\Vert _F + \sqrt{s^*}(1+\mu _{k-1})\Vert \Delta S^k\Vert _F\big ). \end{aligned}$$
(24)

Combining inequality (23) with inequality (24), we obtain that

$$\begin{aligned}&\Vert {\mathcal {P}}_{{\mathcal {K}}}(\Delta L^k)\Vert _F \\&\quad \le a_{k-1}\sqrt{4\delta ^2 + 4\tau \Vert \mathcal{Q}\Vert _\infty \big (\zeta _{k-1}\sqrt{2r^*}\Vert \Delta L^k\Vert _F + \sqrt{s^*}(1+\mu _{k-1})\Vert \Delta S^k\Vert _F\big )}, \\&\Vert {\mathcal {P}}_{\Gamma }(\Delta S^k)\Vert _F \\&\quad \le b_{k-1}\sqrt{4\delta ^2 + 4\tau \Vert \mathcal{Q}\Vert _\infty \big (\zeta _{k-1}\sqrt{2r^*}\Vert \Delta L^k\Vert _F + \sqrt{s^*}(1+\mu _{k-1})\Vert \Delta S^k\Vert _F\big )}. \end{aligned}$$

Now substituting the bounds of \(\Vert {\mathcal {P}}_{{\mathcal {K}}}(\Delta L^k)\Vert _F\) and \(\Vert {\mathcal {P}}_{\Gamma }(\Delta S^k)\Vert _F\) into (12) gives that

$$\begin{aligned}&\Vert \Delta L^k\Vert _F^2 + \Vert \Delta S^k\Vert _F^2 \\&\quad \le \theta _{k-1}\big [4\delta ^2 + 4\tau \Vert \mathcal{Q}\Vert _\infty \big (\zeta _{k-1}\sqrt{2r^*}\Vert \Delta L^k\Vert _F + \sqrt{s^*}(1+\mu _{k-1})\Vert \Delta S^k\Vert _F\big )\big ]. \end{aligned}$$

Notice that \(x^2+y^2\le ax+by +c\) for \(a,b,c\in {\mathbb {R}}_{+}\) implies \((x+y)\le \frac{a+b}{2}+\sqrt{2}\sqrt{c+\frac{a^2+b^2}{4}}\). Therefore, the last equation implies that \(\Vert \Delta L^k\Vert _F+\Vert \Delta S^k\Vert _F\le \frac{a+b}{2}+\sqrt{2}\sqrt{c+\frac{a^2+b^2}{4}}\) with

$$\begin{aligned} a=4\tau \Vert \mathcal{Q}\Vert _\infty \sqrt{2r^*}\theta _{k-1}\zeta _{k-1},\ b=4\tau \Vert \mathcal{Q}\Vert _\infty \sqrt{s^*}\theta _{k-1}(1+\mu _{k-1}),\ c=4\delta ^2\theta _{k-1}. \end{aligned}$$

A suitable rearrangement yields the desired result. The proof is then completed. \(\square \)

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Han, L., Bi, S. & Pan, S. Two-stage convex relaxation approach to least squares loss constrained low-rank plus sparsity optimization problems. Comput Optim Appl 64, 119–148 (2016). https://doi.org/10.1007/s10589-015-9797-6

Download citation

  • Received:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10589-015-9797-6

Keywords

Mathematics Subject Classification

Navigation