Abstract
This paper is concerned with the least squares loss constrained low-rank plus sparsity optimization problems that seek a low-rank matrix and a sparse matrix by minimizing a positive combination of the rank function and the zero norm over a least squares constraint set describing the observation or prior information on the target matrix pair. For this class of NP-hard optimization problems, we propose a two-stage convex relaxation approach by the majorization for suitable locally Lipschitz continuous surrogates, which have a remarkable advantage in reducing the error yielded by the popular nuclear norm plus \(\ell _1\)-norm convex relaxation method. Also, under a suitable restricted eigenvalue condition, we establish a Frobenius norm error bound for the optimal solution of each stage and show that the error bound of the first stage convex relaxation (i.e. the nuclear norm plus \(\ell _1\)-norm convex relaxation), can be reduced much by the second stage convex relaxation, thereby providing the theoretical guarantee for the two-stage convex relaxation approach. We also verify the efficiency of the proposed approach by applying it to some random test problems and some problems with real data arising from specularity removal from face images, and foreground/background separation from surveillance videos.
Similar content being viewed by others
References
Agarwal, A., Negahban, S., Wainwright, M.J.: Noisy matrix decomposition via convex relaxation: optimal rates in high deimensions. Ann. Stat. 40, 1171–1197 (2012)
Aybat, N.S., Ma, S., Goldfarb, D.: Noisy efficient algorithms for robust and stable principal component pursuit problems. Comput. Optim. Appl. 58, 1–29 (2014)
Bach, F.R.: Consistency of trace norm minimization. J. Mach. Learn. Res. 8, 1019–1048 (2008)
Bi, S.J.: Study for multi-stage convex relaxation approach tolow-rank optimization problems, Disertation for ph.D. Degree, South China University of Technology, Guangzhou (2014)
Candès, E.J., Tao, T.: Decoding by linear programming. IEEE Trans. Inf. Theor. 51, 4203–4215 (2005)
Candès, E.J., Recht, B.: Exact matrix completion via convex optimization. Found. Comput. Math. 9, 717–772 (2009)
Candès, E.J., Li, X., Ma, Y., Wright, J.: Robust principal component analysis? J. ACM 58, 1–37 (2011)
Candès, E.J., Plain, Y.: Tight oracle inequalities for low-rank matrix recovery from a minimal number of noisy random measurements. IEEE Trans. Inf. Theory 57, 2342–2359 (2011)
Chandrasekaran, V., Sanghavi, S., Parrilo, P.A., Willsky, A.S.: Rank-sparsity incoherence for matrix decomposition. SIAM J. Optim. 21, 572–596 (2011)
Chartrand, R.: Nonconvex splitting for regularized low-rank + sparse decomposition. IEEE Trans. Signal Process. 60, 5810–5819 (2012)
Chen, C.H., He, B.S., Ye, Y.Y., Yuan, X.M.: The directExtension of ADMM for multi-block convex minimization problems isnot necessarily convergent. Math. Progr. Ser. A (2014). doi: 10.1007/s10107-014-0826-5
Chen, Y., Jalali, A., Sanghavi, S., Caramanis, C.: Low-rank matrix recovery from errors and erasures. IEEE Trans. Inf. Theory 59, 4324–4337 (2013)
Donoho, D.: For most large underdetermined systems of linear equations the minimal l1-norm solution is also the sparest solution. Commun. Pure Appl. Math. 59, 797–829 (2006)
Fazel, M.: Matrix Rank Minimization with Applications, Disertation for ph.D. Degree, StanfordUniversity, California (2002)
Georghiades, A., Belhumeur, P., Kriegman, D.: From few to many: illumination cone models for face recognition under variable lighting and pose. IEEE Trans. Pattern Anal. Mach. Intell. 23, 643–660 (2001)
Golbabaee, M., Vandergheynst, P.: Hyperspectral image compressed sensing via low-rank and joint sparse matrix recovery. In: Proceedings of IEEE International Conference on on Acoustics, Speech and Signal Processing, Kyota, pp. 2741–2744 (2011)
He, B.S., Tao, M., Yuan, X.: Alternating direction method with Gaussian back substitution for separable convex programming. SIAM J. Optim. 22, 57–81 (2012)
He, R., Sun, Z., Tan, T., Zheng, W.S.: Recovery of corrupted low-rank matrices via half-quadratic based nonconvex minimization. In: Proceedings of IEEE International Coference on Computer Vision and Pattern Recognation, Providence, RI, pp. 2889–2896 (2011)
Hsu, D., Kakade, S.M., Zhang, T.: Robust matrix decomposition with sparse corruptions. IEEE Trans. Inf. Theory 57, 7221–7234 (2011)
Kong, L., Xiu, N.H.: Exact low-rank matrix recovery via nonconvex schatten p-minimization. Asia-Pac. J. Op. Res. 30, 1340010 (2013)
Lai, M.J., Xu, Y., Yin, W.: Improved iteratively reweighted least squares for unconstrained smoothed \(l_q\) minimization. SIAM J. Numer. Anal. 5, 927–957 (2013)
Lewis, A.S.: The convex analysis of unitarily invariant matrix functions. J. Convex Anal. 2, 173–183 (1995)
Li, L., Huang, W., Gu, I.Y., Qi, T.: Statistical modeling of complex backgrounds for foreground object detection. IEEE Trans. Image Process. 13, 1459–1472 (2004)
McCoy, M., Tropp, J.: Two proposals for robust PCA using semidefinite programming. Electr. J. Stat. 2, 1123–1160 (2011)
Miao, W.M.: Matrix completion model with fixed basiscoefficients and rank regularized problems with hard constraints, Disertation for ph.D. Degree. NationalUniversity of Singapore, Singapore (2013)
Negahban, S., Wainwright, M.J.: Restricted strong convexity and weighted matrix completion: Optimal bounds with noise. J. Mach. Learn. Res. 13, 1665–1697 (2012)
Peng, Y., Ganesh, A., Wright, J., Ma, Y.: RASL: Robust alignment via sparse and low-rank decomposition for linearly correlated images. IEEE Trans. Pattern Anal. Mach. Intell. 34, 2233–2246 (2012)
Rao, G., Peng, Y., Xun, Z.B.: The robust low-rank and sparse matrix decomposition based on \(S_{1/2}\) modeling. Sci China: Inf. Sci. 6, 733–748 (2013). (in Chinese)
Raskutti, G., Wainwright, M.J., Yu, B.: Restricted eigenvalue properties for correlated Gaussian designs. J. Mach. Learn. Res. 11, 2241–2259 (2010)
Recht, B., Fazel, M., Parrilo, P.: Guaranteed minimum rank solutions of matrix equations via nuclear norm minimization. SIAM Rev. 52, 471–501 (2010)
Rockafellar, R.T.: Convex Analysis. Princeton University Press, Princeton (1970)
Shu, X., Ahuja, N.: Imaging via three-dimensional compressive sampling (3DCS). In: Proceedings of IEEE International Conference on Computer Vision, pp. 439–436 (2011)
Tao, M., Yuan, X.: Recovering low-rank and sparse components of matrices from incomplete and noisy obervations. SIAM J. Optim. 21, 57–81 (2011)
Waters, A., Sankaranarayanan, A., Baraniuk, R.: Sparcs: recovering low-rank and sparse matrices from compressive measurements. In: Proceedings of Neural Information Processing Systems, Granada, Spain, pp. 1–9 (2011)
Wright, J., Ganesh, A., Min, K., Ma, Y.: Compressive principal component pursuit. In: Proceedings of IEEE International Symposium on Information Theory, pp. 1276–1280 (2012)
Xu, H., Sanghavi, S., Caramanis, C.: Robust PCA via outlier pursuit. IEEE Trans. Inf. Theory 58, 3047–3064 (2012)
Zhang, T.: Some sharp performance bounds for least squares regression with \(L_1\) reguarization. Ann. Stat. 37, 2109–2144 (2009)
Zhang, Z., Ganesh, A., Liang, X., Ma, Y.: TILT: transform-invariant low-rank textures. Int. J. Comput. Vis. 99, 1–24 (2012)
Zhou, Z., Li, X., Wright, J., Candés, E., Ma, Y.: Stable principal component pursuit. In: Proceedings of IEEE International Symposium on Information Theory, Austin, TX, pp. 1518–1522 (2010)
Acknowledgments
The authors would like to thank two anonymous referees for their helpful suggestions on the revision of the original manuscript. Supported by the National Natural Science Foundation of China under project Nos. 11501219 and 11571120, the Natural Science Foundation of Guangdong Province under project Nos. 2015A030313214 and 2015A030310298, and the Fundamental Research Funds for the Central Universities(SCUT).
Author information
Authors and Affiliations
Corresponding author
Appendix
Appendix
Lemma 6.1
Let \({\mathcal {T}}\) be the subspace given by (3). Then, for any \(Z\in {\mathbb {R}}^{n_1\times n_2}\), it holds that
Proof
Let \({\mathcal {B}}:{\mathbb {R}}^{n_1\times n_2}\rightarrow {\mathbb {R}}^{n_1\times n_2}\) be defined by \({\mathcal {B}}(X)= U_1^*(U_1^*)^{{\mathbb {T}}}XV_2^* (V_2^*)^{{\mathbb {T}}}+XV_1^*(V_1^*)^{{\mathbb {T}}}\) for \(X\in {\mathbb {R}}^{n_1\times n_2}\). Then, from the definition of the subspace \({\mathcal {T}}\), it follows that
By the necessary optimal conditions of (20), there exists \(Y\in {\mathbb {R}}^{n_1\times n_2}\), such that
By the expression of \({\mathcal {B}}\), it is easy to check that \({\mathcal {B}}\) is self-adjoint and \({\mathcal {B}}({\mathcal {B}}(X))={\mathcal {B}}(X)\) for any \(X\in {\mathbb {R}}^{n_1\times n_2}\). Together with the last equation, we deduce that
which implies the expression of \({\mathcal {P}}_{{\mathcal {T}}}(Z)\). Using the same arguments, one may easily obtain the expression of \({\mathcal {P}}_{{\mathcal {T}}^{\perp }}(Z)\). Thus, we complete the proof. \(\square \)
The proof of Lemma 2.2
It suffices to consider that the right-hand side is positive (otherwise the inequality is trivial), which implies that \(\langle H,{\mathcal {A}}^*{\mathcal {A}}(G)\rangle >0\), and then \(\langle H,{\mathcal {A}}^*{\mathcal {A}}(H)\rangle >0\). Without loss of generality, we assume that \(n_1\le n_2\), \(\Omega \) takes the form
where \(\lfloor \frac{s^*}{n_1}\rfloor \) means the integer not more than \(\frac{s^*}{n_1}\), and all components \(G_{ij}\) for \(i+(j-1)n_1>s^*\) are arranged in a descending order by the column index, i.e.,
where \(i_0\) and \(j_0\) are positive integers such that \(i_0+n_1(j_0-1)=s^*\). For \(k=1,2,\ldots \), let
except that the largest column index in the last block stops at \(n_2\). By comparing with (21), it is immediate to see that \(\Omega =\Omega _0\) and \(\Lambda =\Omega _1\), and consequently \(\Gamma =\Omega _0\cup \Omega _1\). In addition, from the order of \(|G_{ij}|\) for \(i+(j-1)n_1>s^*\), we have \(\Vert {\mathcal {P}}_{\Omega _k}(G)\Vert _{\infty }\le \frac{\Vert {\mathcal {P}}_{\Omega _{k-1}}(G)\Vert _{1}}{t}\) when \(k>1\), which implies that \(\sum _{k>1}\Vert {\mathcal {P}}_{\Omega _k}(G)\Vert _{\infty }\le \frac{\Vert {\mathcal {P}}_{\Omega ^{c}}(G)\Vert _{1}}{t}\). Then, we have that
where the second equality is using \(\Gamma =\Omega _0\cup \Omega _1\), the third one is due to \(\langle H,{\mathcal {A}}^*{\mathcal {A}}(H)\rangle >0\), and the first inequality is by the definition of \(\varpi (\cdot ,\cdot )\), Combining the last inequality with
we immediately obtain the desired result. Thus, we complete the proof. \(\square \)
The proof of Theorem 4.1
Let the subspaces \({\mathcal {H}}\) and \({\mathcal {K}}\) and the index sets \(\Gamma \) and \(\Lambda \) be defined as in Lemma 4.2. Using Lemma 2.4 with \({\mathcal {J}}_1={\mathcal {H}}, {\mathcal {I}}={\mathcal {K}},G=\Delta L^{k}\) and \(H={\mathcal {P}}_{{\mathcal {K}}}(\Delta L^k)\) yields that
Using Lemma 2.2 with \(H={\mathcal {P}}_{\Gamma }(\Delta S^k)\) and \(G=\Delta S^k\) then yields that
In addition, from the definitions of \(\vartheta _{+}(\cdot )\) and \(\chi _{+}(\cdot )\), it follows that
From the above four inequalities, it is immediate to obtain that
Let \(\beta ^k\equiv \max \Big (\frac{\pi (2r^*+l,l)}{l(1-\Vert {\mathcal {P}}_{{\mathcal {T}}^\bot }(W^{k-1})\Vert )}, \frac{\varpi (s^*+t,t)}{t\nu _{k-1}T_\mathrm{min}^{k-1}}\Big )\). From Lemma 4.1 and the definition of \(\widehat{\gamma }\),
where the last inequality is due to \(\beta ^k\le \frac{\widehat{\gamma }}{\min [(1-\Vert {\mathcal {P}}_{{\mathcal {T}}^{\perp }}(W^{k-1})\Vert ),\,\nu _{k-1}T_\mathrm{min}^{k-1}]}\) and \(\Gamma =\Omega \cup \Lambda \). Together with equation (22), we obtain that
Notice that Assumption 4.1 holds and \(0\le \xi _{k-1}<\frac{1}{c_1}\) and \(0\le \eta _{k-1}<\frac{1}{c_2}\) imply that \(1-\widehat{\gamma }\sqrt{2r^*}\xi _{k-1}>0\) and \(1-\widehat{\gamma }\sqrt{s^*}\eta _{k-1}>0\). By the definitions of \(a_{k-1}\) and \(b_{k-1}\), we have
Next we focus on characterizing the bounds of \(\Vert {\mathcal {A}}(\Delta L^k)\Vert _F\) and \(\Vert {\mathcal {A}}(\Delta S^k)\Vert _F\). Since
we only need to bound \(|\langle \mathcal{Q}(\Delta L^k),\Delta S^k\rangle |\). Notice that \(\Vert \Delta L^k\Vert _\infty \le 2\tau \). We have that
In addition, from Lemma 4.1 and the definitions of \(\zeta _{k-1}\) and \(\mu _{k-1}\), it follows that
From the last three inequalities, it follows that
Combining inequality (23) with inequality (24), we obtain that
Now substituting the bounds of \(\Vert {\mathcal {P}}_{{\mathcal {K}}}(\Delta L^k)\Vert _F\) and \(\Vert {\mathcal {P}}_{\Gamma }(\Delta S^k)\Vert _F\) into (12) gives that
Notice that \(x^2+y^2\le ax+by +c\) for \(a,b,c\in {\mathbb {R}}_{+}\) implies \((x+y)\le \frac{a+b}{2}+\sqrt{2}\sqrt{c+\frac{a^2+b^2}{4}}\). Therefore, the last equation implies that \(\Vert \Delta L^k\Vert _F+\Vert \Delta S^k\Vert _F\le \frac{a+b}{2}+\sqrt{2}\sqrt{c+\frac{a^2+b^2}{4}}\) with
A suitable rearrangement yields the desired result. The proof is then completed. \(\square \)
Rights and permissions
About this article
Cite this article
Han, L., Bi, S. & Pan, S. Two-stage convex relaxation approach to least squares loss constrained low-rank plus sparsity optimization problems. Comput Optim Appl 64, 119–148 (2016). https://doi.org/10.1007/s10589-015-9797-6
Received:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10589-015-9797-6