Abstract
In this paper we consider the rank and zero norm regularized least squares loss minimization problem with a spectral norm ball constraint. For this class of NP-hard optimization problems, we propose a two-stage convex relaxation approach by majorizing some suitable locally Lipschitz continuous surrogates. Furthermore, the Frobenius norm error bound for the optimal solution of each stage is characterized and the theoretical guarantee is established for the two-stage convex relaxation approach by showing that the error bound of the first stage convex relaxation (i.e., the nuclear norm and \(\ell _1\)-norm regularized minimization problem), can be reduced much by the second stage convex relaxation under a suitable restricted eigenvalue condition. Also, we verify the efficiency of the proposed approach by applying it to some random test problems and some real problems.
Similar content being viewed by others
References
Agarwal, A., Negahban, S., Wainwright, M.J.: Noisy matrix decomposition via convex relaxation: optimal rates in high deimensions. Ann. Stat. 40, 1171–1197 (2012)
Aybat, N.S., Ma, S., Goldfarb, D.: Noisy efficient algorithms for robust and stable principal component pursuit problems. Comput. Optim. Appl. 58, 1–29 (2014)
Beck, A., Teboulle, M.: A fast iterative shrinkage-thresholding algorithm for linear inverse problems. SIAM J. Imaging Sci. 2(1), 183–202 (2009)
Bi, S.J., Pan, S.H.: Multi-satge convex relaxation approach based on equivalent MPGCC to rank regularized minimization. SIAM J. Control Optim. 55, 2493–2518 (2017)
Cand‘es, E.J., Li, X., Ma, Y., Wright, J.: Robust principal component analysis? J. ACM 58, 1–37 (2011)
Chandrasekaran, V., Sanghavi, S., Parrilo, P.A., Willsky, A.S.: Rank-sparsity incoherence for matrix decomposition. SIAM J. Optim. 21, 572–596 (2011)
Chandrasekaran, V., Parrilo, P., Willsky, A.: Latent variable graphical model selection via convex optimization. Ann. Stat. 40, 1935–1967 (2012)
Chartrand, R.: Nonconvex splitting for regularized low-rank + sparse decomposition. IEEE Trans. Signal Process. 60, 5810–5819 (2012)
Dong, W., Shi, G., Hu, X., Ma, Y.: Nonlocal sparse and low-rank regularization for optimal flow estimation. IEEE Trans. Image Process. 23, 4527–4538 (2014)
Fazel, M.: Matrix rank minimization with applications. Stanford University, Disertation for Ph.D. Degree. State of California (2002)
Golbabaee, M., Vandergheynst, P.: Hyperspectral image compressed sensing via low-rank and joint sparse matrix recovery, In: Proceedings of IEEE International Conference on on Acoustics, Speech and Signal Processing, Kyota, pp. 2741–2744 (2011)
Han, L., Bi, S., Pan, S.H.: Two-stage convex relaxation approcha to least squares loss constrained low-rank plus sparsity optimization. Comput. Optim. Appl. 64, 119–148 (2016)
He, R., Sun, Z., Tan, T., Zheng, W.S.: Recovery of corrupted low-rank matrices via half-quadratic based nonconvex minimization, In: Proceedings of IEEE International Coference on Computer Vision and Pattern Recognation, Providence, RI, pp. 2889–2896 (2011)
Hsu, D., Kakade, S.M., Zhang, T.: Robust matrix decomposition with sparse corruptions. IEEE Trans. Inf. Theory 57, 7221–7234 (2011)
Kong, L., Xiu, N.H.: Exact low-rank matrix recovery via nonconvex schatten p-minimization. Asia-Pacific J. Oper. Res. 30, 1340010 (2013)
Lai, M.J., Xu, Y., Yin, W.: Improved iteratively reweighted least squares for unconstrained smoothed \(l_q\) minimization. SIAM J. Numer. Anal. 5, 927–957 (2013)
Lewis, A.S.: The convex analysis of unitarily invariant matrix functions. J. Convex Anal. 2, 173–183 (1995)
Li, X., Ng, M.K., Yuan, X.: Median filtering-based methods for static background extraction from surveillance video. Numer. Linear Algebra Appl. 22, 1–10 (2015)
McCoy, M., Tropp, J.: Two proposals for robust PCA using semidefinite programming. Electron. J. Stat. 2, 1123–1160 (2011)
Miao, W.M.: Matrix completion model with fixed basis coefficients and rank regularized problems with hard constraints. National University of Singapore, Disertation for Ph.D. Degree. Republic of Singapore (2013)
Miao, W.M., Pan, S.H., Sun, D.F.: A rank-corrected procedure for matrix completion with fixed basis coefficients. Math. Program. 159, 289–338 (2016)
Peng, Y., Ganesh, A., Wright, J., Ma, Y.: RASL: Robust alignment via sparse and low-rank decomposition for linearly correlated images. IEEE Trans. Pattern Anal. Mach. Intell. 34, 2233–2246 (2012)
Rao, G., Peng, Y., Xun, Z.B.: The robust low-rank and sparse matrix decomposition based on \(S_{1/2}\) modeling. Sci. China Inf. Sci. 6, 733–748 (2013). (in Chinese)
Rockafellar, R.T.: Convex Analysis, Princeton. Princeton University Press, Princeton (1970)
Shu, X., Ahuja, N.: Imaging via three-dimensional compressive sampling (3DCS). In: Proceedings of IEEE International Conference on Computer Vision, pp. 439–436 (2011)
Tao, M., Yuan, X.: Recovering low-rank and sparse components of matrices from incomplete and noisy obervations. SIAM J. Optim. 21, 57–81 (2011)
Waters, A., Sankaranarayanan, A., Baraniuk, R.: Sparcs: recovering low-rank and sparse matrices from compressive measurements. In: Proceedings of Neural Information Processing Systems, Granada, Spain, pp. 1–9 (2011)
Wright, J., Ganesh, A., Min, K., Ma, Y.: Compressive prinipalompo-nent pursuit. In: IEEE International Symposium on Information Theory Proeedings, pp. 1276–1280 (2012)
Yang, L., Pong, T.K., Chen, X.: Alternating direction method of multipliers for a class of nonconvex and nonsmooth problems with applications to background/foreground extraction. SIAM J. Imaging Sci. 10, 74–110 (2017)
Zhang, T.: Some sharp performance bounds for least squares regression with \(L_1\) reguarization. Ann. Stat. 37, 2109–2144 (2009)
Zhang, Y., Mu, C., Kuo, H., Wright, J.: Towards guaranteed illumination models for nonconvex objects.In: In International Conference on Computer Vision (2013)
Zhang, Z., Ganesh, A., Liang, X., Ma, Y.: TILT: Transform-invariant low-rank textures. Int. J. Comput. Vis. 99, 1–24 (2012)
Zhou, Z., Li, X., Wright, J., Candés, E., Ma, Y.: Stable principal component pursuit. In: Proceedings of IEEE International Symposium on Information Theory, Austin, TX, pp. 1518–1522 (2010)
Author information
Authors and Affiliations
Corresponding author
Additional information
Supported by the National Natural Science Foundation of China under Project Nos. 11501219 and 11701186, the Guangdong Natural Science Funds under Project Nos. 2015A030310298 and 2017A030310418, the Open Project Program of the State Key Lab of CAD and CG (Grant No. A1613), Zhejiang University, and the Fundamental Research Funds for the Central Universities (SCUT).
Appendix
Appendix
1.1 The proof of Lemma 2.3
We first show the proof of the inequality (5). Without loss of generality, we assume that \(n_1<n_2\) and the index set \(\Omega \) takes the form
and all components \(G_{ij}\) with \(i+n_1(j-1)>s^*\) are arranged in a descending order of \(|G_{ij}|\):
where \(i_0\) and \(j_0\) are nonnegative integers such that \(i_0+n_1(j_0-1)=s^*\). For \(k=1,2,\ldots \), let
except that the largest column index in the last block stops at \(n_2\). From the definition of these index sets, it is immediate to see that \(\Gamma =\Omega \cup \Omega _1\). Notice that \(\Vert \mathcal {P}_{\Omega _k}(G)\Vert _{\infty }\le \frac{\Vert \mathcal {P}_{\Omega _{k-1}}(G)\Vert _{1}}{t}\) when \(k>1\), which implies that \(\sum _{k>1}\Vert \mathcal {P}_{\Omega _k}(G)\Vert _{\infty }\le \frac{\Vert \mathcal {P}_{\Omega }(G)\Vert _{1}}{t}\). Hence,
Combining the last inequality with the following inequality
we immediately obtain the desired result (5).
Next, we give the proof of the inequality (6). Let \(\widehat{H}\) be an arbitrary matrix from \(\mathcal {I}\). By the definition of \(\vartheta _+(2r^* +l)\), \(\Vert \mathcal {A}\mathcal {P}_{\mathcal {I}}(\widehat{H}-\widehat{G})\Vert _F^2 \le \vartheta _+(2r^* +l)\Vert \mathcal {P}_{\mathcal {I}}(\widehat{H}-\widehat{G})\Vert _F^2\) and \(\Vert \mathcal {A}(\widehat{H})\Vert _F^2\le \vartheta _+(2r^* +l)\Vert \widehat{H}\Vert _F^2\). Then,
We proceed the arguments by considering the following two cases.
Case 1: \(\mathrm{rank}((U_{2}^*)^{\mathbb {T}}\widehat{G}V_{2}^*)\le l\le \min (n_1 - r^*,n_2 - r^* )\). Now, by the expression of \(\mathcal {P}_{\mathcal {J}_1}\), we have
where the last two equalities are due to \((U_{2}^*)^{\mathbb {T}}\widehat{G}V_{2}^* =P_{1}[\mathrm{Diag}\left( \sigma ((U_{2}^2)^{\mathbb {T}}\widehat{G}V_{2}^*)\right) \ \ 0]Q_{1}^{\mathbb {T}}\). Note that \(\mathcal {P}_{\mathcal {T}^{\perp }}(\widehat{G})=U_{2}^*(U_{2}^*)^{\mathbb {T}}\widehat{G}V_{2}^*(V_{2}^*)^{\mathbb {T}}\) by (4). So, \(\mathcal {P}_{\mathcal {T}^{\perp }}(\widehat{G})=\mathcal {P}_{\mathcal {J}_1}(\widehat{G})\), i.e., \(\widehat{G}\in \mathcal {I}\). Then,
This inequality implies the desired result (6). Thus, we complete the proof for this case.
Case 2: \(l<\mathrm{rank}((U_{2}^*)^{\mathbb {T}}\widehat{G}V_{2}^*)\). Let k be the smallest positive integer such that \(kl\ge \min (n_1 - r^*,n_2 - r^*)\). Clearly, \(k\ge 2\). Let \(l_{i}\) and \(\widetilde{l}_{i}\) for \(i=1,2,\ldots ,k\) be such that
For each \(2\le i\le k\), we define the subspace \(\mathcal {J}_i:=\big \{U_{2}^*P_iZ(V_{2}^*Q_{i})^{\mathbb {T}}\ |\ Z\in \mathbb {R}^{l_{i}\times \widetilde{l}_{i} }\big \}\), where \(P_i\in \mathbb {O}^{(n_1 - r^*)\times l_{i}}\) is the matrix consisting of the \(\left( \sum _{j=1}^{i-1}l_{j}+1\right) \)th column to the \(\left( \sum _{j=1}^{i}l_{j}\right) \)th column of P; and \(Q_i\in \mathbb {O}^{(n_1 - r^*)\times \widetilde{l}_{i}}\) is the matrix consisting of the \(\left( \sum _{j=1}^{i-1}\widetilde{l}_{j}+1\right) \)th column to the \(\left( \sum _{j=1}^{i}\widetilde{l}_{j}\right) \)th column of Q. Clearly, \(\mathcal {J}_1\perp \mathcal {J}_i\) for \(i\ge 2\). For each \(i\ge 1\), it is easy to calculate that
This, together with \(\mathcal {P}_{\mathcal {T}^{\perp }}(\widehat{G})=U_{2}^*(U_{2}^*)^{\mathbb {T}}\widehat{G}V_{2}^*(V_{2}^*)^{\mathbb {T}}\), implies that \( \mathcal {P}_{\mathcal {T}^{\perp }}(\widehat{G}) =\sum _{i=1}^k\mathcal {P}_{J_i}(\widehat{G}). \) Then, \(\langle \widehat{H},\mathcal {A}^*\mathcal {A}(\widehat{G})\rangle =\langle \widehat{H},\mathcal {A}^*\mathcal {A}\mathcal {P}_{\mathcal {I}}(\widehat{G})\rangle +{\textstyle {\sum _{i>1}\big \langle \widehat{H},\mathcal {A}^*\mathcal {A}\mathcal {P}_{J_i}(\widehat{G})\big \rangle }}\). Consequently, we have that
where the first inequality is using the definition of \(\pi \) by the fact that \(\widehat{H}\in \mathcal {I},\mathcal {P}_{\mathcal {J}_i}(\widehat{G})\in \mathcal {J}_i\) and \(\mathrm{rank}(\mathcal {P}_{\mathcal {J}_i}(\widehat{G}))\le l\), \(\mathcal {I}\perp \mathcal {J}_i\) for \(i>1\), and the second inequality is due to
since \(\Vert \mathcal {P}_{\mathcal {J}_{i+1}}(\widehat{G})\Vert \le l^{-1}\Vert \mathcal {P}_{\mathcal {J}_i}(\widehat{G})\Vert _*\). Combining (26) with (25), we get the inequality (6). Thus, we complete the proof.
1.2 The proof of Lemma 4.2
(a) Notice that \(\mathcal{P}_\mathcal{K}(\Delta L^k)=\Delta L^k - \mathcal{P}_{\mathcal {T}^\bot }(\Delta L^k) + \mathcal{P}_\mathcal{H}(\Delta L^k)\) implies \(L_q=q(\mathcal{P}_{\mathcal {T}^\bot }(\Delta L^k) - \mathcal{P}_\mathcal{H}(\Delta L^k)+ L^* ) +(1-q)L^k\). It suffices to argue that \(\Vert \mathcal{P}_{\mathcal {T}^\bot }(\Delta L^k) - \mathcal{P}_\mathcal{H}(\Delta L^k) + L^*\Vert \le \tau \). By the expression of \(\mathcal {P}_{\mathcal {T}^{\perp }}(\Delta L^k)\) and the SVD of \((U_2^*)^\mathbb {T}\Delta L^kV_2^*\), we have that
From the definition of the subspace \(\mathcal {H}\), we have \(\mathcal {P}_{\mathcal {H}}(Z)=U_{2}^*P_1P_1^{\mathbb {T}}(U_{2}^*)^{\mathbb {T}} ZV_2^*Q_1Q_1^\mathbb {T}(V_2^*)^\mathbb {T}\) for any \(Z\in \mathbb {R}^{n_1\times n_2}\). Together with the last equation, it follows that
where \(\sigma ^{\downarrow ,l}(Z)\) means the vector consisting of the first l components of \(\sigma (Z)\). Thus, we have that
where \(\sigma _{l+1}\ge \cdots \ge \sigma _n\) are the smallest \(n-l\) singular values of \((U_2^*)^{\mathbb {T}} \Delta L^k V_2^*\). Then, \(\mathcal{P}_{\mathcal {T}^\bot }(\Delta L^k) - \mathcal{P}_\mathcal{H}(\Delta L^k) + L^*\) has the SVD as
Note that \(\sigma _{l+1}\le \Vert \mathcal{P}_{\mathcal {T}^\bot }(\Delta L^k)\Vert =\Vert \mathcal{P}_{\mathcal {T}^\bot }(L^k)\Vert \le \tau \). The last equation implies that \(\Vert \mathcal{P}_{\mathcal {T}^\bot }(\Delta L^k) - \mathcal{P}_\mathcal{H}(\Delta L^k) + L^*\Vert \le \tau \). Thus, we show that \(\Vert L_q\Vert \le \tau \).
(b) The Eqs. (27) and (28) imply that \(\Vert \mathcal {P}_{\mathcal {T}^\bot }(\Delta L^k )- \mathcal {P}_{\mathcal {H}}(\Delta L^k)\Vert \le l^{-1}\Vert \mathcal {P}_{\mathcal {H}}(\Delta L^k)\Vert _*\) and \(\Vert \mathcal {P}_{\mathcal {T}^\bot }(\Delta L^k )- \mathcal {P}_{\mathcal {H}} (\Delta L^k)\Vert _* =\Vert \mathcal {P}_{\mathcal {T}^\bot }(\Delta L^k )\Vert _*-\Vert \mathcal {P}_{\mathcal {H}} (\Delta L^k)\Vert _*\). Then, we have that
where the second equality is using \(\mathcal {P}_{\mathcal {K}}(\Delta L^k)=\mathcal {P}_{\mathcal {T}}(\Delta L^k)+\mathcal {P}_{\mathcal {H}}(\Delta L^k)\), and the last inequality is using the fact that \(xy\le \frac{(x+y)^2}{4}\) for any \(x,y\in \mathbb {R}\). In addition,
Let \(\gamma ^{k-1}:=\min ( a_{k-1},b_{k-1})\). From the last two inequalities, we have
where the second inequality is using the fact that \(x^2+y^2 \le (x+y)^2\) for any \(x,y\in \mathbb {R}_+ \) and the fourth inequality is using Lemma 4.1. Notice that \(\Vert \Delta L^k\Vert _F^2=\Vert \mathcal {P}_{\mathcal {K}}(\Delta L^k)\Vert _F^2+\Vert \mathcal {P}_{\mathcal {K}^{\perp }}(\Delta L^k)\Vert _F^2\) and \(\Vert \Delta S^k\Vert _F^2=\Vert \mathcal{P}_\Gamma (\Delta S^k)\Vert _F^2+\Vert \mathcal{P}_{\Gamma ^c}(\Delta S^k)\Vert _F^2\). From inequality (30), we readily obtain the desired result. The proof is completed.
1.3 The proof of Theorem 4.1
Let the subspaces \(\mathcal {H}\) and \(\mathcal {K}\) and the index sets \(\Gamma \) and \(\Lambda \) be defined as in Lemma 4.2. Let \(L_q:=L^k - q \mathcal{P}_\mathcal{K}(\Delta L^k )\) and \(S_q:=S^k - q \mathcal{P}_{\Gamma }(\Delta S^k)\) for any \(q\in (0,1)\). From Lemma 4.2 (a), the point \((L_q,S_q)\) is feasible to the problem (13). Together with the optimality of \((L^k,S^k)\) to the problem (13), we have that
Then we get that
From the proof of Lemma 4.2 (a), we get that \( U_1^*(V_1^*)^\mathbb {T} +U_2^* P \mathrm{Diag}(e) Q^\mathbb {T}(V_{2}^*)^\mathbb {T}\) is a subdifferential of the nuclear norm at \(L^* + \mathcal {P}_{\mathcal {T}^\bot }(\Delta L^k) -\mathcal {P}_{\mathcal {H}}(\Delta L^k)\). Similarly, \( \mathrm{sgn}(S^*) + \mathrm{sgn}(\mathcal{P}_{\Omega ^c} (\Delta S^k))\) is a subdifferential of the \(\ell _1\) norm at \(T^{k-1}\circ (S^*+\mathcal{P}_{\Omega ^c}(\Delta S^k)-\mathcal{P}_\Lambda (\Delta S^k))\). Thus, it follows that
Note that
where the second equality is obtained by \(\mathcal {K}:=\mathcal {T}\oplus \mathcal {H}\) and (29). In addition, it holds
and
Combine with (31–34), we get that
where \(\delta _{k-1} := \Vert \mathcal{P}_\mathcal{T}(U_1^* (V_1^*)^\mathbb {T}-W^{k-1})\Vert _F\) and \(\overline{\delta }_{k-1} := \Vert \mathcal{P}_\Omega (T^{k-1})\Vert _F\). On the other hand, for any \(q\in (0,1)\), we have that
Combining (35) and (36), and taking the limit \(q\rightarrow 0^+\), we obtain that
Notice that \(\Vert \Delta L^k\Vert _\infty \le \Vert \Delta L^k\Vert \le 2\tau \). We have that
Similarly, we get that \( |\langle \Delta S^k, \mathcal{Q}(\mathcal{P}_\mathcal{K}(\Delta L^k))\rangle | \le 2\tau \Vert \mathcal {Q}\Vert _\infty \Vert \Delta S^k\Vert _1. \) Together with (37), it is immediate to obtain that
Using Lemma 2.3 with \(G=\Delta S^k\), \(H=\mathcal {P}_{\Gamma }(\Delta S^k)\), and \(\mathcal {J}_1=\mathcal {H}\), \(\mathcal {I}=\mathcal {K},\widehat{G}=\Delta L^{k}\), \(\widehat{H}=\mathcal {P}_{\mathcal {K}}(\Delta L^k)\) yields that \(\Vert \mathcal{P}_\Gamma (G-H)\Vert _F=0, \Vert \mathcal{P}_{\mathcal {I}}(\widehat{G}-\widehat{H})\Vert _F=0\) and
where the second inequality is using the eigenvalue concepts of a linear operator \(\mathcal{A}\) in Sect. 2, and the last inequality is using the fact that \(xy\le \frac{x^2/z+z y^2}{2}\) for any \(x,y,z\in \mathbb {R}_+ \). Let \( \beta ^k\equiv \max \big (\frac{\sqrt{ \vartheta _{+}^2(2r^*+l) \pi (2r^*+l,l)}}{\sqrt{l \vartheta _{-}(2r^*+l)}a_{k-1}},\frac{\sqrt{ \chi _+^2(s^*+t) \varpi (s^*+t,t)}}{\sqrt{t \chi _-(s^*+t)}b_{k-1}}\big ). \) From Lemma 4.1 and the definition of \(\widetilde{\gamma }:=\max \big (\frac{\vartheta _{+}^2(2r^*+l) \pi (2r^*+l,l)}{l\vartheta _{-}(2r^*+l)},\frac{ \chi _+^2(s^*+t) \varpi (s^*+t,t)}{t\chi _-(s^*+t)}\big )\),
where the last second inequality is due to \(\beta ^k\le \sqrt{\widetilde{\gamma }}/ \min (a_{k-1},b_{k-1})\) and the fact that \(\xi _{k-1}:=\widehat{a}_{k-1}/ \min ( a_{k-1} , b_{k-1}), \eta _{k-1}:=\widehat{b}_{k-1}/ \min ( a_{k-1}, b_{k-1})\), and the last inequality is using the fact that \(x+y\le \sqrt{2(x^2+y^2)}\) for any \(x,y\in \mathbb {R}\). From Lemma 4.1 and the definitions of \(\zeta _{k-1}:=\widehat{a}_{k-1}/b_{k-1}\) and \(\mu _{k-1}:=\widehat{b}_{k-1}/ b_{k-1}\), it follows that
Combining with inequalities (38)–(40) and noticing that \(x^2+y^2 \le (x+y)^2\) holds for any \(x,y\in \mathbb {R}_+\), we obtain that
Notice that combining Assumption 4.1 with Lemmas 2.1 and 2.2, we get \(\frac{\varpi (s^*+t,t)}{2t}\le c_2 \), \(\frac{\pi (2r^*+l,l)}{2l}\le c_1\) and then \(\widetilde{\gamma }\le 2\max \big (c_1 \frac{\vartheta _{+}^2(2r^*+l)}{\vartheta _{-}(2r^*+l)} , c_2 \frac{\chi _+^2(s^*+t)}{\chi _-(s^*+t)}\big ) \) which implies
where
and the assumptions of \(\xi _{k-1}^2\) and \(\eta _{k-1}^2\) are used. So, the coefficients on the left side in (41) are positive.
Then, by the definitions of \(\widetilde{a}_{k-1},\widetilde{b}_{k-1},\overline{a}_{k-1}\) and \(\overline{b}_{k-1}\) in (21), we rewrite (41) as
where \(\widetilde{a}_{k-1},\widetilde{b}_{k-1}>0\). It is not hard to obtain that
which implies
and
Substituting the bounds of \(\Vert \mathcal {P}_{\mathcal {K}}(\Delta L^k)\Vert _F\) and \(\Vert \mathcal {P}_{\Gamma }(\Delta S^k)\Vert _F\) into (18) gives that the desired result. The proof is then completed.
Rights and permissions
About this article
Cite this article
Han, L., Bi, S. Two-stage convex relaxation approach to low-rank and sparsity regularized least squares loss. J Glob Optim 70, 71–97 (2018). https://doi.org/10.1007/s10898-017-0573-2
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10898-017-0573-2