Abstract
The problem of reconstructing n-by-n structured matrix signal \(\text {X}=(\mathbf{x} _1,\ldots ,\mathbf{x} _n)\) via convex optimization is investigated, where each column \(\mathbf{x} _j\) is a vector of s-sparsity and all columns have the same \(l_1\)-norm value. In this paper, the convex programming problem was solved with noise-free or noisy measurements. The uniform sufficient conditions were established which are very close to necessary conditions and non-uniform conditions were also discussed. In addition, stronger conditions were investigated to guarantee the reconstructed signal’s support stability, sign stability and approximation-error robustness. Moreover, with the convex geometric approach in random measurement setting, one of the critical ingredients in this contribution is to estimate the related widths’ bounds in case of Gaussian and non-Gaussian distributions. These bounds were explicitly controlled by signal’s structural parameters r and s which determined matrix signal’s column-wise sparsity and \(l_1\)-column-flatness respectively. This paper provides a relatively complete theory on column-wise sparse and \(l_1\)-column-flat matrix signal reconstruction, as well as a heuristic foundation for dealing with more complicated high-order tensor signals in, e.g., statistical big data analysis and related data-intensive applications.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Foucart, S., Rauhut, H.: A Mathematical Introduction to Compressive Sensing. Birkhaeusser (2013)
Eldar, Y.C., Kutynoik, G. (eds.): Compressed Sensing: Theory and Applications. Cambridge University Press (2012)
Cohen, D., Eldar, Y.C.: Sub-Nyquist radar systems: temporal, spectral and spatial compression. IEEE Signal Process. Mag. 35(6), 35–57 (2018)
Davenport, M.A., Romberg, J.: An overview of low-rank matrix recovery from incomplete observations. arXiv:1601.06422 (2016)
Duarte, M.F., Baraniuk, R.G.: Kronecker compressive sensing. IEEE Trans. Image Process. 21(2), 494–504 (2012)
Dasarathy, G., Shah, P., Bhaskar, B.N., Nowak, R.: Sketching sparse matrices. arXiv:1303.6544 (2013)
Chandrasekaran, V., Recht, B., Parrio, P.A., Wilsky, A.S.: The convex geometry of linear inverse problems. Found. Comput. Math. 12, 805–849 (2012)
Tropp, J.A.: Convex recovery of a structured signal from independent random linear measurements. In: Pfander, G. (ed.) Sampling Theory: A Renaissance: Compressive Sampling and Other Developments. Birkhaeusser (2015)
Mendelson, S.: Learning without concentration. J. ACM 62, 3 (2014)
Mendelson, S., Pajor, A., Tomczak-Jaegermann, N.: Reconstruction and subgaussian operators in asymptotic geometric analysis. Geom. Func. Anal. 17(4), 1248–1282 (2007)
Van Lint, J.H., Wilson, R.M.: A Course in Combinatorics. Springer-Verlag (1995)
Vershynin, R.: High-Dimensional Probability - with Applications to Data Science. Oxford University Press (2015)
Ledoux, M., Talagrand, M.: Probability in Banach Space: Isopermetry and Processes. Springer, Heidelberg (1991). https://doi.org/10.1007/978-3-642-20212-4
Dai, W., Li, Y., Zou, J., Xiong, H., Zheng, Y.: Fully decomposable compressive sampling with optimization for multidimensional sparse representation. IEEE Trans. Signal Process. 66(3), 603–616 (2018)
Vatier, S., Peyre, G., Dadili, J.: Model consistency of partly smooth regularizers. IEEE Trans. Inform. Theory 64(3), 1725–1747 (2018)
Oymak, S., Tropp, J.A.: Universality laws for randomized dimension reduction with applications. Inf. Infer. 7, 337–386 (2017)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Appendices
Appendix A: Proofs of Theorems in Section 3
Proof
of Lemma 1. It’s easy to verify the set \(\{(\lambda {}_{1}\xi {}_{1},\ldots {}, \lambda {}_{n}\xi {}_{n}): \xi {}_{j}\text { in }\partial {}\vert {}{} \mathbf{x} _{j}\vert {}_{1} \text {and}\,\lambda {}_{j}\ge {}0\text { for all }{} \textit{j}, \lambda {}_{1}+\ldots {}+\lambda {}_{n}=1\text { and }\lambda {}_{j}=0\,\text {for}\,\textit{j}: \vert {}{} \mathbf{x} _{j}\vert {}_{1}<max_{k}\vert {}{} \mathbf{x} _{k}\vert {}_{1}\}\) is contained in \(\partial {}\vert {}\vert {}\vert {}\text {X}\vert {}\vert {}\vert {}_{1}\): since for any \(\text {M} \equiv {} (\lambda {}_{1}\xi {}_{1},\ldots {}, \lambda {}_{n}\xi {}_{n})\) in this set, we have
and \(\vert {}\vert {}\vert {}.\vert {}\vert {}\vert {}{1}\)’s conjugate norm \(\vert {}\vert {}\vert {}\mathrm {M}\vert {}\vert {}\vert {}_{1}^{*}=\sum _{j} \lambda _{i}\left| \xi _{i}\right| _{\infty } \le \sum _{j} \lambda _{i}=1\), as a result M is in \(\partial {}\vert {}\vert {}\vert {}\text {X}\vert {}\vert {}\vert {}_{1}\).
Now prove that any M in \(\partial {}\vert {}\vert {}\vert {}\text {X}\vert {}\vert {}\vert {}_{1}\) has the form specified as a member in the above set. Let \(\mathrm {M} \equiv \left( \varvec{\eta }_{1}, \ldots , \varvec{\eta }_{n}\right) \), \(\vert {}\vert {}\vert {}\text {Y}\vert {}\vert {}\vert {}_{1}\ge {} \vert {}\vert {}\vert {}\text {X}\vert {}\vert {}\vert {}_{1} + {<}\text {Y}-\text {X,M}{>}\) for all \(\text {Y} \equiv {} (\mathbf{y} _{1},\ldots {}, \mathbf{y} _{n})\) implies:
Let \(\varvec{\eta }_{j}=\left| \varvec{\eta }_{j}\right| _{\infty } \xi _{j}\left( \mathrm {so}\left| \xi _{j}\right| _{\infty }=1 \text{ if } \varvec{\eta }_{j} \ne 0\right) \), then \(max _{j}\left| \mathbf {y}_{j}\right| _{1} \ge max _{j}\left| \mathbf {x}_{j}\right| _{1}+\sum _{j}\left| \varvec{\eta }_{j}\right| _{\infty } {<}\mathbf {y}_{j}-\mathbf {x}_{j}, \xi _{j}{>}.\) For each \(j: \varvec{\eta }_{i} \ne 0\) we can select a \(i_{j}\) such that \(|\xi _j(i_j)|=1\) and let \(e^{*}_{j}\) be such a vector with component \(e_{j}^{*}(i_j)={\text {sgn}} \xi _{j}(i_j)\) and \(e^{*}_{j}(i)=0\) for all \(i \ne i_j,\) then for \(\mathbf {y}_{j}=\mathbf {x}_{j}+e_{j}^{*}\), \(j=1, \ldots , n\), (37) implies
As a result \(1 \ge \sum _{j}\left| \varvec{\eta }_{j}\right| _{\infty }\).
Furthermore for any given i, let \(\mathbf {y}_{j}=\mathbf {x}_{j}\) for all \(j \ne i\) and \(\mathbf {y}_{i}\) be any vector satisfying \(\left| \mathbf {y}_{i}\right| _1 \le |\mathbf {x}_i|_1\), then substitute these \(\mathbf {y}_{1}, \ldots , \mathbf {y}_{n}\) into (37) we obtain
i.e., \(<\mathbf {y}_{i}-\mathbf {x}_{i}, \xi _{j}>\,\le \,0.\) As a result \(<\mathbf {x}_{i}, \xi _{j}\,\,>\,\,\ge \,\,<\mathbf {y}_{i}, \xi _{j}>\) for all \(\mathbf {y}_{i}:\left| \mathbf {y}_{i}\right| _{1}\,\le \,\left| \mathbf {x}_{i}\right| _{1}\) so \(<\mathbf {x}_{i}, \xi _{j}>\,\ge \,\left| \mathbf {x}_{i}\right| _{1}\left| \xi _{i}\right| _{\infty }=\left| \mathbf {x}_{i}\right| _{1},\) hence finally we get \(<\mathbf {x}_{i}, \xi _{j}>\,=\left| \mathbf {x}_{i}\right| _{1}\). This (together with \(\left| \xi _{i}\right| _{\infty }=1\)) implies \(\xi _{i}\) in \(\partial \left| \mathbf {x}_{i}\right| _{1}\) if \(\varvec{\eta }_{i} \ne 0,\) for any \(i=1, \ldots , n\).
In summary, we have so far proved that for any \(\mathrm {M}\) in \(\partial \vert {}\vert {}\vert {}\mathrm {X}\vert {}\vert {}\vert {}_{1}, \mathrm {M}\) always has the form \(\left( \lambda _{1} \xi _{1}, \ldots , \lambda _{n} \xi _{n}\right) \) where \(\xi _{j}\) in \(\partial \left| \mathbf {x}_{j}\right| _{1}, \lambda _{j} \ge 0\) for all j and \(\lambda _{1}+\ldots +\lambda _{n} \le 1\) since \(\vert {}\vert {}\vert {}\mathrm {X}\vert {}\vert {}\vert {}_{1}=\,<\mathrm {M}, \mathrm {X}>\,=\sum _{j} \lambda _{j}<\xi _{j}, \mathbf {x}_{j}>\,=\sum _{j} \lambda _{j}\left| \mathbf {x}_{j}\right| _{1} \le max _{j}\left| \mathbf {x}_{j}\right| _{1} \sum _{j} \lambda _{j} \le \vert {}\vert {}\vert {}\mathrm {X}\vert {}\vert {}\vert {}_{1},\) as a result \(\lambda _{1}+\ldots +\lambda _{n}=1\) and \(\lambda _{j}=0\) for \(j:\left| \mathbf {x}_{j}\right| _{1}<m a x_{k}\left| \mathbf {x}_{k}\right| _{1}\).
\(\square \)
Proof
of Theorem 1. To prove the necessity, let \(\mathrm {S}\) be a s -sparsity pattern and \(\mathrm {H} \in \mathrm {ker}\,\,\varPhi \backslash \{\mathrm {O}\}\) Set \(\mathbf {y} \equiv \varPhi \left( \mathrm {H}_{\mathrm {S}}\right) =\varPhi \left( -\mathrm {H}_{\sim \mathrm {S}}\right) \) and \(\mathrm {H}_{\mathrm {S}} \in \sum _{s}^{n \times n}\), Hs should be the unique minimizer of \(\text {MP}_{y, \varPhi , 0}\) with \(-H_{\sim s}\) as its feasible solution, hence \(\vert {}\vert {}\vert {}H_{S}\vert {}\vert {}\vert {}_{1}<\vert {}\vert {}\vert {}H_{\sim S}\vert {}\vert {}\vert {}_{1}\).
Now prove the sufficiency. Let \(X=\left( \mathbf {x}_{1}, \dots , \mathbf {x}_{n}\right) \) be a matrix signal with its support \(\mathrm {S}=\mathrm {S}_{1} \cup \ldots \cup \,\mathrm {S}_{n}\) as a s -sparsity pattern (where \(\left. \mathrm {S}_{j}={\text {supp}}\left( \mathbf {x}_{j}\right) \right) \) and let \(\mathbf {y}=\varPhi (\mathrm {X})\) For any feasible solution \(Z(\ne \mathrm {X})\) of \(\mathrm {MP}_{\mathrm {y}, \mathrm {\Phi }, 0},\) obviously there exists \(\mathrm {H}=\left( \textit{\textbf{h}}_{1}, \ldots , \textit{\textbf{h}}_{n}\right) \) in ker \(\varPhi \backslash \{\mathrm {O}\}\) such that \(Z=\mathrm {X}+\mathrm {H}\). since \(\partial \vert {}\vert {}\vert {}Z\vert {}\vert {}\vert {}_{1} \ge \partial \vert {}\vert {}\vert {}X\vert {}\vert {}\vert {}_{1}+<H, M>\) for any \(\mathrm {M}\) in \(\partial \vert {}\vert {}\vert {}\mathrm {X}\vert {}\vert {}\vert {}_{1},\) we have
\(\partial {}\vert {}\vert {}\vert {}\text {Z}\vert {}\vert {}\vert {}_{1 }- \partial {}\vert {}\vert {}\vert {}\text {X}\vert {}\vert {}\vert {}_{1} \ge {}\text { sup}\{<\text {H,M}> :\text { for any M in } \partial {}\vert {}\vert {}\vert {}\text {X}\vert {}\vert {}\vert {}_{1} \}\)
\(=\,\sup \left\{ <\mathrm {H}, \mathrm {M}>: \mathrm {M}=\mathrm {E}+\mathrm {V} \text{ where } \mathrm {E}=\left( \lambda _{1} \text {sgn}\left( \mathbf {x}_{1}\right) \ldots , \lambda _{n} \text {sgn}\left( \mathbf {x}_{n}\right) \right) \text{ and } \mathrm {V}\right. \)
\(=\left( \lambda _{1} \xi _{1}, \ldots , \lambda _{n} \xi _{n}\right) ,|\xi _j|_{\infty } \le 1\), \(\left. \lambda _{j} \ge 0 \text{ for } \text{ all } j, \lambda _{1}+\ldots +\lambda _{n}=1\right\} \) (by Lemma 1 and notice supp \(\left. \left( \text {sgn}\left( \mathbf {x}_{\mathrm {j}}\right) \right) =\mathrm {S}_{j}=\,{\sim }\, \text {supp}\left( \xi _{j}\right) \right) \)
\(\ge \sup \{-|<\mathrm {H}, \mathrm {E}>|+<\mathrm {H}, \mathrm {V}> : \mathrm {E} \text{ and } \mathrm {V} \text{ specified } \text{ as } \text{ the } \text{ above }\}\)
\(=\sup \left\{ -| \sum _{j=1}^{n} \lambda _{j}<\textit{\textbf{h}}_{j| S j}\right. \), \(\text {sgn}\left( \mathbf {x}_{j}\right)>|+\sum _{j=1}^{n} \lambda _{j}{<}\textit{\textbf{h}}_{j|\sim S j}, \xi _j{>}:\,\lambda _{j} \text{ and } \left. \xi _{j} \text{ specified } \text{ as } \text{ the } \text{ above } \right\} \)
\(\ge -\sup \left| \sum _{j=1}^{n} \lambda _{j}<\textit{\textbf{h}}_{j| S j}, \text {sgn}\left( \mathbf {x}_{j}\right) >\right| \): \(\lambda _{j} \ge 0 \text{ for } \text{ all } j, \lambda _{1}+\ldots .+\lambda _{n}=1\} +\sup \left\{ <\mathrm {H}_{\sim \mathrm {S}}, \mathrm {V}>:\vert {}\vert {}\vert {}\mathrm {V}\vert {}\vert {}\vert {}_{1}^{*} \le 1\right\} \)
(note that \(\vert {}\vert {}\vert {}\mathrm {V}\vert {}\vert {}\vert {}_{1}^{*}=\sum _{j}\left| \lambda _{j} \xi _{j}\right| _{\infty } \le \sum _{j}\left| \xi _{j}\right| _{\infty }=1\) where \(\vert {}\vert {}\vert {}\cdot \vert {}\vert {}\vert {}_{1}^{*}\) is \(\vert {}\vert {}\vert {}\cdot \vert {}\vert {}\vert {}_{1}\) ’s conjugate norm)
\(\left. =-\sup \left| \sum _{j=1}^{n} \lambda _{j}<\textit{\textbf{h}}_{j| S j}, \text {sgn}\left( \mathbf {x}_{j}\right) >\right| : \lambda _{j} \ge 0 \text{ for } \text{ all } j, \lambda _{1}+\ldots +\lambda _{n}=1\right\} +\vert {}\vert {}\vert {}H_{\sim S}\vert {}\vert {}\vert {}_{1}\)
\(=-max _{j}\left| <\textit{\textbf{h}}_{j| S j}, \text {sgn}\left( \mathbf {x}_{j}\right) >\right| +\vert {}\vert {}\vert {}H_{\sim S}\vert {}\vert {}\vert {}_{1}\)
under the condition (3.3). As a result, X is the unique minimizer of \(\mathrm {MP}_{\mathrm {y}, \mathrm {\Phi }, 0}\) \(\square \)
The proof of Theorem 2 follows almost the same logic of proving \(\textit{l}_{1}\)-min reconstruction’s stability for vector signals under the \(\textit{l}_{1}\) Null Space Property assumption (e.g., see sec. 4.2 in [1]). For presentation completeness we provide the simple proof here. The basic tool is an auxiliary inequality (which unfortunately does not hold for matrix norm \(\vert {}\vert {}\vert {}.\vert {}\vert {}\vert {}_{1}\)): given index subset \(\varDelta {}\) and any vector x, z, then\(^{[1]}\)
Proof
of Theorem 2. For any feasible solution \(\text {Z}=(\mathbf{z} _{1},\ldots {},\mathbf{z} _{n})\) to problem \(\text {MP}_{y, \varPhi {}, 0}\) where \(\mathbf{y} =\varPhi {}\text {(X)}\), there is \(\text {H} = (\mathbf{h} _{1},\ldots {},\mathbf{h} _{n})\) in ker \(\varPhi {}\) such that Z = H + X. Apply (38) to each column vector \(\mathbf{z} _{j}\) and \(\mathbf{x} _{j}\) we get
Hence \(\vert {}\vert {}\vert {}\mathrm {H}_{\sim \mathrm {S}}\vert {}\vert {}\vert {}_{1} \equiv m a x_{j}\left| \mathbf {h}_{j|\sim \mathrm {S} j}\right| _{1} \le \max _{j}\left( \left| \mathbf {z}_{j}\right| _{1}-\left| \mathbf {x}_{j}\right| _{1}\right) +\vert {}\vert {}\vert {}\mathbf {H}_{\mathrm {S}}\vert {}\vert {}\vert {}_{1}+2 m a x_{j}\left| \mathbf {x}_{j|\sim S{j}}\right| \le \max _{j}\left( \left| \mathbf {z}_{j}\right| _{1}-\left| \mathbf {x}_{j}\right| _{1}\right) +\rho |||H_{\sim S}|||_1+2\max _j|\mathbf {x}_{j|\sim Sj}|_1\)( by (10)), namely:
As a result \(\vert {}\vert {}\vert {}\mathrm {H}\vert {}\vert {}\vert {}_{1}=\vert {}\vert {}\vert {}\mathrm {H}_{\mathrm {S}}\vert {}\vert {}\vert {}_{1}+\vert {}\vert {}\vert {}\mathrm {H}_{\sim \mathrm {S}}\vert {}\vert {}\vert {}_{1}\le (1+\rho )\vert {}\vert {}\vert {}{H}_{\sim \mathrm {S}}\vert {}\vert {}\vert {}_{1} \le (1-\rho )^{-1}(1+\rho )\left( 2 m a x_{j}\left| \mathbf {x}_{j|\sim S j}\right| _{1}+\max _{j}\left( \left| \mathbf {z}_{j}\right| _{1}-\left| \mathbf {x}_{j}\right| _{1}\right) \right) \) for any s-sparsity pattern S, which implies (11) since \(min _{\mathrm {S}}\ max _{j}\left| \mathbf {x}_{j|\sim {S} j}\right| _{1}=max _{j} \sigma _{s}\left( \mathbf {x}_{j}\right) _{1}\).
In particular, if Z is minimizer \(\text {X}^{*}\) and X is \(\textit{l}_{1}\)-column-flat then \(\vert {}{} \mathbf{x} _{j}\vert {}_{1}=\vert {}\vert {}\vert {}\text {X}\vert {}\vert {}\vert {}_{1}\) for any j so \(max_{j} (\vert {}{} \mathbf{x} ^{*}_{j}\vert {}_{1} - \vert {}{} \mathbf{x} _{j}\vert {}_{1}) = \vert {}\vert {}\vert {}\text {X}^{*}\vert {}\vert {}\vert {}_{1} - \vert {}\vert {}\vert {}\text {X}\vert {}\vert {}\vert {}_{1 }\le {}\) 0 for minimizer \(\text {X}^{*}\), which implies the conclusion. \(\square \)
Remark: For any flat and sparse signal X, condition (10) guarantees X can be uniquely reconstructed by solving \(\text {MP}_{y, \varPhi {}, 0}\) due to Theorem 1, while in this case the right hand side of (12) is zero, i.e., this theorem is consisted with the former one. In addition, (12) indicates that the error for the minimizer \(\text {X}^{*}\) to approximate the flat but non-sparse signal X is controlled column-wisely by X’s non-sparsity (measured by \(max_{j }\sigma {}_{s}(\mathbf{x} _{j})_{1}\)).
Proof
of Theorem 3. Consider the problem \(\text {MP}_{y, \varPhi {}, {\eta {}}}\): inf \(\vert {}\vert {}\vert {}\text {Z}\vert {}\vert {}\vert {}_{1}\) s.t. \(\text {Z}\ \in {}R^{n\times {}n}\), \(\vert {}{} \mathbf{y} -\varPhi {}_{S}\text {(Z)}\vert {}_{2 }\le {} \eta {}\) at first where \(\eta {}> 0\). For any minimizer \(\text {X}^{*}\) of this problem with both its objective \(\vert {}\vert {}\vert {}.\vert {}\vert {}\vert {}_{1}\) and constraint function \(\vert {}{} \mathbf{y} -\varPhi {}_{S}(.)\vert {}_{2 }\) convex, according to the general convex optimization theory, there exist a positive multiplier \(\gamma {}^{*}> 0\) and \(\text {M}^{*}\) in \(\partial {}\vert {}\vert {}\vert {}\text {X}^{*}\vert {}\vert {}\vert {}_{1}\) such that
then \(\text {M}^{*}= \gamma {}^{*}\varPhi {}_{S}^{T}(\mathbf{y} -\varPhi {}_{S}(\text {X}^{*}))\) can not have any zero column since \(\mathbf{y} -\varPhi {}_{S}(\text {X}^{*}) \not = \mathbf{0} \), which implies \(\vert {}{} \mathbf{x} ^{*}_{j}\vert {}_{1 }=max_{k}\vert {}{} \mathbf{x} _{k}^{*}\vert {}_{1}\) for every j according to Lemma 1.
Now consider the problem \(\text {MP}_{y, \varPhi {}, 0}\) : inf \(\vert {}\vert {}\vert {}\text {Z}\vert {}\vert {}\vert {}_{1}\) s.t. \(\text {Z}\in {}R^{n\times {}n}\), \(\mathbf{y} = \varPhi {}_{S}\text {(Z)}\). For its minimizer \(\text {X}^{*}\) there is a multiplier vector u such that \(\text {M}^{* }+\varPhi {}_{S}^{T}(\mathbf{u} ) = \text {O}\). If \(\mathbf{u} \not = \mathbf{0} \) then \(M^{*}\) doesn’t have any zero column which implies \(\vert {}{} \mathbf{x} _{j}\vert {}_{1}=max_{k}\vert {}{} \mathbf{x} _{k}\vert {}_{1}\) for every j according to Lemma 1. On the other hand, u = 0 implies \(\text {M}^{*} = \text {O}\) which cannot happen according to Lemma 1 unless \(\text {X}^{* }= \text {O}\). \(\square \)
Proof
of Theorem 4. For any feasible solution \(\text {Z}=(\mathbf{z} _{1},\ldots {},\mathbf{z} _{n})\) to problem \(\text {MP}_{y, \varPhi {}, {\eta {} }}\) where \(\mathbf{y} =\varPhi {}\text {(X)} +\textit{e}\), Let \(\text {Z} - \text {X} = \text {H} = (\mathbf{h} _{1},\ldots {},\mathbf{h} _{n})\), apply (38) to each column vector \(\mathbf{z} _{j}\) and \(\mathbf{x} _{j}\) we get \(\left| \mathbf {h}_{j|\sim S j}\right| _{1} \le \left| \mathbf {z}_{j}\right| _{1}-\left| \mathbf {x}_{j}\right| _{1}+\left. \left| \mathbf {h}_{j|Sj}|_1+2\right| \mathbf {x}_{j|\sim S j}\right| _{1}\) Hence \(\vert {}\vert {}\vert {}\mathrm {H}_{\sim \mathrm {S}}\vert {}\vert {}\vert {}_{1} \equiv max _{j}\left| \mathbf {h}_{j|\sim \mathrm {S} j}\right| _{1} \le max _{j}\left( \left| \mathbf {z}_{j}\right| _{1}-\left| \mathbf {x}_{j}\right| _{1}\right) +\vert {}\vert {}\vert {}\mathbf {H}_{\mathrm {S}}\vert {}\vert {}\vert {}_{1}+2 m a x_{j}\left| \mathbf {x}_{j|\sim S{j}}\right| \le m a x_{j}(\left| \mathbf {z}_{j}\right| _{1} -\left| \mathbf {x}_{j}\right| _{1}) +\rho |||\mathrm {H}_{\sim \mathrm {S}}|||_{1} +2 max _{j}\left| \mathbf {x}_{j|\sim {S} j}\right| _{1}+\beta |\varPhi (\mathrm {H})|_{2}\) (by (14)), namely:
As a result \(\vert {}\vert {}\vert {}\text {H}\vert {}\vert {}\vert {}_{1 }= \vert {}\vert {}\vert {}\text {H}_{S}\vert {}\vert {}\vert {}_{1 }+ \vert {}\vert {}\vert {}\text {H}_{\sim S}\vert {}\vert {}\vert {}_{1 }\le {}(1+{\rho {}})\vert {}\vert {}\vert {}\text {H}_{\sim {}S}\vert {}\vert {}\vert {}_{1 }+ {\beta {}\vert {}}\varPhi {}\text {(H)}\vert {}_{2} \le {} (1-\rho {})^{-1}(1+\rho {})(\left( 2 m a x_{j}\left| \mathbf {x}_{j|\sim S j}\right| _{1}+m a x_{j}\left( \left| \mathbf {z}_{j}\right| _{1}-\left| \mathbf {x}_{j}\right| _{1}\right) \right) + 2(1-\rho {})^{-1}\beta {}\vert {}\varPhi {}\text {(X)}\vert {}_{2 })\) for any s-sparsity pattern S, which implies (15) since \(\textit{min}_{S} {max _{j }}\vert {} \mathbf{x} _{j|\sim Sj}\vert {}_{1} = max_{j }\sigma {}_{s}(\mathbf{x} _{j})_{1}\).
In particular, if Z is a minimizer \(\text {X}^{*}\) and X is \(\textit{l}_{1}\)-column-flat then \(\vert {}{} \mathbf{x} _{j}\vert {}_{1}=\vert {}\vert {}\vert {}\text {X}\vert {}\vert {}\vert {}_{1}\) for any j so \(max_{j}(\vert {}{} \mathbf{x} ^{*}_{j}\vert {}_{1}- \vert {}{} \mathbf{x} _{j}\vert {}_{1}) = \vert {}\vert {}\vert {}\text {X}^{*}\vert {}\vert {}\vert {}_{1} - \vert {}\vert {}\vert {}\text {X}\vert {}\vert {}\vert {}_{1 }\le {}\) 0 for minimizer X\(^{*}\), which implies (16). \(\square \)
Proof
of Theorem 5. Let \(\text {H} = (\mathbf{h} _{1},\ldots {},\mathbf{h} _{n})\) be any n-by-n matrix. For each j suppose \(\vert {}h_{j}(\textit{i}_{1})\vert {}\ge {}\vert {}h_{j}(\textit{i}_{2})\vert {}\ge {} \ldots {}\ge {}\vert {}h_{j}(i_{n})\vert {}\), let \(\text {S}_{0}(\textit{j}) = \{(\textit{i}_{1}, \textit{j}),\ldots {},(i_{s}, \textit{j})\}\), i.e., the set of indices of s components in column \(\mathbf{h} _{j}\) with the largest absolute values, \(\text {S}_{1}(\textit{j}) = \{(\textit{i}_{1+\textit{s}}, \textit{j}),\ldots {},(i_{2s}, \textit{j})\}\) be the set of indices of s components in \(\mathbf{h} _{j}\) with the secondary largest absolute values, etc., and for any \(\textit{k}=0,1,2,\ldots {}\) let \(\text {S}_{k} = {\cup {}}_{j=1}^n\text {S}_{k}(\textit{j})\), obviously \(\text {H} = \sum {}_{\textit{k}\ge {}0}\) \(\text {H}_{Sk}\). At first we note that (20) holds for S as long as it holds for S\(_{0}\), so we try to prove this in the following. Start from condition (1):
\((1- \delta {}_{s })\vert {}\text {H}_{S0}\vert {}_{F}{^{2}}\le {} \vert {}\varPhi {}(\text {H}_{S0})\vert {}_{2}{^{2}} = <\varPhi {}(\text {H}_{S0})\), \(\varPhi {}\text {(H)} - \sum {}_{\textit{k}\ge {}1}\varPhi {}(\text {H}_{S\textit{k}})>\)
\(= <\varPhi {}(\text {H}_{S0})\), \(\varPhi {}\text {(H)}> - \sum {}_{\textit{k}\ge {}1}<\varPhi {}(\text {H}_{S0})\), \(\varPhi {}(\text {H}_{S\textit{k}})>\)
\(\le \left| \varPhi \left( \mathrm {H}_{\mathrm {S} 0}\right) \right| _{2}|\varPhi (\mathrm {H})|_{2}+\left( \varDelta _{\mathrm {s}}/n\right) \sum _{n \ge j \ge 1} \sum _{k \ge 1}\left| \mathbf {h}_{j| S 0(j)}\right| _{2}\left| \mathbf {h}_{j| S k(j)}\right| _{2}\) (by condition (2))
\(\le \left( 1+\delta _{s}\right) ^{1/2}\left| \mathrm {H}_{\mathrm {S} 0}\right| _{F}|\varPhi (\mathrm {H})|_{2}+\left( \varDelta _{\mathrm {s}}/n\right) \left| \mathrm {H}_{\mathrm {S} 0}\right| _{F} \sum _{n \ge j \ge 1} \sum _{k \ge 1}\left| \mathbf {h}_{j| {S} k(j)}\right| _{2}\) (by condition \(\left. (1) \text{ and } \left| \mathbf {h}_{j| S 0(j)}\right| _{2} \le \left| \mathbf {H}_{\mathrm {S} 0}\right| _{F}\right) \)
\(\le \left( 1+\delta _{s}\right) ^{1/2}\left| \mathrm {H}_{\mathrm {S} 0}\right| _{F}|\varPhi (\mathrm {H})|_{2}+\left( \varDelta _{\mathrm {s}}/n\right) \left| \mathrm {H}_{\mathrm {S} 0}\right| _{F} \sum _{n \ge j \ge 1}\left( s^{-1/2}\left| \mathbf {h}_{j|\sim {S} 0(j)}\right| _{1}\right. \) + (1/4) \(\left. | \mathbf {h}_{j|S 0(j)| 2}\right) \) (by the inequality \(\left( \sum _{s \ge k \ge 1} a_{k}^{2}\right) ^{1/2} \le s^{-1/2} \sum _{s\ge k \ge 1} a_{k}+\left( s^{1/2}/4\right) \left( a_{1}-a_{s}\right) \) for \(a_{1} \ge a_{2} \ge \ldots \ge a_{s} \ge 0\) and the fact \(\min _{s \ge i \ge 1}\left| \textit{\textbf{h}}_{j| S k(j)}(i)\right| \ge \max _{s \ge i \ge 1}\left| \textit{\textbf{h}}_{j| S k+1(j)}(i)\right| \left. \text{ for } \text{ any } j\right) \)
Cancel \(\vert {}\text {H}_{S0}\vert {}_{F}\) on both sides we get \((1- \delta {}_{s })\vert {}\text {H}_{S0}\vert {}_{F} \le {} (1+ \delta {}_{s })^{1/2} \vert {}\varPhi {}(\text {H})\vert {}_{2}^{ } + \textit{s}^{-1/2}\varDelta {}_{s}\vert {}\vert {}\vert {}\text {H}_{\sim {}S0 }\vert {}\vert {}\vert {}_{1} + (\varDelta {}_{s}/4\textit{n}^{1/2}) \vert {}\text {H}_{S0}\vert {}_{F}\) hence
\(\vert {}\text {H}_{S0}\vert {}_{F} \le {} (1- \delta {}_{s }-\varDelta {}_{s}/4\textit{n}^{1/2})^{-1}((1+ \delta {}_{s })^{1/2} \vert {}\varPhi {}\text {(H)}\vert {}_{2}^{ }+ \textit{s}^{-1/2}\varDelta {}_{s}\vert {}\vert {}\vert {}\text {H}_{\sim {}S0 }\vert {}\vert {}\vert {}_{1})\)
Note that \(\vert {}\vert {}\vert {}\text {H}_{S0}\vert {}\vert {}\vert {}_{1} = {max_{ j }} \vert {} \mathbf{h} _{j|S0(j)}\vert {}_{1 }\le {} \textit{s}^{1/2}{max_{ j }}\vert {} \mathbf{h} _{j|S0({j})}\vert {}_{2 }\le {}{} \textit{s}^{1/2}\vert {}\text {H}_{S0}\vert {}_{F }\) and combine this with the above inequality, we obtain (20) and (21) for S\(_{0}\), which implies they hold for any S. \(\square \)
Appendix B: Proofs of Theorems in Section 4
Proof
of Lemma 2. (1) Observe that when \(\varPhi {}_{S}^{T}\varPhi {}_{S }\)is a bijection, (23)’s objective function \(\textit{L}_{S}\text {(Z)} = \vert {}\vert {}\vert {}\text {Z}\vert {}\vert {}\vert {}_{1 }+ (1/2)\gamma {} \vert {}{} \mathbf{y} - \varPhi {}_{S}\text {(Z)}\vert {}_{2}^{2}\) is strictly convex for variable \(\text {Z} \in {}\sum _s^{n\times n}(S)\). According to general convex programming theory, its minimizer \(\text {X}^{*}_{S}\) is unique.
(2) Let \(\textit{L}(Z) := \vert {}\vert {}\vert {}\text {Z}\vert {}\vert {}\vert {}_{1 }+ (1/2)\gamma {}\vert {}{} \mathbf{y} - \varPhi {}\text {(Z)}\vert {}_{2}^{2}\). To prove \(\text {X}^{*}_{S}\) is also the global minimizer of (22), we prove its perturbation by H will always increase the objective’s value, i.e., \(\textit{L}(\text {X}^{*}_{S}+\text {H}) > \textit{L}(\text {X}^{*}_{S}\)) under the conditions specified by (1) (2) (3). Since conclusion (1) implies \(\textit{L}(\text {X}^{*}_{S}+\text {H}) > \textit{L}(\text {X}^{*}_{S})\) for any \(\text {H}\not =\text {O}\) with support in S and L(Z) is convex, we only need to consider the perturbation \(\text {X}^{*}_{S}+\text {H}\) with \(\text {H}_{S} = \text {O}\).
Since \(\text {X}^{*}_{S}\) is the minimizer of (23), by first-order optimization condition there exists \(\text {M}^{*}\) in \(\partial {}\vert {}\vert {}\vert {}\text {X}^{*}_{S}\vert {}\vert {}\vert {}_{1 }\) such that
then \(\text {M}^{*} = \gamma {}^{*}\varPhi {}_{S}^{T}(\mathbf{y} - \varPhi {}_{S}(\text {X}^{*}_{S}))\) and in particular \(\text {M}^{*}_{\sim {}S }= \text {O}\). Equivalently:
Now we compute \(\textit{ L}(\text {X}^{*}_{S}+\text {H}) - \textit{L}(\text {X}^{*}_{S})\)
\(= \vert {}\vert {}\vert {} \text {X}^{*}_{S}+\text {H} \vert {}\vert {}\vert {}_{1 }- \vert {}\vert {}\vert {}\text {X}^{*}_{S}\vert {}\vert {}\vert {}_{1 }+ (1/2) \gamma {}(\vert {} \varPhi {}(\text {X}^{*}_{S}) - \mathbf{y} \vert {}_{2}^{2}+ 2<\varPhi {}(\text {X}^{*}_{S})- \mathbf{y} , \varPhi {}\text {(H)}>+ \vert {}\varPhi {}\text {(H)}\vert {}_{2}^{2} - \vert {}\varPhi {}(\text {X}^{*}_{S})- \mathbf{y} \vert {}_{2}^{2})\)
\(= \vert {}\vert {}\vert {} \text {X}^{*}_{S}+\text {H} \vert {}\vert {}\vert {}_{1 }- \vert {}\vert {}\vert {}\text {X}^{*}_{S}\vert {}\vert {}\vert {}_{1 }+ \gamma {}< \varPhi {}(\text {X}^{*}_{S}) - \mathbf{y} , \varPhi {}\text {(H)} > + (1/2)\gamma {}\vert {}\varPhi {}\text {(H)}\vert {}_{2}^{2 }\)
\(= \vert {}\vert {}\vert {}\text {X}^{*}_{S}+\text {H} \vert {}\vert {}\vert {}_{1 }- \vert {}\vert {}\vert {}\text {X}^{*}_{S}\vert {}\vert {}\vert {}_{1 }+ \gamma {}<\varPhi {}(\text {X}^{*}_{S}) - \mathbf{y} , \varPhi {}_{\sim {}S}\text {(H)} > + (1/2)\gamma {} \vert {}\varPhi {}_{\sim {}S}\text {(H)}\vert {}_{2}^{2 }\)
\(\ge {} \vert {}\vert {}\vert {}\text {X}^{*}_{S}+\text {H} \vert {}\vert {}\vert {}_{1 }-\vert {}\vert {}\vert {}\text {X}^{*}_{S}\vert {}\vert {}\vert {}_{1 }+\gamma {}<\varPhi {}(\text {X}^{*}_{S}) - \mathbf{y} ,\varPhi {}_{\sim {}S}\text {(H)}>\)
The first term \(\vert {}\vert {}\vert {} \text {X}^{*}_{S}+\text {H} \vert {}\vert {}\vert {}_{1 }- \vert {}\vert {}\vert {} \text {X}^{*}_{S} \vert {}\vert {}\vert {}_{1 }\)
\(=max_{j}( \vert {}{} \mathbf{x} ^{*}_{j}\vert {}_{1} + \vert {}{} \mathbf{h} _{j}\vert {}_{1} ) - max_{j} \vert {}{} \mathbf{x} ^{*}_{j}\vert {}_{1} (\text {supp(X}^{*}_{S})\cap {} \text {supp(H)} = \varnothing {})\)
\(= \vert {}\vert {}\vert {}\text {X}^{*}_{S}\vert {}\vert {}\vert {}_{1 }+ \vert {}\vert {}\vert {} \text {H} \vert {}\vert {}\vert {}_{1 }- \vert {}\vert {}\vert {}\text {X}^{*}_{S} \vert {}\vert {}\vert {}_{1}\) (condition (1) implies \(\text {X}^{*}_{S}\)’s \(\textit{l}_{1}\)-column-flatness: remark after Theorem 3)
\(= \vert {}\vert {}\vert {}\) H\(^{ }\vert {}\vert {}\vert {}_{1 }\)
By replacing \(X^{*}_{S}\) with (41), note \(\text {supp}(\varPhi {}_{\sim {}S}^{T}) = \sim \text {S}\) and \(\vert {}\vert {}\vert {}\text {M}\vert {}\vert {}\vert {}_{1}^{* }\le {} 1\), the second term
Therefore
and condition (3) implies the right hand side > 0. This proves X\(^{*}_{S}\) is the minimizer of (22) and the minimizer is unique.
(3) For Y\(^{*}\) = \(\varPhi {}_{S}^{*-1}\)(y) \(\in {}\sum _s^{n\times {}n}(S)\)(then supp(Y\(^{*}\)) in S) and by (41) we have
(4) Note that for any non-zero scalars u and \(\textit{v},\textit{sgn}(\textit{u}) = \textit{sgn}(\textit{v})~\textit{iff}\,\,\vert {}{} \textit{u}\vert {} > \vert {}{} \textit{u}-\textit{v} \vert {}\). Therefore
In particular, if \(\textit{min}_{(\textit{i,j}) in\, \textit{S }}\vert {}\text {Y}^{*}_{ij }\vert {} > \gamma {}^{-1}{} \textit{N}((\varPhi {}_{S}^{T}\varPhi {}_{S})^{-1}: \vert {}\vert {}\vert {}.\vert {}\vert {}\vert {}_{1}^{*}\rightarrow {}\vert {}\vert {}\vert {}.\vert {}\vert {}\vert {}_{max})\) then \(\vert {} \text {Y}^{*}_{ij }\vert {}>\gamma {}^{-1}{} \textit{max}_{(\textit{i,j}) }\vert {} (\varPhi {}_{S}^{T}\varPhi {}_{S})^{-1}(\text {M}^{*})_{ ij} \vert {}\) so \(\textit{sgn}(\text {X}^{*}_{S,\textit{ij}}) =\textit{sgn}(\text {Y}^{*}_{ij})\) for all (i,j) in S. \(\square \)
Proof
of Lemma 3. (1) Let \(\text {X}^{*}_{S} \in {}{} \textit{Arg}\) inf \(\vert {}\vert {}\vert {}\text {Z}\vert {}\vert {}\vert {}_{1}\) s.t. \(\text {Z}\ \in {}R^{n\times {}n}, \vert {}{} \mathbf{y} -\varPhi {}_{S}(\text {Z})\vert {}_{2 }\le {} \eta {}\). i.e., a minimizer with its support restricted on S. We first prove \(\text {X}^{*}_{S}\) is the only minimizer of this support-restricted problem, then we prove \(\text {X}^{*}\) is also the minimizer of problem \(\text {MP}_{y, \varPhi {}, {\eta {}}}\) (25), i.e., \(\text {X}^{*}_{S }\) is the global minimizer and (25)’s minimizer is unique.
According to general convex optimization theory, there exist a positive multiplier \(\gamma {}^{*} > 0\) and \(\text {M}^{*}\) in \(\partial {}\vert {}\vert {}\vert {}\text {X}^{*}_{S}\vert {}\vert {}\vert {}_{1}\) such that
then equivalently
Suppose \(\text {X}^{0}\) is another minimizer of inf \(\vert {}\vert {}\vert {}\text {Z}\vert {}\vert {}\vert {}_{1}\) s.t. \(\text {Z}\ \in {}R^{n\times {}n}\), \(\vert {}{} \mathbf{y} -\varPhi {}_{S}\text {(Z)}\vert {}_{2 }\le {} \eta {}\), then there exist a positive multiplier \(\gamma {}^{0}>\) 0 and \(\text {M}^{0}\) in \(\partial {}\vert {}\vert {}\vert {}\text {X}^{0}\vert {}\vert {}\vert {}_{1 }\) such that
Equivalently, (45) shows that \(\text {X}^{*}_{S}\) is also a minimizer of \(\textit{L}_{S}\text {(Z)} = \vert {}\vert {}\vert {}\text {Z}\vert {}\vert {}\vert {}_{1 }+ (1/2)\gamma {}^{*}\vert {}{} \mathbf{y} -\varPhi {}_{S}\text {(Z)}\vert {}_{2}^{2}\) which is a strictly convex function on \(\sum _s^{n\times {}n}(S)\) since \(\varPhi {}_{S}^{T}\varPhi {}_{S}^{ }\) is a bijection (condition(2)), as a result L\(_{S}\)(Z)’s minimizer is unique. However, since \(\vert {}\vert {}\vert {}\text {X}^{*}_{S}\vert {}\vert {}\vert {}_{1} = \vert {}\vert {}\vert {}\text {X}^{0}\vert {}\vert {}\vert {}_{1}\) we have \(\textit{L}_{S}(\text {X}^{*}_{S}) = \vert {}\vert {}\vert {}\text {X}^{*}_{S}\vert {}\vert {}\vert {}_{1} + (1/2)\gamma {}^{*}\vert {}{} \mathbf{y} -\varPhi {}_{S}(\text {X}^{*}_{S})\vert {}_{2}^{2} = \vert {}\vert {}\vert {}\text {X}^{*}_{S}\vert {}\vert {}\vert {}_{1} + \gamma {}^{*}\eta {}^{2}/2 =\vert {}\vert {}\vert {}\text {X}^{0}\vert {}\vert {}\vert {}_{1 }+ (\gamma {}^{*}/2)\vert {}{} \mathbf{y} -\varPhi {}_{S}(\text {X}^{0})\vert {}_{2}^{2 }= \textit{L}_{S}(\text {X}^{0})\), which implies \(\text {X}^{*}_{S} = \text {X}^{0}\), i.e., \(\text {X}^{*}_{S}\) is the unique minimizer of the support-restricted problem inf \(\vert {}\vert {}\vert {}\text {Z}\vert {}\vert {}\vert {}_{1}\) s.t. \(\text {Z}\ \in {}R^{n\times {}n}, \vert {}{} \mathbf{y} -\varPhi {}_{S}(Z)\vert {}_{2 }\le {} \eta {}\).
\(\text {X}_{S}^{*}\)’s\(\textit{ l}_{1}\)-column-flatness is implied by condition (1) and Theorem 3.
Now prove \(\text {X}^{*}_{S}\) (which is S-sparse and \(\textit{l}_{1}\)-column-flat ) is also a minimizer of problem \(\text {MP}_{y, \varPhi {}, {\eta {}}}\) (25). Again we start with the fact that \(\text {X}^{*}_{S }=\textit{ Arginf} \textit{L}_{S}\text {(Z)} = \textit{Arginf} \vert {}\vert {}\vert {}Z\vert {}\vert {}\vert {}_{1 }+ (1/2)\gamma {}^{*}\vert {}{} \mathbf{y} -\varPhi {}_{S}\text {(Z)}\vert {}_{2}^{2}\) with some multiplier \(\gamma {}^{* }> 0\) (which value depends on \(\text {X}^{*}_{S}\)) and by Lemma 2, \(\text {X}^{*}_{S}\) is the unique minimizer of the convex problem (without any restriction on solution’s support)
under the condition
According to convex optimization theory, \(\text {X}^{*}_{S}\) (under condition (49)) being the unique minimizer of problem (48) means \(\text {X}^{*}_{S }\) is also a minimizer of \(\text {MP}_{y,\varPhi {},\eta {}}\) (25), which furthermore implies that \(\text {MP}_{y,\varPhi {},\eta {}}\)’s minimizer is unique, S-sparse and \(\textit{l}_{1}\)-column-flat.
In order to make condition (49) more meaningful, we need to replace the minimizer-dependent parameter \(\gamma {}^{*}\) with explicit information. From (48)’s first-order optimization condition (45) we obtain
\(1 \ge {}\vert {}\vert {}\vert {}\text {M}^{*}\vert {}\vert {}\vert {}^{*}_{1 }= \gamma {}^{* }\vert {}\vert {}\vert {}\varPhi {}_{S}^{T}(\varPhi {}_{S}(\text {X}^{*}_{S}) - \mathbf{y} ) \vert {}\vert {}\vert {}^{*}_{1} \ge {} \gamma {}^{*}{} \textit{ min}\{^{ }\vert {}\vert {}\vert {}\varPhi {}_{S}^{T}(\mathbf{z} )\vert {}\vert {}\vert {}^{*}_{1}: \vert {}{} \mathbf{z} \vert {}_{2} = 1\}\vert {}\varPhi {}_{S}(\text {X}^{*}_{S}) - \mathbf{y} \vert {}_{2} =\gamma {}^{*}\eta {}\varLambda {}_{min}(\varPhi {}_{S}^{T})\)
i.e.,
with this upper-bound of \(\gamma {}^{*}\), (49) can be derived from a uniform condition
which is equivalent to condition (3).
From now on we denote \(\text {X}^{*}_{S}\) as \(\text {X}^{*}\).
(2) For \(\text {Y}^{*} = \varPhi {}_{S}^{*-1}(\mathbf{y} ) \in {}\sum _s^{n\times {}n}(S)\)and by Lemma 2’s conclusion (4), if \(\textit{min}_{(\textit{i,j}) in \textit{S }}\vert {}Y^{*}_{ij}\vert {} > \gamma {}^{*-1 }{} \textit{N}((\varPhi {}_{S}^{T}\varPhi {}_{S})^{-1}\): \(\vert {}\vert {}\vert {}.\vert {}\vert {}\vert {}_{1}^{*}\rightarrow {}\vert {}\vert {}\vert {}.\vert {}\vert {}\vert {}_{max})\) then \(\textit{sgn}(\text {X}^{*}_{S,\textit{ij}}) =\textit{sgn}(\text {Y}^{*}_{ij})\,\text {for all} (\textit{i},\textit{j})\) in S. To replace multiplier \(\gamma {}^{*}\) with more explicit information in this condition, we need some lower bound of \(\gamma {}^{*}\) which can be derived from the first-order optimization condition \(\text {M}^{*} = \gamma {}^{*}(\mathbf{y} - \varPhi {}_{S}^{T}(\varPhi {}_{S}(\text {X}^{*}))\) again. Note that \(\text {X}^{*}\) is \(\textit{l}_{1}\)-column-flat implies every column of \(\text {X}^{*}\) is not 0, further more \(\text {M}^{*}\) has no 0-column so \(\text {M}^{*} = ( \lambda {}_{1}{} \mathbf{u} _{1},\ldots {}, \lambda {}_{n}{} \mathbf{u} _{n})\) with \(\lambda {}_{j} > 0\text { for all }{} \textit{j}, \lambda {}_{1}+\ldots {}+\lambda {}_{n} =1\) and \(\vert {}{} \mathbf{u} _{j}\vert {}_\infty {} = 1\), as a result \(\vert {}\vert {}\vert {}\text {M}^{*}\vert {}\vert {}\vert {}_{1}^{*}= \sum {}_{j}\lambda {}_{j}\vert {}{} \mathbf{u} _{j}\vert {}_{\infty {}}=1\). Hence
\(1 = \vert {}\vert {}\vert {}\text {M}^{*}\vert {}\vert {}\vert {}_{1}^{*}\le {}\gamma {}^{*}\vert {}\vert {}\vert {}\varPhi {}_{S}^{T}(\varPhi {}_{S}(\text {X}^{*}) - \mathbf{y} ) \vert {}\vert {}\vert {}_{1}^{*} \le {} \gamma {}^{* }{} \textit{N}(\varPhi {}_{S}^{T}: \textit{l}_{2}\rightarrow {}\vert {}\vert {}\vert {}.\vert {}\vert {}\vert {}_{1}^{*})\vert {}\varPhi {}_{S}(\text {X}^{*}) - \mathbf{y} \vert {}_{2} = \gamma {}^{*}\eta {}N(\varPhi {}_{S}^{T}: \textit{l}_{2}\rightarrow {}\vert {}\vert {}\vert {}.\vert {}\vert {}\vert {}_{1}^{*})\)
i.e.,
Replace \(\gamma {}^{*-1}\) with its upper-bound in (52), we obtain if \(\textit{min}_{(\textit{i,j}) in \, \textit{S }}\vert {}\text {Y}^{*}_{ij}\vert {}> \eta {} \text {N}(\varPhi {}_{S}^{T}\): \(\textit{l}_{2}\rightarrow {}\vert {}\vert {}\vert {}.\vert {}\vert {}\vert {}^{*}_{1})\textit{N}((\varPhi {}_{S}^{T}\varPhi {}_{S})^{-1}\): \(\vert {}\vert {}\vert {}.\vert {}\vert {}\vert {}_{1}^{*}\rightarrow {}\vert {}\vert {}\vert {}.\vert {}\vert {}\vert {}_{max})\) then \(\textit{sgn}(\text {X}^{*}_{S,\textit{ij}}) =\textit{sgn}(\text {Y}^{*}_{ij})\) for all (i,j) in S.
(3) \(\text {Y}^{*} = \varPhi {}_{S}^{*-1}(\mathbf{y} )\in {}\sum _s^{n\times {}n}(S)\) implies \(\varPhi {}_{S}^{T}(\varPhi {}_{S}(\text {Y}^{*})-\mathbf{y} ) = \text {O}\) and then condition (1) leads to \(\varPhi {}_{S}(\text {Y}^{*}) = \mathbf{y} \). Furthermore, \(\varPhi {}_{S}^{T}\varPhi {}_{S }\)is a bijection for \(\sum _s^{n\times {}n}(S)\rightarrow {}\sum _s^{n\times {}n}(S)\) and notice \(X^{*}-\text {Y}^{*}\in {}\sum _s^{n\times {}n}(S)\), so for any matrix norm \(\vert {}.\vert {}_{\alpha {}}\):
\(\vert {}\text {X}^{*} - \text {Y}^{*}\vert {}_{\alpha {} =} \vert {}(\varPhi {}_{S}^{T}\varPhi {}_{S})^{-1}(\varPhi {}_{S}^{T}\varPhi {}_{S})(X^{*} - Y^{*})\vert {}_{\alpha {} }= \vert {}\varPhi {}_{S}^{*-1}\varPhi {}_{S}(\text {X}^{*} - \text {Y}^{*})\vert {}_{\alpha {} }= \vert {}(\varPhi {}_{S}^{*-1}(\varPhi {}_{S}(\text {X}^{*}) - \mathbf{y} ))\vert {}_{\alpha {}}\)
\(\le {} \textit{N}(\varPhi {}_{S}^{*-1}: \textit{l}_{2}\rightarrow {}\vert {}.\vert {}_{\alpha {}})\vert {}\varPhi {}_{S}(\text {X}^{*}) - \mathbf{y} \vert {}_{2 }= \eta {} N(\varPhi {}_{S}^{*-1}: \textit{l}_{2}\rightarrow {}\vert {}.\vert {}_{\alpha {}}) \) \(\square \)
Proof
of Theorem 6. (1) Note that in case of \(\text {X} \in {}\sum _s^{n\times {}n}(R)\) and \(\mathbf{y} = \varPhi {}\text {(X)}+ {\textit{\textbf{e}}} = \varPhi {}_{R}\text {(X)}+ {{\textit{\textbf{e}}}}, \vert {}{{\textit{\textbf{e}}}}\vert {}_{2} \le {}\eta {}\), we have
It’s straightforward to verify that in this situation condition (3) in this theorem leads to condition (3) in Lemma 3: \(\text {sup}\{<\varPhi {}_{\sim {}R}^{T}(\varPhi {}_{R}\varPhi {}_{R}^{*-1}(\mathbf{y} ) - \mathbf{y} )\mathbf , \text {H}>: \vert {}\vert {}\vert {}\text {H}\vert {}\vert {}\vert {}_{1}=1\} < \eta {} \varLambda {}_{min}(\varPhi {}_{R}^{T}) (1 - \textit{N}(\varPhi {}_{R}^{*-1}\varPhi {}_{\sim {}R}: \vert {}\vert {}\vert {}.\vert {}\vert {}\vert {}_{1}^{*}\rightarrow {}\vert {}\vert {}\vert {}.\vert {}\vert {}\vert {}_{1}^{*}))\)
for any \(\eta {}\). As a result, \(\text {X}^{*}\in {}\sum _s^{n\times {}n}(R)\) and is \(\textit{l}_{1}\)-column-flat and the unique minimizer of \(\text {MP}_{y, \varPhi {}, {\eta {}}}\).
(2) For \(\text {Y}^{*} = \varPhi {}_{R}^{*-1}(\mathbf{y} ) \in {}\sum _s^{n\times {}n}(R)\) and by Lemma 3(4), we obtain \(\vert {}\text {X}^{*} - \text {Y}^{*}\vert {}_{\alpha {}} \le {} \eta {} \text {N}(\varPhi {}_{R}^{*-1}\): \(\textit{l}_{2}\rightarrow {}._{\alpha {}})\) for any given matrix norm \(\vert {}_{.}\vert {}_{\alpha {}}\). On the other hand, \(\text {Y}^{*} = \varPhi {}_{R}^{*-1}(\mathbf{y} )\) implies \(\varPhi {}_{R}^{T}(\varPhi {}_{R}(\text {Y}^{*})-\mathbf{y} ) = \text {O}\) then condition (1) leads to \(\varPhi {}_{R}(\text {Y}^{*}) = \mathbf{y} \), hence \(\varPhi {}_{R}(\text {Y}^{*}) = \mathbf{y} = \varPhi {}(\text {X})+ {{\textit{\textbf{e}}}} = \varPhi {}_{R}(X)+ {{\textit{\textbf{e}}}}\), namely \(\varPhi {}_{R}^{T}\varPhi {}_{R}(\text {Y}^{*}) = \varPhi {}_{R}^{T}\varPhi {}_{R}\text {(X)}+\varPhi {}_{R}^{T}({{\textit{\textbf{e}}}})\), as a result:
Since \(\vert {}{\textit{\textbf{e}}}\vert {}_{2} \le {} {\eta {}}\), we get \(\vert {}\text {Y}^{*} - \text {X}\vert {}_{\alpha {}} \le {} \eta {} \text {N}(\varPhi {}_{R}^{*-1}\): \(\textit{l}_{2}\rightarrow {}\vert {}.\vert {}_{\alpha {}})\) for any given matrix norm \(\vert {}_{.}\vert {}_{\alpha {}}\). Combining with \(\vert {}\text {X}^{*} - \text {Y}^{*}\vert {}_{\alpha {}} \le {} \eta {} \text {N}(\varPhi {}_{R}^{*-1}\): \(\textit{l}_{2}\rightarrow {}\vert {}.\vert {}_{\alpha {}})\) we get the reconstruction error bound \(\vert {}\text {X}^{*} - \text {X}\vert {}_{\alpha {}} \le {} 2\eta {} \text {N}(\varPhi {}_{R}^{*-1}: \textit{l}_{2}\rightarrow {}\vert {}.\vert {}_{\alpha {}})\).
(3) By the first-order optimization condition on minimizer \(\text {X}^{*}\) with the fact \(\text {supp(X}^{*}) = \text {R}\), we have the equation \(\text {X}^{*} = \varPhi {}_{R}^{*-1}(\mathbf{y} ) - \gamma {}^{*-1}(\varPhi {}_{R}^{T}\varPhi {}_{R})^{-1}(\text {M}^{*}) = \text {Y}^{*} -\gamma {}^{*-1}(\varPhi {}_{R}^{T}\varPhi {}_{R})^{-1}(\text {M}^{*})\) where \(\text {M}^{*}\) is in \(\partial {}\vert {}\vert {}\vert {}\text {X}^{*}\vert {}\vert {}\vert {}_{1}\), namely:
Combining with (53), we get
Since \(\textit{sgn}(\text {X}^{*}_{ij}) = \textit{sgn}(\text {X}_{ij}) \textit{iff} \vert {}\text {X}_{ij}\vert {} > \vert {}\text {X}_{ij} - X^{*}_{ij}\vert {} = \vert {}\varPhi {}_{R}^{*-1}({{\textit{\textbf{e}}}})_{ ij} - \gamma {}^{*-1} (\varPhi {}_{R}^{T}\varPhi {}_{R})^{-1}(\text {M}^{*})_{ij}\vert {}\), in particular, if \(\text {X}_{ij}\) can satisfy \(\vert {}\text {X}_{ij}\vert {}>max_{ij} \vert {}\varPhi {}_{R}^{*-1}({{\textit{\textbf{e}}}})_{ ij} \vert {} +\gamma {}^{*-1}max_{ij} \vert {} (\varPhi {}_{R}^{T}\varPhi {}_{R})^{-1}(\text {M}^{*})_{ij} \vert {}\) then the former inequality is true and as a result \(\textit{sgn}(\text {X}^{*}_{ij}) = \textit{sgn}(\text {X}_{ij})\). It’s straightforward to verify (by using (52)) that the condition (3) just provides a guarantee for this. \(\square \)
Appendix C: Proofs of Theorems in Section 5
Proof
of Lemma 4. We start with (FACT 4) \(\textit{w}^{2}(\text {D}(\vert {}\vert {}\vert {}.\vert {}\vert {}\vert {}_{1}, \text {X})) \le {}\text {E}_{G}[\textit{inf}\{\vert {}\text {G}-\textit{t}\text {V}\vert {}_{F}^{2}\): \(\textit{t}>0\), V in \(\partial {}\vert {}\vert {}\vert {}\text {X}\vert {}\vert {}\vert {}_{1}\}]\) where G is a random matrix with entries \(\text {G}_{ij}\sim ^{iid} \textit{N}(0,1)\).
Set \(\text {G}=(\mathbf{g} _{1},\ldots {},\mathbf{g} _{n})\) where \(\mathbf{g} _{j}\sim ^{iid} \textit{N}(0,\text {I}_{n})\). By Lemma 1, \(V=(\lambda {}_{1}\xi {}_{1},\ldots {}, \lambda {}_{n}\xi {}_{n})\) where w.l.o.g. \(\lambda {}_{j}\ge {}0\) for \(\textit{j}=1,\ldots {},\textit{r}, \lambda {}_{1}+\ldots {}+\lambda {}_{r}=1\), \(\lambda {}_{j}=0\) for \(j\ge r+1\); \(\vert {}{} \mathbf{x} _{j}\vert {}_{1}=max_{k} \vert {}{} \mathbf{x} _{k}\vert {}_{1}\) for \(\textit{j}=1,\ldots {},\textit{r}\) and \(\vert {}{} \mathbf{x} _{j}\vert {}_{1}<max_{k}\vert {}{} \mathbf{x} _{k}\vert {}_{1 }\)for \(j\ge {}1+\textit{r}; \xi {}_{j}(\textit{i})=\textit{sgn}(\text {X}_{ij})\) for \(\text {X}_{ij}\not =0\) and \(\vert {}\xi {}_{j}(\textit{i})\vert {} \le {}1\) for all i and j. Then
For each \(j=1, \ldots , r\) let S(S) be the support of \(\mathbf {x}_{j}( \text{ so } |S(j)| \le s)\) and \(\sim S(j)\) be its complimentary set, then \(\left| \mathbf {g}_{j}-t \lambda _{j}\xi _j\right| _ 2^{2}=\left| \mathbf {g}_{j|S(j)}\right| -t \lambda _{j} \xi _{j| S(j)}\left| _2^{2}+\right| \mathbf {g}_{j|\sim S(j)}-\left. t \lambda _{j} \xi _{j|\sim S(j)}\right| _{2} ^{2}\). Notice that all components of \(\xi _{\mathrm {j} | S(j)}\) are \(\pm 1\) and all components of \(\xi _{j|\sim S(j)}\) can be any value in the interval \([-1,+1] .\) Select \(\lambda _{1}=\ldots =\lambda _{r}=1/r,\) let \(\varepsilon >0\) be arbitrarily small positive number and select \(t=t(\varepsilon )\) such that \(\mathrm {P}[|g|>t(\varepsilon )/r] \le \varepsilon \) where g is a standard scalar Gaussian random variable (i.e., \(\left. \mathrm {g} \sim N(0,1) \text{ and } \varepsilon \text {can be} \exp \left( -t(\varepsilon )^{2}/2 r^{2}\right) \right) \) For each j and each i outside S(j), set \(\xi _j\)’s component \(\xi _{j}(i)=r g_{j}(i)/t(\varepsilon )\) if \(\left| g_{j}(i)\right| \le t(\varepsilon )/r\) (in this case \(\left| g_{j}(i)-t \lambda _{j} \xi _{j|}(i)\right| =0\)) and otherwise \(\xi _{j}(i)=\text {sgn}\left( g_{j}(i)\right) \) (in this case \(\left| g_{j}(i)-t \lambda _{j} \xi _{j|}(i)\right| =\left| g_{j}(i)\right| -t(\varepsilon )/r)\), then \(\left| \mathbf {g}_{j|\sim S(j)}-t \lambda _{j} \xi _{j|\sim S(j)}\right| _ 2^{2}=0\) when \(| \mathbf {g}_{j|\sim S(j)}|_{\infty }<t(\varepsilon )/r\), hence:
\(\mathrm {E}\left[ \left| \mathbf {g}_{j|\sim S(j)}-t \lambda _{j} \xi _{ j|\sim S(j)}\right| _{2}^{2}\right] = \int _{0}^{\infty } d u \mathrm {P}\left[ \left| \mathbf {g}_{j| \sim S(j)}-t \lambda _{j} \xi _{j| \sim S(j)}\right| _ 2^{2}>u\right] \)
\(=2 \int _{0}^{\infty } d u u \mathrm {P}\left[ \left| \mathbf {g}_{j|\sim S(j)}-t \lambda _{j} \xi _{i |\sim S(j)}\right| _{2}>u\right] \)
\(\le 2 \int _{0}^{\infty } d u u P\left[ \text{ There } \text{ exists } \left( \mathbf {g}_{j|\sim S(j)}-t \lambda _{j} \xi _{i |\sim S(j)}\right) \right. \)’s component with magnitude \(\left. >(n-s)^{-1/2} u\right] \)
\(\le 2(n-s) \int _{0}^{\infty } d u u P\left[ |g|-t(\varepsilon )/r>(n-s)^{-1/2} u\right] \)
\(\le 2(n-s) \int _{0}^{\infty } d u u \exp \left( -\left( (t(\varepsilon )/r)+(n-s)^{-1/2} u\right) ^{2}/2\right) \)
\(\le C_{0}(n-s)^{2} \exp \left( -t(\varepsilon )^{2}/2 r^{2}\right) \le C_{0}(n-s)^{2} \varepsilon \)
where C\(_{0}\) is an absolute constant. On the other hand:
\(E_{gj}[\vert {}{} \mathbf{g} _{j\vert {}S(\textit{j})}-\textit{t}\lambda {}_{j}\xi {}_{j\vert {}S(\textit{j})}\vert {}_{2}^{2}] = E_{gj}[\vert {}{} \mathbf{g} _{j\vert {}S(\textit{j})}\vert {}^{2}] +(\textit{t}(\varepsilon {})^{2}/\textit{r}^{2})\vert {}\xi {}_{j\vert {}S(\textit{j})}\vert {}_{2}^{2 }= (1+\textit{t}(\varepsilon {})^{2}/\textit{r}^{2})\textit{s} =(1+2\textit{log}(1/\varepsilon {}))\textit{s }\)
Hence \(\textit{w}^{2}(\text {D}(\vert {}\vert {}\vert {}.\vert {}\vert {}\vert {}_{1}, \text {X})) \le {} (1+2\textit{log }(1/\varepsilon {}))\textit{rs} + (\textit{n}-\textit{r})\textit{n }+ \textit{r}(\textit{n}-\textit{s})^{2}\varepsilon {} \le {}{} \textit{n}^{2} - \textit{r}(\textit{n}-\textit{slog}(\textit{e}/\varepsilon {}^{2})) + \textit{C}_{0}{} \textit{n}^{2}r\varepsilon {}\)
In particular, let \(\varepsilon {}=1/\textit{C}_{0}{} \textit{n}^{2}{} \textit{r}\) then we get \(\textit{w}^{2}(\text {D}(\vert {}\vert {}\vert {}.\vert {}\vert {}\vert {}_{1}, \text {X})) \le {} \textit{n}^{2} - \textit{r}(\textit{n }- \textit{slog}(\textit{Cn}^{4}{} \textit{r}^{2})) + 1\textit{.}\) \(\square \)
Proof
of Theorem 9. For any \(\textit{s}<\textit{n}\), there exist \(\textit{k }\ge {} (\textit{n}/4\textit{s})^{\textit{ns}/2}\) subsets \(\text {S}^{(\alpha {}\beta {}\ldots {}\omega {})} = \text {S}_{1}^{({\alpha {}})}\cup {} \text {S}_{2}^{({\beta {}})} \cup {}\ldots {}\cup {}\,\text {S}_{n}^{(\omega {}) }\text {in }\{(\textit{i},\textit{j}): 1\le {}{} \textit{i}, \textit{j}\le {}{} \textit{n }\}\) where each \(\text {S}_{j}^{(\mu {})}=\{(\textit{i}_{1},\textit{j}),\ldots {}, (\textit{i}_{s},\textit{j}): 1\le {}{} \textit{i}_{1}<\textit{i}_{2}<\ldots {}<i_{s}\le {}{} \textit{n}\}\) and \(\vert {}\text {S}_{j}^{(\mu {})}\cap {}\text {S}_{j}^{(\nu {})}\vert {} < \text {s/2}\) for \(\mu {}\not =\nu {}\). This fact is based on a combinatorial theorem [11] that for any \(\textit{s}<\textit{n}\) there exist \(\textit{l} \ge {} (\textit{n}/4\textit{s})^{\textit{s}/2}\) subsets \(\text {R}^{(\mu {}) }\text {in }\{1,2,\ldots {},\textit{n}\}\) where \(\vert {}\text {R}^{(\mu {})}\cap {}\text {R}^{(\nu {})}\vert {}< \textit{s}/2\) for any \(\mu {}\not =\nu {}\). For the n-by-n square \(\{(\textit{i},\textit{j}): 1\le {}{} \textit{i}, \textit{j}\le {}{} \textit{n }\}\), assign a \(\text {R}^{(\mu {}) }\)to each column, i.e., set \(\text {S}_{j}^{(\mu {})}:=\{ (\textit{i}, \textit{j}): \textit{i}\in {}\text {R}^{(\mu {})}\}\). As a result \(\vert {}\text {S}_{j}^{(\mu {})}\cap {}\text {S}_{j}^{(\nu {})}\vert {}<\text {s/2}\) for \(\mu {}\not =\nu {}\) since \(\vert {}\text {R}^{(\mu {})}\cap {}\text {R}^{(\nu {})}\vert {} < \text {s/2}\) for \(\mu {}\not =\nu {}\) and totally there can be \(k=l^{n}\) such assignments \(\text {S}^{({\alpha {}\beta {}\ldots {}\omega {}})}=\text {S}_{1}^{(\alpha {})}\cup {} \text {S}_{2}^{(\beta {})}\cup {}\ldots {}\cup {}\,\text {S}_{n}^{(\omega {})}\) on the square.
Now we call the above \(\text {S}_{1}^{(\alpha {})}\cup {} \text {S}_{2}^{(\beta {})}\cup {}\ldots {}\cup {}\,\text {S}_{n}^{(\omega {}) }\)a configuration on the n-by-n square. Let m be the rank of linear operator \(\varPhi {}\). Consider the quotient space \(\textit{L}:=R^{n\times {}n} /\text {ker}\varPhi {}\), then \(\text {dim}{} \textit{L}=\textit{n}^{2}-\text {dimker}\varPhi {}=\textit{m}\). For any [X] in L define the norm \(\vert {}[\text {X}]\vert {}:=\textit{inf}\{\vert {}\vert {}\vert {}\text {X}-\text {V}\vert {}\vert {}\vert {}_{1}\): V in ker\(\varPhi {}\)}. For any \(X=(\mathbf{x} _{1},\ldots {},\mathbf{x} _{n})\) with \(\mathbf{x} _{j }~in~\sum {}^{2S }\) for all j, the assumption about \(\varPhi {}\) implies \(\vert {}[\text {X}]\vert {}=\vert {}\vert {}\vert {}\text {X}\vert {}\vert {}\vert {}_{1}\). Now for any configuration \(\varDelta {}=\text {S}_{1}\cup {}\text {S}_{2}\cup {}\ldots {}\,\cup {}\text {S}_{n }\)on the n-by-n square, define \(\text {X}_{ij}(\varDelta {}):=1/\textit{s}\) if (\(\textit{i},\textit{j})\in {} \text {S}_{j}\) and 0 otherwise, then \(\vert {}\vert {}\vert {}\text {X}(\varDelta {})\vert {}\vert {}\vert {}_{1}=1\), each \(\text {X}(\varDelta {})\)’s column \(\text {x}_{j}(\varDelta {})\in {}\sum {}^{S }\)and each column of \(\text {X}(\varDelta {}\)’)\(-\text {X}(\varDelta {}\)”) is in \(\sum {}^{2S}\), furthermore \(\vert {}[\text {X}(\varDelta {}\)’)]\(-[\text {X}(\varDelta {}\)”)]\(\vert {}=\vert {}\vert {}\vert {}\text {X}(\varDelta {}\)’)\(-\text {X}(\varDelta {}\)”)\(\vert {}\vert {}\vert {}_{1}>1\) because of the property \(\vert {}\text {S}_{j}\)’\(\cap {}\text {S}_{j}\)”\(\vert {}< \text {s}/2\) for \(\text {S}_{j}\)’\(\not =\text {S}_{j}\)”. These facts imply that the set \(\varTheta {}:=\{[\text {X}(\varDelta {})]: \varDelta {}\) runs over all configurations} is a subset on normed quotient space L’s unit sphere with distances between any pair of its members >1, i.e., a d-net on the sphere where d > 1. The cardinality of \(\varTheta {}\) = number of configurations\(\textit{k} \ge {} (\textit{n}/4\textit{s})^{\textit{ns}/2}\) and an elementary estimate derives \(\textit{k} \le {} 3^{dimL}=3^{m}\), hence \(\textit{m}\ge {} \textit{C}_{1}{} \textit{nslog}(\textit{C}_{2}{} \textit{n}/\text {s}))\) where \(\textit{C}_{1}=1/2\textit{log}3\) and \(\textit{C}_{2}=1/4\). \(\square \)
Appendix D: Proofs of Theorems in Section 6
Proof
of Lemma 5. We start with a similar inequality as that in FACT 4 (the proof is also similar) \(\textit{W}^{2}(\varGamma {}_{X}; \varPhi {}_{A,B}) \le {} \text {E}_{H}[\textit{inf}\{\vert {}\text {H}-\textit{t}\text {V}\vert {}_{F}^{2}\): \(\textit{t}>0\), V in \(\partial {}\vert {}\vert {}\vert {}\text {X}\vert {}\vert {}\vert {}_{1}\}\)]. With the same specifications for \(\text {V}=(\lambda {}_{1}\xi {}_{1},\ldots {}, \lambda {}_{n}\xi {}_{n}\)) as those in Lemma 4, i.e.(w.l.o.g.) \(\lambda {}_{j }\ge {} 0\) for \(\textit{j}=1,\ldots {},\textit{r}, \lambda {}_{1}+\ldots {}+\lambda {}_{r}=1\), \(\lambda {}_{j}=0\) for \(j\ge r+1\); \(\vert {}{} \mathbf{x} _{j}\vert {}_{1}=max_{k}\vert {}{} \mathbf{x} _{k}\vert {}_{1 }\)for \(\textit{j}=1,\ldots {},\textit{r}\) and \(\vert {}{} \mathbf{x} _{j}\vert {}_{1 }< {max_{k}}\vert {}{} \mathbf{x} _{k}\vert {}_{1 }\)for \({j\ge {}}1+\textit{r}\); \(\xi {}_{j}(\textit{i})=\textit{sgn}(\text {X}_{ij})\) for \(\text {X}_{ij}\not =0\) and \(\vert {}\xi {}_{j}(\textit{i})\vert {} \le {}1\) for all i and j. Let \(\textit{\textbf{h}}_{j}\equiv {} \sum _{l=1}^mm^{-1}B_{lj}\text {A}^{T}\varepsilon {}_{l}\), we have
\(W^{2}\left( \varGamma _{\mathrm {X}} ; \varPhi _{\mathrm {A}, \mathrm {B}}\right) \)
\(\le \mathrm {E}_{A, B, E}\left[ i n f_{ t>0, \lambda j, \xi j \text{ specified } \text{ as } \text{ the } \text{ above } }\right. \sum _{j=1}^{n} | \sum _{l=1}^{m}{} \textit{m}^{-1}B_{lj}\text {A}^{T}\varepsilon {}_{l }- \textit{t}\lambda {}_{j}\xi {}_{j}\vert {}_{2}^{2}]\)
\(=\sum _{j=r+1}^{n} \mathrm {E}_{A, B, E}\left[ \left| \textit{\textbf{h}}_{j}\right| _{2}^{2}\right] + \mathrm {E}_{A, B, E}\left[ i n f_{ t>0, \lambda j, \xi j \text{ specified } \text{ as } \text{ the } \text{ above } }\sum _{j=1}^{r}\right. \left. \left| \textit{\textbf{h}}_{j}-t \lambda _{j} \xi _{j}\right| _{2}^{2}\right] \)
\(=\mathrm {I}+\mathrm {II}\)
The first and second terms are estimated respectively. The first term
\(\text {I} = \sum _{j=r+1}^n\textit{m}^{-2}\sum _{l,k=1}^m\text {E}_{B}[B_{lj}\text {B}_{kj}]\text {E}_{A,E}[\varepsilon {}_{l}^{T}\text {AA}^{T}\varepsilon {}_{k}] = \textit{m}^{-2}(\textit{n}-\textit{r}) \sum _{l,k=1}^m\delta {}_{lk} \text {E}_{A,E}[\varepsilon {}_{l}^{T}\text {AA}^{T}\varepsilon {}_{l}] =\textit{ }(\textit{n}-\textit{r})\textit{n}\)
To estimate II, for each \(\textit{j}=1,\ldots {},\textit{r}\) let S(j) be the support of \(\mathbf{x} _{j}\) (so \(\vert {}{} \textit{S}(\textit{j})\vert {} \le {} \text {s}\)) and \(\sim \)S(j) be its complimentary set, then
\(\left. \sum _{j=1}^{r}\left| \textit{\textbf{h}}_{j}-t \lambda _{j} \xi _{j}\right| _{2}^{2}\right] = \left. \sum _{j=1}^{r}\left| \textit{\textbf{h}}_{j| S(j)}-t \lambda _{j} \xi _{j| S(j)}\right| _{2}^{2}\right] \) \(+\sum _{j=1}^{r}|\textit{\textbf{h}}_{j| \sim S(j)}\)
\(\left. -t \lambda _{j} \xi _{j|\sim S(j)}|_{2}^{2}]\right] \)
Notice that all components of \(\xi {}_{j\vert {}S({j})}\) are \(\pm {}\)1 and all components of \(\xi {}_{j\vert {}\sim {}S({j}) }\) can be any value in the interval [−1, +1]. Select \(\lambda {}_{1}=\ldots {}=\lambda {}_{r} =1/\textit{r}\), let \(\delta {}>0\) be arbitrarily small positive number and select \(\textit{t}=\textit{t}(\delta {})\) such that \(\text {P}_{A,B,E}[\vert {}{} \textit{h}\vert {} > \textit{t}(\delta {})/\textit{r}] \le {} \delta {}\) where h is a random scalar such that \(h_{j}(\textit{i})\sim {}{} \textit{h}\) and i indicates the vector \({{\textit{\textbf{h}}}_{j}}\)’s i-th component. For each j and i outside S(j), set \(\xi {}_{j}\)’s component \(\xi {}_{j}(\textit{i})=\textit{rh}_{j}(\textit{i})/\textit{t}(\varepsilon {})\) if \(\left| h_{j}(i)\right| \le t(\delta )/r\) and otherwise \(\xi _{j}(i)=\text {sgn}\left( h_{j}(i)\right) \), then \(\left| \textit{\textbf{h}}_{j| \sim S(j)}-t \lambda _{j} \xi _{j|\sim S(j)}\right| _{2}^{2}=0\) when \(| \textit{\textbf{h}}_{j|\sim S(j)}|_{\infty }<t(\delta )/r\) and notice the fact that for independent standard scalar Gaussian variables \(a_{l}, b_{l}\) and Rademacher variables \(\varepsilon _{l}, l=1, \ldots , m\), there exists absolute constant c such that for any \(\eta >0\):
as a result, in the above expression \(\delta {}\) can be \(\textit{ c}_{ }{} \textit{exp}(-\textit{t}(\delta {})/\textit{r})\) and:
\(\mathrm {E}\left[ \left| \textit{\textbf{h}}_{j|\sim S(j)}-t \lambda _{j} \xi _{j|\sim S(j)}\right| _{2}^{2}\right] = \int _{0}^{\infty } d u P\left[ \left| \textit{\textbf{h}}_{j|\sim S (j)}-t \lambda _{j} \xi _{j| \sim S(j)}\right| _{2} >u\right] \)
\(=2 \int _{0}^{\infty } d u u \mathrm {P}\left[ \left| \textit{\textbf{h}}_{j|\sim S(j)}-t \lambda _{j} \xi _{j|\sim S(j)}\right| _ 2>u\right] \)
\(\le 2 \int _{0}^{\infty } d u u P\left[ \text{ There } \text{ exists } \left( \textit{\textbf{h}}_{j|\sim S(j)}-t \lambda _{j} \xi _{i\sim S(j)}\right) ' \mathrm {s}\right. \) component with magnitude \(\left. >(n-s)^{-1/2} u\right] \)
\(\le 2(n-s) \int _{0}^{\infty } d u u P\left[ |h|-t(\delta )/r>(n-s)^{-1/2} u\right] \)
\(\le 2(n-s) \int _{0}^{\infty } d u u \exp \left( -\left( (t(\delta )/r)+(n-s)^{-1/2} u\right) \right) \)
\(\le C_{0}(n-s)^{2} \exp (-(t(\delta )/r)) \le C_{0}(n-s)^{2} \delta \)
where \(\textit{C}_{0}\) is an absolute constant. On the other hand \(\left| \xi _{j| S(j)}\right| _{2}^{2} \le s\) for \(j \ge 1+r\) so:
\(\mathrm {E}_{A, B, E}\left[ \inf _{t>0, \lambda j, \xi j} \sum _{j=1}^{r}\left| \textit{\textbf{h}}_{j| S(j)}-t \lambda _{j} \xi _{j| S(j)}\right| _{2}^{2}\right] \)
\(\le \mathrm {E}_{A, B, E}\left[ \sum _{j=1}^{r}\left| \textit{\textbf{h}}_{j| S(j)}-t(\delta ) \xi _{j| S(j)}/r\right| _{2}^{2}\right] \)
\(\le \sum _{j=1}^{r} \mathrm {E}_{A, B, E}\left[ m^{-2}\left| \sum _{l=1}^{m} B_{l j}\left( \mathrm {A}^{\mathrm {T}} \varvec{\varepsilon }_{l}\right) _{| S(j)}\right| {}_2^{2}\right] +r s t(\delta )^{2}/r^{2}\)
\(=r s\left( 1+t(\delta )^{2}/r^{2}\right) \)
hence \(\text {II} \le {}{} \textit{rs}(1+\textit{t}(\delta {})^{2}/\textit{r}^{2}) +nr\delta {}\). Combine all the above estimates we have:
\(\textit{W}^{2}(\varGamma {}_{X}; \varPhi {}_{A,B}) \le {} \text {I} + \text {II} \le {} (\textit{n}-\textit{r})\textit{n + rs}(1+\textit{t}(\delta {})^{2}/\textit{r}^{2}) + \textit{C}_{0}{} \textit{n}^{2}r\delta {} = \textit{n}^{2} - \textit{r}(\textit{n}-\textit{s}(1+\textit{t}(\delta {})^{2}/\textit{r}^{2}) + \textit{C}_{0}{} \textit{n}^{2}r\delta {}\)
Substitute \(\textit{t}(\delta {})/\textit{r}\) with \(\textit{log}(\textit{c}/\delta {})\) we get, for any \(\delta {}> 0\):
In particular, let \(\delta {}=1\textit{/C}_{0}{} \textit{n}^{2}{} \textit{r }\text { then }{} \textit{ W}^{2}(\varGamma {}_{X}; \varPhi {}_{A,B}) \le {} \textit{n}^{2}-\textit{r}(\textit{n}-\textit{s}(1+\textit{log}^{2}(\textit{cn}^{2}{} \textit{r}))+ 1\). \(\square \)
Proof
of Lemma 6. By the second moment inequality \(\text {P}[\text {Z} \ge \xi ] \ge (\text {E[Z]} - \xi {})_{ +}^{2}/\text {E}[\text {Z}^{2}]\) for any non-negative r.v. Z and any \(\xi {} > 0\). Set \(\text {Z} = \vert {}{<}\text {M, U}{>}\vert {}^{2}\) and \(\xi {} =\text {E}[\vert {}{<}\text {M, U}{>}\vert {}^{2}]/2\), we get:
To estimate the upper bound of \(\text {E}[\vert {}{<}\text {M,U}{>}\vert {}^{2}]\), let \(\text {U}=\sum {}_{j}\lambda {}_{j}{} \mathbf{u} _{j}{} \mathbf{v} _{\textit{j} }\) be U’s singular value decomposition, \(\mathbf{u} _{i}^{T}{} \mathbf{u} _{j}=\mathbf{v} _{i}^{T}{} \mathbf{v} _{j}=\delta {}_{ij}\), \(\lambda {}_{j}>0\) for each j. Notice that \(\text {M}=\mathbf{ab} ^{T}\) where a\(\sim \)b\(\sim \)N(0, \(I_{n}\)) and independent each other, then \({<}\text {M,U}{>} = \mathbf{a} ^{T}\text {U}{} \mathbf{b} =\sum {}_{j}\lambda {}_{j}{} \mathbf{a} ^{T}{} \mathbf{u} _{j}{} \mathbf{v} _{j}^{T}{} \mathbf{b} \) where \(\mathbf{a} ^{T}{} \mathbf{u} _{i}\sim {}{} \mathbf{v} _{j}^{T}{} \mathbf{b} \sim {}{} \textit{N}(0,1)\) and independent each other, hence \(\text {E}[\vert {}{<}\text {M,U}{>}\vert {}^{2}] = \sum {}_{j}\lambda {}_{j}^{2}\text {E}[\vert {}{} \mathbf{a} ^{T}{} \mathbf{u} _{j}\vert {}^{2}]\text {E}\vert {}{} \mathbf{v} _{j}^{T}{} \mathbf{b} \vert {}^{2}]= \sum {}_{j}\lambda {}_{j}^{2} = \vert {}\text {U}\vert {}_{F}^{2}=1\) for U in the assumption.
On the other hand by Gaussian hypercontractivity we have
In conclusion \(\text {P}[\vert {}{<}\text {M,U}{>}\vert {}^{2}\ge {} 1/2] = \text {P}[\vert {}{<}\text {M,U}{>}\vert {}^{2}\ge {} \text {E}[\vert {}{<}\text {M,U}{>}\vert {}^{2}]/2] \ge {} \textit{c}\) for U: \(\vert {}\text {U}\vert {}_{F}^{2}=1\). \(\square \)
The proof of Lemma 7 is logically the same as the proof of Lemma 5, the only difference is about the distribution tail of the components of vectors \({{\textit{\textbf{h}}}_{j} }\equiv {}\sum _{l=1}^mm^{-1}B_{lj}\text {A}^{T}\varepsilon {}_{l} \) which \(\sim ^{iid} \textit{h} \equiv {}{} \textit{m}^{-1}\sum _{l,k=1}^mb_{l}\text {a}_{k}\varepsilon {}_{k}\) with independent scalar sub-Gaussian variables \(a_{l}\), \(b_{l}\) and Rademacher variables \(\varepsilon {}_{l}\), l=1,...,m. This auxiliary result is presented in the following lemma:
Lemma 8
For independent scalar zero-mean sub-Gaussian variables \(a_{l}, b_{l}\) and Rademacher variables \(\varepsilon {}_{l}, \textit{l}=1,\ldots {},\textit{m}{} \mathbf , \) let \(\sigma {}_{A}\equiv {}max_{l}\vert {}a_{l}\vert {}_{\psi {}2}\), \(\sigma {}_{B}\equiv {}max_{l}\vert {}b_{l}\vert {}_{\psi {}2 }(\vert {}.\vert {}_{\psi {}2 }\) denotes a sub-Gaussian variable’s \(\psi {}_{2}\)-norm), then there exists absolute constant c such that for any \(\eta {}> 0\):
Proof
Notice that \(a_{\ell } \varepsilon _{k}\) is zero-mean sub-Gaussian variable with \(| a_{k} \varepsilon _{k} |_{\psi {2}}=|a_k|_{\psi 2},\) for \(b=m^{-1/2} \sum _{1 \le l \le m} b_{l}\) and \(a=m^{-1/2} \sum _{1 \le k \le m} a_{k} \varepsilon _{k}\) we have \(|b|_{\psi 2} \le C m^{-1/2}\left( \sum _{l}\left| b_{l}\right| _{\psi _{2}}^{2}\right) ^{1/2}C \sigma _{\mathrm {B}}\) and \(|a|_{\psi 2} \le C m^{-1/2}\left( \sum _l | a_{k} |_ {\psi {2}}^{2}\right) ^{1/2} \le C \sigma _{\mathrm {A}}\) where C is an absolute constant. Furthermore, because the product of two sub-Gaussian variables a and b is sub-Exponential and its \(\psi _{1}\) -norm \(|b a|_{w 1} \le \) \(|b|_{\psi 2}|a|_{\psi 2} \le C^{2} \sigma _{\mathrm {A}} \sigma _{\mathrm {B}}, h \equiv m^{-1} \sum _{l, k=1}^{m} b_{l} a_{k} \varepsilon _{k}=a b\) has its distribution tail P[\(|h|>\eta ]<2 \exp \left( -c \eta /\sigma _{A} \sigma _{B}\right) \) where c is an absolute constant. \(\square \)
Proof
of Lemma 7. With the same logic as in the proof of Lemma 5 and based-upon Lemma 8, the auxiliary parameter \(\delta {}\) in the argument can be \(2\textit{exp}(-\textit{ct}(\delta {}) /r\sigma {}_{A}\sigma {}_{B})\) and equivalently \(\textit{t}(\delta {})/\textit{r }= \sigma {}_{A}\sigma {}_{B}{} \textit{log}(2/\delta {})\) which derives the final result. \(\square \)
Rights and permissions
Copyright information
© 2020 Springer Nature Singapore Pte Ltd.
About this paper
Cite this paper
Tian, Y. (2020). Convex Reconstruction of Structured Matrix Signals from Linear Measurements: Theoretical Results. In: Zeng, J., Jing, W., Song, X., Lu, Z. (eds) Data Science. ICPCSEE 2020. Communications in Computer and Information Science, vol 1257. Springer, Singapore. https://doi.org/10.1007/978-981-15-7981-3_13
Download citation
DOI: https://doi.org/10.1007/978-981-15-7981-3_13
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-15-7980-6
Online ISBN: 978-981-15-7981-3
eBook Packages: Computer ScienceComputer Science (R0)