Skip to main content

Convex Reconstruction of Structured Matrix Signals from Linear Measurements: Theoretical Results

  • Conference paper
  • First Online:
Data Science (ICPCSEE 2020)

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 1257))

  • 1178 Accesses

Abstract

The problem of reconstructing n-by-n structured matrix signal \(\text {X}=(\mathbf{x} _1,\ldots ,\mathbf{x} _n)\) via convex optimization is investigated, where each column \(\mathbf{x} _j\) is a vector of s-sparsity and all columns have the same \(l_1\)-norm value. In this paper, the convex programming problem was solved with noise-free or noisy measurements. The uniform sufficient conditions were established which are very close to necessary conditions and non-uniform conditions were also discussed. In addition, stronger conditions were investigated to guarantee the reconstructed signal’s support stability, sign stability and approximation-error robustness. Moreover, with the convex geometric approach in random measurement setting, one of the critical ingredients in this contribution is to estimate the related widths’ bounds in case of Gaussian and non-Gaussian distributions. These bounds were explicitly controlled by signal’s structural parameters r and s which determined matrix signal’s column-wise sparsity and \(l_1\)-column-flatness respectively. This paper provides a relatively complete theory on column-wise sparse and \(l_1\)-column-flat matrix signal reconstruction, as well as a heuristic foundation for dealing with more complicated high-order tensor signals in, e.g., statistical big data analysis and related data-intensive applications.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 89.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 119.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Foucart, S., Rauhut, H.: A Mathematical Introduction to Compressive Sensing. Birkhaeusser (2013)

    Google Scholar 

  2. Eldar, Y.C., Kutynoik, G. (eds.): Compressed Sensing: Theory and Applications. Cambridge University Press (2012)

    Google Scholar 

  3. Cohen, D., Eldar, Y.C.: Sub-Nyquist radar systems: temporal, spectral and spatial compression. IEEE Signal Process. Mag. 35(6), 35–57 (2018)

    Article  Google Scholar 

  4. Davenport, M.A., Romberg, J.: An overview of low-rank matrix recovery from incomplete observations. arXiv:1601.06422 (2016)

  5. Duarte, M.F., Baraniuk, R.G.: Kronecker compressive sensing. IEEE Trans. Image Process. 21(2), 494–504 (2012)

    Article  MathSciNet  MATH  Google Scholar 

  6. Dasarathy, G., Shah, P., Bhaskar, B.N., Nowak, R.: Sketching sparse matrices. arXiv:1303.6544 (2013)

  7. Chandrasekaran, V., Recht, B., Parrio, P.A., Wilsky, A.S.: The convex geometry of linear inverse problems. Found. Comput. Math. 12, 805–849 (2012)

    Article  MathSciNet  Google Scholar 

  8. Tropp, J.A.: Convex recovery of a structured signal from independent random linear measurements. In: Pfander, G. (ed.) Sampling Theory: A Renaissance: Compressive Sampling and Other Developments. Birkhaeusser (2015)

    Google Scholar 

  9. Mendelson, S.: Learning without concentration. J. ACM 62, 3 (2014)

    MathSciNet  Google Scholar 

  10. Mendelson, S., Pajor, A., Tomczak-Jaegermann, N.: Reconstruction and subgaussian operators in asymptotic geometric analysis. Geom. Func. Anal. 17(4), 1248–1282 (2007)

    Article  MathSciNet  MATH  Google Scholar 

  11. Van Lint, J.H., Wilson, R.M.: A Course in Combinatorics. Springer-Verlag (1995)

    Google Scholar 

  12. Vershynin, R.: High-Dimensional Probability - with Applications to Data Science. Oxford University Press (2015)

    Google Scholar 

  13. Ledoux, M., Talagrand, M.: Probability in Banach Space: Isopermetry and Processes. Springer, Heidelberg (1991). https://doi.org/10.1007/978-3-642-20212-4

    Book  MATH  Google Scholar 

  14. Dai, W., Li, Y., Zou, J., Xiong, H., Zheng, Y.: Fully decomposable compressive sampling with optimization for multidimensional sparse representation. IEEE Trans. Signal Process. 66(3), 603–616 (2018)

    Article  MathSciNet  MATH  Google Scholar 

  15. Vatier, S., Peyre, G., Dadili, J.: Model consistency of partly smooth regularizers. IEEE Trans. Inform. Theory 64(3), 1725–1747 (2018)

    Article  MathSciNet  MATH  Google Scholar 

  16. Oymak, S., Tropp, J.A.: Universality laws for randomized dimension reduction with applications. Inf. Infer. 7, 337–386 (2017)

    MathSciNet  MATH  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Yuan Tian .

Editor information

Editors and Affiliations

Appendices

Appendix A: Proofs of Theorems in Section 3

Proof

of Lemma 1. It’s easy to verify the set \(\{(\lambda {}_{1}\xi {}_{1},\ldots {}, \lambda {}_{n}\xi {}_{n}): \xi {}_{j}\text { in }\partial {}\vert {}{} \mathbf{x} _{j}\vert {}_{1} \text {and}\,\lambda {}_{j}\ge {}0\text { for all }{} \textit{j}, \lambda {}_{1}+\ldots {}+\lambda {}_{n}=1\text { and }\lambda {}_{j}=0\,\text {for}\,\textit{j}: \vert {}{} \mathbf{x} _{j}\vert {}_{1}<max_{k}\vert {}{} \mathbf{x} _{k}\vert {}_{1}\}\) is contained in \(\partial {}\vert {}\vert {}\vert {}\text {X}\vert {}\vert {}\vert {}_{1}\): since for any \(\text {M} \equiv {} (\lambda {}_{1}\xi {}_{1},\ldots {}, \lambda {}_{n}\xi {}_{n})\) in this set, we have

$$\begin{aligned} {<}\text {M,X}{>}\,=\,\sum {}_{j}\lambda {}_{j}{<}\xi {}_{j},\mathbf{x} _{j}{>}\, =\,\sum {}_{j}\lambda {}_{j}\vert {}{} \mathbf{x} _{j}\vert {}_{1} = \vert {}\vert {}\vert {}X\vert {}\vert {}\vert {}_{1}\sum {}_{j}\lambda {}_{j} = \vert {}\vert {}\vert {}X\vert {}\vert {}\vert {}_{1 } \end{aligned}$$

and \(\vert {}\vert {}\vert {}.\vert {}\vert {}\vert {}{1}\)’s conjugate norm \(\vert {}\vert {}\vert {}\mathrm {M}\vert {}\vert {}\vert {}_{1}^{*}=\sum _{j} \lambda _{i}\left| \xi _{i}\right| _{\infty } \le \sum _{j} \lambda _{i}=1\), as a result M is in \(\partial {}\vert {}\vert {}\vert {}\text {X}\vert {}\vert {}\vert {}_{1}\).

Now prove that any M in \(\partial {}\vert {}\vert {}\vert {}\text {X}\vert {}\vert {}\vert {}_{1}\) has the form specified as a member in the above set. Let \(\mathrm {M} \equiv \left( \varvec{\eta }_{1}, \ldots , \varvec{\eta }_{n}\right) \), \(\vert {}\vert {}\vert {}\text {Y}\vert {}\vert {}\vert {}_{1}\ge {} \vert {}\vert {}\vert {}\text {X}\vert {}\vert {}\vert {}_{1} + {<}\text {Y}-\text {X,M}{>}\) for all \(\text {Y} \equiv {} (\mathbf{y} _{1},\ldots {}, \mathbf{y} _{n})\) implies:

$$\begin{aligned} max _{j}|\mathbf {y}|_{1} \ge max _{j}\left| \mathbf {x}_{j}\right| _{1}+\sum _{j}{<}\mathbf {y}_{j}-\mathbf {x}_{j}, \mathbf {\eta }_{j}{>} \end{aligned}$$
(37)

Let \(\varvec{\eta }_{j}=\left| \varvec{\eta }_{j}\right| _{\infty } \xi _{j}\left( \mathrm {so}\left| \xi _{j}\right| _{\infty }=1 \text{ if } \varvec{\eta }_{j} \ne 0\right) \), then \(max _{j}\left| \mathbf {y}_{j}\right| _{1} \ge max _{j}\left| \mathbf {x}_{j}\right| _{1}+\sum _{j}\left| \varvec{\eta }_{j}\right| _{\infty } {<}\mathbf {y}_{j}-\mathbf {x}_{j}, \xi _{j}{>}.\) For each \(j: \varvec{\eta }_{i} \ne 0\) we can select a \(i_{j}\) such that \(|\xi _j(i_j)|=1\) and let \(e^{*}_{j}\) be such a vector with component \(e_{j}^{*}(i_j)={\text {sgn}} \xi _{j}(i_j)\) and \(e^{*}_{j}(i)=0\) for all \(i \ne i_j,\) then for \(\mathbf {y}_{j}=\mathbf {x}_{j}+e_{j}^{*}\), \(j=1, \ldots , n\), (37) implies

$$\begin{aligned} \begin{array}{c} 1+max _{j}\left| \mathbf {x}_{j}\right| _{1} \ge m a x_{j}\left| \mathbf {y}_{j}\right| _{1} \ge m a x_{j}\left| \mathbf {x}_{j}\right| _{1}+\sum \nolimits _{j}\left| \varvec{\eta }_{j}\right| _{\infty }{<}\textit{\textbf{e}}_{j}^{*}, \xi _{j}{>}\\ =\,max _{j}\left| \mathbf {x}_{j}\right| _{1}+\sum \nolimits _{j}|\varvec{\eta }_{j}|_\infty |\varvec{\xi }_{j }|_ \infty =max _{j}\left| \mathbf {x}_{j}\right| _{1}+\sum \nolimits _{j}|\varvec{\eta }_j|_{\infty } \end{array} \end{aligned}$$

As a result \(1 \ge \sum _{j}\left| \varvec{\eta }_{j}\right| _{\infty }\).

Furthermore for any given i,  let \(\mathbf {y}_{j}=\mathbf {x}_{j}\) for all \(j \ne i\) and \(\mathbf {y}_{i}\) be any vector satisfying \(\left| \mathbf {y}_{i}\right| _1 \le |\mathbf {x}_i|_1\), then substitute these \(\mathbf {y}_{1}, \ldots , \mathbf {y}_{n}\) into (37) we obtain

$$\begin{aligned} \begin{array}{c} max _{j}\left| \mathbf {x}_{j}\right| _{1} \ge m a x_{j}\left| \mathbf {y}_{j}\right| _{1} \ge max _{j}\left| \mathbf {x}_{j}\right| _{1}+\sum \nolimits _{j}{<}\mathbf {y}_{j}-\mathbf {x}_{j}, \varvec{\eta }_{j}{>}\\ =\,m a x_{j}\left| \mathbf {x}_{j}\right| _{1}+\left| \varvec{\eta }_{j}\right| _{\infty }<\mathbf {y}_{i}-\mathbf {x}_{i}, \xi _{j}> \end{array} \end{aligned}$$

i.e., \(<\mathbf {y}_{i}-\mathbf {x}_{i}, \xi _{j}>\,\le \,0.\) As a result \(<\mathbf {x}_{i}, \xi _{j}\,\,>\,\,\ge \,\,<\mathbf {y}_{i}, \xi _{j}>\) for all \(\mathbf {y}_{i}:\left| \mathbf {y}_{i}\right| _{1}\,\le \,\left| \mathbf {x}_{i}\right| _{1}\) so \(<\mathbf {x}_{i}, \xi _{j}>\,\ge \,\left| \mathbf {x}_{i}\right| _{1}\left| \xi _{i}\right| _{\infty }=\left| \mathbf {x}_{i}\right| _{1},\) hence finally we get \(<\mathbf {x}_{i}, \xi _{j}>\,=\left| \mathbf {x}_{i}\right| _{1}\). This (together with \(\left| \xi _{i}\right| _{\infty }=1\)) implies \(\xi _{i}\) in \(\partial \left| \mathbf {x}_{i}\right| _{1}\) if \(\varvec{\eta }_{i} \ne 0,\) for any \(i=1, \ldots , n\).

In summary, we have so far proved that for any \(\mathrm {M}\) in \(\partial \vert {}\vert {}\vert {}\mathrm {X}\vert {}\vert {}\vert {}_{1}, \mathrm {M}\) always has the form \(\left( \lambda _{1} \xi _{1}, \ldots , \lambda _{n} \xi _{n}\right) \) where \(\xi _{j}\) in \(\partial \left| \mathbf {x}_{j}\right| _{1}, \lambda _{j} \ge 0\) for all j and \(\lambda _{1}+\ldots +\lambda _{n} \le 1\) since \(\vert {}\vert {}\vert {}\mathrm {X}\vert {}\vert {}\vert {}_{1}=\,<\mathrm {M}, \mathrm {X}>\,=\sum _{j} \lambda _{j}<\xi _{j}, \mathbf {x}_{j}>\,=\sum _{j} \lambda _{j}\left| \mathbf {x}_{j}\right| _{1} \le max _{j}\left| \mathbf {x}_{j}\right| _{1} \sum _{j} \lambda _{j} \le \vert {}\vert {}\vert {}\mathrm {X}\vert {}\vert {}\vert {}_{1},\) as a result \(\lambda _{1}+\ldots +\lambda _{n}=1\) and \(\lambda _{j}=0\) for \(j:\left| \mathbf {x}_{j}\right| _{1}<m a x_{k}\left| \mathbf {x}_{k}\right| _{1}\).

   \(\square \)

Proof

of Theorem 1. To prove the necessity, let \(\mathrm {S}\) be a s -sparsity pattern and \(\mathrm {H} \in \mathrm {ker}\,\,\varPhi \backslash \{\mathrm {O}\}\) Set \(\mathbf {y} \equiv \varPhi \left( \mathrm {H}_{\mathrm {S}}\right) =\varPhi \left( -\mathrm {H}_{\sim \mathrm {S}}\right) \) and \(\mathrm {H}_{\mathrm {S}} \in \sum _{s}^{n \times n}\), Hs should be the unique minimizer of \(\text {MP}_{y, \varPhi , 0}\) with \(-H_{\sim s}\) as its feasible solution, hence \(\vert {}\vert {}\vert {}H_{S}\vert {}\vert {}\vert {}_{1}<\vert {}\vert {}\vert {}H_{\sim S}\vert {}\vert {}\vert {}_{1}\).

Now prove the sufficiency. Let \(X=\left( \mathbf {x}_{1}, \dots , \mathbf {x}_{n}\right) \) be a matrix signal with its support \(\mathrm {S}=\mathrm {S}_{1} \cup \ldots \cup \,\mathrm {S}_{n}\) as a s -sparsity pattern (where \(\left. \mathrm {S}_{j}={\text {supp}}\left( \mathbf {x}_{j}\right) \right) \) and let \(\mathbf {y}=\varPhi (\mathrm {X})\) For any feasible solution \(Z(\ne \mathrm {X})\) of \(\mathrm {MP}_{\mathrm {y}, \mathrm {\Phi }, 0},\) obviously there exists \(\mathrm {H}=\left( \textit{\textbf{h}}_{1}, \ldots , \textit{\textbf{h}}_{n}\right) \) in ker \(\varPhi \backslash \{\mathrm {O}\}\) such that \(Z=\mathrm {X}+\mathrm {H}\). since \(\partial \vert {}\vert {}\vert {}Z\vert {}\vert {}\vert {}_{1} \ge \partial \vert {}\vert {}\vert {}X\vert {}\vert {}\vert {}_{1}+<H, M>\) for any \(\mathrm {M}\) in \(\partial \vert {}\vert {}\vert {}\mathrm {X}\vert {}\vert {}\vert {}_{1},\) we have

\(\partial {}\vert {}\vert {}\vert {}\text {Z}\vert {}\vert {}\vert {}_{1 }- \partial {}\vert {}\vert {}\vert {}\text {X}\vert {}\vert {}\vert {}_{1} \ge {}\text { sup}\{<\text {H,M}> :\text { for any M in } \partial {}\vert {}\vert {}\vert {}\text {X}\vert {}\vert {}\vert {}_{1} \}\)

\(=\,\sup \left\{ <\mathrm {H}, \mathrm {M}>: \mathrm {M}=\mathrm {E}+\mathrm {V} \text{ where } \mathrm {E}=\left( \lambda _{1} \text {sgn}\left( \mathbf {x}_{1}\right) \ldots , \lambda _{n} \text {sgn}\left( \mathbf {x}_{n}\right) \right) \text{ and } \mathrm {V}\right. \)

\(=\left( \lambda _{1} \xi _{1}, \ldots , \lambda _{n} \xi _{n}\right) ,|\xi _j|_{\infty } \le 1\), \(\left. \lambda _{j} \ge 0 \text{ for } \text{ all } j, \lambda _{1}+\ldots +\lambda _{n}=1\right\} \) (by Lemma 1 and notice supp \(\left. \left( \text {sgn}\left( \mathbf {x}_{\mathrm {j}}\right) \right) =\mathrm {S}_{j}=\,{\sim }\, \text {supp}\left( \xi _{j}\right) \right) \)

\(\ge \sup \{-|<\mathrm {H}, \mathrm {E}>|+<\mathrm {H}, \mathrm {V}> : \mathrm {E} \text{ and } \mathrm {V} \text{ specified } \text{ as } \text{ the } \text{ above }\}\)

\(=\sup \left\{ -| \sum _{j=1}^{n} \lambda _{j}<\textit{\textbf{h}}_{j| S j}\right. \), \(\text {sgn}\left( \mathbf {x}_{j}\right)>|+\sum _{j=1}^{n} \lambda _{j}{<}\textit{\textbf{h}}_{j|\sim S j}, \xi _j{>}:\,\lambda _{j} \text{ and } \left. \xi _{j} \text{ specified } \text{ as } \text{ the } \text{ above } \right\} \)

\(\ge -\sup \left| \sum _{j=1}^{n} \lambda _{j}<\textit{\textbf{h}}_{j| S j}, \text {sgn}\left( \mathbf {x}_{j}\right) >\right| \): \(\lambda _{j} \ge 0 \text{ for } \text{ all } j, \lambda _{1}+\ldots .+\lambda _{n}=1\} +\sup \left\{ <\mathrm {H}_{\sim \mathrm {S}}, \mathrm {V}>:\vert {}\vert {}\vert {}\mathrm {V}\vert {}\vert {}\vert {}_{1}^{*} \le 1\right\} \)

(note that \(\vert {}\vert {}\vert {}\mathrm {V}\vert {}\vert {}\vert {}_{1}^{*}=\sum _{j}\left| \lambda _{j} \xi _{j}\right| _{\infty } \le \sum _{j}\left| \xi _{j}\right| _{\infty }=1\) where \(\vert {}\vert {}\vert {}\cdot \vert {}\vert {}\vert {}_{1}^{*}\) is \(\vert {}\vert {}\vert {}\cdot \vert {}\vert {}\vert {}_{1}\) ’s conjugate norm)

\(\left. =-\sup \left| \sum _{j=1}^{n} \lambda _{j}<\textit{\textbf{h}}_{j| S j}, \text {sgn}\left( \mathbf {x}_{j}\right) >\right| : \lambda _{j} \ge 0 \text{ for } \text{ all } j, \lambda _{1}+\ldots +\lambda _{n}=1\right\} +\vert {}\vert {}\vert {}H_{\sim S}\vert {}\vert {}\vert {}_{1}\)

\(=-max _{j}\left| <\textit{\textbf{h}}_{j| S j}, \text {sgn}\left( \mathbf {x}_{j}\right) >\right| +\vert {}\vert {}\vert {}H_{\sim S}\vert {}\vert {}\vert {}_{1}\)

$$\begin{aligned} \ge -max _{j}\left| \textit{\textbf{h}}_{j| S j}\right| _{1}+\vert {}\vert {}\vert {}H_{\sim S}\vert {}\vert {}\vert {}_{1}=-\vert {}\vert {}\vert {}H_{ S}\vert {}\vert {}\vert {}_{1}+\vert {}\vert {}\vert {}H_{\sim S}\vert {}\vert {}\vert {}_{1}>0 \end{aligned}$$

under the condition (3.3). As a result, X is the unique minimizer of \(\mathrm {MP}_{\mathrm {y}, \mathrm {\Phi }, 0}\)    \(\square \)

The proof of Theorem 2 follows almost the same logic of proving \(\textit{l}_{1}\)-min reconstruction’s stability for vector signals under the \(\textit{l}_{1}\) Null Space Property assumption (e.g., see sec. 4.2 in [1]). For presentation completeness we provide the simple proof here. The basic tool is an auxiliary inequality (which unfortunately does not hold for matrix norm \(\vert {}\vert {}\vert {}.\vert {}\vert {}\vert {}_{1}\)): given index subset \(\varDelta {}\) and any vector x, z, then\(^{[1]}\)

$$\begin{aligned} \left| (\mathbf {x}-\mathbf {z})_{\sim \varDelta }\right| _{1} \le |\mathbf {z}|_{1}-|\mathbf {x}|_{1}+\left| (\mathbf {x}-\mathbf {z})_{\varDelta }\right| _{1}+2\left| \mathbf {x}_{\sim \varDelta }\right| _{1} \end{aligned}$$
(38)

Proof

of Theorem 2. For any feasible solution \(\text {Z}=(\mathbf{z} _{1},\ldots {},\mathbf{z} _{n})\) to problem \(\text {MP}_{y, \varPhi {}, 0}\) where \(\mathbf{y} =\varPhi {}\text {(X)}\), there is \(\text {H} = (\mathbf{h} _{1},\ldots {},\mathbf{h} _{n})\) in ker \(\varPhi {}\) such that Z = H + X. Apply (38) to each column vector \(\mathbf{z} _{j}\) and \(\mathbf{x} _{j}\) we get

$$ \left| \mathbf {h}_{j|\sim Sj}\right| _1 \le \left| \mathbf {z}_{j}\right| _{1}-\left| \mathbf {x}_{j}\right| _{1}+\left| \mathbf {h}_{j|Sj}\right| _{1}+2\left| \mathbf {x}_{j|\sim S j}\right| _1 $$

Hence \(\vert {}\vert {}\vert {}\mathrm {H}_{\sim \mathrm {S}}\vert {}\vert {}\vert {}_{1} \equiv m a x_{j}\left| \mathbf {h}_{j|\sim \mathrm {S} j}\right| _{1} \le \max _{j}\left( \left| \mathbf {z}_{j}\right| _{1}-\left| \mathbf {x}_{j}\right| _{1}\right) +\vert {}\vert {}\vert {}\mathbf {H}_{\mathrm {S}}\vert {}\vert {}\vert {}_{1}+2 m a x_{j}\left| \mathbf {x}_{j|\sim S{j}}\right| \le \max _{j}\left( \left| \mathbf {z}_{j}\right| _{1}-\left| \mathbf {x}_{j}\right| _{1}\right) +\rho |||H_{\sim S}|||_1+2\max _j|\mathbf {x}_{j|\sim Sj}|_1\)( by (10)), namely:

$$ \vert {}\vert {}\vert {}\mathrm {H}_{\sim \mathrm {S}}\vert {}\vert {}\vert {}_{1} \le (1-\rho )^{-1}\left( 2 \max _{j}\left| \mathbf {x}_{j|\sim S j}\right| _{1}+\max _{j}\left( \left| \mathbf {z}_{j}\right| _{1}-\left| \mathbf {x}_{j}\right| _{1}\right) \right) $$

As a result \(\vert {}\vert {}\vert {}\mathrm {H}\vert {}\vert {}\vert {}_{1}=\vert {}\vert {}\vert {}\mathrm {H}_{\mathrm {S}}\vert {}\vert {}\vert {}_{1}+\vert {}\vert {}\vert {}\mathrm {H}_{\sim \mathrm {S}}\vert {}\vert {}\vert {}_{1}\le (1+\rho )\vert {}\vert {}\vert {}{H}_{\sim \mathrm {S}}\vert {}\vert {}\vert {}_{1} \le (1-\rho )^{-1}(1+\rho )\left( 2 m a x_{j}\left| \mathbf {x}_{j|\sim S j}\right| _{1}+\max _{j}\left( \left| \mathbf {z}_{j}\right| _{1}-\left| \mathbf {x}_{j}\right| _{1}\right) \right) \) for any s-sparsity pattern S, which implies (11) since \(min _{\mathrm {S}}\ max _{j}\left| \mathbf {x}_{j|\sim {S} j}\right| _{1}=max _{j} \sigma _{s}\left( \mathbf {x}_{j}\right) _{1}\).

In particular, if Z is minimizer \(\text {X}^{*}\) and X is \(\textit{l}_{1}\)-column-flat then \(\vert {}{} \mathbf{x} _{j}\vert {}_{1}=\vert {}\vert {}\vert {}\text {X}\vert {}\vert {}\vert {}_{1}\) for any j so \(max_{j} (\vert {}{} \mathbf{x} ^{*}_{j}\vert {}_{1} - \vert {}{} \mathbf{x} _{j}\vert {}_{1}) = \vert {}\vert {}\vert {}\text {X}^{*}\vert {}\vert {}\vert {}_{1} - \vert {}\vert {}\vert {}\text {X}\vert {}\vert {}\vert {}_{1 }\le {}\) 0 for minimizer \(\text {X}^{*}\), which implies the conclusion.    \(\square \)

Remark: For any flat and sparse signal X, condition (10) guarantees X can be uniquely reconstructed by solving \(\text {MP}_{y, \varPhi {}, 0}\) due to Theorem 1, while in this case the right hand side of (12) is zero, i.e., this theorem is consisted with the former one. In addition, (12) indicates that the error for the minimizer \(\text {X}^{*}\) to approximate the flat but non-sparse signal X is controlled column-wisely by X’s non-sparsity (measured by \(max_{j }\sigma {}_{s}(\mathbf{x} _{j})_{1}\)).

Proof

of Theorem 3. Consider the problem \(\text {MP}_{y, \varPhi {}, {\eta {}}}\): inf \(\vert {}\vert {}\vert {}\text {Z}\vert {}\vert {}\vert {}_{1}\) s.t. \(\text {Z}\ \in {}R^{n\times {}n}\), \(\vert {}{} \mathbf{y} -\varPhi {}_{S}\text {(Z)}\vert {}_{2 }\le {} \eta {}\) at first where \(\eta {}> 0\). For any minimizer \(\text {X}^{*}\) of this problem with both its objective \(\vert {}\vert {}\vert {}.\vert {}\vert {}\vert {}_{1}\) and constraint function \(\vert {}{} \mathbf{y} -\varPhi {}_{S}(.)\vert {}_{2 }\) convex, according to the general convex optimization theory, there exist a positive multiplier \(\gamma {}^{*}> 0\) and \(\text {M}^{*}\) in \(\partial {}\vert {}\vert {}\vert {}\text {X}^{*}\vert {}\vert {}\vert {}_{1}\) such that

$$\begin{aligned} \mathrm {M}^{*}+\gamma ^{*} \varPhi _{\mathrm {S}}^{\mathrm {T}}\left( \varPhi _{\mathrm {S}}\left( \mathrm {X}^{*}\right) -\mathbf {y}\right) =\mathrm {O} \text{ and } \left| \mathbf {y}-\varPhi _{\mathrm {S}}\left( \mathrm {X}^{*}\right) \right| _{2}=\eta \end{aligned}$$
(39)

then \(\text {M}^{*}= \gamma {}^{*}\varPhi {}_{S}^{T}(\mathbf{y} -\varPhi {}_{S}(\text {X}^{*}))\) can not have any zero column since \(\mathbf{y} -\varPhi {}_{S}(\text {X}^{*}) \not = \mathbf{0} \), which implies \(\vert {}{} \mathbf{x} ^{*}_{j}\vert {}_{1 }=max_{k}\vert {}{} \mathbf{x} _{k}^{*}\vert {}_{1}\) for every j according to Lemma 1.

Now consider the problem \(\text {MP}_{y, \varPhi {}, 0}\) : inf \(\vert {}\vert {}\vert {}\text {Z}\vert {}\vert {}\vert {}_{1}\) s.t. \(\text {Z}\in {}R^{n\times {}n}\), \(\mathbf{y} = \varPhi {}_{S}\text {(Z)}\). For its minimizer \(\text {X}^{*}\) there is a multiplier vector u such that \(\text {M}^{* }+\varPhi {}_{S}^{T}(\mathbf{u} ) = \text {O}\). If \(\mathbf{u} \not = \mathbf{0} \) then \(M^{*}\) doesn’t have any zero column which implies \(\vert {}{} \mathbf{x} _{j}\vert {}_{1}=max_{k}\vert {}{} \mathbf{x} _{k}\vert {}_{1}\) for every j according to Lemma 1. On the other hand, u = 0 implies \(\text {M}^{*} = \text {O}\) which cannot happen according to Lemma 1 unless \(\text {X}^{* }= \text {O}\).    \(\square \)

Proof

of Theorem 4. For any feasible solution \(\text {Z}=(\mathbf{z} _{1},\ldots {},\mathbf{z} _{n})\) to problem \(\text {MP}_{y, \varPhi {}, {\eta {} }}\) where \(\mathbf{y} =\varPhi {}\text {(X)} +\textit{e}\), Let \(\text {Z} - \text {X} = \text {H} = (\mathbf{h} _{1},\ldots {},\mathbf{h} _{n})\), apply (38) to each column vector \(\mathbf{z} _{j}\) and \(\mathbf{x} _{j}\) we get \(\left| \mathbf {h}_{j|\sim S j}\right| _{1} \le \left| \mathbf {z}_{j}\right| _{1}-\left| \mathbf {x}_{j}\right| _{1}+\left. \left| \mathbf {h}_{j|Sj}|_1+2\right| \mathbf {x}_{j|\sim S j}\right| _{1}\) Hence \(\vert {}\vert {}\vert {}\mathrm {H}_{\sim \mathrm {S}}\vert {}\vert {}\vert {}_{1} \equiv max _{j}\left| \mathbf {h}_{j|\sim \mathrm {S} j}\right| _{1} \le max _{j}\left( \left| \mathbf {z}_{j}\right| _{1}-\left| \mathbf {x}_{j}\right| _{1}\right) +\vert {}\vert {}\vert {}\mathbf {H}_{\mathrm {S}}\vert {}\vert {}\vert {}_{1}+2 m a x_{j}\left| \mathbf {x}_{j|\sim S{j}}\right| \le m a x_{j}(\left| \mathbf {z}_{j}\right| _{1} -\left| \mathbf {x}_{j}\right| _{1}) +\rho |||\mathrm {H}_{\sim \mathrm {S}}|||_{1} +2 max _{j}\left| \mathbf {x}_{j|\sim {S} j}\right| _{1}+\beta |\varPhi (\mathrm {H})|_{2}\) (by (14)), namely:

$$ \vert {}\vert {}\vert {}\mathrm {H}_{\sim \mathrm {S}}\vert {}\vert {}\vert {}_{1} \le (1-\rho )^{-1}\left( 2 max _{j}\left| \mathbf {x}_{j| \sim {S} j}\right| _{1}+max _{j}\left( \left. \mathbf {z}_{j}\right| _{1}-\left| \mathbf {x}_{j}\right| _{1}\right) +\beta |\varPhi (\mathrm {H})|_{2}\right) $$

As a result \(\vert {}\vert {}\vert {}\text {H}\vert {}\vert {}\vert {}_{1 }= \vert {}\vert {}\vert {}\text {H}_{S}\vert {}\vert {}\vert {}_{1 }+ \vert {}\vert {}\vert {}\text {H}_{\sim S}\vert {}\vert {}\vert {}_{1 }\le {}(1+{\rho {}})\vert {}\vert {}\vert {}\text {H}_{\sim {}S}\vert {}\vert {}\vert {}_{1 }+ {\beta {}\vert {}}\varPhi {}\text {(H)}\vert {}_{2} \le {} (1-\rho {})^{-1}(1+\rho {})(\left( 2 m a x_{j}\left| \mathbf {x}_{j|\sim S j}\right| _{1}+m a x_{j}\left( \left| \mathbf {z}_{j}\right| _{1}-\left| \mathbf {x}_{j}\right| _{1}\right) \right) + 2(1-\rho {})^{-1}\beta {}\vert {}\varPhi {}\text {(X)}\vert {}_{2 })\) for any s-sparsity pattern S, which implies (15) since \(\textit{min}_{S} {max _{j }}\vert {} \mathbf{x} _{j|\sim Sj}\vert {}_{1} = max_{j }\sigma {}_{s}(\mathbf{x} _{j})_{1}\).

In particular, if Z is a minimizer \(\text {X}^{*}\) and X is \(\textit{l}_{1}\)-column-flat then \(\vert {}{} \mathbf{x} _{j}\vert {}_{1}=\vert {}\vert {}\vert {}\text {X}\vert {}\vert {}\vert {}_{1}\) for any j so \(max_{j}(\vert {}{} \mathbf{x} ^{*}_{j}\vert {}_{1}- \vert {}{} \mathbf{x} _{j}\vert {}_{1}) = \vert {}\vert {}\vert {}\text {X}^{*}\vert {}\vert {}\vert {}_{1} - \vert {}\vert {}\vert {}\text {X}\vert {}\vert {}\vert {}_{1 }\le {}\) 0 for minimizer X\(^{*}\), which implies (16).    \(\square \)

Proof

of Theorem 5. Let \(\text {H} = (\mathbf{h} _{1},\ldots {},\mathbf{h} _{n})\) be any n-by-n matrix. For each j suppose \(\vert {}h_{j}(\textit{i}_{1})\vert {}\ge {}\vert {}h_{j}(\textit{i}_{2})\vert {}\ge {} \ldots {}\ge {}\vert {}h_{j}(i_{n})\vert {}\), let \(\text {S}_{0}(\textit{j}) = \{(\textit{i}_{1}, \textit{j}),\ldots {},(i_{s}, \textit{j})\}\), i.e., the set of indices of s components in column \(\mathbf{h} _{j}\) with the largest absolute values, \(\text {S}_{1}(\textit{j}) = \{(\textit{i}_{1+\textit{s}}, \textit{j}),\ldots {},(i_{2s}, \textit{j})\}\) be the set of indices of s components in \(\mathbf{h} _{j}\) with the secondary largest absolute values, etc., and for any \(\textit{k}=0,1,2,\ldots {}\) let \(\text {S}_{k} = {\cup {}}_{j=1}^n\text {S}_{k}(\textit{j})\), obviously \(\text {H} = \sum {}_{\textit{k}\ge {}0}\) \(\text {H}_{Sk}\). At first we note that (20) holds for S as long as it holds for S\(_{0}\), so we try to prove this in the following. Start from condition (1):

\((1- \delta {}_{s })\vert {}\text {H}_{S0}\vert {}_{F}{^{2}}\le {} \vert {}\varPhi {}(\text {H}_{S0})\vert {}_{2}{^{2}} = <\varPhi {}(\text {H}_{S0})\), \(\varPhi {}\text {(H)} - \sum {}_{\textit{k}\ge {}1}\varPhi {}(\text {H}_{S\textit{k}})>\)

\(= <\varPhi {}(\text {H}_{S0})\), \(\varPhi {}\text {(H)}> - \sum {}_{\textit{k}\ge {}1}<\varPhi {}(\text {H}_{S0})\), \(\varPhi {}(\text {H}_{S\textit{k}})>\)

\(\le \left| \varPhi \left( \mathrm {H}_{\mathrm {S} 0}\right) \right| _{2}|\varPhi (\mathrm {H})|_{2}+\left( \varDelta _{\mathrm {s}}/n\right) \sum _{n \ge j \ge 1} \sum _{k \ge 1}\left| \mathbf {h}_{j| S 0(j)}\right| _{2}\left| \mathbf {h}_{j| S k(j)}\right| _{2}\) (by condition (2))

\(\le \left( 1+\delta _{s}\right) ^{1/2}\left| \mathrm {H}_{\mathrm {S} 0}\right| _{F}|\varPhi (\mathrm {H})|_{2}+\left( \varDelta _{\mathrm {s}}/n\right) \left| \mathrm {H}_{\mathrm {S} 0}\right| _{F} \sum _{n \ge j \ge 1} \sum _{k \ge 1}\left| \mathbf {h}_{j| {S} k(j)}\right| _{2}\) (by condition \(\left. (1) \text{ and } \left| \mathbf {h}_{j| S 0(j)}\right| _{2} \le \left| \mathbf {H}_{\mathrm {S} 0}\right| _{F}\right) \)

\(\le \left( 1+\delta _{s}\right) ^{1/2}\left| \mathrm {H}_{\mathrm {S} 0}\right| _{F}|\varPhi (\mathrm {H})|_{2}+\left( \varDelta _{\mathrm {s}}/n\right) \left| \mathrm {H}_{\mathrm {S} 0}\right| _{F} \sum _{n \ge j \ge 1}\left( s^{-1/2}\left| \mathbf {h}_{j|\sim {S} 0(j)}\right| _{1}\right. \) + (1/4) \(\left. | \mathbf {h}_{j|S 0(j)| 2}\right) \) (by the inequality \(\left( \sum _{s \ge k \ge 1} a_{k}^{2}\right) ^{1/2} \le s^{-1/2} \sum _{s\ge k \ge 1} a_{k}+\left( s^{1/2}/4\right) \left( a_{1}-a_{s}\right) \) for \(a_{1} \ge a_{2} \ge \ldots \ge a_{s} \ge 0\) and the fact \(\min _{s \ge i \ge 1}\left| \textit{\textbf{h}}_{j| S k(j)}(i)\right| \ge \max _{s \ge i \ge 1}\left| \textit{\textbf{h}}_{j| S k+1(j)}(i)\right| \left. \text{ for } \text{ any } j\right) \)

$$\begin{aligned}&\le \left| \mathrm {H}_{\mathrm {S} 0}\right| _{F}\left( \left( 1+\delta _{s}\right) ^{1/2}|\varPhi (\mathrm {H})|_{2}+\left( s^{-1/2} \varDelta _{\mathrm {s}}/n\right) \sum \nolimits _{n \ge j \ge 1}\left| \mathbf {h}_{j|\sim {S} 0(j)}\right| _{1}\right. \\&\left. +\left( \varDelta _{\mathrm {s}}/4 n\right) \sum \nolimits _{n \ge j \ge 1}\left| \mathbf {h}_{j| {S} 0(j)}\right| _{2}\right) \\&\le \left| \mathrm {H}_{\mathrm {S} 0}\right| _{F}\left( \left( 1+\delta _{s}\right) ^{1/2}|\varPhi (\mathrm {H})|_{2}+s^{-1/2} \varDelta _{\mathrm {s}} \max _{j}\left| \mathbf {h}_{j |\sim \mathrm {S} 0(j)}\right| _{1}\right. \\&+\left( \varDelta _{\mathrm {s}}/4 n\right) n^{1/2}\left( \sum \nolimits _{n \ge j \ge 1}\left| \mathbf {h}_{j| S 0(j)}\right| _2^{2}\right) ^{1/2}\\&=\left| \mathrm {H}_{\mathrm {S0}} \right| _F\left( \left( 1+\delta _{s}\right) ^{1/2}|\varPhi (\mathrm {H})|_{2}+s^{-1/2} \varDelta _{\mathrm {s}}\vert {}\vert {}\vert {}\mathrm {H}_{\sim \mathrm {S} 0}\vert {}\vert {}\vert {}_{1}+\left( \varDelta _{s}/4 n^{1/2}\right) | \mathrm {H}_{\mathrm {S0}}| _F\right) \end{aligned}$$

Cancel \(\vert {}\text {H}_{S0}\vert {}_{F}\) on both sides we get \((1- \delta {}_{s })\vert {}\text {H}_{S0}\vert {}_{F} \le {} (1+ \delta {}_{s })^{1/2} \vert {}\varPhi {}(\text {H})\vert {}_{2}^{ } + \textit{s}^{-1/2}\varDelta {}_{s}\vert {}\vert {}\vert {}\text {H}_{\sim {}S0 }\vert {}\vert {}\vert {}_{1} + (\varDelta {}_{s}/4\textit{n}^{1/2}) \vert {}\text {H}_{S0}\vert {}_{F}\) hence

\(\vert {}\text {H}_{S0}\vert {}_{F} \le {} (1- \delta {}_{s }-\varDelta {}_{s}/4\textit{n}^{1/2})^{-1}((1+ \delta {}_{s })^{1/2} \vert {}\varPhi {}\text {(H)}\vert {}_{2}^{ }+ \textit{s}^{-1/2}\varDelta {}_{s}\vert {}\vert {}\vert {}\text {H}_{\sim {}S0 }\vert {}\vert {}\vert {}_{1})\)

Note that \(\vert {}\vert {}\vert {}\text {H}_{S0}\vert {}\vert {}\vert {}_{1} = {max_{ j }} \vert {} \mathbf{h} _{j|S0(j)}\vert {}_{1 }\le {} \textit{s}^{1/2}{max_{ j }}\vert {} \mathbf{h} _{j|S0({j})}\vert {}_{2 }\le {}{} \textit{s}^{1/2}\vert {}\text {H}_{S0}\vert {}_{F }\) and combine this with the above inequality, we obtain (20) and (21) for S\(_{0}\), which implies they hold for any S.    \(\square \)

Appendix B: Proofs of Theorems in Section 4

Proof

of Lemma 2. (1) Observe that when \(\varPhi {}_{S}^{T}\varPhi {}_{S }\)is a bijection, (23)’s objective function \(\textit{L}_{S}\text {(Z)} = \vert {}\vert {}\vert {}\text {Z}\vert {}\vert {}\vert {}_{1 }+ (1/2)\gamma {} \vert {}{} \mathbf{y} - \varPhi {}_{S}\text {(Z)}\vert {}_{2}^{2}\) is strictly convex for variable \(\text {Z} \in {}\sum _s^{n\times n}(S)\). According to general convex programming theory, its minimizer \(\text {X}^{*}_{S}\) is unique.

(2) Let \(\textit{L}(Z) := \vert {}\vert {}\vert {}\text {Z}\vert {}\vert {}\vert {}_{1 }+ (1/2)\gamma {}\vert {}{} \mathbf{y} - \varPhi {}\text {(Z)}\vert {}_{2}^{2}\). To prove \(\text {X}^{*}_{S}\) is also the global minimizer of (22), we prove its perturbation by H will always increase the objective’s value, i.e., \(\textit{L}(\text {X}^{*}_{S}+\text {H}) > \textit{L}(\text {X}^{*}_{S}\)) under the conditions specified by (1) (2) (3). Since conclusion (1) implies \(\textit{L}(\text {X}^{*}_{S}+\text {H}) > \textit{L}(\text {X}^{*}_{S})\) for any \(\text {H}\not =\text {O}\) with support in S and L(Z) is convex, we only need to consider the perturbation \(\text {X}^{*}_{S}+\text {H}\) with \(\text {H}_{S} = \text {O}\).

Since \(\text {X}^{*}_{S}\) is the minimizer of (23), by first-order optimization condition there exists \(\text {M}^{*}\) in \(\partial {}\vert {}\vert {}\vert {}\text {X}^{*}_{S}\vert {}\vert {}\vert {}_{1 }\) such that

$$\begin{aligned} \mathrm {M}^{*}+\gamma \varPhi _{\mathrm {S}}^{\mathrm {T}}\left( \varPhi _{\mathrm {S}}\left( \mathrm {X}_{\mathrm {S}}^{*}\right) -\mathbf {y}\right) =\mathrm {O} \end{aligned}$$
(40)

then \(\text {M}^{*} = \gamma {}^{*}\varPhi {}_{S}^{T}(\mathbf{y} - \varPhi {}_{S}(\text {X}^{*}_{S}))\) and in particular \(\text {M}^{*}_{\sim {}S }= \text {O}\). Equivalently:

$$\begin{aligned} \mathrm {X}_{\mathrm {S}}^{*}=\varPhi _{\mathrm {S}}^{*-1}(\mathbf {y})-\gamma ^{-1}\left( \varPhi _{\mathrm {S}}^{\mathrm {T}} \varPhi _{\mathrm {S}}\right) ^{-1}\left( \mathrm {M}^{*}\right) \end{aligned}$$
(41)

Now we compute \(\textit{ L}(\text {X}^{*}_{S}+\text {H}) - \textit{L}(\text {X}^{*}_{S})\)

\(= \vert {}\vert {}\vert {} \text {X}^{*}_{S}+\text {H} \vert {}\vert {}\vert {}_{1 }- \vert {}\vert {}\vert {}\text {X}^{*}_{S}\vert {}\vert {}\vert {}_{1 }+ (1/2) \gamma {}(\vert {} \varPhi {}(\text {X}^{*}_{S}) - \mathbf{y} \vert {}_{2}^{2}+ 2<\varPhi {}(\text {X}^{*}_{S})- \mathbf{y} , \varPhi {}\text {(H)}>+ \vert {}\varPhi {}\text {(H)}\vert {}_{2}^{2} - \vert {}\varPhi {}(\text {X}^{*}_{S})- \mathbf{y} \vert {}_{2}^{2})\)

\(= \vert {}\vert {}\vert {} \text {X}^{*}_{S}+\text {H} \vert {}\vert {}\vert {}_{1 }- \vert {}\vert {}\vert {}\text {X}^{*}_{S}\vert {}\vert {}\vert {}_{1 }+ \gamma {}< \varPhi {}(\text {X}^{*}_{S}) - \mathbf{y} , \varPhi {}\text {(H)} > + (1/2)\gamma {}\vert {}\varPhi {}\text {(H)}\vert {}_{2}^{2 }\)

\(= \vert {}\vert {}\vert {}\text {X}^{*}_{S}+\text {H} \vert {}\vert {}\vert {}_{1 }- \vert {}\vert {}\vert {}\text {X}^{*}_{S}\vert {}\vert {}\vert {}_{1 }+ \gamma {}<\varPhi {}(\text {X}^{*}_{S}) - \mathbf{y} , \varPhi {}_{\sim {}S}\text {(H)} > + (1/2)\gamma {} \vert {}\varPhi {}_{\sim {}S}\text {(H)}\vert {}_{2}^{2 }\)

\(\ge {} \vert {}\vert {}\vert {}\text {X}^{*}_{S}+\text {H} \vert {}\vert {}\vert {}_{1 }-\vert {}\vert {}\vert {}\text {X}^{*}_{S}\vert {}\vert {}\vert {}_{1 }+\gamma {}<\varPhi {}(\text {X}^{*}_{S}) - \mathbf{y} ,\varPhi {}_{\sim {}S}\text {(H)}>\)

The first term \(\vert {}\vert {}\vert {} \text {X}^{*}_{S}+\text {H} \vert {}\vert {}\vert {}_{1 }- \vert {}\vert {}\vert {} \text {X}^{*}_{S} \vert {}\vert {}\vert {}_{1 }\)

\(=max_{j}( \vert {}{} \mathbf{x} ^{*}_{j}\vert {}_{1} + \vert {}{} \mathbf{h} _{j}\vert {}_{1} ) - max_{j} \vert {}{} \mathbf{x} ^{*}_{j}\vert {}_{1} (\text {supp(X}^{*}_{S})\cap {} \text {supp(H)} = \varnothing {})\)

\(= \vert {}\vert {}\vert {}\text {X}^{*}_{S}\vert {}\vert {}\vert {}_{1 }+ \vert {}\vert {}\vert {} \text {H} \vert {}\vert {}\vert {}_{1 }- \vert {}\vert {}\vert {}\text {X}^{*}_{S} \vert {}\vert {}\vert {}_{1}\) (condition (1) implies \(\text {X}^{*}_{S}\)’s \(\textit{l}_{1}\)-column-flatness: remark after Theorem 3)

\(= \vert {}\vert {}\vert {}\) H\(^{ }\vert {}\vert {}\vert {}_{1 }\)

By replacing \(X^{*}_{S}\) with (41), note \(\text {supp}(\varPhi {}_{\sim {}S}^{T}) = \sim \text {S}\) and \(\vert {}\vert {}\vert {}\text {M}\vert {}\vert {}\vert {}_{1}^{* }\le {} 1\), the second term

$$\begin{aligned}&\gamma {}<\varPhi {}_{S}(\text {X}^{*})- \mathbf{y} , \varPhi {}_{\sim {}S}\text {(H)}>\nonumber \\&= \gamma {}<\varPhi {}_{\sim {}S}^{T}(\varPhi {}_{S}\varPhi {}_{S}^{*-1}(\mathbf{y} ) -\mathbf{y}) ,\text { H}>-< \text {M}^{*}, \varPhi {}_{S}^{*-1}\varPhi {}_{\sim {}S}\text {(H)}> \nonumber \\&\ge {} (- \gamma {}\text {sup}\{<\varPhi {}_{\sim {}S}^{T}(\varPhi {}_{S}\varPhi {}_{S}^{*-1}(\mathbf{y} ) - \mathbf{y} ), \text {H}>:\vert {}\vert {}\vert {}\text {H}\vert {}\vert {}\vert {}_{1}=1\} - \text {sup}\{<\varPhi {}_{\sim {}S}^{T}(\varPhi {}_{S}^{*-1})^{T} \nonumber \\&\text {M, H}>: \vert {}\vert {}\vert {}\text {H}\vert {}\vert {}\vert {}_{1}=1, \vert {}\vert {}\vert {}\text {M}\vert {}\vert {}\vert {}_{1}^{*}\le {} 1 \})\vert {}\vert {}\vert {}\text {H}\vert {}\vert {}\vert {}_{1 } \end{aligned}$$
(42)

Therefore

$$\begin{aligned}&L(X^{*}_{S}+H) - L(X^{*}_{S}) \nonumber \\&\ge |||H|||_{1}(1-\gamma sup \left\{ \langle \varPhi _{\sim {}S}^{T}(\varPhi {}_{S}\varPhi {}_{S}^{*-1}(\mathbf {y}) - \mathbf {y}),H\rangle : |||H|||_{1}= 1\right\} \nonumber \\&- sup \left\{ \langle \varPhi _{\sim {S}}^{\mathrm {T}}\left( \varPhi _{\mathrm {S}}^{*-1}\right) ^{\mathrm {T}} \mathrm {M}, \mathrm {H}\rangle :|||\mathrm {H}|||_{1}=\,\text {land}\,|||\mathrm {M}|||_{1}^{*} \le 1\right\} ) \end{aligned}$$
(43)

and condition (3) implies the right hand side > 0. This proves X\(^{*}_{S}\) is the minimizer of (22) and the minimizer is unique.

(3) For Y\(^{*}\) = \(\varPhi {}_{S}^{*-1}\)(y) \(\in {}\sum _s^{n\times {}n}(S)\)(then supp(Y\(^{*}\)) in S) and by (41) we have

$$\begin{aligned}&\vert {} \text {X}^{*}_{S,\textit{ij }}\vert {}\\&= \vert {}\text {Y}^{*}_{ij} -\gamma {}^{-1}(\varPhi {}_{S}^{T}\varPhi {}_{S})^{-1}(\text {M}^{*})_{ij} \vert {}\\&\ge {}\vert {} Y^{*}_{ij} \vert {} - \gamma {}^{-1}\vert {} (\varPhi {}_{S}^{T}\varPhi {}_{S})^{-1}(\text {M}^{*})_{ ij}\vert {}\\&\ge {} \vert {}\text {Y}^{*}_{ij} \vert {}- \gamma {}^{-1} max_{ij}\vert (\varPhi {}_{S}^{T}\varPhi {}_{S})^{-1}(\text {M}^{*})_{ ij} \vert \\&= \vert {} \text {Y}^{*}_{ij} \vert -\gamma {}^{-1} \vert (\varPhi _{S}^{T}\varPhi {}_{S})^{-1}(\text {M}^{*})\vert _{max}\\&\ge {}\vert {}\text {Y}{^{*}}_{ij} \vert {} - \textit{N}((\varPhi {}_{S}^{T}\varPhi {}_{S})^{-1}: \vert {}\vert {}\vert {}.\vert {}\vert {}\vert {}_{1}^{*}\rightarrow {}\vert {}\vert {}\vert {}.\vert {}\vert {}\vert {}_{max})\vert {}\vert {}\vert {}\text {M}\vert {}\vert {}\vert {}_{1}^{*}\\&\ge {} \vert {} \text {Y}{^{*}}_{ij} \vert {} - \textit{N}((\varPhi {}_{S}^{T}\varPhi {}_{S})^{-1}: \vert {}\vert {}\vert {}.\vert {}\vert {}\vert {}_{1}^{*}\rightarrow {}\vert {}\vert {}\vert {}.\vert {}\vert {}\vert {}_{max}) ( \vert {}\vert {}\vert {}\text {M}\vert {}\vert {}\vert {}_{1}^{*}\le {} 1)\\&> 0 \quad \text {for those} (\textit{i},\textit{j}): \vert {} \text {Y}^{*}_{ij} \vert {} > \textit{N}((\varPhi {}_{S}^{T}\varPhi {}_{S})^{-1}: \vert {}\vert {}\vert {}.\vert {}\vert {}\vert {}_{1}^{*}\rightarrow {}\vert {}\vert {}\vert {}.\vert {}\vert {}\vert {}_{max}) \end{aligned}$$

(4) Note that for any non-zero scalars u and \(\textit{v},\textit{sgn}(\textit{u}) = \textit{sgn}(\textit{v})~\textit{iff}\,\,\vert {}{} \textit{u}\vert {} > \vert {}{} \textit{u}-\textit{v} \vert {}\). Therefore

$$\begin{aligned} \text {sgn}\left( \mathrm {X}_{\mathrm {S}, i j}^{*}\right) =\text {sgn}\left( \mathrm {Y}_{i j}^{*}\right) \text{ iff } \left| \mathrm {Y}_{i j}^{*}\right| >\left| \mathrm {Y}_{i j}^{*}-\mathrm {X}_{i j}^{*}\right| =\gamma ^{-1}\left| \left( \varPhi _{\mathrm {S}}^{\mathrm {T}} \varPhi _{\mathrm {S}}\right) ^{-1}\left( \mathrm {M}^{*}\right) _{i j}\right| \end{aligned}$$
(44)

In particular, if \(\textit{min}_{(\textit{i,j}) in\, \textit{S }}\vert {}\text {Y}^{*}_{ij }\vert {} > \gamma {}^{-1}{} \textit{N}((\varPhi {}_{S}^{T}\varPhi {}_{S})^{-1}: \vert {}\vert {}\vert {}.\vert {}\vert {}\vert {}_{1}^{*}\rightarrow {}\vert {}\vert {}\vert {}.\vert {}\vert {}\vert {}_{max})\) then \(\vert {} \text {Y}^{*}_{ij }\vert {}>\gamma {}^{-1}{} \textit{max}_{(\textit{i,j}) }\vert {} (\varPhi {}_{S}^{T}\varPhi {}_{S})^{-1}(\text {M}^{*})_{ ij} \vert {}\) so \(\textit{sgn}(\text {X}^{*}_{S,\textit{ij}}) =\textit{sgn}(\text {Y}^{*}_{ij})\) for all (i,j) in S.    \(\square \)

Proof

of Lemma 3. (1) Let \(\text {X}^{*}_{S} \in {}{} \textit{Arg}\) inf \(\vert {}\vert {}\vert {}\text {Z}\vert {}\vert {}\vert {}_{1}\) s.t. \(\text {Z}\ \in {}R^{n\times {}n}, \vert {}{} \mathbf{y} -\varPhi {}_{S}(\text {Z})\vert {}_{2 }\le {} \eta {}\). i.e., a minimizer with its support restricted on S. We first prove \(\text {X}^{*}_{S}\) is the only minimizer of this support-restricted problem, then we prove \(\text {X}^{*}\) is also the minimizer of problem \(\text {MP}_{y, \varPhi {}, {\eta {}}}\) (25), i.e., \(\text {X}^{*}_{S }\) is the global minimizer and (25)’s minimizer is unique.

According to general convex optimization theory, there exist a positive multiplier \(\gamma {}^{*} > 0\) and \(\text {M}^{*}\) in \(\partial {}\vert {}\vert {}\vert {}\text {X}^{*}_{S}\vert {}\vert {}\vert {}_{1}\) such that

$$\begin{aligned} \mathrm {M}^{*}+\gamma ^{*} \varPhi _{\mathrm {S}}^{\mathrm {T}}\left( \varPhi _{\mathrm {S}}\left( \mathrm {X}_{\mathrm {S}}^{*}\right) -\mathbf {y}\right) =\mathrm {O} \text{ and } \left| \mathbf {y}-\varPhi _{\mathrm {S}}\left( \mathrm {X}_{\mathrm {S}}^{*}\right) \right| _{2}=\eta \end{aligned}$$
(45)

then equivalently

$$\begin{aligned} \mathrm {X}_{\mathrm {S}}^{*}=\varPhi _{\mathrm {S}}^{*-1}(\mathbf {y})-\gamma ^{*-1}\left( \varPhi _{\mathrm {S}}^{\mathrm {T}} \varPhi _{\mathrm {S}}\right) ^{-1}\left( \mathrm {M}^{*}\right) \end{aligned}$$
(46)

Suppose \(\text {X}^{0}\) is another minimizer of inf \(\vert {}\vert {}\vert {}\text {Z}\vert {}\vert {}\vert {}_{1}\) s.t. \(\text {Z}\ \in {}R^{n\times {}n}\), \(\vert {}{} \mathbf{y} -\varPhi {}_{S}\text {(Z)}\vert {}_{2 }\le {} \eta {}\), then there exist a positive multiplier \(\gamma {}^{0}>\) 0 and \(\text {M}^{0}\) in \(\partial {}\vert {}\vert {}\vert {}\text {X}^{0}\vert {}\vert {}\vert {}_{1 }\) such that

$$\begin{aligned} \mathrm {M}^{0}+\gamma ^{0} \varPhi _{\mathrm {S}}^{\mathrm {T}}\left( \varPhi _{\mathrm {S}}\left( \mathrm {X}^{0}\right) -\mathbf {y}\right) =\mathrm {O} \text{ and } \left| \mathbf {y}-\varPhi _{\mathrm {S}}\left( \mathrm {X}^{0}\right) \right| _{2}=\eta \end{aligned}$$
(47)

Equivalently, (45) shows that \(\text {X}^{*}_{S}\) is also a minimizer of \(\textit{L}_{S}\text {(Z)} = \vert {}\vert {}\vert {}\text {Z}\vert {}\vert {}\vert {}_{1 }+ (1/2)\gamma {}^{*}\vert {}{} \mathbf{y} -\varPhi {}_{S}\text {(Z)}\vert {}_{2}^{2}\) which is a strictly convex function on \(\sum _s^{n\times {}n}(S)\) since \(\varPhi {}_{S}^{T}\varPhi {}_{S}^{ }\) is a bijection (condition(2)), as a result L\(_{S}\)(Z)’s minimizer is unique. However, since \(\vert {}\vert {}\vert {}\text {X}^{*}_{S}\vert {}\vert {}\vert {}_{1} = \vert {}\vert {}\vert {}\text {X}^{0}\vert {}\vert {}\vert {}_{1}\) we have \(\textit{L}_{S}(\text {X}^{*}_{S}) = \vert {}\vert {}\vert {}\text {X}^{*}_{S}\vert {}\vert {}\vert {}_{1} + (1/2)\gamma {}^{*}\vert {}{} \mathbf{y} -\varPhi {}_{S}(\text {X}^{*}_{S})\vert {}_{2}^{2} = \vert {}\vert {}\vert {}\text {X}^{*}_{S}\vert {}\vert {}\vert {}_{1} + \gamma {}^{*}\eta {}^{2}/2 =\vert {}\vert {}\vert {}\text {X}^{0}\vert {}\vert {}\vert {}_{1 }+ (\gamma {}^{*}/2)\vert {}{} \mathbf{y} -\varPhi {}_{S}(\text {X}^{0})\vert {}_{2}^{2 }= \textit{L}_{S}(\text {X}^{0})\), which implies \(\text {X}^{*}_{S} = \text {X}^{0}\), i.e., \(\text {X}^{*}_{S}\) is the unique minimizer of the support-restricted problem inf \(\vert {}\vert {}\vert {}\text {Z}\vert {}\vert {}\vert {}_{1}\) s.t. \(\text {Z}\ \in {}R^{n\times {}n}, \vert {}{} \mathbf{y} -\varPhi {}_{S}(Z)\vert {}_{2 }\le {} \eta {}\).

\(\text {X}_{S}^{*}\)’s\(\textit{ l}_{1}\)-column-flatness is implied by condition (1) and Theorem 3.

Now prove \(\text {X}^{*}_{S}\) (which is S-sparse and \(\textit{l}_{1}\)-column-flat ) is also a minimizer of problem \(\text {MP}_{y, \varPhi {}, {\eta {}}}\) (25). Again we start with the fact that \(\text {X}^{*}_{S }=\textit{ Arginf} \textit{L}_{S}\text {(Z)} = \textit{Arginf} \vert {}\vert {}\vert {}Z\vert {}\vert {}\vert {}_{1 }+ (1/2)\gamma {}^{*}\vert {}{} \mathbf{y} -\varPhi {}_{S}\text {(Z)}\vert {}_{2}^{2}\) with some multiplier \(\gamma {}^{* }> 0\) (which value depends on \(\text {X}^{*}_{S}\)) and by Lemma 2, \(\text {X}^{*}_{S}\) is the unique minimizer of the convex problem (without any restriction on solution’s support)

$$\begin{aligned} i n f\vert {}\vert {}\vert {}{Z}\vert {}\vert {}\vert {}_{1}+(1/2) \gamma ^{*}|\mathbf {y}-\varPhi (Z)|_{2}^{2} \end{aligned}$$
(48)

under the condition

$$\begin{aligned} \begin{array}{l}{\left. \gamma ^{*} \sup \left\{<\varPhi _{\sim \mathrm {S}}^{\mathrm {T}}\left( \varPhi _{\mathrm {S}} \varPhi _{\mathrm {S}}^{*-1}(\mathbf {y})-\mathbf {y}\right) , \mathrm {H}\right)>:\vert {}\vert {}\vert {}\mathrm {H}\vert {}\vert {}\vert {}_{\mathrm {1}}=1\right\} } \\ {+\sup \left\{<\varPhi _{\sim \mathrm {S}} ^{\mathrm {T}}\left( \varPhi _{\mathrm {S}}^{*-1}\right) ^{\mathrm {T}}(\mathrm {M}), \mathrm {H}>:\vert {}\vert {}\vert {}\mathrm {H}\vert {}\vert {}\vert {}_{1}=1 \text{ and } \vert {}\vert {}\vert {}\mathrm {M}\vert {}\vert {}\vert {}_{1}^{*} \le 1\right\} <1}\end{array} \end{aligned}$$
(49)

According to convex optimization theory, \(\text {X}^{*}_{S}\) (under condition (49)) being the unique minimizer of problem (48) means \(\text {X}^{*}_{S }\) is also a minimizer of \(\text {MP}_{y,\varPhi {},\eta {}}\) (25), which furthermore implies that \(\text {MP}_{y,\varPhi {},\eta {}}\)’s minimizer is unique, S-sparse and \(\textit{l}_{1}\)-column-flat.

In order to make condition (49) more meaningful, we need to replace the minimizer-dependent parameter \(\gamma {}^{*}\) with explicit information. From (48)’s first-order optimization condition (45) we obtain

\(1 \ge {}\vert {}\vert {}\vert {}\text {M}^{*}\vert {}\vert {}\vert {}^{*}_{1 }= \gamma {}^{* }\vert {}\vert {}\vert {}\varPhi {}_{S}^{T}(\varPhi {}_{S}(\text {X}^{*}_{S}) - \mathbf{y} ) \vert {}\vert {}\vert {}^{*}_{1} \ge {} \gamma {}^{*}{} \textit{ min}\{^{ }\vert {}\vert {}\vert {}\varPhi {}_{S}^{T}(\mathbf{z} )\vert {}\vert {}\vert {}^{*}_{1}: \vert {}{} \mathbf{z} \vert {}_{2} = 1\}\vert {}\varPhi {}_{S}(\text {X}^{*}_{S}) - \mathbf{y} \vert {}_{2} =\gamma {}^{*}\eta {}\varLambda {}_{min}(\varPhi {}_{S}^{T})\)

i.e.,

$$\begin{aligned} \gamma ^{*} \le \left( \eta \varLambda _{\min }\left( \varPhi _{\mathrm {S}}^{\mathrm {T}}\right) \right) ^{-1} \end{aligned}$$
(50)

with this upper-bound of \(\gamma {}^{*}\), (49) can be derived from a uniform condition

$$\begin{aligned} \begin{aligned}&\left( \eta \varLambda _{\min }\left( \varPhi _{\mathrm {S}}^{\mathrm {T}}\right) \right) ^{-1} \sup \left\{<\varPhi _{\sim \mathrm {S}}^{\mathrm {T}}\left( \varPhi _{\mathrm {S}} \varPhi _{\mathrm {S}}^{*-1}(\mathbf {y})-\mathbf {y}\right) , \mathrm {H}\right)>: \left. \vert {}\vert {}\vert {}\mathrm {H}\vert {}\vert {}\vert {} _{1}=1\right\} \\&+\sup \left\{<\varPhi _{\sim \mathrm {S}}^{\mathrm {T}}\left( \varPhi _{\mathrm {S}}^{*-1}\right) ^{\mathrm {T}}(\mathrm {M}), \mathrm {H}>:\vert {}\vert {}\vert {}\mathrm {H}\vert {}\vert {}\vert {}_{1}=1 \ {\text {and}}\,\vert {}\vert {}\vert {}\mathrm {M}\vert {}\vert {}\vert {}_{1}^{*} \le 1\right\} <1 \end{aligned} \end{aligned}$$
(51)

which is equivalent to condition (3).

From now on we denote \(\text {X}^{*}_{S}\) as \(\text {X}^{*}\).

(2) For \(\text {Y}^{*} = \varPhi {}_{S}^{*-1}(\mathbf{y} ) \in {}\sum _s^{n\times {}n}(S)\)and by Lemma 2’s conclusion (4), if \(\textit{min}_{(\textit{i,j}) in \textit{S }}\vert {}Y^{*}_{ij}\vert {} > \gamma {}^{*-1 }{} \textit{N}((\varPhi {}_{S}^{T}\varPhi {}_{S})^{-1}\): \(\vert {}\vert {}\vert {}.\vert {}\vert {}\vert {}_{1}^{*}\rightarrow {}\vert {}\vert {}\vert {}.\vert {}\vert {}\vert {}_{max})\) then \(\textit{sgn}(\text {X}^{*}_{S,\textit{ij}}) =\textit{sgn}(\text {Y}^{*}_{ij})\,\text {for all} (\textit{i},\textit{j})\) in S. To replace multiplier \(\gamma {}^{*}\) with more explicit information in this condition, we need some lower bound of \(\gamma {}^{*}\) which can be derived from the first-order optimization condition \(\text {M}^{*} = \gamma {}^{*}(\mathbf{y} - \varPhi {}_{S}^{T}(\varPhi {}_{S}(\text {X}^{*}))\) again. Note that \(\text {X}^{*}\) is \(\textit{l}_{1}\)-column-flat implies every column of \(\text {X}^{*}\) is not 0, further more \(\text {M}^{*}\) has no 0-column so \(\text {M}^{*} = ( \lambda {}_{1}{} \mathbf{u} _{1},\ldots {}, \lambda {}_{n}{} \mathbf{u} _{n})\) with \(\lambda {}_{j} > 0\text { for all }{} \textit{j}, \lambda {}_{1}+\ldots {}+\lambda {}_{n} =1\) and \(\vert {}{} \mathbf{u} _{j}\vert {}_\infty {} = 1\), as a result \(\vert {}\vert {}\vert {}\text {M}^{*}\vert {}\vert {}\vert {}_{1}^{*}= \sum {}_{j}\lambda {}_{j}\vert {}{} \mathbf{u} _{j}\vert {}_{\infty {}}=1\). Hence

\(1 = \vert {}\vert {}\vert {}\text {M}^{*}\vert {}\vert {}\vert {}_{1}^{*}\le {}\gamma {}^{*}\vert {}\vert {}\vert {}\varPhi {}_{S}^{T}(\varPhi {}_{S}(\text {X}^{*}) - \mathbf{y} ) \vert {}\vert {}\vert {}_{1}^{*} \le {} \gamma {}^{* }{} \textit{N}(\varPhi {}_{S}^{T}: \textit{l}_{2}\rightarrow {}\vert {}\vert {}\vert {}.\vert {}\vert {}\vert {}_{1}^{*})\vert {}\varPhi {}_{S}(\text {X}^{*}) - \mathbf{y} \vert {}_{2} = \gamma {}^{*}\eta {}N(\varPhi {}_{S}^{T}: \textit{l}_{2}\rightarrow {}\vert {}\vert {}\vert {}.\vert {}\vert {}\vert {}_{1}^{*})\)

i.e.,

$$\begin{aligned} \gamma ^{*-1} \le \eta N\left( \varPhi _{\mathrm {S}}^{\mathrm {T}}: l_{2} \rightarrow |||{.}|||_{1}^{*}\right) \end{aligned}$$
(52)

Replace \(\gamma {}^{*-1}\) with its upper-bound in (52), we obtain if \(\textit{min}_{(\textit{i,j}) in \, \textit{S }}\vert {}\text {Y}^{*}_{ij}\vert {}> \eta {} \text {N}(\varPhi {}_{S}^{T}\): \(\textit{l}_{2}\rightarrow {}\vert {}\vert {}\vert {}.\vert {}\vert {}\vert {}^{*}_{1})\textit{N}((\varPhi {}_{S}^{T}\varPhi {}_{S})^{-1}\): \(\vert {}\vert {}\vert {}.\vert {}\vert {}\vert {}_{1}^{*}\rightarrow {}\vert {}\vert {}\vert {}.\vert {}\vert {}\vert {}_{max})\) then \(\textit{sgn}(\text {X}^{*}_{S,\textit{ij}}) =\textit{sgn}(\text {Y}^{*}_{ij})\) for all (i,j) in S.

(3) \(\text {Y}^{*} = \varPhi {}_{S}^{*-1}(\mathbf{y} )\in {}\sum _s^{n\times {}n}(S)\) implies \(\varPhi {}_{S}^{T}(\varPhi {}_{S}(\text {Y}^{*})-\mathbf{y} ) = \text {O}\) and then condition (1) leads to \(\varPhi {}_{S}(\text {Y}^{*}) = \mathbf{y} \). Furthermore, \(\varPhi {}_{S}^{T}\varPhi {}_{S }\)is a bijection for \(\sum _s^{n\times {}n}(S)\rightarrow {}\sum _s^{n\times {}n}(S)\) and notice \(X^{*}-\text {Y}^{*}\in {}\sum _s^{n\times {}n}(S)\), so for any matrix norm \(\vert {}.\vert {}_{\alpha {}}\):

\(\vert {}\text {X}^{*} - \text {Y}^{*}\vert {}_{\alpha {} =} \vert {}(\varPhi {}_{S}^{T}\varPhi {}_{S})^{-1}(\varPhi {}_{S}^{T}\varPhi {}_{S})(X^{*} - Y^{*})\vert {}_{\alpha {} }= \vert {}\varPhi {}_{S}^{*-1}\varPhi {}_{S}(\text {X}^{*} - \text {Y}^{*})\vert {}_{\alpha {} }= \vert {}(\varPhi {}_{S}^{*-1}(\varPhi {}_{S}(\text {X}^{*}) - \mathbf{y} ))\vert {}_{\alpha {}}\)

\(\le {} \textit{N}(\varPhi {}_{S}^{*-1}: \textit{l}_{2}\rightarrow {}\vert {}.\vert {}_{\alpha {}})\vert {}\varPhi {}_{S}(\text {X}^{*}) - \mathbf{y} \vert {}_{2 }= \eta {} N(\varPhi {}_{S}^{*-1}: \textit{l}_{2}\rightarrow {}\vert {}.\vert {}_{\alpha {}}) \)    \(\square \)

Proof

of Theorem 6. (1) Note that in case of \(\text {X} \in {}\sum _s^{n\times {}n}(R)\) and \(\mathbf{y} = \varPhi {}\text {(X)}+ {\textit{\textbf{e}}} = \varPhi {}_{R}\text {(X)}+ {{\textit{\textbf{e}}}}, \vert {}{{\textit{\textbf{e}}}}\vert {}_{2} \le {}\eta {}\), we have

$$\begin{aligned} \varPhi {}_{R}\varPhi {}_{R}^{*-1}(\mathbf{y} ) - \mathbf{y} = (\varPhi {}_{R}\varPhi {}_{R}^{*-1} - \text {I}_{R}){{\textit{\textbf{e}}}} \end{aligned}$$

It’s straightforward to verify that in this situation condition (3) in this theorem leads to condition (3) in Lemma 3: \(\text {sup}\{<\varPhi {}_{\sim {}R}^{T}(\varPhi {}_{R}\varPhi {}_{R}^{*-1}(\mathbf{y} ) - \mathbf{y} )\mathbf , \text {H}>: \vert {}\vert {}\vert {}\text {H}\vert {}\vert {}\vert {}_{1}=1\} < \eta {} \varLambda {}_{min}(\varPhi {}_{R}^{T}) (1 - \textit{N}(\varPhi {}_{R}^{*-1}\varPhi {}_{\sim {}R}: \vert {}\vert {}\vert {}.\vert {}\vert {}\vert {}_{1}^{*}\rightarrow {}\vert {}\vert {}\vert {}.\vert {}\vert {}\vert {}_{1}^{*}))\)

for any \(\eta {}\). As a result, \(\text {X}^{*}\in {}\sum _s^{n\times {}n}(R)\) and is \(\textit{l}_{1}\)-column-flat and the unique minimizer of \(\text {MP}_{y, \varPhi {}, {\eta {}}}\).

(2) For \(\text {Y}^{*} = \varPhi {}_{R}^{*-1}(\mathbf{y} ) \in {}\sum _s^{n\times {}n}(R)\) and by Lemma 3(4), we obtain \(\vert {}\text {X}^{*} - \text {Y}^{*}\vert {}_{\alpha {}} \le {} \eta {} \text {N}(\varPhi {}_{R}^{*-1}\): \(\textit{l}_{2}\rightarrow {}._{\alpha {}})\) for any given matrix norm \(\vert {}_{.}\vert {}_{\alpha {}}\). On the other hand, \(\text {Y}^{*} = \varPhi {}_{R}^{*-1}(\mathbf{y} )\) implies \(\varPhi {}_{R}^{T}(\varPhi {}_{R}(\text {Y}^{*})-\mathbf{y} ) = \text {O}\) then condition (1) leads to \(\varPhi {}_{R}(\text {Y}^{*}) = \mathbf{y} \), hence \(\varPhi {}_{R}(\text {Y}^{*}) = \mathbf{y} = \varPhi {}(\text {X})+ {{\textit{\textbf{e}}}} = \varPhi {}_{R}(X)+ {{\textit{\textbf{e}}}}\), namely \(\varPhi {}_{R}^{T}\varPhi {}_{R}(\text {Y}^{*}) = \varPhi {}_{R}^{T}\varPhi {}_{R}\text {(X)}+\varPhi {}_{R}^{T}({{\textit{\textbf{e}}}})\), as a result:

$$\begin{aligned} \mathrm {Y}^{*}-\mathrm {X}=\left( \varPhi _{\mathrm {R}}^{\mathrm {T}} \varPhi _{\mathrm {R}}\right) ^{-1} \varPhi _{\mathrm {R}}^{\mathrm {T}}(e) \equiv \varPhi _{\mathrm {R}}^{*-1}(e) \end{aligned}$$
(53)

Since \(\vert {}{\textit{\textbf{e}}}\vert {}_{2} \le {} {\eta {}}\), we get \(\vert {}\text {Y}^{*} - \text {X}\vert {}_{\alpha {}} \le {} \eta {} \text {N}(\varPhi {}_{R}^{*-1}\): \(\textit{l}_{2}\rightarrow {}\vert {}.\vert {}_{\alpha {}})\) for any given matrix norm \(\vert {}_{.}\vert {}_{\alpha {}}\). Combining with \(\vert {}\text {X}^{*} - \text {Y}^{*}\vert {}_{\alpha {}} \le {} \eta {} \text {N}(\varPhi {}_{R}^{*-1}\): \(\textit{l}_{2}\rightarrow {}\vert {}.\vert {}_{\alpha {}})\) we get the reconstruction error bound \(\vert {}\text {X}^{*} - \text {X}\vert {}_{\alpha {}} \le {} 2\eta {} \text {N}(\varPhi {}_{R}^{*-1}: \textit{l}_{2}\rightarrow {}\vert {}.\vert {}_{\alpha {}})\).

(3) By the first-order optimization condition on minimizer \(\text {X}^{*}\) with the fact \(\text {supp(X}^{*}) = \text {R}\), we have the equation \(\text {X}^{*} = \varPhi {}_{R}^{*-1}(\mathbf{y} ) - \gamma {}^{*-1}(\varPhi {}_{R}^{T}\varPhi {}_{R})^{-1}(\text {M}^{*}) = \text {Y}^{*} -\gamma {}^{*-1}(\varPhi {}_{R}^{T}\varPhi {}_{R})^{-1}(\text {M}^{*})\) where \(\text {M}^{*}\) is in \(\partial {}\vert {}\vert {}\vert {}\text {X}^{*}\vert {}\vert {}\vert {}_{1}\), namely:

$$\begin{aligned} \mathrm {X}^{*}-\mathrm {Y}^{*}=-\gamma ^{*-1}\left( \varPhi _{\mathrm {R}}^{\mathrm {T}} \varPhi _{\mathrm {R}}\right) ^{-1}\left( \mathrm {M}^{*}\right) \end{aligned}$$
(54)

Combining with (53), we get

$$\begin{aligned} \mathrm {X}^{*}-\mathrm {X}=\varPhi _{\mathrm {R}}^{*-1}(e)-\gamma ^{*-1}\left( \varPhi _{\mathrm {R}}^{\mathrm {T}} \varPhi _{\mathrm {R}}\right) ^{-1}\left( \mathrm {M}^{*}\right) \end{aligned}$$
(55)

Since \(\textit{sgn}(\text {X}^{*}_{ij}) = \textit{sgn}(\text {X}_{ij}) \textit{iff} \vert {}\text {X}_{ij}\vert {} > \vert {}\text {X}_{ij} - X^{*}_{ij}\vert {} = \vert {}\varPhi {}_{R}^{*-1}({{\textit{\textbf{e}}}})_{ ij} - \gamma {}^{*-1} (\varPhi {}_{R}^{T}\varPhi {}_{R})^{-1}(\text {M}^{*})_{ij}\vert {}\), in particular, if \(\text {X}_{ij}\) can satisfy \(\vert {}\text {X}_{ij}\vert {}>max_{ij} \vert {}\varPhi {}_{R}^{*-1}({{\textit{\textbf{e}}}})_{ ij} \vert {} +\gamma {}^{*-1}max_{ij} \vert {} (\varPhi {}_{R}^{T}\varPhi {}_{R})^{-1}(\text {M}^{*})_{ij} \vert {}\) then the former inequality is true and as a result \(\textit{sgn}(\text {X}^{*}_{ij}) = \textit{sgn}(\text {X}_{ij})\). It’s straightforward to verify (by using (52)) that the condition (3) just provides a guarantee for this.    \(\square \)

Appendix C: Proofs of Theorems in Section 5

Proof

of Lemma 4. We start with (FACT 4) \(\textit{w}^{2}(\text {D}(\vert {}\vert {}\vert {}.\vert {}\vert {}\vert {}_{1}, \text {X})) \le {}\text {E}_{G}[\textit{inf}\{\vert {}\text {G}-\textit{t}\text {V}\vert {}_{F}^{2}\): \(\textit{t}>0\), V in \(\partial {}\vert {}\vert {}\vert {}\text {X}\vert {}\vert {}\vert {}_{1}\}]\) where G is a random matrix with entries \(\text {G}_{ij}\sim ^{iid} \textit{N}(0,1)\).

Set \(\text {G}=(\mathbf{g} _{1},\ldots {},\mathbf{g} _{n})\) where \(\mathbf{g} _{j}\sim ^{iid} \textit{N}(0,\text {I}_{n})\). By Lemma 1, \(V=(\lambda {}_{1}\xi {}_{1},\ldots {}, \lambda {}_{n}\xi {}_{n})\) where w.l.o.g. \(\lambda {}_{j}\ge {}0\) for \(\textit{j}=1,\ldots {},\textit{r}, \lambda {}_{1}+\ldots {}+\lambda {}_{r}=1\), \(\lambda {}_{j}=0\) for \(j\ge r+1\); \(\vert {}{} \mathbf{x} _{j}\vert {}_{1}=max_{k} \vert {}{} \mathbf{x} _{k}\vert {}_{1}\) for \(\textit{j}=1,\ldots {},\textit{r}\) and \(\vert {}{} \mathbf{x} _{j}\vert {}_{1}<max_{k}\vert {}{} \mathbf{x} _{k}\vert {}_{1 }\)for \(j\ge {}1+\textit{r}; \xi {}_{j}(\textit{i})=\textit{sgn}(\text {X}_{ij})\) for \(\text {X}_{ij}\not =0\) and \(\vert {}\xi {}_{j}(\textit{i})\vert {} \le {}1\) for all i and j. Then

$$\begin{aligned}&\quad w^{2}\left( \mathrm {D}\left( \vert {}\vert {}\vert {}\cdot \vert {}\vert {}\vert {}_{1}, \mathrm {X}\right) \right) \\&\quad \le \mathrm {E}_{G}\left[ inf_{\ t> 0, \lambda _ j, {\xi }_j \text { specified as the above }} \sum \nolimits _{j=1}^{r}\left| \mathbf {g}_{j}-t \lambda _{j} \xi _{j}\right| _{2}^{2}+\sum \nolimits _{j=r+1}^{n}\left| \mathbf {g}_{j}\right| _{2}^{2}\right] \\&\quad \le i n f_{\ t> 0, \text{ all } \lambda _{j} \text{ specified } \text{ as } \text{ the } \text{ above } } \mathrm {E}_{G}\left[ {inf}_{ \text { all } \xi _{j} \text{ specified } \text{ as } \text{ the } \text{ above } } \sum \nolimits _{j=1}^{r}\right. \\&\quad \left. \left| \mathbf {g}_{j}-t \lambda _{j} \xi _{j}\right| _{2}^{2}+\sum \nolimits _{j=r+1}^{n} \left| \mathbf {g}_{j}\right| _{2}^{2}\right] \\&\quad = i n f_{\ t> 0, \text{ all } \lambda _{j} \text{ specified } \text{ as } \text{ the } \text{ above } } \mathrm {E}_{G}\left[ {inf}_{ \text { all } \xi _{j} \text{ specified } \text{ as } \text{ the } \text{ above } } \sum \nolimits _{j=1}^{r}\right. \\&\quad \left. \left| \mathbf {g}_{j}-t \lambda _{j} \xi _{j}\right| _ 2^{2}\right] +\sum \nolimits _{j=r+1}^{n} \mathrm {E}_{\mathrm {G}}\left[ \left| \mathbf {g}_{j}\right| _{2}^{2}\right] \\&\quad = i n f_{\ t> 0, \text{ all } \lambda _{j} \text{ specified } \text{ as } \text{ the } \text{ above } } \mathrm {E}_{G}\left[ \sum \nolimits _{j=1}^{r} { inf}_{\ \xi _{j} \text{ specified } \text{ as } \text{ the } \text{ above } }\right. \\&\quad \left. \left| \mathbf {g}_{j}-t \lambda _{j} \xi _{j}\right| _ 2^{2}\right] +(n-r) n\\&\quad (\text {since }\left. \xi _{j} \text{ is } \text{ unrelated } \text{ each } \text{ other } \text{ and } \mathrm {E}_{G}\left[ \left| \mathbf {g}_{j}\right| _2^{2}\right] =n\right) \\&\quad =i n f_{\ t> 0, \text{ all } \lambda _j \text{ specified } \text{ as } \text{ the } \text{ above } } \sum \nolimits _{j=1}^{r}\mathrm {E}_{g j}\left[ {inf}_{{\ \xi }_ j \text{ specified } \text{ as } \text{ the } \text{ above } }\left| \mathbf {g}_{j}-t \lambda _{j} \xi _{j}\right| _{2}^{2}\right] +\\&(n-r) n \end{aligned}$$

For each \(j=1, \ldots , r\) let S(S) be the support of \(\mathbf {x}_{j}( \text{ so } |S(j)| \le s)\) and \(\sim S(j)\) be its complimentary set, then \(\left| \mathbf {g}_{j}-t \lambda _{j}\xi _j\right| _ 2^{2}=\left| \mathbf {g}_{j|S(j)}\right| -t \lambda _{j} \xi _{j| S(j)}\left| _2^{2}+\right| \mathbf {g}_{j|\sim S(j)}-\left. t \lambda _{j} \xi _{j|\sim S(j)}\right| _{2} ^{2}\). Notice that all components of \(\xi _{\mathrm {j} | S(j)}\) are \(\pm 1\) and all components of \(\xi _{j|\sim S(j)}\) can be any value in the interval \([-1,+1] .\) Select \(\lambda _{1}=\ldots =\lambda _{r}=1/r,\) let \(\varepsilon >0\) be arbitrarily small positive number and select \(t=t(\varepsilon )\) such that \(\mathrm {P}[|g|>t(\varepsilon )/r] \le \varepsilon \) where g is a standard scalar Gaussian random variable (i.e., \(\left. \mathrm {g} \sim N(0,1) \text{ and } \varepsilon \text {can be} \exp \left( -t(\varepsilon )^{2}/2 r^{2}\right) \right) \) For each j and each i outside S(j),  set \(\xi _j\)’s component \(\xi _{j}(i)=r g_{j}(i)/t(\varepsilon )\) if \(\left| g_{j}(i)\right| \le t(\varepsilon )/r\) (in this case \(\left| g_{j}(i)-t \lambda _{j} \xi _{j|}(i)\right| =0\)) and otherwise \(\xi _{j}(i)=\text {sgn}\left( g_{j}(i)\right) \) (in this case \(\left| g_{j}(i)-t \lambda _{j} \xi _{j|}(i)\right| =\left| g_{j}(i)\right| -t(\varepsilon )/r)\), then \(\left| \mathbf {g}_{j|\sim S(j)}-t \lambda _{j} \xi _{j|\sim S(j)}\right| _ 2^{2}=0\) when \(| \mathbf {g}_{j|\sim S(j)}|_{\infty }<t(\varepsilon )/r\), hence:

\(\mathrm {E}\left[ \left| \mathbf {g}_{j|\sim S(j)}-t \lambda _{j} \xi _{ j|\sim S(j)}\right| _{2}^{2}\right] = \int _{0}^{\infty } d u \mathrm {P}\left[ \left| \mathbf {g}_{j| \sim S(j)}-t \lambda _{j} \xi _{j| \sim S(j)}\right| _ 2^{2}>u\right] \)

\(=2 \int _{0}^{\infty } d u u \mathrm {P}\left[ \left| \mathbf {g}_{j|\sim S(j)}-t \lambda _{j} \xi _{i |\sim S(j)}\right| _{2}>u\right] \)

\(\le 2 \int _{0}^{\infty } d u u P\left[ \text{ There } \text{ exists } \left( \mathbf {g}_{j|\sim S(j)}-t \lambda _{j} \xi _{i |\sim S(j)}\right) \right. \)’s component with magnitude \(\left. >(n-s)^{-1/2} u\right] \)

\(\le 2(n-s) \int _{0}^{\infty } d u u P\left[ |g|-t(\varepsilon )/r>(n-s)^{-1/2} u\right] \)

\(\le 2(n-s) \int _{0}^{\infty } d u u \exp \left( -\left( (t(\varepsilon )/r)+(n-s)^{-1/2} u\right) ^{2}/2\right) \)

\(\le C_{0}(n-s)^{2} \exp \left( -t(\varepsilon )^{2}/2 r^{2}\right) \le C_{0}(n-s)^{2} \varepsilon \)

where C\(_{0}\) is an absolute constant. On the other hand:

\(E_{gj}[\vert {}{} \mathbf{g} _{j\vert {}S(\textit{j})}-\textit{t}\lambda {}_{j}\xi {}_{j\vert {}S(\textit{j})}\vert {}_{2}^{2}] = E_{gj}[\vert {}{} \mathbf{g} _{j\vert {}S(\textit{j})}\vert {}^{2}] +(\textit{t}(\varepsilon {})^{2}/\textit{r}^{2})\vert {}\xi {}_{j\vert {}S(\textit{j})}\vert {}_{2}^{2 }= (1+\textit{t}(\varepsilon {})^{2}/\textit{r}^{2})\textit{s} =(1+2\textit{log}(1/\varepsilon {}))\textit{s }\)

Hence \(\textit{w}^{2}(\text {D}(\vert {}\vert {}\vert {}.\vert {}\vert {}\vert {}_{1}, \text {X})) \le {} (1+2\textit{log }(1/\varepsilon {}))\textit{rs} + (\textit{n}-\textit{r})\textit{n }+ \textit{r}(\textit{n}-\textit{s})^{2}\varepsilon {} \le {}{} \textit{n}^{2} - \textit{r}(\textit{n}-\textit{slog}(\textit{e}/\varepsilon {}^{2})) + \textit{C}_{0}{} \textit{n}^{2}r\varepsilon {}\)

In particular, let \(\varepsilon {}=1/\textit{C}_{0}{} \textit{n}^{2}{} \textit{r}\) then we get \(\textit{w}^{2}(\text {D}(\vert {}\vert {}\vert {}.\vert {}\vert {}\vert {}_{1}, \text {X})) \le {} \textit{n}^{2} - \textit{r}(\textit{n }- \textit{slog}(\textit{Cn}^{4}{} \textit{r}^{2})) + 1\textit{.}\)    \(\square \)

Proof

of Theorem 9. For any \(\textit{s}<\textit{n}\), there exist \(\textit{k }\ge {} (\textit{n}/4\textit{s})^{\textit{ns}/2}\) subsets \(\text {S}^{(\alpha {}\beta {}\ldots {}\omega {})} = \text {S}_{1}^{({\alpha {}})}\cup {} \text {S}_{2}^{({\beta {}})} \cup {}\ldots {}\cup {}\,\text {S}_{n}^{(\omega {}) }\text {in }\{(\textit{i},\textit{j}): 1\le {}{} \textit{i}, \textit{j}\le {}{} \textit{n }\}\) where each \(\text {S}_{j}^{(\mu {})}=\{(\textit{i}_{1},\textit{j}),\ldots {}, (\textit{i}_{s},\textit{j}): 1\le {}{} \textit{i}_{1}<\textit{i}_{2}<\ldots {}<i_{s}\le {}{} \textit{n}\}\) and \(\vert {}\text {S}_{j}^{(\mu {})}\cap {}\text {S}_{j}^{(\nu {})}\vert {} < \text {s/2}\) for \(\mu {}\not =\nu {}\). This fact is based on a combinatorial theorem [11] that for any \(\textit{s}<\textit{n}\) there exist \(\textit{l} \ge {} (\textit{n}/4\textit{s})^{\textit{s}/2}\) subsets \(\text {R}^{(\mu {}) }\text {in }\{1,2,\ldots {},\textit{n}\}\) where \(\vert {}\text {R}^{(\mu {})}\cap {}\text {R}^{(\nu {})}\vert {}< \textit{s}/2\) for any \(\mu {}\not =\nu {}\). For the n-by-n square \(\{(\textit{i},\textit{j}): 1\le {}{} \textit{i}, \textit{j}\le {}{} \textit{n }\}\), assign a \(\text {R}^{(\mu {}) }\)to each column, i.e., set \(\text {S}_{j}^{(\mu {})}:=\{ (\textit{i}, \textit{j}): \textit{i}\in {}\text {R}^{(\mu {})}\}\). As a result \(\vert {}\text {S}_{j}^{(\mu {})}\cap {}\text {S}_{j}^{(\nu {})}\vert {}<\text {s/2}\) for \(\mu {}\not =\nu {}\) since \(\vert {}\text {R}^{(\mu {})}\cap {}\text {R}^{(\nu {})}\vert {} < \text {s/2}\) for \(\mu {}\not =\nu {}\) and totally there can be \(k=l^{n}\) such assignments \(\text {S}^{({\alpha {}\beta {}\ldots {}\omega {}})}=\text {S}_{1}^{(\alpha {})}\cup {} \text {S}_{2}^{(\beta {})}\cup {}\ldots {}\cup {}\,\text {S}_{n}^{(\omega {})}\) on the square.

Now we call the above \(\text {S}_{1}^{(\alpha {})}\cup {} \text {S}_{2}^{(\beta {})}\cup {}\ldots {}\cup {}\,\text {S}_{n}^{(\omega {}) }\)a configuration on the n-by-n square. Let m be the rank of linear operator \(\varPhi {}\). Consider the quotient space \(\textit{L}:=R^{n\times {}n} /\text {ker}\varPhi {}\), then \(\text {dim}{} \textit{L}=\textit{n}^{2}-\text {dimker}\varPhi {}=\textit{m}\). For any [X] in L define the norm \(\vert {}[\text {X}]\vert {}:=\textit{inf}\{\vert {}\vert {}\vert {}\text {X}-\text {V}\vert {}\vert {}\vert {}_{1}\): V in ker\(\varPhi {}\)}. For any \(X=(\mathbf{x} _{1},\ldots {},\mathbf{x} _{n})\) with \(\mathbf{x} _{j }~in~\sum {}^{2S }\) for all j, the assumption about \(\varPhi {}\) implies \(\vert {}[\text {X}]\vert {}=\vert {}\vert {}\vert {}\text {X}\vert {}\vert {}\vert {}_{1}\). Now for any configuration \(\varDelta {}=\text {S}_{1}\cup {}\text {S}_{2}\cup {}\ldots {}\,\cup {}\text {S}_{n }\)on the n-by-n square, define \(\text {X}_{ij}(\varDelta {}):=1/\textit{s}\) if (\(\textit{i},\textit{j})\in {} \text {S}_{j}\) and 0 otherwise, then \(\vert {}\vert {}\vert {}\text {X}(\varDelta {})\vert {}\vert {}\vert {}_{1}=1\), each \(\text {X}(\varDelta {})\)’s column \(\text {x}_{j}(\varDelta {})\in {}\sum {}^{S }\)and each column of \(\text {X}(\varDelta {}\)’)\(-\text {X}(\varDelta {}\)”) is in \(\sum {}^{2S}\), furthermore \(\vert {}[\text {X}(\varDelta {}\)’)]\(-[\text {X}(\varDelta {}\)”)]\(\vert {}=\vert {}\vert {}\vert {}\text {X}(\varDelta {}\)’)\(-\text {X}(\varDelta {}\)”)\(\vert {}\vert {}\vert {}_{1}>1\) because of the property \(\vert {}\text {S}_{j}\)\(\cap {}\text {S}_{j}\)\(\vert {}< \text {s}/2\) for \(\text {S}_{j}\)\(\not =\text {S}_{j}\)”. These facts imply that the set \(\varTheta {}:=\{[\text {X}(\varDelta {})]: \varDelta {}\) runs over all configurations} is a subset on normed quotient space L’s unit sphere with distances between any pair of its members >1, i.e., a d-net on the sphere where d > 1. The cardinality of \(\varTheta {}\) = number of configurations\(\textit{k} \ge {} (\textit{n}/4\textit{s})^{\textit{ns}/2}\) and an elementary estimate derives \(\textit{k} \le {} 3^{dimL}=3^{m}\), hence \(\textit{m}\ge {} \textit{C}_{1}{} \textit{nslog}(\textit{C}_{2}{} \textit{n}/\text {s}))\) where \(\textit{C}_{1}=1/2\textit{log}3\) and \(\textit{C}_{2}=1/4\).    \(\square \)

Appendix D: Proofs of Theorems in Section 6

Proof

of Lemma 5. We start with a similar inequality as that in FACT 4 (the proof is also similar) \(\textit{W}^{2}(\varGamma {}_{X}; \varPhi {}_{A,B}) \le {} \text {E}_{H}[\textit{inf}\{\vert {}\text {H}-\textit{t}\text {V}\vert {}_{F}^{2}\): \(\textit{t}>0\), V in \(\partial {}\vert {}\vert {}\vert {}\text {X}\vert {}\vert {}\vert {}_{1}\}\)]. With the same specifications for \(\text {V}=(\lambda {}_{1}\xi {}_{1},\ldots {}, \lambda {}_{n}\xi {}_{n}\)) as those in Lemma 4, i.e.(w.l.o.g.) \(\lambda {}_{j }\ge {} 0\) for \(\textit{j}=1,\ldots {},\textit{r}, \lambda {}_{1}+\ldots {}+\lambda {}_{r}=1\), \(\lambda {}_{j}=0\) for \(j\ge r+1\); \(\vert {}{} \mathbf{x} _{j}\vert {}_{1}=max_{k}\vert {}{} \mathbf{x} _{k}\vert {}_{1 }\)for \(\textit{j}=1,\ldots {},\textit{r}\) and \(\vert {}{} \mathbf{x} _{j}\vert {}_{1 }< {max_{k}}\vert {}{} \mathbf{x} _{k}\vert {}_{1 }\)for \({j\ge {}}1+\textit{r}\); \(\xi {}_{j}(\textit{i})=\textit{sgn}(\text {X}_{ij})\) for \(\text {X}_{ij}\not =0\) and \(\vert {}\xi {}_{j}(\textit{i})\vert {} \le {}1\) for all i and j. Let \(\textit{\textbf{h}}_{j}\equiv {} \sum _{l=1}^mm^{-1}B_{lj}\text {A}^{T}\varepsilon {}_{l}\), we have

\(W^{2}\left( \varGamma _{\mathrm {X}} ; \varPhi _{\mathrm {A}, \mathrm {B}}\right) \)

\(\le \mathrm {E}_{A, B, E}\left[ i n f_{ t>0, \lambda j, \xi j \text{ specified } \text{ as } \text{ the } \text{ above } }\right. \sum _{j=1}^{n} | \sum _{l=1}^{m}{} \textit{m}^{-1}B_{lj}\text {A}^{T}\varepsilon {}_{l }- \textit{t}\lambda {}_{j}\xi {}_{j}\vert {}_{2}^{2}]\)

\(=\sum _{j=r+1}^{n} \mathrm {E}_{A, B, E}\left[ \left| \textit{\textbf{h}}_{j}\right| _{2}^{2}\right] + \mathrm {E}_{A, B, E}\left[ i n f_{ t>0, \lambda j, \xi j \text{ specified } \text{ as } \text{ the } \text{ above } }\sum _{j=1}^{r}\right. \left. \left| \textit{\textbf{h}}_{j}-t \lambda _{j} \xi _{j}\right| _{2}^{2}\right] \)

\(=\mathrm {I}+\mathrm {II}\)

The first and second terms are estimated respectively. The first term

\(\text {I} = \sum _{j=r+1}^n\textit{m}^{-2}\sum _{l,k=1}^m\text {E}_{B}[B_{lj}\text {B}_{kj}]\text {E}_{A,E}[\varepsilon {}_{l}^{T}\text {AA}^{T}\varepsilon {}_{k}] = \textit{m}^{-2}(\textit{n}-\textit{r}) \sum _{l,k=1}^m\delta {}_{lk} \text {E}_{A,E}[\varepsilon {}_{l}^{T}\text {AA}^{T}\varepsilon {}_{l}] =\textit{ }(\textit{n}-\textit{r})\textit{n}\)

To estimate II, for each \(\textit{j}=1,\ldots {},\textit{r}\) let S(j) be the support of \(\mathbf{x} _{j}\) (so \(\vert {}{} \textit{S}(\textit{j})\vert {} \le {} \text {s}\)) and \(\sim \)S(j) be its complimentary set, then

\(\left. \sum _{j=1}^{r}\left| \textit{\textbf{h}}_{j}-t \lambda _{j} \xi _{j}\right| _{2}^{2}\right] = \left. \sum _{j=1}^{r}\left| \textit{\textbf{h}}_{j| S(j)}-t \lambda _{j} \xi _{j| S(j)}\right| _{2}^{2}\right] \) \(+\sum _{j=1}^{r}|\textit{\textbf{h}}_{j| \sim S(j)}\)

\(\left. -t \lambda _{j} \xi _{j|\sim S(j)}|_{2}^{2}]\right] \)

Notice that all components of \(\xi {}_{j\vert {}S({j})}\) are \(\pm {}\)1 and all components of \(\xi {}_{j\vert {}\sim {}S({j}) }\) can be any value in the interval [−1, +1]. Select \(\lambda {}_{1}=\ldots {}=\lambda {}_{r} =1/\textit{r}\), let \(\delta {}>0\) be arbitrarily small positive number and select \(\textit{t}=\textit{t}(\delta {})\) such that \(\text {P}_{A,B,E}[\vert {}{} \textit{h}\vert {} > \textit{t}(\delta {})/\textit{r}] \le {} \delta {}\) where h is a random scalar such that \(h_{j}(\textit{i})\sim {}{} \textit{h}\) and i indicates the vector \({{\textit{\textbf{h}}}_{j}}\)’s i-th component. For each j and i outside S(j), set \(\xi {}_{j}\)’s component \(\xi {}_{j}(\textit{i})=\textit{rh}_{j}(\textit{i})/\textit{t}(\varepsilon {})\) if \(\left| h_{j}(i)\right| \le t(\delta )/r\) and otherwise \(\xi _{j}(i)=\text {sgn}\left( h_{j}(i)\right) \), then \(\left| \textit{\textbf{h}}_{j| \sim S(j)}-t \lambda _{j} \xi _{j|\sim S(j)}\right| _{2}^{2}=0\) when \(| \textit{\textbf{h}}_{j|\sim S(j)}|_{\infty }<t(\delta )/r\) and notice the fact that for independent standard scalar Gaussian variables \(a_{l}, b_{l}\) and Rademacher variables \(\varepsilon _{l}, l=1, \ldots , m\), there exists absolute constant c such that for any \(\eta >0\):

$$\begin{aligned} \mathrm {P}\left[ \left| m^{-1} \sum _{l, k=1}^{m} b_{l} a_{k} \varepsilon _{k}\right| >\eta \right] <c \exp (-\eta ) \end{aligned}$$
(56)

as a result, in the above expression \(\delta {}\) can be \(\textit{ c}_{ }{} \textit{exp}(-\textit{t}(\delta {})/\textit{r})\) and:

\(\mathrm {E}\left[ \left| \textit{\textbf{h}}_{j|\sim S(j)}-t \lambda _{j} \xi _{j|\sim S(j)}\right| _{2}^{2}\right] = \int _{0}^{\infty } d u P\left[ \left| \textit{\textbf{h}}_{j|\sim S (j)}-t \lambda _{j} \xi _{j| \sim S(j)}\right| _{2} >u\right] \)

\(=2 \int _{0}^{\infty } d u u \mathrm {P}\left[ \left| \textit{\textbf{h}}_{j|\sim S(j)}-t \lambda _{j} \xi _{j|\sim S(j)}\right| _ 2>u\right] \)

\(\le 2 \int _{0}^{\infty } d u u P\left[ \text{ There } \text{ exists } \left( \textit{\textbf{h}}_{j|\sim S(j)}-t \lambda _{j} \xi _{i\sim S(j)}\right) ' \mathrm {s}\right. \) component with magnitude \(\left. >(n-s)^{-1/2} u\right] \)

\(\le 2(n-s) \int _{0}^{\infty } d u u P\left[ |h|-t(\delta )/r>(n-s)^{-1/2} u\right] \)

\(\le 2(n-s) \int _{0}^{\infty } d u u \exp \left( -\left( (t(\delta )/r)+(n-s)^{-1/2} u\right) \right) \)

\(\le C_{0}(n-s)^{2} \exp (-(t(\delta )/r)) \le C_{0}(n-s)^{2} \delta \)

where \(\textit{C}_{0}\) is an absolute constant. On the other hand \(\left| \xi _{j| S(j)}\right| _{2}^{2} \le s\) for \(j \ge 1+r\) so:

\(\mathrm {E}_{A, B, E}\left[ \inf _{t>0, \lambda j, \xi j} \sum _{j=1}^{r}\left| \textit{\textbf{h}}_{j| S(j)}-t \lambda _{j} \xi _{j| S(j)}\right| _{2}^{2}\right] \)

\(\le \mathrm {E}_{A, B, E}\left[ \sum _{j=1}^{r}\left| \textit{\textbf{h}}_{j| S(j)}-t(\delta ) \xi _{j| S(j)}/r\right| _{2}^{2}\right] \)

\(\le \sum _{j=1}^{r} \mathrm {E}_{A, B, E}\left[ m^{-2}\left| \sum _{l=1}^{m} B_{l j}\left( \mathrm {A}^{\mathrm {T}} \varvec{\varepsilon }_{l}\right) _{| S(j)}\right| {}_2^{2}\right] +r s t(\delta )^{2}/r^{2}\)

\(=r s\left( 1+t(\delta )^{2}/r^{2}\right) \)

hence \(\text {II} \le {}{} \textit{rs}(1+\textit{t}(\delta {})^{2}/\textit{r}^{2}) +nr\delta {}\). Combine all the above estimates we have:

\(\textit{W}^{2}(\varGamma {}_{X}; \varPhi {}_{A,B}) \le {} \text {I} + \text {II} \le {} (\textit{n}-\textit{r})\textit{n + rs}(1+\textit{t}(\delta {})^{2}/\textit{r}^{2}) + \textit{C}_{0}{} \textit{n}^{2}r\delta {} = \textit{n}^{2} - \textit{r}(\textit{n}-\textit{s}(1+\textit{t}(\delta {})^{2}/\textit{r}^{2}) + \textit{C}_{0}{} \textit{n}^{2}r\delta {}\)

Substitute \(\textit{t}(\delta {})/\textit{r}\) with \(\textit{log}(\textit{c}/\delta {})\) we get, for any \(\delta {}> 0\):

$$\begin{aligned} \textit{W}^{2}(\varGamma {}_{X}; \varPhi {}_{A,B}) \le {} \textit{n}^{2}-\textit{r}(\textit{n}-\textit{s}(1+\textit{log}^{2}(\textit{c}/\delta {})) + \textit{C}_{0}{} \textit{n}^{2}r\delta {} \end{aligned}$$

In particular, let \(\delta {}=1\textit{/C}_{0}{} \textit{n}^{2}{} \textit{r }\text { then }{} \textit{ W}^{2}(\varGamma {}_{X}; \varPhi {}_{A,B}) \le {} \textit{n}^{2}-\textit{r}(\textit{n}-\textit{s}(1+\textit{log}^{2}(\textit{cn}^{2}{} \textit{r}))+ 1\).    \(\square \)

Proof

of Lemma 6. By the second moment inequality \(\text {P}[\text {Z} \ge \xi ] \ge (\text {E[Z]} - \xi {})_{ +}^{2}/\text {E}[\text {Z}^{2}]\) for any non-negative r.v. Z and any \(\xi {} > 0\). Set \(\text {Z} = \vert {}{<}\text {M, U}{>}\vert {}^{2}\) and \(\xi {} =\text {E}[\vert {}{<}\text {M, U}{>}\vert {}^{2}]/2\), we get:

$$\begin{aligned} \mathrm {P}\left[ |{<}\mathrm {M}, \mathrm {U}{>}|^{2} \ge \mathrm {E}\left[ |{<}\mathrm {M}, \mathrm {U}{>}|^{2}\right] /2\right] \ge \mathrm {E}\left[ |{<}\mathrm {M}, \mathrm {U}{>}|^{2}\right] ^{2}/4 \mathrm {E}\left[ |{<}\mathrm {M}, \mathrm {U}{>}|^{4}\right] \end{aligned}$$
(57)

To estimate the upper bound of \(\text {E}[\vert {}{<}\text {M,U}{>}\vert {}^{2}]\), let \(\text {U}=\sum {}_{j}\lambda {}_{j}{} \mathbf{u} _{j}{} \mathbf{v} _{\textit{j} }\) be U’s singular value decomposition, \(\mathbf{u} _{i}^{T}{} \mathbf{u} _{j}=\mathbf{v} _{i}^{T}{} \mathbf{v} _{j}=\delta {}_{ij}\), \(\lambda {}_{j}>0\) for each j. Notice that \(\text {M}=\mathbf{ab} ^{T}\) where a\(\sim \)b\(\sim \)N(0, \(I_{n}\)) and independent each other, then \({<}\text {M,U}{>} = \mathbf{a} ^{T}\text {U}{} \mathbf{b} =\sum {}_{j}\lambda {}_{j}{} \mathbf{a} ^{T}{} \mathbf{u} _{j}{} \mathbf{v} _{j}^{T}{} \mathbf{b} \) where \(\mathbf{a} ^{T}{} \mathbf{u} _{i}\sim {}{} \mathbf{v} _{j}^{T}{} \mathbf{b} \sim {}{} \textit{N}(0,1)\) and independent each other, hence \(\text {E}[\vert {}{<}\text {M,U}{>}\vert {}^{2}] = \sum {}_{j}\lambda {}_{j}^{2}\text {E}[\vert {}{} \mathbf{a} ^{T}{} \mathbf{u} _{j}\vert {}^{2}]\text {E}\vert {}{} \mathbf{v} _{j}^{T}{} \mathbf{b} \vert {}^{2}]= \sum {}_{j}\lambda {}_{j}^{2} = \vert {}\text {U}\vert {}_{F}^{2}=1\) for U in the assumption.

On the other hand by Gaussian hypercontractivity we have

$$\begin{aligned} (\text {E}[\vert {}{<}\text {M,U}{>}\vert {}^{4}])^{1/4}\le {} \textit{C}_{0}(\text {E}[\vert {}{<}\text {M,U}{>}\vert {}^{2}])^{1/2 }= \textit{C}_{0} \end{aligned}$$

In conclusion \(\text {P}[\vert {}{<}\text {M,U}{>}\vert {}^{2}\ge {} 1/2] = \text {P}[\vert {}{<}\text {M,U}{>}\vert {}^{2}\ge {} \text {E}[\vert {}{<}\text {M,U}{>}\vert {}^{2}]/2] \ge {} \textit{c}\) for U: \(\vert {}\text {U}\vert {}_{F}^{2}=1\).    \(\square \)

The proof of Lemma 7 is logically the same as the proof of Lemma 5, the only difference is about the distribution tail of the components of vectors \({{\textit{\textbf{h}}}_{j} }\equiv {}\sum _{l=1}^mm^{-1}B_{lj}\text {A}^{T}\varepsilon {}_{l} \) which \(\sim ^{iid} \textit{h} \equiv {}{} \textit{m}^{-1}\sum _{l,k=1}^mb_{l}\text {a}_{k}\varepsilon {}_{k}\) with independent scalar sub-Gaussian variables \(a_{l}\), \(b_{l}\) and Rademacher variables \(\varepsilon {}_{l}\), l=1,...,m. This auxiliary result is presented in the following lemma:

Lemma 8

For independent scalar zero-mean sub-Gaussian variables \(a_{l}, b_{l}\) and Rademacher variables \(\varepsilon {}_{l}, \textit{l}=1,\ldots {},\textit{m}{} \mathbf , \) let \(\sigma {}_{A}\equiv {}max_{l}\vert {}a_{l}\vert {}_{\psi {}2}\), \(\sigma {}_{B}\equiv {}max_{l}\vert {}b_{l}\vert {}_{\psi {}2 }(\vert {}.\vert {}_{\psi {}2 }\) denotes a sub-Gaussian variable’s \(\psi {}_{2}\)-norm), then there exists absolute constant c such that for any \(\eta {}> 0\):

$$\begin{aligned} \mathrm {P}[|h|>\eta ]<2 \exp \left( -c \eta /\sigma _{\mathrm {A}} \sigma _{\mathrm {B}}\right) \end{aligned}$$
(58)

Proof

Notice that \(a_{\ell } \varepsilon _{k}\) is zero-mean sub-Gaussian variable with \(| a_{k} \varepsilon _{k} |_{\psi {2}}=|a_k|_{\psi 2},\) for \(b=m^{-1/2} \sum _{1 \le l \le m} b_{l}\) and \(a=m^{-1/2} \sum _{1 \le k \le m} a_{k} \varepsilon _{k}\) we have \(|b|_{\psi 2} \le C m^{-1/2}\left( \sum _{l}\left| b_{l}\right| _{\psi _{2}}^{2}\right) ^{1/2}C \sigma _{\mathrm {B}}\) and \(|a|_{\psi 2} \le C m^{-1/2}\left( \sum _l | a_{k} |_ {\psi {2}}^{2}\right) ^{1/2} \le C \sigma _{\mathrm {A}}\) where C is an absolute constant. Furthermore, because the product of two sub-Gaussian variables a and b is sub-Exponential and its \(\psi _{1}\) -norm \(|b a|_{w 1} \le \) \(|b|_{\psi 2}|a|_{\psi 2} \le C^{2} \sigma _{\mathrm {A}} \sigma _{\mathrm {B}}, h \equiv m^{-1} \sum _{l, k=1}^{m} b_{l} a_{k} \varepsilon _{k}=a b\) has its distribution tail P[\(|h|>\eta ]<2 \exp \left( -c \eta /\sigma _{A} \sigma _{B}\right) \) where c is an absolute constant.    \(\square \)

Proof

of Lemma 7. With the same logic as in the proof of Lemma 5 and based-upon Lemma 8, the auxiliary parameter \(\delta {}\) in the argument can be \(2\textit{exp}(-\textit{ct}(\delta {}) /r\sigma {}_{A}\sigma {}_{B})\) and equivalently \(\textit{t}(\delta {})/\textit{r }= \sigma {}_{A}\sigma {}_{B}{} \textit{log}(2/\delta {})\) which derives the final result.    \(\square \)

Rights and permissions

Reprints and permissions

Copyright information

© 2020 Springer Nature Singapore Pte Ltd.

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Tian, Y. (2020). Convex Reconstruction of Structured Matrix Signals from Linear Measurements: Theoretical Results. In: Zeng, J., Jing, W., Song, X., Lu, Z. (eds) Data Science. ICPCSEE 2020. Communications in Computer and Information Science, vol 1257. Springer, Singapore. https://doi.org/10.1007/978-981-15-7981-3_13

Download citation

  • DOI: https://doi.org/10.1007/978-981-15-7981-3_13

  • Published:

  • Publisher Name: Springer, Singapore

  • Print ISBN: 978-981-15-7980-6

  • Online ISBN: 978-981-15-7981-3

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics