Skip to main content

Incoherent Submatrix Selection via Approximate Independence Sets in Scalar Product Graphs

  • Conference paper
  • First Online:
Machine Learning, Optimization, and Data Science (LOD 2019)

Abstract

This paper addresses the problem of extracting the largest possible number of columns from a given matrix \(X\in \mathbb R^{n\times p}\) in such a way that the resulting submatrix has an coherence smaller than a given threshold \(\eta \). This problem can clearly be expressed as the one of finding a maximum cardinality stable set in the graph whose adjacency matrix is obtained by taking the componentwise absolute value of \(X^tX\) and setting entries less than \(\eta \) to 0 and the other entries to 1. We propose a spectral-type relaxation which boils down to optimising a quadratic function on a sphere. We prove a theoretical approximation bound for the solution of the resulting relaxed problem.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

Notes

  1. 1.

    Here, positivity is trivial.

References

  1. Adcock, B., Hansen, A.C., Poon, C., Roman, B.: Breaking the coherence barrier: a new theory for compressed sensing. Forum Math. Sigma 5, 84 (2017)

    Article  MathSciNet  Google Scholar 

  2. Adcock, B., Hansen, A.C., Poon, C., Roman, B., et al.: Breaking the coherence barrier: asymptotic incoherence and asymptotic sparsity in compressed sensing. Preprint (2013)

    Google Scholar 

  3. Arora, S., Ge, R., Moitra, A.:. New algorithms for learning incoherent and overcomplete dictionaries. In: Conference on Learning Theory, pp. 779–806 (2014)

    Google Scholar 

  4. Baraniuk, R.G.: Compressive sensing [lecture notes]. IEEE Signal Process. Mag. 24(4), 118–121 (2007)

    Article  Google Scholar 

  5. Bellec, P.C.: Localized Gaussian width of \(m\)-convex hulls with applications to lasso and convex aggregation. arXiv preprint arXiv:1705.10696 (2017)

  6. Bühlmann, P., Van De Geer, S.: Statistics for High-dimensional Data: Methods, Theory and Applications. Springer, Heidelberg (2011). https://doi.org/10.1007/978-3-642-20192-9

    Book  MATH  Google Scholar 

  7. Candes, E., Romberg, J.: Sparsity and incoherence in compressive sampling. Inverse Prob. 23(3), 969 (2007)

    Article  MathSciNet  Google Scholar 

  8. Candès, E.J.: Mathematics of sparsity (and a few other things). In: Proceedings of the International Congress of Mathematicians, Seoul, South Korea, vol. 123. Citeseer (2014)

    Google Scholar 

  9. Candes, E.J., Eldar, Y.C., Needell, D., Randall, P.: Compressed sensing with coherent and redundant dictionaries. Appl. Comput. Harmonic Anal. 31(1), 59–73 (2011)

    Article  MathSciNet  Google Scholar 

  10. Candès, E.J., Plan, Y.: Near-ideal model selection by l1 minimization. Ann. Stat. 37(5A), 2145–2177 (2009)

    Article  Google Scholar 

  11. Candès, E.J., Wakin, M.B.: An introduction to compressive sampling. IEEE Signal Process. Mag. 25(2), 21–30 (2008)

    Article  Google Scholar 

  12. Cevher, V., Boufounos, P., Baraniuk, R.G., Gilbert, A.C., Strauss, M.J.: Near-optimal Bayesian localization via incoherence and sparsity. In: International Conference on Information Processing in Sensor Networks, IPSN 2009, pp. 205–216. IEEE (2009)

    Google Scholar 

  13. Chrétien, S., Darses, S.: Invertibility of random submatrices via tail-decoupling and a matrix Chernoff inequality. Stat. Probab. Lett. 82(7), 1479–1487 (2012)

    Article  MathSciNet  Google Scholar 

  14. Chrétien, S., Darses, S.: Sparse recovery with unknown variance: a Lasso-type approach. IEEE Trans. Inf. Theory 60(7), 3970–3988 (2014)

    Article  MathSciNet  Google Scholar 

  15. Foucart, S., Rauhut, H.: A Mathematical Introduction to Compressive Sensing, vol. 1. Birkhäuser, Basel (2013)

    Book  Google Scholar 

  16. Hager, W.W.: Minimizing a quadratic over a sphere. SIAM J. Optim. 12(1), 188–208 (2001)

    Article  MathSciNet  Google Scholar 

  17. Mallat, S.: A Wavelet Tour of Signal Processing: The Sparse Way. Academic Press, Cambridge (2008)

    MATH  Google Scholar 

  18. Nelson, J.L., Temlyakov, V.N.: On the size of incoherent systems. J. Approximation Theory 163(9), 1238–1245 (2011)

    Article  MathSciNet  Google Scholar 

  19. Romberg, J.: Imaging via compressive sampling. IEEE Signal Process. Mag. 25(2), 14–20 (2008)

    Article  Google Scholar 

  20. Van De Geer, S.A., Bühlmann, P., et al.: On the conditions used to prove oracle results for the Lasso. Electron. J. Stat. 3, 1360–1392 (2009)

    Article  MathSciNet  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Stéphane Chrétien .

Editor information

Editors and Affiliations

A Minimizing Quadratic Functionals on the Sphere

A Minimizing Quadratic Functionals on the Sphere

1.1 A.1 A Semi-explicit Solution

The following result can be found in [16].

Lemma 1

For \(Q\in \mathbb S_p\) and \(q\in \mathbb R^p\), consider the following quadratic programming problem over the sphere:

$$\begin{aligned} \min _{\Vert x \Vert _2=1} \quad \frac{1}{2} x^tQx-q^tx. \end{aligned}$$
(10)

Let \(\lambda _1 \le \ldots \le \lambda _p\) be the eigenvalues of Q and \(\phi _1\),...,\( \phi _p\) be associated pairwise orthogonal, unit-norm eigenvectors. Let \(\gamma _{k,i}=q^t \phi _i\), \(i=1,\ldots ,p\). Let \(\mathcal E_1=\{i \text { s.t. } \lambda _i=\lambda _1 \}\) and \(\mathcal E_+=\{i \text { s.t. } \lambda _i>\lambda _1 \}\). Then, \(x^*\) is a solution if and only if

$$\begin{aligned} x^*&= \sum _{i=1}^p c^*_i \phi _i \end{aligned}$$

and

  1. 1.

    degenerate case: If \(\gamma _i=0\) for all \(i \in \mathcal E_1\) and

    $$\begin{aligned} \sum _{i\in \mathcal E_+} \, \frac{\gamma _i^2}{(\lambda _i-\lambda _1)^2} \le 1. \end{aligned}$$

    then \(c_i^*=\gamma _i/(\lambda _i-\lambda _1)\), \(i\in \mathcal E_1\) and \(c_i^*\), \(i\in \mathcal E_1\) are arbitrary under the constraint that \(\sum _{i\in \mathcal E_1} \quad c^{*^2}_i = 1-\sum _{i\in \mathcal E_+} \quad c^{*^2}_i\).

  2. 2.

    nondegenerate case: If not in the degenerate case, \(c_i^*=\gamma _i/(\lambda _i-\mu )\), \(i=1,\ldots ,n\) for \(\mu > -\lambda _1\) which is a solution of

    $$\begin{aligned} \sum _{i=1,\ldots ,n} \, \frac{\gamma _i^2}{(\lambda _i-\mu )^2}&= 1. \end{aligned}$$
    (11)

Moreover, we have the following useful result.

Corollary 1

If Q is positive definite, and \(\sum _{i=1,\ldots ,p} \ \gamma _i^2/\lambda _i^2 <1\), then \(0<\mu <\lambda _1\).

Proof

This follows immediately from the intermediate value theorem.

1.2 A.2 Bounds on \(\mu \)

From (11), we can get the following easy bounds on \(\mu \).

Lemma 2

Let \(\gamma _{\min }= \min _{i=1}^p \gamma _i\) and \(\gamma _{\max }= \max _{i=1}^p \gamma _i\). Then, we have

$$\begin{aligned} p \gamma _{\max }^2 \ge \max _{i=1}^p \ \{(\lambda _i-\mu )^2\}&\ge p \gamma _{\min }^2. \end{aligned}$$
(12)

and

$$\begin{aligned} \gamma _{\min }^2 \le \min _{i=1}^p \ \{(\lambda _i-\mu )^2\}&\le \Vert \gamma \Vert _2^2 \end{aligned}$$
(13)

Proof

The proof is divided into three parts, corresponding to each (double) inequality.

Proof of (12): We have

$$\begin{aligned} \max _{i=1}^p \frac{\gamma ^2_{\max }}{(\lambda _i-\mu )^2}&\ge \max _{i=1}^p \frac{\gamma _i^2}{(\lambda _i-\mu )^2} \\&\ge \frac{1}{p} \sum _{i=1}^p \frac{\gamma _i^2}{(\lambda _i-\mu )^2}\\&=\frac{1}{p}. \end{aligned}$$

This immediately gives \(p \gamma _{\max } \ge \max _{i=1}^p \ \{(\lambda _i-\mu )^2\}\). On the one hand, we have

$$\begin{aligned} 1=p \ \sum _{i=1,\ldots ,p} \, \frac{\gamma _i^2}{(\lambda _i-\mu )^2}&\ge \frac{p\gamma _{\min }^2}{\max _{i=1}^p \{(\lambda _i-\mu )^2\}}. \end{aligned}$$

Therefore, we get \(\max _{i=1}^p \{(\lambda _i-\mu )^2\}\ge p \ \gamma _{\min }^2\). On the other hand, we have

Proof of (13):

$$\begin{aligned} \frac{\gamma _i^2}{(\lambda _i-\mu )^2}&\le 1 \end{aligned}$$

which gives

$$\begin{aligned} (\lambda _i-\mu )^2&\ge \gamma _i^2 \end{aligned}$$

for \(i=1,\ldots ,p\). Thus, the lower bound follows. For the other bound, since

$$\begin{aligned} \sum _{i=1}^p \frac{\gamma _i^2}{(\lambda _i-\mu )^2}&=1, \end{aligned}$$
(14)

we get

$$\begin{aligned} 1 \le \sum _{i=1}^p \frac{\gamma _i^2}{(\lambda _i-\mu )^2}&\le \frac{\Vert \gamma \Vert _2^2}{\min _{i=1}^p \ (\lambda _i-\mu )^2} \end{aligned}$$

and the proof in completed.

1.3 A.3 \(\ell _\infty \) Perturbation of the Linear Term

We now consider the problem of controlling the solution under perturbation of q.

Lemma 3

Consider the two quadratic programming problems over the sphere:

$$\begin{aligned} \min _{\Vert x\Vert _2=1} \quad \frac{1}{2} x^tQx-q_k^tx, \end{aligned}$$
(15)

for \(k=1,2\). Assume that the solution to (15) is non-degenerate in both cases \(k=1,2\) and let \(x^*_1\) and \(x^*_2\) be the corresponding solutions. Assume further that \(\sum _{i=1,\ldots ,n} \ \gamma _{k,i}^2/\lambda _i^2 <1\), \(k=1,2\). Let \(\phi \) denote the inverse function of \(x\mapsto x/(1+x)^3\). Then, we have

$$\begin{aligned} \Vert x_1^*-x_2^*\Vert _{\infty }&\le \sqrt{p} \left( \frac{\Vert \gamma _{1}-\gamma _{2}\Vert _2 }{(\lambda _1-\mu _2)}+ \frac{\Vert \gamma _{1}\Vert _2 \ \nu ^*}{(\lambda _1-\mu _1)(\lambda _1-\mu _2)} \right) ^2, \end{aligned}$$

with \(r^*\) given by

$$\begin{aligned} \nu ^*&= (\lambda _p-\mu _1) \phi \left( p \ \frac{\gamma _{1,\max }^2}{\gamma _{1,\min }^2} \frac{\Vert \gamma _1^2-\gamma _2^2 \Vert _1}{2 \ \Vert \gamma _2\Vert _2^2}\right) \end{aligned}$$

Proof

Let \(\varPhi \) denote the matrix whose columns are the eigenvectors of A. More precisely, \(\lambda _1\le \cdots \le \lambda _p\) and let \(\phi _i\) be an eigenvector associated with \(\lambda _i\), \(i=1,\ldots ,p\). Let \(\gamma _i=q^t \phi _i\), \(i=1,\ldots ,p\). Let \(c_1^*\) (resp. \(c_2^*\)) be the vector of coefficients of \(x_1^*\) (resp. \(x_2^*\)) in the eigenbasis of A. For each \(k=1,2\), there exists a real \(\mu _k\) such that

$$\begin{aligned} c_{k,i}^*=\frac{\gamma _{k,i}}{(\lambda _i-\mu _k)}, \end{aligned}$$

\(i=1,\ldots ,p\) for \(\mu _k > -\lambda _1\) which is a solution of

$$\begin{aligned} \sum _{i=1}^p \, \frac{\gamma _{k,i}^2}{(\lambda _i-\mu )^2} = 1. \end{aligned}$$

Now, apply Neuberger’s Theorem 2 to obtain an estimation of \(\vert \mu _1-\mu _2\vert \) as a function of \(\gamma _1\) and \(\gamma _2\). For this purpose, set

$$\begin{aligned} F(\mu )&= \sum _{i=1}^p \, \frac{\gamma _{2,i}^2}{(\lambda _i-\mu )^2} -1, \ i.e. \quad F'(\mu ) = 2 \sum _{i=1}^p \ \frac{\gamma _{2,i}^2}{(\lambda _i-\mu )^3}. \end{aligned}$$

Now, we need to find the smallest value of \(\nu \) such that, for all \(\mu \in B(\mu _1,\nu )\), we need to find a number \(h \in \bar{B}(0,\nu )\) such that

$$\begin{aligned} h&= F'(\mu )^{-1}\ F(\mu _1) \end{aligned}$$

We therefore have that

$$\begin{aligned} h&= \frac{\sum _{i=1}^p \frac{\gamma ^2_{2,i}}{(\lambda _i-\mu _1)^2}-1}{2 \ \sum _{i=1}^p \frac{\gamma ^2_{2,i}}{(\lambda _i-\mu )^3}} = \frac{\sum _{i=1}^p \frac{\gamma ^2_{1,i}}{(\lambda _i-\mu _1)^2}-1+ \sum _{i=1}^p \frac{\gamma ^2_{2,i}-\gamma ^2_{1,i}}{(\lambda _i-\mu _1)^2}}{2 \ \sum _{i=1}^p \frac{\gamma ^2_{2,i}}{(\lambda _i-\mu )^3}} \end{aligned}$$

and since

$$\begin{aligned} \sum _{i=1}^p \ \frac{\gamma ^2_{1,i}}{(\lambda _i-\mu _1)^2}&=1, \end{aligned}$$

we have

$$\begin{aligned} h&\le \frac{ (\min _{i=1}^p\ \{(\lambda _i -\mu _1)^{2}\})^{-1} \ \Vert \gamma ^2_1-\gamma ^2_2\Vert _1 }{2 \ \Vert \gamma _{2} \Vert _2^2 \ (\max \{(\lambda _i-\mu )^3\})^{-1}} \end{aligned}$$

where \(\cdot ^2\) is to be understood componentwise. Moreover, since \(\sum _{i=1,\ldots ,p}\)\(\gamma _{k,i}^2/\lambda _i^2 <1\), \(k=1,2\),

$$\begin{aligned} \max \{(\lambda _i-\mu )^3\}&= (\lambda _p-\mu _1 +r)^3 \text { and } \min _{i=1}^p \{(\lambda _i-\mu _1)^2\} = (\lambda _1-\mu _1)^2. \end{aligned}$$

Thus, for \(\nu >0\) such that

$$\begin{aligned} \nu&\ge \frac{ \Vert \gamma ^2_1-\gamma ^2_2\Vert _1 \ (\lambda _p-\mu _1+\nu )^{3} }{2 \ \Vert \gamma _{2} \Vert _2^2 \ (\lambda _1 -\mu _1)^2}, \end{aligned}$$

we get from Theorem 2 that there exists a solution to the equation \(F(u)=0\) inside the ball \(\bar{B}(\mu _1,\nu )\). Make the change of variable

$$\begin{aligned} \nu&= \alpha (\lambda _p-\mu _1) \end{aligned}$$

and obtain that we need to find \(\alpha \in (0,1)\) such that

$$\begin{aligned} \frac{\alpha }{(1+\alpha )^3}&\ge \frac{ \Vert \gamma ^2_1-\gamma ^2_2\Vert _1 \ (\lambda _n-\mu _1)^{2} }{2 \ \Vert \gamma _{2} \Vert _2^2 \ (\lambda _1 -\mu _1)^2}. \end{aligned}$$

Lemma 2 now gives

$$\begin{aligned} \frac{ (\lambda _n-\mu _1)^{2} }{(\lambda _1 -\mu _1)^2}&\le p\ \frac{\gamma _{1,\max }^2}{\gamma _{1,\min }^2} \end{aligned}$$

from which we get that the value \(\nu ^*\) of \(\nu \) given by

$$\begin{aligned} \nu ^*&= (\lambda _p-\mu _1) \phi \left( p \ \frac{\gamma _{1,\max }^3}{\gamma _{1,\min }^2} \frac{\Vert \gamma _1^2-\gamma _2^2 \Vert _1}{2 \ \Vert \gamma _2\Vert _2^2}\right) \end{aligned}$$

is admissible, for \(\Vert \gamma _1^2-\gamma _2^2\Vert _1\) such that the term involving \(\phi \) is less than one.

$$\begin{aligned} \frac{\gamma _{1,i}}{(\lambda _i-\mu _1)}-\frac{\gamma _{2,i}}{(\lambda _i-\mu _2)}&= \frac{\gamma _{1,i}(\lambda _i-\mu _1+\mu _1-\mu _2)-\gamma _{2,i}(\lambda _i-\mu _1)}{(\lambda _i-\mu _1)(\lambda _i-\mu _2)} \\&= \frac{(\gamma _{1,i}-\gamma _{2,i})}{\lambda _i-\mu _2}+\frac{\gamma _{1,i}(\mu _1-\mu _2)}{(\lambda _i-\mu _1)(\lambda _i-\mu _2)}. \end{aligned}$$

Therefore,

$$\begin{aligned} \Vert c_1^*-c_2^* \Vert _2^2&\le \left( \frac{\Vert \gamma _{1}-\gamma _{2}\Vert _2 }{(\lambda _1-\mu _2)}+ \frac{\Vert \gamma _{1}\Vert _2 \ \vert \mu _1-\mu _2\vert }{(\lambda _1-\mu _1)(\lambda _1-\mu _2)} \right) ^2. \end{aligned}$$

Finally, using that \(\vert \mu _1-\mu _2\vert \le \nu ^*\), we get

$$\begin{aligned} \Vert c^*_1-c^*_2 \Vert _2&\le \left( \frac{\Vert \gamma _{1}-\gamma _{2}\Vert _2 }{(\lambda _1-\mu _2)}+ \frac{\Vert \gamma _{1}\Vert _2 \ \nu ^*}{(\lambda _1-\mu _1)(\lambda _1-\mu _2)} \right) ^2, \end{aligned}$$

which gives

$$\begin{aligned} \Vert x_1^*-x_2^*\Vert _{\infty }&\le \sqrt{p} \left( \frac{\Vert \gamma _{1}-\gamma _{2}\Vert _2 }{(\lambda _1-\mu _2)}+ \frac{\Vert \gamma _{1}\Vert _2 \ \nu ^*}{(\lambda _1-\mu _1)(\lambda _1-\mu _2)} \right) ^2, \end{aligned}$$

as announced.

1.4 A.4 Neuberger’s Theorem

In this subsection, we recall Neuberger’s theorem.

Theorem 2

Suppose that \(r > 0\), that \(x \in R^p\), and that F is a continuous function from \(\bar{B}(x,r)\) to \(R^m\) with the property that for each y in B(xr), there is an h in \(\bar{B}(0,r)\) such that

$$\begin{aligned} \lim _{t\rightarrow 0+} \ \frac{(F(y + th) - F(y))}{t}&= -F(x). \end{aligned}$$
(16)

Then, there exists u in \(\bar{B}(x,r)\) such that \(F(u) = 0\).

Rights and permissions

Reprints and permissions

Copyright information

© 2019 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Chrétien, S., Ho, Z.W.O. (2019). Incoherent Submatrix Selection via Approximate Independence Sets in Scalar Product Graphs. In: Nicosia, G., Pardalos, P., Umeton, R., Giuffrida, G., Sciacca, V. (eds) Machine Learning, Optimization, and Data Science. LOD 2019. Lecture Notes in Computer Science(), vol 11943. Springer, Cham. https://doi.org/10.1007/978-3-030-37599-7_9

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-37599-7_9

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-37598-0

  • Online ISBN: 978-3-030-37599-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics