research-article

Improved testing of low rank matrices

Authors:

Yi Li,

Zhengyu Wang,

David P. WoodruffAuthors Info & Claims

KDD '14: Proceedings of the 20th ACM SIGKDD international conference on Knowledge discovery and data mining

Pages 691 - 700

https://doi.org/10.1145/2623330.2623736

Published: 24 August 2014 Publication History

Get Access

Abstract

We study the problem of determining if an input matrix A εR^{m x n} can be well-approximated by a low rank matrix. Specifically, we study the problem of quickly estimating the rank or stable rank of A, the latter often providing a more robust measure of the rank. Since we seek significantly sublinear time algorithms, we cast these problems in the property testing framework. In this framework, A either has low rank or stable rank, or is far from having this property. The algorithm should read only a small number of entries or rows of A and decide which case A is in with high probability. If neither case occurs, the output is allowed to be arbitrary. We consider two notions of being far: (1) A requires changing at least an ε-fraction of its entries, or (2) A requires changing at least an ε-fraction of its rows. We call the former the "entry model" and the latter the "row model". We show:

For testing if a matrix has rank at most d in the entry model, we improve the previous number of entries of A that need to be read from O(d²/ε²) (Krauthgamer and Sasson, SODA 2003) to O(d²/ε). Our algorithm is the first to adaptively query the entries of A, which for constant d we show is necessary to achieve O(1/ε) queries. For the important case of d = 1 we also give a new non-adaptive algorithm, improving the previous O(1/ε²) queries to O(log²(1/ε) / ε).

For testing if a matrix has rank at most d in the row model, we prove an Ω(d/ε) lower bound on the number of rows that need to be read, even for adaptive algorithms. Our lower bound matches a non-adaptive upper bound of Krauthgamer and Sasson.

For testing if a matrix has stable rank at most d in the row model or requires changing an ε/d-fraction of its rows in order to have stable rank at most d, we prove that reading θ(d/ε²) rows is necessary and sufficient.

We also give an empirical evaluation of our rank and stable rank algorithms on real and synthetic datasets.

Supplementary Material

MP4 File (p691-sidebyside.mp4)

Download
236.67 MB

References

[1]

D. M. Blei, A. Y. Ng, and M. I. Jordan. Latent dirichlet allocation. the Journal of machine Learning research, 2003.

Abstract

Supplementary Material

References

Cited By

Index Terms

Recommendations

An Implicit Multishift $QR$-Algorithm for Hermitian Plus Low Rank Matrices

Generalized low-rank approximations of matrices revisited

Compressing Rank-Structured Matrices via Randomized Sampling

Comments

Information

Published In

Sponsors

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Upcoming Conference

Contributors

Other Metrics

Bibliometrics

Article Metrics

Other Metrics

Citations

Cited By

Login options

Full Access

View options

PDF

eReader

Share

Share this Publication link

Share on social media

Affiliations