skip to main content
10.1145/2623330.2623736acmconferencesArticle/Chapter ViewAbstractPublication PageskddConference Proceedingsconference-collections
research-article

Improved testing of low rank matrices

Published: 24 August 2014 Publication History

Abstract

We study the problem of determining if an input matrix A εRm x n can be well-approximated by a low rank matrix. Specifically, we study the problem of quickly estimating the rank or stable rank of A, the latter often providing a more robust measure of the rank. Since we seek significantly sublinear time algorithms, we cast these problems in the property testing framework. In this framework, A either has low rank or stable rank, or is far from having this property. The algorithm should read only a small number of entries or rows of A and decide which case A is in with high probability. If neither case occurs, the output is allowed to be arbitrary. We consider two notions of being far: (1) A requires changing at least an ε-fraction of its entries, or (2) A requires changing at least an ε-fraction of its rows. We call the former the "entry model" and the latter the "row model". We show:
For testing if a matrix has rank at most d in the entry model, we improve the previous number of entries of A that need to be read from O(d22) (Krauthgamer and Sasson, SODA 2003) to O(d2/ε). Our algorithm is the first to adaptively query the entries of A, which for constant d we show is necessary to achieve O(1/ε) queries. For the important case of d = 1 we also give a new non-adaptive algorithm, improving the previous O(1/ε2) queries to O(log2(1/ε) / ε).
For testing if a matrix has rank at most d in the row model, we prove an Ω(d/ε) lower bound on the number of rows that need to be read, even for adaptive algorithms. Our lower bound matches a non-adaptive upper bound of Krauthgamer and Sasson.
For testing if a matrix has stable rank at most d in the row model or requires changing an ε/d-fraction of its rows in order to have stable rank at most d, we prove that reading θ(d2) rows is necessary and sufficient.
We also give an empirical evaluation of our rank and stable rank algorithms on real and synthetic datasets.

Supplementary Material

MP4 File (p691-sidebyside.mp4)

References

[1]
D. M. Blei, A. Y. Ng, and M. I. Jordan. Latent dirichlet allocation. the Journal of machine Learning research, 2003.
[2]
E. J. Candés, X. Li, Y. Ma, and J. Wright. Robust principal component analysis? J. ACM, 58(3):11, 2011.
[3]
C. H. Q. Ding and X. He. K-means clustering via principal component analysis. In ICML, 2004.
[4]
D. Feldman, M. Schmidt, and C. Sohler. Turning big data into tiny data: Constant-size coresets for k-means, pca and projective clustering. In Proceedings of the Annual ACM-SIAM Symposium on Discrete Algorithms, 2013.
[5]
O. Goldreich. A brief introduction to property testing. In Studies in Complexity and Cryptography, pages 465--469. 2011.
[6]
H. Hotelling. Analysis of a complex of statistical variables into principal components. Journal of educational psychology, 24(6):417, 1933.
[7]
I. T. Jolliffe. Graphical Representation of Data Using Principal Components. Springer, 2002.
[8]
R. Krauthgamer and O. Sasson. Property testing of data dimensionality. In SODA, pages 18--27, 2003.
[9]
D. D. Lee and H. S. Seung. Algorithms for non-negative matrix factorization. Advances in Neural Information Processing Systems, 2001.
[10]
M. Magdon-Ismail. Row sampling for matrix algorithms via a non-commutative bernstein bound. arXiv:1008.0587, 2010.
[11]
M. W. Mahoney. Randomized algorithms for matrices and data. Foundations and Trends in Machine Learning, 3(2):123--224, 2011.
[12]
M. Parnas and D. Ron. Testing metric properties. In STOC, pages 276--285, 2001.
[13]
K. Pearson. On lines and planes of closest fit to systems of points in space. Philosophical Magazine, 2(6):559--572, 1901.
[14]
M. Rudelson and R. Vershynin. Sampling from large matrices: An approach through geometric functional analysis. J. ACM, 54(4), July 2007.
[15]
B. M. Sarwar, G. Karypis, J. A. Konstan, and J. Riedl. Application of dimensionality reduction in recommender systems -- a case study. In Proceedings of the ACM WebKDD Workshop, 2000.
[16]
J. A. Tropp. Column subset selection, matrix factorization, and eigenvalue optimization. In SODA, pages 978--986, 2009.
[17]
C. Yang, T. Han, L. Quan, and C.-L. Tai. Parsing façade with rank-one approximation. In CVPR, pages 1720--1727, 2012.

Cited By

View all
  • (2024)Sublinear Time Eigenvalue Approximation via Random SamplingAlgorithmica10.1007/s00453-024-01208-586:6(1764-1829)Online publication date: 12-Feb-2024
  • (2021)Property Testing of the Boolean and Binary RankTheory of Computing Systems10.1007/s00224-021-10047-8Online publication date: 3-Jun-2021
  • (2020)Testing Positive Semi-Definiteness via Random Submatrices2020 IEEE 61st Annual Symposium on Foundations of Computer Science (FOCS)10.1109/FOCS46700.2020.00114(1191-1202)Online publication date: Nov-2020
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
KDD '14: Proceedings of the 20th ACM SIGKDD international conference on Knowledge discovery and data mining
August 2014
2028 pages
ISBN:9781450329569
DOI:10.1145/2623330
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 24 August 2014

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. dimensionality reduction
  2. principal component analysis
  3. property testing
  4. robustness
  5. stable rank

Qualifiers

  • Research-article

Conference

KDD '14
Sponsor:

Acceptance Rates

KDD '14 Paper Acceptance Rate 151 of 1,036 submissions, 15%;
Overall Acceptance Rate 1,133 of 8,635 submissions, 13%

Upcoming Conference

KDD '25

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)16
  • Downloads (Last 6 weeks)2
Reflects downloads up to 01 Mar 2025

Other Metrics

Citations

Cited By

View all
  • (2024)Sublinear Time Eigenvalue Approximation via Random SamplingAlgorithmica10.1007/s00453-024-01208-586:6(1764-1829)Online publication date: 12-Feb-2024
  • (2021)Property Testing of the Boolean and Binary RankTheory of Computing Systems10.1007/s00224-021-10047-8Online publication date: 3-Jun-2021
  • (2020)Testing Positive Semi-Definiteness via Random Submatrices2020 IEEE 61st Annual Symposium on Foundations of Computer Science (FOCS)10.1109/FOCS46700.2020.00114(1191-1202)Online publication date: Nov-2020
  • (2019)Testing matrix rank, optimallyProceedings of the Thirtieth Annual ACM-SIAM Symposium on Discrete Algorithms10.5555/3310435.3310481(727-746)Online publication date: 6-Jan-2019
  • (2019)Testing Proximity to Subspaces: Approximate $$\ell _\infty $$ ℓ ∞  Minimization in Constant TimeAlgorithmica10.1007/s00453-019-00642-0Online publication date: 25-Oct-2019
  • (2017)Local reconstruction of low-rank matrices and subspacesRandom Structures & Algorithms10.1002/rsa.2072051:4(607-630)Online publication date: 1-Dec-2017
  • (2016)On approximating functions of the singular values in a streamProceedings of the forty-eighth annual ACM symposium on Theory of Computing10.1145/2897518.2897581(726-739)Online publication date: 19-Jun-2016

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media