skip to main content
10.1145/3313276.3316350acmconferencesArticle/Chapter ViewAbstractPublication PagesstocConference Proceedingsconference-collections
research-article

Performance of Johnson-Lindenstrauss transform for k-means and k-medians clustering

Published: 23 June 2019 Publication History

Abstract

Consider an instance of Euclidean k-means or k-medians clustering. We show that the cost of the optimal solution is preserved up to a factor of (1+ε) under a projection onto a random O(log(k /ε) / ε2)-dimensional subspace. Further, the cost of every clustering is preserved within (1+ε). More generally, our result applies to any dimension reduction map satisfying a mild sub-Gaussian-tail condition. Our bound on the dimension is nearly optimal. Additionally, our result applies to Euclidean k-clustering with the distances raised to the p-th power for any constant p.
For k-means, our result resolves an open problem posed by Cohen, Elder, Musco, Musco, and Persu (STOC 2015); for k-medians, it answers a question raised by Kannan.

References

[1]
{AC06} Nir Ailon and Bernard Chazelle. Approximate nearest neighbors and the fast Johnson–Lindenstrauss transform. In Proceedings of the Symposium on Theory of Computing, pages 557–563, 2006.
[2]
{Ach03} Dimitris Achlioptas. Database-friendly random projections: Johnson– Lindenstrauss with binary coins. Journal of Computer and System Sciences, 66(4):671–687, 2003.
[3]
{AL09} Nir Ailon and Edo Liberty. Fast dimension reduction using Rademacher series on dual BCH codes. Discrete & Computational Geometry, 42(4):615, 2009.
[4]
{AL13} Nir Ailon and Edo Liberty. An almost optimal unrestricted fast Johnson– Lindenstrauss transform. ACM Transactions on Algorithms (TALG), 9(3):21, 2013.
[5]
{Alo03} Noga Alon. Problems and results in extremal combinatorics-I. Discrete Mathematics, 273(1-3):31–53, 2003.
[6]
{BBCA + 19} Luca Becchetti, Marc Bury, Vincent Cohen-Addad, Fabrizio Grandoni, and Chris Schwiegelshohn. Oblivious dimension reduction for k-means – beyond subspaces and the Johnson-Lindenstrauss lemma. In Proceedings of the Symposium on Theory of Computing, 2019.
[7]
{BDM09} Christos Boutsidis, Petros Drineas, and Michael W Mahoney. Unsupervised feature selection for the k-means clustering problem. In Advances in Neural Information Processing Systems, pages 153–161, 2009.
[8]
{BMI13} Christos Boutsidis and Malik Magdon-Ismail. Deterministic feature selection for k-means clustering. IEEE Transactions on Information Theory, 59(9):6099–6110, 2013.
[9]
{BZD10} Christos Boutsidis, Anastasios Zouzias, and Petros Drineas. Random projections for k-means clustering. In Advances in Neural Information Processing Systems, pages 298–306, 2010.
[10]
{BZMD15} Christos Boutsidis, Anastasios Zouzias, Michael W Mahoney, and Petros Drineas. Randomized dimensionality reduction for k-means clustering. IEEE Transactions on Information Theory, 61(2):1045–1062, 2015.
[11]
{CEM + 15} Michael B Cohen, Sam Elder, Cameron Musco, Christopher Musco, and Madalina Persu. Dimensionality reduction for k-means clustering and low rank approximation. In Proceedings of the Symposium on Theory of Computing, pages 163–172, 2015.
[12]
{DFK + 99} Petros Drineas, Alan M Frieze, Ravi Kannan, Santosh Vempala, and V Vinay. Clustering in large graphs and matrices. In Proceedings of the Symposium on Discrete Algorithms, pages 291–299, 1999.
[13]
{DG03} Sanjoy Dasgupta and Anupam Gupta. An elementary proof of a theorem of Johnson and Lindenstrauss. Random Structures & Algorithms, 22(1):60– 65, 2003.
[14]
{DKS10} Anirban Dasgupta, Ravi Kumar, and Tamás Sarlós. A sparse Johnson– Lindenstrauss transform. In Prooceedings of the Symposium on Theory of Computing, pages 341–350, 2010.
[15]
{Far90} Nariman Farvardin. A study of vector quantization for noisy channels. IEEE Transactions on Information Theory, 36(4):799–809, 1990.
[16]
{FSS13} Dan Feldman, Melanie Schmidt, and Christian Sohler. Turning big data into tiny data: Constant-size coresets for k-means, pca and projective clustering. In Proceedings of the Symposium on Discrete Algorithms, pages 1434–1453, 2013.
[17]
{IM98} Piotr Indyk and Rajeev Motwani. Approximate nearest neighbors: towards removing the curse of dimensionality. In Proceedings of the Symposium on Theory of Computing, pages 604–613, 1998.
[18]
{ Jai10} Anil K Jain. Data clustering: 50 years beyond k-means. Pattern recognition letters, 31(8):651–666, 2010.
[19]
{ JDS11} Herve Jegou, Matthijs Douze, and Cordelia Schmid. Product quantization for nearest neighbor search. IEEE Transactions on Pattern Analysis and Machine Intelligence, 33(1):117–128, 2011.
[20]
{ JL84} William Johnson and Joram Lindenstrauss. Extensions of Lipschitz mappings into a Hilbert space. In Conference in modern analysis and probability (New Haven, Connecticut, 1982), volume 26 of Contemporary Mathematics, pages 189–206. 1984.
[21]
{Kan18} Ravi Kannan. Intro and foundations of data science I. Tutorial at Simons Institute, 2018. Available at https://www.youtube.com/watch?v=9GMT3FnQTGM. {Kir34} M Kirszbraun. Über die zusammenziehende und lipschitzsche transformationen. Fundamenta Mathematicae, 22(1):77–108, 1934.
[22]
{KM + 05} B Klartag, Shahar Mendelson, et al. Empirical processes and random projections. Journal of Functional Analysis, 225(1):229, 2005.
[23]
{KN14} Daniel M Kane and Jelani Nelson. Sparser Johnson-Lindenstrauss transforms. Journal of the ACM, 61(1):4, 2014.
[24]
{KW11} Felix Krahmer and Rachel Ward. New and improved Johnson– Lindenstrauss embeddings via the restricted isometry property. SIAM Journal on Mathematical Analysis, 43(3):1269–1281, 2011.
[25]
{Llo82} Stuart Lloyd. Least squares quantization in PCM. IEEE Transactions on Information Theory, 28(2):129–137, 1982.
[26]
{LM00} Beatrice Laurent and Pascal Massart. Adaptive estimation of a quadratic functional by model selection. Annals of Statistics, pages 1302–1338, 2000.
[27]
{LN17} Kasper Green Larsen and Jelani Nelson. Optimality of the Johnson– Lindenstrauss lemma. In Symposium on Foundations of Computer Science, pages 633–638. IEEE, 2017.
[28]
{Nao18} Assaf Naor. Metric dimension reduction: A snapshot of the Ribe program. arXiv preprint arXiv:1809.02376, 2018.
[29]
{NPW14} Jelani Nelson, Eric Price, and Mary Wootters. New constructions of RIP matrices with fast multiplication and fewer rows. In Proceedings of the Symposium on Discrete Algorithms, pages 1515–1528, 2014.
[30]
{Sar06} Tamas Sarlos. Improved approximation algorithms for large matrices via random projections. In Proceedings of the Foundations of Computer Science, pages 143–152, 2006.
[31]
{SW18} Christian Sohler and David P Woodruff. Strong coresets for k-median and subspace approximation: Goodbye dimension. In Proceedings of the Foundations of Computer Science, pages 802–813, 2018.

Cited By

View all
  • (2024)Coresets for multiple ℓp regressionProceedings of the 41st International Conference on Machine Learning10.5555/3692070.3694250(53202-53233)Online publication date: 21-Jul-2024
  • (2024)Making old things newProceedings of the 41st International Conference on Machine Learning10.5555/3692070.3692550(12046-12086)Online publication date: 21-Jul-2024
  • (2024)Fair Projections as a Means toward Balanced RecommendationsACM Transactions on Intelligent Systems and Technology10.1145/366492916:1(1-32)Online publication date: 30-Dec-2024
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
STOC 2019: Proceedings of the 51st Annual ACM SIGACT Symposium on Theory of Computing
June 2019
1258 pages
ISBN:9781450367059
DOI:10.1145/3313276
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 23 June 2019

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. Johnson-Lindenstrauss transform
  2. Kirszbraun theorem
  3. clustering
  4. dimension reduction
  5. k-means
  6. k-medians

Qualifiers

  • Research-article

Conference

STOC '19
Sponsor:

Acceptance Rates

Overall Acceptance Rate 1,469 of 4,586 submissions, 32%

Upcoming Conference

STOC '25
57th Annual ACM Symposium on Theory of Computing (STOC 2025)
June 23 - 27, 2025
Prague , Czech Republic

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)79
  • Downloads (Last 6 weeks)9
Reflects downloads up to 20 Feb 2025

Other Metrics

Citations

Cited By

View all
  • (2024)Coresets for multiple ℓp regressionProceedings of the 41st International Conference on Machine Learning10.5555/3692070.3694250(53202-53233)Online publication date: 21-Jul-2024
  • (2024)Making old things newProceedings of the 41st International Conference on Machine Learning10.5555/3692070.3692550(12046-12086)Online publication date: 21-Jul-2024
  • (2024)Fair Projections as a Means toward Balanced RecommendationsACM Transactions on Intelligent Systems and Technology10.1145/366492916:1(1-32)Online publication date: 30-Dec-2024
  • (2024)Settling Time vs. Accuracy Tradeoffs for Clustering Big DataProceedings of the ACM on Management of Data10.1145/36549762:3(1-25)Online publication date: 30-May-2024
  • (2024)TS-RTPM-Net: Data-Driven Tensor Sketching for Efficient CP DecompositionIEEE Transactions on Big Data10.1109/TBDATA.2023.331025410:1(1-11)Online publication date: Feb-2024
  • (2024)Random Projections for Curves in High DimensionsDiscrete & Computational Geometry10.1007/s00454-024-00710-5Online publication date: 11-Dec-2024
  • (2023)Dimension reduction for maximum matchings and the Fastest Mixing Markov ChainComptes Rendus. Mathématique10.5802/crmath.447361:G5(869-876)Online publication date: 18-Jul-2023
  • (2023)k-median clustering via metric embeddingProceedings of the 37th International Conference on Neural Information Processing Systems10.5555/3666122.3669350(73817-73838)Online publication date: 10-Dec-2023
  • (2023)On generalization bounds for projective clusteringProceedings of the 37th International Conference on Neural Information Processing Systems10.5555/3666122.3669262(71723-71754)Online publication date: 10-Dec-2023
  • (2023)Sketching algorithms for sparse dictionary learningProceedings of the 37th International Conference on Neural Information Processing Systems10.5555/3666122.3668223(48431-48443)Online publication date: 10-Dec-2023
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media