skip to main content
10.1145/1247069.1247072acmconferencesArticle/Chapter ViewAbstractPublication PagessocgConference Proceedingsconference-collections
Article

A PTAS for k-means clustering based on weak coresets

Published: 06 June 2007 Publication History

Abstract

Given a point set P ⊆ Rd the k-means clustering problem is to find a set C=(c1,...,ck) of k points and a partition of P into k clusters C1,...,Ck such that the sum of squared errors ∑i=1kp ∈ Ci |p -ci |22 is minimized. For given centers this cost function is minimized byassigning points to the nearest center.The k-means cost function is probably the most widely used cost function in the area of clustering.In this paper we show that every unweighted point set P has a weak (ε, k)-coreset of size Poly(k,1/ε) for the k-means clustering problem, i.e. its size is independent of the cardinality |P| of the point set and the dimension d of the Euclidean space Rd. A weak coreset is a weighted set S ⊆ P together with a set T such that T contains a (1+ε)-approximation for the optimal cluster centers from P and for every set of kcenters from T the cost of the centers for S is a (1±ε)-approximation of the cost for P.We apply our weak coreset to obtain a PTAS for the k-means clustering problem with running time O(nkd + d · Poly(k/ε) + 2Õ(k/ε)).

References

[1]
J.L. Bentley and J.B. Saxe. Decomposable searching problems I: Static--to--dynamic transformation. J. Algorithms, 1(4):301--358. 1980, pages 301--358, 1980.
[2]
M. Badoiu, S. Har--Peled, and P. Indyk. Approximate clustering via core-sets. Proc. 34th Annu. ACM Sympos. Theory Comput. (STOC), pages 396--407, 2002.
[3]
K. Chen. On k-Median clustering in high dimensions. Proc. 17th Annual ACM-SIAM Symposium of Discrete Algorithms (SODA), pages 1177--1185, 2006.
[4]
W. Fernandez de la Vega, M. Karpinski, C. Kenyon, and Y. Rabani. Approximation schemes for clustering problems. Proc. 35th Annu. ACM Sympos. Theory Comput. (STOC), pages 50--58, 2003.
[5]
M. Effros and L.J. Schulman. Deterministic clustering with data nets. Report TR04-085, Elec. Colloq. Comp. Complexity, http://www.eccc.uni--trier.de/eccc-reports/2004/TR04-085, 2003.
[6]
D. Feldman, A. Fiat, and M. Sharir. Coresets for weighted and their applications. Proc. 47th Annu. IEEE Sympos. Found. Comput. Sci. (FOCS), 2006.
[7]
G. Frahling and C. Sohler. Coresets in dynamic geometric data streams. Proc. 37th Annu. ACM Sympos. Theory Comput. (STOC), pages 209--217, 2005.
[8]
D. Haussler. Decision theoretic generalizations of the pac model for neural net and other learning applications. Information and Computation, 100(1):78--150, 1992.
[9]
S. Har-Peled and A. Kushal. Smaller coresets for k-median and k-means clustering. Proc. 21st Annu. ACM Sympos. Comput. Geom. (SOCG), pages 126--134, 2005.
[10]
S. Har-Peled and S. Mazumdar. Coresets for k-means and k-median clustering and their applications. Proc. 36th Annu. ACM Sympos. Theory Comput. (STOC), pages 291--300, 2004.
[11]
S. Har-Peled and K.R. Varadarajan. Approximation schemes for clustering problems. Proc. 18th Annu. ACM Sympos. Comput. Geom.(SoCG), pages 312--318, 2002.
[12]
M. Inaba, N. Katoh, and H. Imai. Applications of weighted voronoi diagrams and randomization to variance-based k-clustering. Proc. 10th Annu. ACM Sympos. Comput. Geom.(SoCG), pages 332--339, 1994.
[13]
A. Kumar, Y. Sabharwal, and S. Sen. A simple linear time (1+ε)-approximation algorithm for k-means clustering in any dimensions. Proc. 45th Annual Symposium on Foundations of Computer Science, pages 454--462, 2004.
[14]
A. Kumar, Y. Sabharwal, and S. Sen. Linear time algorithms for clustering problems in any dimensions. Proc. 32nd Annual Internat. Colloquium on Automata, Languages, and Programming (ICALP), pages 1374--1385, 2005.
[15]
S. Lloyd. Least squares quantization in pcm. IEEE Transactions on Information Theory, 28:129--137, 1982.
[16]
N. Linial, E. London, and Y. Rabinovich. The geometry of graphs and some of its algorithmic applications. Combinatorica, 15(2):215--245, 1995.
[17]
J. Matousek. On approximate geometric k-clustering. Discrete Comput. Geom., 24:61--84, 2000.
[18]
R.R. Mettu and C.G. Plaxton. Optimal time bounds for approximate clustering. Machine Learning, 56:35--60, 2004.
[19]
R. Ostrovsky, Y. Rabani, L. Shulman, and C. Swamy. The effectiveness of lloyd-type methods for the k-means problem. Proc. 47th Annu. IEEE Sympos. Found. Comput. Sci. (FOCS), 2006.

Cited By

View all

Index Terms

  1. A PTAS for k-means clustering based on weak coresets

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    SCG '07: Proceedings of the twenty-third annual symposium on Computational geometry
    June 2007
    404 pages
    ISBN:9781595937056
    DOI:10.1145/1247069
    • Program Chair:
    • Jeff Erickson
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 06 June 2007

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. approximation
    2. coresets
    3. geometric optimization
    4. k-mean

    Qualifiers

    • Article

    Acceptance Rates

    Overall Acceptance Rate 625 of 1,685 submissions, 37%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)44
    • Downloads (Last 6 weeks)2
    Reflects downloads up to 17 Jan 2025

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)Clustering with faulty centersComputational Geometry10.1016/j.comgeo.2023.102052117(102052)Online publication date: Feb-2024
    • (2024)Coresets for kernel clusteringMachine Learning10.1007/s10994-024-06540-z113:8(5891-5906)Online publication date: 22-Apr-2024
    • (2024)Speeding Up Constrained k-Means Through 2-MeansAlgorithmic Aspects in Information and Management10.1007/978-981-97-7801-0_5(52-63)Online publication date: 19-Sep-2024
    • (2024)An Effective RSP Data Sampling AlgorithmKnowledge Science, Engineering and Management10.1007/978-981-97-5501-1_25(331-342)Online publication date: 27-Jul-2024
    • (2023)Sketching algorithms for sparse dictionary learningProceedings of the 37th International Conference on Neural Information Processing Systems10.5555/3666122.3668223(48431-48443)Online publication date: 10-Dec-2023
    • (2023)Brief Announcement: Streaming Balanced ClusteringProceedings of the 35th ACM Symposium on Parallelism in Algorithms and Architectures10.1145/3558481.3591318(311-314)Online publication date: 17-Jun-2023
    • (2023)Parameterized Approximation Schemes for Clustering with General Norm Objectives2023 IEEE 64th Annual Symposium on Foundations of Computer Science (FOCS)10.1109/FOCS57990.2023.00085(1377-1399)Online publication date: 6-Nov-2023
    • (2023)Improved Distribution Matching for Dataset Condensation2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)10.1109/CVPR52729.2023.00759(7856-7865)Online publication date: Jun-2023
    • (2022)Stochastic Approximate Algorithms for Uncertain Constrained K-Means ProblemMathematics10.3390/math1001014410:1(144)Online publication date: 4-Jan-2022
    • (2022)Alpha Lightweight Coreset for k-Means Clustering2022 16th International Conference on Ubiquitous Information Management and Communication (IMCOM)10.1109/IMCOM53663.2022.9721770(1-8)Online publication date: 3-Jan-2022
    • Show More Cited By

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media