Skip to main content
Log in

Improved PTAS for the constrained k-means problem

  • Published:
Journal of Combinatorial Optimization Aims and scope Submit manuscript

Abstract

The k-means problem has been paid lots of attention in many fields, and each cluster of the k-means problem always satisfies locality property. In this paper, we study the constrained k-means problem, where the clusters do not satisfy locality property and can be an arbitrary partition of the set of points. Ding and Xu presented a unified framework with running time \(O(2^{poly (k/\epsilon )} (\log n)^{k+1} nd)\) by applying uniform sampling and simplex lemma techniques such that a collection of size \(O(2^{poly (k/\epsilon )} (\log n)^{k+1})\) of candidate sets containing approximate centers is obtained. Then, the collection is enumerated to get the one that can induce a \((1+\epsilon )\)-approximation solution for the constrained k-means problem. By applying \(D^2\)-sampling technique, Bhattacharya, Jaiswal, and Kumar presented an algorithm with running time \(O(2^{{\tilde{O}}(k/\epsilon )}nd)\), which is bounded by \(O(2^k( \frac{2123ek}{\epsilon ^3})^{64k/\epsilon }knd)\). The algorithm outputs a collection of size \(O(2^k( \frac{2123ek}{\epsilon ^3})^{64k/\epsilon })\) of candidate sets containing approximate centers. In this paper, we present an algorithm with running time \(O((\frac{1891ek}{\epsilon ^2})^{8k/\epsilon }nd)\) such that a collection of size \(O((\frac{1891ek}{\epsilon ^2})^{8k/\epsilon }n)\) of candidate sets containing approximate centers can be obtained.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2

Similar content being viewed by others

References

  • Aggarwal G, Panigrahy R, Feder T, Thomas D, Kenthapadi K, Khuller S, Zhu A (2010) Achieving anonymity via clustering. ACM Trans Algorithms 6(3):49:1–49:19

    Article  MathSciNet  MATH  Google Scholar 

  • Ahmadian S, Norouzi-Fard A, Svensson O, Ward J (2017) Better guarantees for k-means and Euclidean k-median by primal-dual algorithms. In: Proceedings of 58th annual symposium on foundations of computer science, FOCS, California, USA, pp 61–72

  • Aloise D, Deshpande A, Hansen P, Popat P (2009) NP-hardness of euclidean sum-of-squares clustering. Mach Learn 75(2):245–248

    Article  MATH  Google Scholar 

  • Arthur D, Vassilvitskii S (2007) k-means++: the advantages of careful seeding. In: Proceedings of 18th annual ACM-SIAM symposium on discrete algorithms, SODA, Louisiana, USA, pp 1027–1035

  • Awasthi P, Charikar M, Krishnaswamy R, Sinop AK (2015) The hardness of approximation of Euclidean k-means. In: Proceedings of 31st annual international symposium on computational geometry, SoCG, Eindhoven, The Netherlands, pp 754–767

  • Badoiu M, Har-Peled S, Indyk P (2002) Approximate clustering via core-sets. In: Proceedings of 34th annual ACM symposium on theory of computing, STOC, Québec, Canada, pp 250–257

  • Betzler N, Guo J, Niedermeier R (2010) Parameterized computational complexity of dodgson and young elections. Inf Comput 208(2):165–177

    Article  MathSciNet  MATH  Google Scholar 

  • Bhattacharya A, Jaiswal R, Kumar A (2016) Faster algorithms for the constrained k-means problem. In: Proceedings of 33rd annual symposium on theoretical aspects of computer science, STACS, Orléans, France, pp 16:1–16:13

  • Bhattacharya A, Jaiswal R, Kumar A (2018) Faster algorithms for the constrained k-means problem. Theory Comput Syst 62(1):93–115

    Article  MathSciNet  MATH  Google Scholar 

  • Cabello S, Giannopoulos P, Knauer C, Marx D, Rote G (2011) Geometric clustering: fixed-parameter tractability and lower bounds with respect to the dimension. ACM Trans Algorithms 7(4):43:1–43:27

  • Cohen-Addad V (2018) A fast approximation scheme for low-dimensional k-means. In: Proceedings 29th annual ACM-SIAM symposium on discrete algorithms, SODA, Louisiana, USA, pp 430–440

  • Cohen-Addad V, Klein PN, Mathieu C (2016) Local search yields approximation schemes for k-means and k-median in euclidean and minor-free metrics. In: Proceedings of 57th annual symposium on foundations of computer science, FOCS, New Jersey, USA, pp 353–364

  • Cygan M, Hajiaghayi M, Khuller S (2012) LP rounding for k-centers with non-uniform hard capacities. In: Proceedings of 53rd annual symposium on foundations of computer science, FOCS, New Jersey, USA, pp 273–282

  • Cygan M, Fomin FV, Kowalik L, Lokshtanov D, Marx D, Pilipczuk M, Pilipczuk M, Saurabh S (2015) Parameterized algorithms. Springer

  • De la Vega WF, Karpinski M, Kenyon C, Rabani Y (2003) Approximation schemes for clustering problems. In: Proceedings of 35th annual ACM symposium on theory of computing, STOC, California, USA, pp 50–58

  • Ding H, Xu J (2011) Solving the chromatic cone clustering problem via minimum spanning sphere. In: Proceedings of 38th annual international colloquium on automata, languages and programming, ICALP, Zurich, Switzerland, pp 773–784

  • Ding H, Xu J (2015) A unified framework for clustering constrained data without locality property. In: Proceedings of 26th annual ACM-SIAM symposium on discrete algorithms, SODA, California, USA, pp 1471–1490

  • Downey RG, Fellows MR (1999) Parameterized complexity. Monographs in computer science, Springer

    Book  MATH  Google Scholar 

  • Ene A, Har-Peled S, Raichel B (2013) Fast clustering with lower bounds: no customer too far, no shop too small. CoRR arXiv:1304.7318

  • Fayyad UM, Piatetsky-Shapiro G, Smyth P, Uthurusamy R (eds) (1996) Advances in knowledge discovery and data mining. AAAI Press, Quebec

    Google Scholar 

  • Feldman D, Monemizadeh M, Sohler C (2007) A PTAS for k-means clustering based on weak coresets. In: Proceedings of 23rd annual ACM symposium on computational geometry, SoCG, Gyeongju, South Korea, pp 11–18

  • Feldman D, Schmidt M, Sohler C (2013) Turning big data into tiny data: constant-size coresets for k-means, PCA and projective clustering. In: Proceedings of 24th annual ACM-SIAM symposium on discrete algorithms, SODA, Louisiana, USA, pp 1434–1453

  • Feldmann AE (2015) Fixed parameter approximations for \(k\)-center problems in low highway dimension graphs. In: Proceedings of 42th annual international colloquium on automata, languages and programming, ICALP, Kyoto, Japan, pp 588–600

  • Feng Q, Huang N, Jiang X, Wang J (2018a) Dealing with several parameterized problems by random methods. Theor Comput Sci 734:94–104

    Article  MathSciNet  MATH  Google Scholar 

  • Feng Q, Li S, Zhou Z, Wang J (2018b) Parameterized algorithms for edge biclique and related problems. Theor Comput Sci 734:105–118

    Article  MathSciNet  MATH  Google Scholar 

  • Friggstad Z, Rezapour M, Salavatipour MR (2016) Local search yields a PTAS for k-means in doubling metrics. In: Proceedings of 57th annual symposium on foundations of computer science, FOCS, New Jersey, USA, pp 365–374

  • Gao J, Ping Q, Wang J (2018) Resisting re-identification mining on social graph data. World Wide Web. https://doi.org/10.1007/s11280-017-0524-3

  • Guo L, Shen H, Zhu W (2017) Efficient approximation algorithms for multi-antennae largest weight data retrieval. IEEE Trans Mob Comput 16(12):3320–3333

    Article  Google Scholar 

  • Hajiaghayi MT, Hu W, Li J, Li S, Saha B (2016) A constant factor approximation algorithm for fault-tolerant k-median. ACM Trans Algorithms 12(3):36:1–36:19

    Article  MathSciNet  MATH  Google Scholar 

  • Har-Peled S, Raichel B (2015) Net and prune: a linear time algorithm for Euclidean distance problems. J ACM 62(6):44:1–44:35

    Article  MathSciNet  MATH  Google Scholar 

  • Hoeffding W (1963) Probability inequalities for sums of bounded random variables. J Am Stat Assoc 58(301):13–30

    Article  MathSciNet  MATH  Google Scholar 

  • Inaba M, Katoh N, Imai H (1994) Applications of weighted Voronoi diagrams and randomization to variance-based k-clustering (extended abstract). In: Proceedings of 10th annual symposium on computational geometry, SoCG, New York, USA, pp 332–339

  • Jain AK, Murty MN, Flynn PJ (1999) Data clustering: a review. ACM Comput Surv 31(3):264–323

    Article  Google Scholar 

  • Jain AK, Duin RPW, Mao J (2000) Statistical pattern recognition: a review. IEEE Trans Pattern Anal Mach Intell 22(1):4–37

    Article  Google Scholar 

  • Jaiswal R, Kumar M, Yadav P (2015) Improved analysis of D\({}^{2}\)-sampling based PTAS for k-means and other clustering problems. Inf Process Lett 115(2):100–103

    Article  MathSciNet  MATH  Google Scholar 

  • Kanungo T, Mount DM, Netanyahu NS, Piatko CD, Silverman R, Wu AY (2004) A local search approximation algorithm for k-means clustering. Comput Geom 28(2–3):89–112

    Article  MathSciNet  MATH  Google Scholar 

  • Khuller S, Sussmann YJ (2000) The capacitated K-center problem. SIAM J Discrete Math 13(3):403–418

    Article  MathSciNet  MATH  Google Scholar 

  • Khuller S, Pless R, Sussmann YJ (2000) Fault tolerant K-center problems. Theor Comput Sci 242(1–2):237–245

    Article  MathSciNet  MATH  Google Scholar 

  • Kumar A, Sabharwal Y, Sen S (2010) Linear-time approximation schemes for clustering problems in any dimensions. J ACM 57(2):5:1–5:32

    Article  MathSciNet  MATH  Google Scholar 

  • Kumar N, Raichel B (2013) Fault tolerant clustering revisited. In: Proceedings of 25th annual Canadian conference on computational geometry, CCCG, Ontario, Canada

  • Li J, Yi K, Zhang Q (2010) Clustering with diversity. In: Proceedings of 37th annual international colloquium on automata, languages and programming, ICALP, Bordeaux, France, pp 188–200

  • Li W, Feng Q, Chen J, Hu S (2017) Improved kernel results for some FPT problems based on simple observations. Theor Comput Sci 657:20–27

  • Lin M, Feng Q, Chen J, Li W (2017) Partition on trees with supply and demand: Kernelization and algorithms. Theor Comput Sci 657:11–19

    Article  MathSciNet  MATH  Google Scholar 

  • Lin M, Feng Q, Wang J, Chen J, Fu B, Li W (2018) An improved FPT algorithm for almost forest deletion problem. Inf Process Lett 136:30–36

    Article  MathSciNet  MATH  Google Scholar 

  • Lloyd SP (1982) Least squares quantization in PCM. IEEE Trans Inf Theory 28(2):129–136

    Article  MathSciNet  MATH  Google Scholar 

  • Mahajan M, Nimbhorkar P, Varadarajan KR (2012) The planar k-means problem is NP-hard. Theor Comput Sci 442:13–21

    Article  MathSciNet  MATH  Google Scholar 

  • Ostrovsky R, Rabani Y, Schulman LJ, Swamy C (2012) The effectiveness of lloyd-type methods for the k-means problem. J ACM 59(6):28:1–28:22

    Article  MathSciNet  MATH  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Jianxin Wang.

Additional information

This work is supported by the National Natural Science Foundation of China under Grants (61472449, 61420106009, 61672536, 71631008).

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Feng, Q., Hu, J., Huang, N. et al. Improved PTAS for the constrained k-means problem. J Comb Optim 37, 1091–1110 (2019). https://doi.org/10.1007/s10878-018-0340-4

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10878-018-0340-4

Keywords

Navigation