Abstract
In this paper, we consider the uniform capacitated k-means problem (UC-k-means), an extension of the classical k-means problem (k-means) in machine learning. In the UC-k-means, we are given a set \(\mathcal {D}\) of n points in d-dimensional space and an integer k. Every point in the d-dimensional space has an uniform capacity which is an upper bound on the number of points in \(\mathcal {D}\) that can be connected to this point. Every two-point pair in the space has an associated connecting cost, which is equal to the square of the distance between these two points. We want to find at most k points in the space as centers and connect every point in \(\mathcal {D}\) to some center without violating the capacity constraint, such that the total connecting costs is minimized. Based on the technique of local search, we present a bi-criteria approximation algorithm, which has a constant approximation guarantee and violates the cardinality constraint within a constant factor, for the UC-k-means.
Similar content being viewed by others
References
Aggarwal A, Deshpande A, Kannan R (2009) Adaptive sampling for \(k\)-means clustering. In: Proceedings of APPROX and RANDOM, pp 15-28
Aggarwal A, Louis A, Bansal M, Garg N, Gupta N, Gupta S, Jain S (2013) A \(3\)-approximation algorithm for the facility location problem with uniform capacities. Math Program 141:527–547
Ahmadian S, Norouzi-Fard A, Svensson O, Ward J (2017) Better guarantees for \(k\)-means and euclidean \(k\)-median by primal-dual algorithms. In: Proceedings of FOCS, pp 61–72
Aloise D, Deshpande A, Hansen P, Popat P (2009) NP-hardness of Euclidean sum-of-squares clustering. Mach Learn 75:245–248
Arthur D, Vassilvitskii S (2006) How slow is the \(k\)-means method? In: Proceedings of SoCG, pp 144–153
Arthur D, Vassilvitskii S (2007) \(k\)-means++: the advantages of careful seeding. In: Proceedings of SODA, pp 1027–1035
Awasthi P, Charikar M, Krishnaswamy R, Sinop AK (2015) The hardness of approximation of euclidean \(k\)-means. In: Proceedings of SoCG, pp 754–767
Bachem O, Lucic M, Hassani SH, Krause A (2016) Approximate \(k\)-means++ in sublinear time. In: Proceedings of AAAI, pp 1459–1467
Bachem O, Lucic M, Hassani SH, Krause A (2016) Fast and provably good seedings for \(k\)-means. In: Proceedings of NIPS, pp 55–63
Bachem O, Lucic M, Krause A (2017) Distributed and provably good seedings for \(k\)-means in constant rounds. In: Proceedings of ICML, pp 292–300
Bahmani B, Moseley B, Vattani A, Kumar R, Vassilvitskii S (2012) Scalable \(k\)-means++. In: Proceedings of the VLDB endowment, pp 622–633
Bhattacharya A, Jaiswal R, Kumar A (2018) Faster algorithms for the constrained \(k\)-means problem. Theory Comput Syst 62:93–115
Blömer J, Lammersen C, Schmidt M, Sohler C (2016) Theoretical analysis of the \(k\)-means algorithm—a survey. In: Algorithm engineering, pp 81–116
Byrka J, Fleszar K, Rybicki B, Spoerhase J (2015) Bi-factor approximation algorithms for hard capacitated \(k\)-median problems. In: Proceedings of SODA, pp 722–736
Byrka J, Rybicki B, Uniyal S (2016) An approximation algorithm for uniform capacitated \(k\)-median problem with \(1+\varepsilon \) capacity violation. In: Proceedings of IPCO, pp 262–274
Chudak FA, Williamson DP (2005) Improved approximation algorithms for capacitated facility location problems. Math Program 102:207–222
Demirci G, Li S (2016) Constant approximation for capacitated \(k\)-median with \((1+\epsilon ) \)-capacity violation. In: Proceedings of ICALP, pp 73:1–73:14
Ding H, Xu J (2015) A unified framework for clustering constrained data without locality property. In: Proceedings of ACM-SIAM symposium on Discrete algorithms, pp 1471–1490
Drineas P, Frieze A, Kannan R, Vempala S, Vinay V (2004) Clustering large graphs via the singular value decomposition. Mach Learn 56:9–33
Feldman D, Monemizadeh M, Sohler C (2007) A PTAS for \(k\)-means clustering based on weak coresets. In: Proceedings of SoCG, pp 11–18
Frey BJ, Dueck D (2007) Clustering by passing messages between data points. Science 315:972–976
Geetha S, Poonthalir G, Vanathi PT (2009) Improved \(k\)-means algorithm for capacitated clustering problem. J Comput Sci 8:52–59
Jain K, Vazirani VV (2001) Approximation algorithms for metric facility location and \(k\)-median problems using the primal-dual schema and Lagrangian relaxation. J ACM 48:274–296
Kanungo T, Mount DM, Netanyahu NS, Piatko CD, Silverman R, Wu AY (2004) A local search approximation algorithm for \(k\)-means clustering. Comput Geom 28:89–112
Korupolu MR, Plaxton CG, Rajaraman R (2000) Analysis of a local search heuristic for facility location problems. J Algorithms 37:146–188
Koskosidis YA, Powell WB (1992) Clustering algorithms for consolidation of customer orders into vehicle shipments. Transp Res Part B Methodol 26:365–379
Lee E, Schmidt M, Wright J (2017) Improved and simplified inapproximability for \(k\)-means. Inf Process Lett 120:40–43
Li S (2015) On uniform capacitated \(k\)-median beyond the natural LP relaxation. In: Proceedings of SODA, pp 696–707
Lloyd S (1982) Least squares quantization in PCM. IEEE Trans Inf Theory 28:129–137
Mulvey JM, Beck MP (1984) Solving capacitated clustering problems. Eur J Oper Res 18:339–348
Osman IH, Christofides N (1994) Capacitated clustering problems by hybrid simulated annealing and tabu search. Int Trans Oper Res 1:317–336
Ostrovsky R, Rabani Y, Schulman LJ, Swamy C (2006) The effectiveness of Lloyd-type methods for the \(k\)-means problem. In: Proceedings of FOCS, pp 165–176
Shao J, Xu D (2013) An approximation algorithm for the risk-adjusted two-stage stochastic facility location problem with penalties. J Oper Res Soc China 1:339–346
Shieh HM, May MD (2001) Solving the capacitated clustering problem with genetic algorithms. J Chin Inst Ind Eng 18:1–12
Vattani A (2011) \(k\)-means requires exponentially many iterations even in the plane. Discret Comput Geom 45:596–616
Wu C, Xu D, Shu J (2013) An approximation algorithm for the stochastic fault-tolerant facility location problem. J Oper Res Soc China 1:511–522
Zhang J, Chen B, Ye Y (2005) A multiexchange local search algorithm for the capacitated facility location problem. Math Oper Res 30:389–403
Acknowledgements
The first two authors are supported by Natural Science Foundation of China (No. 11531014). The third author is supported by the Natural Sciences and Engineering Research Council of Canada (NSERC) grant 06446, and Natural Science Foundation of China (Nos. 11771386, 11728104). The fourth author is supported by Natural Science Foundation of China (No. 11871081).
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Han, L., Xu, D., Du, D. et al. An approximation algorithm for the uniform capacitated k-means problem. J Comb Optim 44, 1812–1823 (2022). https://doi.org/10.1007/s10878-020-00550-y
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10878-020-00550-y