Abstract
In this paper, we study the k-means problem with (nonuniform) penalties (k-MPWP) which is a natural generalization of the classic k-means problem. In the k-MPWP, we are given an n-client set \( {\mathcal {D}} \subset {\mathbb {R}}^d\), a penalty cost \(p_j>0\) for each \(j \in {\mathcal {D}}\), and an integer \(k \le n\). The goal is to open a center subset \(F \subset {\mathbb {R}}^d\) with \( |F| \le k\) and to choose a client subset \(P \subseteq {\mathcal {D}} \) as the penalized client set such that the total cost (including the sum of squares of distance for each client in \( {\mathcal {D}} \backslash P \) to the nearest open center and the sum of penalty cost for each client in P) is minimized. We offer a local search \(( 81+ \varepsilon )\)-approximation algorithm for the k-MPWP by using single-swap operation. We further improve the above approximation ratio to \(( 25+ \varepsilon )\) by using multi-swap operation.



Similar content being viewed by others
References
Ahmadian S, Norouzi-Fard A, Svensson O, Ward J (2017) Better guarantees for \(k\)-means and Euclidean \(k\)-median by primal-dual algorithms. In: Proceedings of FOCS, pp 61–72
Aloise D, Deshpande A, Hansen P, Popat P (2009) NP-hardness of Euclidean sum-of-squares clustering. Mach Learn 75:245–249
Arya V, Garg N, Khandekar R, Meyerson A, Munagala K, Pandit V (2004) Local search heuristics for \(k\)-median and facility location problems. SIAM J Comput 33:544–562
Bandyapadhyay S, Varadarajan K (2016) On variants of \(k\)-means clustering. In: Proceedings of SoCG, article no. 14:14:1–14:15
Byrka J, Pensyl T, Rybicki B, Srinivasan A, Trinh K (2017) An improved approximation for \(k\)-median, and positive correlation in budgeted optimization. ACM Transactions on Algorithms, 13(2): Article No. 23
Charikar M, Guha S (1999) Improved combinatorial algorithms for the facility location and \(k\)-median problems. In: Proceedings of FOCS, pp 378–388
Charikar M, Guha S, Tardos É, Shmoys DB (1999) A constant-factor approximation algorithm for the \(k\)-median problem. In: Proceedings of STOC, pp 1–10
Charikar M, Khuller S, Mount DM, Narasimhan G (2001) Algorithms for facility location problems with outliers. In: Proceedings of SODA, pp 642–651
Dasgupta S (2007) The hardness of \(k\)-means clustering. Technical report CS2007-0890, University of California, San Diego
Georgogiannis A (2016) Robust \(k\)-means: a theoretical revisit. In: Proceedings of NIPS, pp 2883–2891
Jain K, Vazirani VV (2001) Approximation algorithms for metric facility location and \(k\)-median problems using the primal-dual schema and Lagrangian relaxation. J ACM 48:274–296
Kanungo T, Mount DM, Netanyahu NS, Piatko CD, Silverman R, Wu AY (2004) A local search approximation algorithm for \(k\)-means clustering. Comput Geom Theory Appl 28:89–112
Li Y, Du D, Xiu N, Xu D (2015) Improved approximation algorithms for the facility location problems with linear/submodular penalties. Algorithmica 73:460–482
Li S, Svensson O (2016) Approximating \(k\)-median via pseudo-approximation. SIAM J Comput 45:530–547
Lloyd S (1957) Least squares quantization in PCM. Technical report, Bell Laboratories
Lloyd S (1982) Least squares quantization in PCM. IEEE Trans Inf Theory 28:129–137
Mahajan M, Nimbhorkar P, Varadarajan K (2009) The planar \(k\)-means problem is NP-hard. In: Proceedings of WALCOM, pp 274–285
Makarychev K, Makarychev Y, Sviridenko M, Ward J (2016) A bi-criteria approximation algorithm for \(k\)-means. In: Proceedings of APPROX/RONDOM, article no. 14, pp 14:1–14:20
Matoušek J (2000) On approximate geometric \(k\)-clustering. Discrete Comput Geom 24:61–84
Tseng GC (2007) Penalized and weighted \(k\)-means for clustering with scattered objects and prior information in high-throughput biological data. Bioinformatics 23:2247–2255
Wang Y, Xu D, Du D, Wu C An approximation algorithm for \(k\)-facility location problem with linear penalties using local search scheme. J Comb Optim. https://doi.org/10.1007/s10878-016-0080-2
Ward J (2017) Private communication
Zhang P (2007) A new approximation algorithm for the \(k\)-facility location problem. Theor Comput Sci 384:126–135
Acknowledgements
The research of the first author is supported by Higher Educational Science and Technology Program of Shandong Province (No. J15LN23) and the Science and Technology Development Plan Project of Jinan City (No. 201401211). The second author is supported by Ri-Xin Talents Project of Beijing University of Technology. The third author is supported by Natural Science Foundation of China (No. 11501412). The fourth author is supported by Natural Science Foundation of China (No. 11531014). The fifth author is supported by Beijing Excellent Talents Funding (No. 2014000020124G046).
Author information
Authors and Affiliations
Corresponding author
Additional information
A preliminary version of this paper appeared in Proceedings of the 23rd International Computing and Combinatorics Conference, pp. 568–574, 2017.
Rights and permissions
About this article
Cite this article
Zhang, D., Hao, C., Wu, C. et al. Local search approximation algorithms for the k-means problem with penalties. J Comb Optim 37, 439–453 (2019). https://doi.org/10.1007/s10878-018-0278-6
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10878-018-0278-6