Abstract
The k-means problem is a classic NP-hard problem in machine learning and computational geometry. And its goal is to separate the given set into k clusters according to the minimal squared distance. The k-means problem with penalties, as one generalization of k-means problem, allows that some point need not be clustered instead of being paid some penalty. In this paper, we study the k-means problem with penalties by using the seeding algorithm. We propose that the accuracy only involves the ratio of the maximal penalty value to the minimal one. When the penalty is uniform, the approximation factor reduces to the same one for the k-means problem. Moreover, our result generalizes the k-means++ for k-means problem to the penalty version. Numerical experiments show that our seeding algorithm is more effective than the one without using seeding.
Similar content being viewed by others
References
Aggarwal A, Deshpande A, Kannan R (2009) Adaptive sampling for \(k\)-means clustering. In: Proceedings of APPROX and RANDOM, pp. 15–28
Ahmadian S, Norouzi-Fard A, Svensson O, Ward J (2017) Better guarantees for \(k\)-means and Euclidean \(k\)-median by primal–dual algorithms. In: Proceedings of FOCS, pp. 61–72
Aloise D, Deshpande A, Hansen P, Popat P (2009) NP-hardness of Euclidean sum-of-squares clustering. Mach Learn 75:245–248
Arthur D, Vassilvitskii S (2007) \(k\)-means++: The advantages of careful seeding. In: Proceedings of SODA, pp. 1027–1035
Awasthi P, Charikar M, Krishnaswamy R, Sinop AK (2015) The hardness of approximation of Euclidean \(k\)-means. In: Proceedings of SoCG, pp. 754–767
Bachem O, Lucic M, Hassani SH, Krause A (2016a) Approximate \(k\)-means++ in sublinear time. In: Proceedings of AAAI, pp. 1459–1467
Bachem O, Lucic M, Hassani SH, Krause A (2016b) Fast and provably good seedings for \(k\)-means. In: Proceedings of NIPS, pp. 55–63
Bachem O, Lucic M, Krause A (2017) Distributed and provably good seedings for \(k\)-means in constant rounds. In: Proceedings of ICML, pp. 292–300
Bahmani B, Moseley B, Vattani A, Kumar R, Vassilvitskii S (2012) Scalable \(k\)-means++. In: Proceedings of the VLDB endowment, pp. 622–633
Blömer J, Lammersen C, Schmidt M, Sohler C (2016) Theoretical analysis of the \(k\)-means algorithm–a survey. In: Kliemann L, Sanders P (eds) Algorithm engineering, Springer, New York, pp. 81–116
Chang XY, Wang Y, Li R, Xu Z (2014) Sparse \(k\)-means with \(l_\infty \)/\(l_0\) penalty for high-dimensional data clustering. arXiv:1403.7890v1
Drineas P, Frieze A, Kannan R, Vempala S, Vinay V (2004) Clustering large graphs via the singular value decomposition. Mach Learn 56:9–33
Har-Peled S, Sadri B (2005) How fast is the \(k\)-means method? In: Proceedings of SODA, pp. 332–229
Kanungo T, Mount DM, Netanyahu NS, Piatko CD, Silverma R, Wu AY (2004) A local search approximation algorithm for \(k\)-means clustering. Comput Geom Theory Appl 28:89–112
Lee E, Schmidt M, Wright J (2017) Improved and simplified inapproximability for \(k\)-means. Inf Process Lett 120:40–43
Lloyd S (1982) Least squares quantization in PCM. IEEE Trans Inf Theory 28:21–33
Ostrovsky R, Rabani Y, Schulman L, Swamy C (2012) The effectiveness of Lloyd-type methods for the \(k\)-means problem. J ACM 59:28:1–28:22
Rezaei M, Färnti P (2016) Set matching measures for external cluster validity. IEEE Trans Knowl Data Eng 28:2173–2186
Tseng GC (2007) Penalized and weighted \(k\)-means for clustering with scattered objects and prior information in high-throughput biological data. Bioinformatics 23:2247–2255
Xu D, Xu Y, Zhang D (2017) A survey on algorithm for \(k\)-means problem and its variants. Oper Res Trans 21:101–109
Zhang D, Hao C, Wu C, Xu D, Zhang Z (2017) A local search approximation algorithm for the \(k\)-means problem with penalties. In: Proceedings of COCOON, pp. 568–574
Acknowledgements
The research of the first author is supported by Higher Educational Science and Technology Program of Shandong Province (No. J17KA171). The second author is supported by National Natural Science Foundation of China (No. 11531014). The third author is supported by National Natural Science Foundation of China (Nos. 11626148 and 11701342) and Natural Science Foundation of Shandong Province (No. ZR2016AQ01). The fourth author’s research is supported by National Natural Science Foundation of China (No. 11871081). The fifth author is supported by National Natural Science Foundation of China (Nos. 61672323 and 61972228).
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Li, M., Xu, D., Yue, J. et al. The seeding algorithm for k-means problem with penalties. J Comb Optim 39, 15–32 (2020). https://doi.org/10.1007/s10878-019-00450-w
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10878-019-00450-w