Abstract
The k-means problem is very classic and important in computer science and machine learning, so there are many variants presented depending on different backgrounds, such as the k-means problem with penalties, the spherical k-means clustering, and so on. Since the k-means problem is NP-hard, the research of its approximation algorithm is very hot. In this paper, we apply a bi-criteria seeding algorithm to both k-means problem with penalties and spherical k-means problem, and improve (upon) the performance guarantees given by the k-means++ algorithm for these two problems.
Similar content being viewed by others
References
Aggarwal A, Deshpande A, Kannan R (2009) Adaptive sampling for \(k\)-means clustering. In: Proceedings of APPROX and RANDOM, pp 15–28
Ahmadian S, Norouzi-Fard A, Svensson O, Ward J (2017) Better guarantees for \(k\)-means and Euclidean \(k\)-median by primal-dual algorithms. In: Proceedings of FOCS, pp 61–72
Aloise D, Deshpande A, Hansen P, Popat P (2009) NP-hardness of Euclidean sum-of-squares clustering. Mach Learn 75:245–248
Arthur D, Vassilvitskii S (2007) \(k\)-means++: the advantages of careful seeding. In: Proceedings of SODA, pp 1027–1035
Awasthi P, Charikar M, Krishnaswamy R, Sinop AK (2015) The hardness of approximation of Euclidean \(k\)-means. In: Proceedings of SoCG, pp 754–767
Bachem O, Lucic M, Hassani SH, Krause A (2016) Approximate \(k\)-means++ in sublinear time. In: Proceedings of AAAI, pp 1459–1467
Bachem O, Lucic M, Krause A (2017) Distributed and provably good seedings for \(k\)-means in constant rounds. In: Proceedings of ICML, pp 292–300
Blömer J, Lammersen C, Schmidt M, Sohler C (2016) Theoretical analysis of the \(k\)-means algorithm—a survey. In: Algorithm engineering. Springer, pp 81–116
Dhillon IS, Modha DS (2001) Concept decompositions for large sparse text data using clustering. Mach Learn 42:143–175
Drineas P, Frieze A, Kannan R, Vempala S, Vinay V (2004) Clustering large graphs via the singular value decomposition. Mach Learn 56:9–33
Endo Y, Miyamoto S (2015) Spherical \(k\)-means++ clustering. In: Proceedings of MDAI, pp 103–114
Feng Q, Zhang Z, Shi F, Wang J (2019) An improved approximation algorithm for the \(k\)-means problem with penalties. In: Proceedings of FAW, pp 170–181
Lee E, Schmidt M, Wright J (2017) Improved and simplified inapproximability for \(k\)-means. Inf Process Lett 120:40–43
Li M, Xu D, Zhang D, Zou J (2019) The seeding algorithms for spherical \(k\)-means clustering. J Glob Optim. https://doi.org/10.1007/s10898-019-00779-w
Li M, Xu D, Yue J, Zhang D, Zhang P (2020) The seeding algorithm for \(k\)-means problem with penalties. J Comb Optim 39:15–32
Lloyd S (1982) Least squares quantization in PCM. IEEE Trans Inf Theory 28:21–33
Makarychev K, Makarychev Y, Sviridenko M, Ward J (2016) A bi-criteria approximation algorithm for \(k\)-means. In: Proceedings of APPROX/RONDOM, pp 14:1–14:20
Ostrovsky R, Rabani Y, Schulman L, Swamy C (2012) The effectiveness of Lloyd-type methods for the \(k\)-means problem. J ACM 59:28:1–28:22
Tseng GC (2007) Penalized and weighted \(k\)-means for clustering with scattered objects and prior information in high-throughput biological data. Bioinformatics 23:2247–2255
Wei D (2016) A constant-factor bi-criteria approximation guarantee for \(k\)-means++. In: Proceedings of NIPS, pp 604–612
Wu X, Kumar V, Quinlan J, Ross Ghosh J, Yang Q, Motoda H, McLachlan GJ, Ng A, Liu B, Yu PS, Zhou Z, Steinbach M, Hand DJ, Steinberg D (2008) Top 10 algorithms in data mining. Knowl Inf Syst 14:1–37
Xu X, Ding S, Shi Z (2018) An improved density peaks clustering algorithm with fast finding cluster centers. Knowl Based Syst 158:65–74
Xu D, Xu Y, Zhang D (2017) A survey on algorithm for \(k\)-means problem and its variants. Oper Res Trans 21:101–109
Xu D, Xu Y, Zhang D (2018) A survey on the initialization methods for the \(k\)-means algorithm. Oper Res Trans 22:31–40
Zhang D, Cheng Y, Li M, Wang Y, Xu D (2019) Local search approximation algorithms for the spherical \(k\)-means problem. In: Proceedings of AAIM, pp 341–351
Zhang D, Hao C, Wu C, Xu D, Zhang Z (2018) A local search approximation algorithm for the \(k\)-means problem with penalties. J Comb Optim 37:439–453
Zhao Y, Karypis G (2001) Criterion functions for document clustering: experiments and analysis. Technical report \(\sharp \)01-40, Department of Computer Science, University of Minnesota
Acknowledgements
The author is supported by Higher Educational Science and Technology Program of Shandong Province (No. J17KA171) and Shandong Provincial Natural Science Foundation (No. ZR2019MA032) of China.
Author information
Authors and Affiliations
Corresponding author
Additional information
Dedicated to Professor Minyi Yue on the Occasion of His 100th Birthday.
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
About this article
Cite this article
Li, M. The bi-criteria seeding algorithms for two variants of k-means problem. J Comb Optim 44, 1693–1704 (2022). https://doi.org/10.1007/s10878-020-00537-9
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10878-020-00537-9