Abstract
Spherical k-means clustering as a known NP-hard variant of the k-means problem has broad applications in data mining. In contrast to k-means, it aims to partition a collection of given data distributed on a spherical surface into k sets so as to minimize the within-cluster sum of cosine dissimilarity. In the paper, we introduce spherical k-means clustering with penalties and give a \(2\max \{2,M\}(1+M)(\ln k+2)\)-approximation algorithm. Moreover, we prove that when against spherical k-means clustering with penalties but on separable instances, our algorithm is with an approximation ratio \(2\max \{3,M+1\}\) with high probability, where M is the ratio of the maximal and the minimal penalty cost of the given data set.
Similar content being viewed by others
References
Ahmadian S, Norouzi-Fard A, Svensson O, Ward J (2017) Better guarantees for \(k\)-means and Euclidean \(k\)-median by primal–dual algorithms. In: Proceedings of the 58th annual IEEE symposium on foundations of computer science (FOCS), pp 61–72
Aloise D, Deshpande A, Hansen P (2009) NP-hardness of Euclidean sum-of-squares clustering. Mach Learn 75(2):245–248
Arthur D, Vassilvitskii S (2006) How slow is the k-means method? In: Proceedings of the 22th symposium on computational geometry (SoCG), pp 144-153
Arthur D, Vassilvitskii S (2007) \(k\)-means++: the advantages of careful seeding, In: Proceedings of the 18th annual ACM-SIAM symposium on discrete algorithms (SODA), pp 1027–1035
Awasthi P, Charikar M, Krishnaswamy R, Sinop A (2015) The hardness of approximation of Euclidean \(k\)-means. In: Proceedings of the 31st symposium on computational geometry (SoCG), pp 754–767
Bahmani B, Moseley B, Vattani A, Kumar R, Vassilvitskii S (2012) Scalable k-means++. Proc VLDB Endow 5(7):622–633
Blömer J, Lammersen C, Schmidt M, Sohler C (2016) Theoretical analysis of the k-means algorithm – a survey. In: Kliemann L, Sanders P (eds) Algorithm engineering. Lecture notes in computer science, vol 9220. Springer, Cham, pp 81–116
Blömer J, Brauer S, Bujna K (2017) A theoretical analysis of the fuzzy \(k\)-means problem, In: Proceedings of the 16th IEEE international conference on data mining (ICDM), pp 805–810
Cohen-Addad V, Klein PN, Mathieu C (2019) Local search yields approximation schemes for \(k\)-means and \(k\)-median in Euclidean and minor-free metrics. SIAM J Comput 48(2):644–667
Dhillon I, Modha D (2001) Concept decompositions for large sparse text data using clustering. Mach Learn 42(1–2):143–175
Drineas P, Frieze A, Kannan R, Vempala V (2004) Clustering large graphs via the singular value decomposition. Mach Learn 56(1–3):9–33
Endo Y, Miyamoto S (2015) Spherical \(k\)-means++ clustering. In: Proceedings of the 16th international conference on modeling decisions for artificial intelligence (MDAI), pp 103-114
Gupta S, Kumar R, Lu K, Moseley B, Vassilvitskii S (2017) Local search methods for \(k\)-means with outliers. Proc VLDB Endow 10(7):757–768
Hornik K, Feinerer I, Kober M, Buchata M (2015) Spherical \(k\)-means clustering. J Stat Softw 50(10):1–22
Kanungo T, Mount D, Netanyahu N, Piatko C, Silverma R (2004) A local search approximation algorithm for \(k\)-means clustering. Comput Geom 28(2–3):89–112
Li M, Xu D, Zhang D, Zou J (2019) The seeding algorithms for spherical \(k\)-means clustering. J Glob Optim 76(4): 695–708
Li M, Xu D, Yue J, Zhang D, Zhang P (2020) The seeding slgorithm for \(k\)-means problem with penalties. J Comb Optim 39(1):15–32
Lloyd S (1982) Least squares quantization in PCM. IEEE Trans Inf Theory 28(2):129–137
Moriya T, Roth H, Nakamura S, Oda H, Kai N, Oda M (2018) Unsupervised pathology image segmentation using representation learning with spherical \(k\)-means. In: Proceeding SPIE 10581, Medical Imaging 2018: Digital Pathology, 1058111
Tunali V, Bilgin T, Camurcu A (2016) An improved clustering algorithm for text mining: multi-cluster spherical \(K\)-means. Int Arab J Inf Technol 13(1):12–19
Vattani A (2011) K-means requires exponentially many iterations even in the plane. Discrete Comput Geom 45(4):596–616
Xu J, Han J, Xiong K, Nie F (2016) Robust and sparse fuzzy \(k\)-means clustering. In: Proceedings 25th international joint conference on artificial intelligence (IJCAI), pp 2224–2230
Xu D, Xu Y, Zhang D (2017) A survey on algorithm for \(k\)-means and its variants. Oper Res Trans 21:101–109 (in Chinese)
Acknowledgements
The authors Sai Ji, Dachuan Xu and Dongmei Zhang are supported by National Natural Science Foundation of China (No. 11871081). The third author Longkun Guo is supported by National Natural Science Foundation of China (No. 61772005) and Natural Science Foundation of Fujian province (No. 2017J01753). The fourth author Min Li is supported by Higher Educational Science and Technology Program of Shandong Province (No. J17KA171) and Natural Science Foundation of Shandong Province (No. ZR2019MA032) of China.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
A preliminary version of this paper appeared in Proceedings of the 13th International Conference on Algorithmic Aspects in Information and Management, pp. 149–158, 2019.
Rights and permissions
About this article
Cite this article
Ji, S., Xu, D., Guo, L. et al. The seeding algorithm for spherical k-means clustering with penalties. J Comb Optim 44, 1977–1994 (2022). https://doi.org/10.1007/s10878-020-00569-1
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10878-020-00569-1