Abstract
In order to cluster the textual data with high dimension in modern data analysis, the spherical k-means clustering is presented. It aims to partition the given points with unit length into k sets so as to minimize the within-cluster sum of cosine dissimilarity. In this paper, we mainly study seeding algorithms for spherical k-means clustering, for its special case (with separable sets), as well as for its generalized problem (\(\alpha \)-spherical k-means clustering). About the spherical k-means clustering with separable sets, an approximate algorithm with a constant factor is presented. Moreover, it can be generalized to the \(\alpha \)-spherical separable k-means clustering. By slickly constructing a useful function, we also show that the famous seeding algorithms such as k-means++ and k-means|| for k-means problem can be applied directly to solve the \(\alpha \)-spherical k-means clustering. Except for theoretical analysis, the numerical experiment is also included.
Similar content being viewed by others
References
Ackermann, M.R.: Algorithms for the Bregman \(k\)-Median Problem. In: Ph.D. Thesis, University of Paderborn (2009)
Aggarwal, A., Deshpande, A., Kannan, R.: Adaptive sampling for \(k\)-means clustering. In: Proceedings of APPROX and RANDOM, pp. 15–28 (2009)
Ahmadian, S., Norouzi-Fard, A., Svensson, O., Ward, J.: Better guarantees for \(k\)-means and Euclidean \(k\)-median by primal-dual algorithms. In: Proceedings of FOCS, pp. 61–72 (2017)
Aloise, D., Deshpande, A., Hansen, P., Popat, P.: NP-hardness of Euclidean sum-of-squares clustering. Mach. Learn. 75, 245–248 (2009)
Arthur, D., Vassilvitskii, S.: \(k\)-means++: The advantages of careful seeding. In: Proceedings of SODA, pp. 1027–1035 (2007)
Awasthi, P., Charikar, M., Krishnaswamy, R., Sinop, A.K.: The hardness of approximation of Euclidean \(k\)-means. In: Proceedings of SoCG, pp. 754–767 (2015)
Bachem, O., Lucic, M., Hassani, S.H., Krause, A.: Approximate \(k\)-means++ in sublinear time. In: Proceedings of AAAI, pp. 1459–1467 (2016)
Bachem, O., Lucic, M., Hassani, S.H., Krause, A.: Fast and provably good seedings for \(k\)-means. In: Proceedings of NIPS, pp. 55–63 (2016)
Bachem, O., Lucic, M., Krause, A.: Distributed and provably good seedings for \(k\)-means in constant rounds. In: Proceedings of ICML, pp. 292–300 (2017)
Bahmani, B., Moseley, B., Vattani, A., Kumar, R., Vassilvitskii, S.: Scalable \(k\)-means++. In: Proceedings of the VLDB Endowment, pp. 622–633 (2012)
Banerjee, A., Dhillon, I.S., Ghosh, J., Sra, S.: Clustering on the unit hypersphere using von mises-fisher distributions. J. Mach. Learn. Res. 6, 1345–1382 (2005)
Blömer, J., Lammersen, C., Schmidt, M., Sohler, C.: Theoretical analysis of the \(k\)-means algorithm - a survey. In: Kliemann, L., Sanders, P. (eds.) Algorithm Engineering, Springer International Publishing, pp. 81–116 (2016)
Dhillon, I.S., Modha, D.S.: Concept decompositions for large sparse text data using clustering. Mach. Learn. 42, 143–175 (2001)
Drineas, P., Frieze, A., Kannan, R., Vempala, S., Vinay, V.: Clustering large graphs via the singular value decomposition. Mach. Learn. 56, 9–33 (2004)
Endo, Y., Miyamoto, S.: Spherical \(k\)-Means++ clustering. In: Proceedings of MDAI, pp. 103–114 (2015)
Hornik, K., Feinerer, I., Kober, M., Buchta, C.: Spherical \(k\)-means clustering. J. Stat. Softw. 50, 1–22 (2012)
Lee, E., Schmidt, M., Wright, J.: Improved and simplified inapproximability for \(k\)-means. Inf. Process. Lett. 120, 40–43 (2017)
Lloyd, S.: Least squares quantization in PCM. IEEE Trans. Inf. Theory 28, 21–33 (1982)
Ostrovsky, R., Rabani, Y., Schulman, L.J., Swamy, C.: The effectiveness of Lloyd-type methods for the \(k\)-means problem. J. ACM 59, 139–156 (2012)
Zhao, Y., Karypis, G.: Criterion functions for document clustering: experiments and analysis. In: Technical Report \(\sharp \)01-40, Department of Computer Science, University of Minnesota, November (2001)
Zhong, S.: Efficient online spherical \(k\)-means clustering. In: Proceedings of IJCNN, pp. 3180–3185 (2005)
Acknowledgements
The first author is supported by the Higher Educational Science and Technology Program of Shandong Province (No. J17KA171). The second author is supported by Natural Science Foundation of China (No. 11531014). The third author is supported by Natural Science Foundation of China (No. 11871081). The fourth author is supported by Natural Science Foundation of China (No. 11801310).
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Li, M., Xu, D., Zhang, D. et al. The seeding algorithms for spherical k-means clustering. J Glob Optim 76, 695–708 (2020). https://doi.org/10.1007/s10898-019-00779-w
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10898-019-00779-w