The seeding algorithms for spherical k-means clustering

Li, Min; Xu, Dachuan; Zhang, Dongmei; Zou, Juan

doi:10.1007/s10898-019-00779-w

The seeding algorithms for spherical k-means clustering

Published: 03 May 2019

Volume 76, pages 695–708, (2020)
Cite this article

Journal of Global Optimization Aims and scope Submit manuscript

Min Li¹,
Dachuan Xu²,
Dongmei Zhang³ &
…
Juan Zou⁴

1040 Accesses
30 Citations
Explore all metrics

Abstract

In order to cluster the textual data with high dimension in modern data analysis, the spherical k-means clustering is presented. It aims to partition the given points with unit length into k sets so as to minimize the within-cluster sum of cosine dissimilarity. In this paper, we mainly study seeding algorithms for spherical k-means clustering, for its special case (with separable sets), as well as for its generalized problem (\(\alpha \)-spherical k-means clustering). About the spherical k-means clustering with separable sets, an approximate algorithm with a constant factor is presented. Moreover, it can be generalized to the \(\alpha \)-spherical separable k-means clustering. By slickly constructing a useful function, we also show that the famous seeding algorithms such as k-means++ and k-means|| for k-means problem can be applied directly to solve the \(\alpha \)-spherical k-means clustering. Except for theoretical analysis, the numerical experiment is also included.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

The Seeding Algorithm for Spherical k-Means Clustering with Penalties

Spherical k-Means++ Clustering

A New Deterministic Method of Initializing Spherical K-means for Document Clustering

References

Ackermann, M.R.: Algorithms for the Bregman \(k\)-Median Problem. In: Ph.D. Thesis, University of Paderborn (2009)
Aggarwal, A., Deshpande, A., Kannan, R.: Adaptive sampling for \(k\)-means clustering. In: Proceedings of APPROX and RANDOM, pp. 15–28 (2009)
Ahmadian, S., Norouzi-Fard, A., Svensson, O., Ward, J.: Better guarantees for \(k\)-means and Euclidean \(k\)-median by primal-dual algorithms. In: Proceedings of FOCS, pp. 61–72 (2017)
Aloise, D., Deshpande, A., Hansen, P., Popat, P.: NP-hardness of Euclidean sum-of-squares clustering. Mach. Learn. 75, 245–248 (2009)
Article Google Scholar
Arthur, D., Vassilvitskii, S.: \(k\)-means++: The advantages of careful seeding. In: Proceedings of SODA, pp. 1027–1035 (2007)
Awasthi, P., Charikar, M., Krishnaswamy, R., Sinop, A.K.: The hardness of approximation of Euclidean \(k\)-means. In: Proceedings of SoCG, pp. 754–767 (2015)
Bachem, O., Lucic, M., Hassani, S.H., Krause, A.: Approximate \(k\)-means++ in sublinear time. In: Proceedings of AAAI, pp. 1459–1467 (2016)
Bachem, O., Lucic, M., Hassani, S.H., Krause, A.: Fast and provably good seedings for \(k\)-means. In: Proceedings of NIPS, pp. 55–63 (2016)
Bachem, O., Lucic, M., Krause, A.: Distributed and provably good seedings for \(k\)-means in constant rounds. In: Proceedings of ICML, pp. 292–300 (2017)
Bahmani, B., Moseley, B., Vattani, A., Kumar, R., Vassilvitskii, S.: Scalable \(k\)-means++. In: Proceedings of the VLDB Endowment, pp. 622–633 (2012)
Banerjee, A., Dhillon, I.S., Ghosh, J., Sra, S.: Clustering on the unit hypersphere using von mises-fisher distributions. J. Mach. Learn. Res. 6, 1345–1382 (2005)
MathSciNet MATH Google Scholar
Blömer, J., Lammersen, C., Schmidt, M., Sohler, C.: Theoretical analysis of the \(k\)-means algorithm - a survey. In: Kliemann, L., Sanders, P. (eds.) Algorithm Engineering, Springer International Publishing, pp. 81–116 (2016)
Dhillon, I.S., Modha, D.S.: Concept decompositions for large sparse text data using clustering. Mach. Learn. 42, 143–175 (2001)
Article Google Scholar
Drineas, P., Frieze, A., Kannan, R., Vempala, S., Vinay, V.: Clustering large graphs via the singular value decomposition. Mach. Learn. 56, 9–33 (2004)
Article Google Scholar
Endo, Y., Miyamoto, S.: Spherical \(k\)-Means++ clustering. In: Proceedings of MDAI, pp. 103–114 (2015)
Hornik, K., Feinerer, I., Kober, M., Buchta, C.: Spherical \(k\)-means clustering. J. Stat. Softw. 50, 1–22 (2012)
Article Google Scholar
Lee, E., Schmidt, M., Wright, J.: Improved and simplified inapproximability for \(k\)-means. Inf. Process. Lett. 120, 40–43 (2017)
Article MathSciNet Google Scholar
Lloyd, S.: Least squares quantization in PCM. IEEE Trans. Inf. Theory 28, 21–33 (1982)
Article MathSciNet Google Scholar
Ostrovsky, R., Rabani, Y., Schulman, L.J., Swamy, C.: The effectiveness of Lloyd-type methods for the \(k\)-means problem. J. ACM 59, 139–156 (2012)
Article MathSciNet Google Scholar
Zhao, Y., Karypis, G.: Criterion functions for document clustering: experiments and analysis. In: Technical Report \(\sharp \)01-40, Department of Computer Science, University of Minnesota, November (2001)
Zhong, S.: Efficient online spherical \(k\)-means clustering. In: Proceedings of IJCNN, pp. 3180–3185 (2005)

Download references

Acknowledgements

The first author is supported by the Higher Educational Science and Technology Program of Shandong Province (No. J17KA171). The second author is supported by Natural Science Foundation of China (No. 11531014). The third author is supported by Natural Science Foundation of China (No. 11871081). The fourth author is supported by Natural Science Foundation of China (No. 11801310).

Author information

Authors and Affiliations

School of Mathematics and Statistics, Shandong Normal University, Jinan, 250014, P.R. China
Min Li
Beijing Institute for Scientific and Engineering Computing, Beijing University of Technology, Beijing, 100124, P.R. China
Dachuan Xu
School of Computer Science and Technology, Shandong Jianzhu University, Jinan, 250101, P.R. China
Dongmei Zhang
School of Mathematical Sciences, Qufu Normal University, Qufu, 273165, P.R. China
Juan Zou

Authors

Min Li
View author publications
You can also search for this author in PubMed Google Scholar
Dachuan Xu
View author publications
You can also search for this author in PubMed Google Scholar
Dongmei Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Juan Zou
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Dongmei Zhang.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Li, M., Xu, D., Zhang, D. et al. The seeding algorithms for spherical k-means clustering. J Glob Optim 76, 695–708 (2020). https://doi.org/10.1007/s10898-019-00779-w

Download citation

Received: 28 September 2018
Accepted: 20 April 2019
Published: 03 May 2019
Issue Date: April 2020
DOI: https://doi.org/10.1007/s10898-019-00779-w

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

The seeding algorithms for spherical k-means clustering

Abstract

Access this article

Similar content being viewed by others

The Seeding Algorithm for Spherical k-Means Clustering with Penalties

Spherical k-Means++ Clustering

A New Deterministic Method of Initializing Spherical K-means for Document Clustering

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

The seeding algorithms for spherical k-means clustering

Abstract

Access this article

Similar content being viewed by others

The Seeding Algorithm for Spherical k-Means Clustering with Penalties

Spherical k-Means++ Clustering

A New Deterministic Method of Initializing Spherical K-means for Document Clustering

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation