Skip to main content
Log in

The seeding algorithms for spherical k-means clustering

  • Published:
Journal of Global Optimization Aims and scope Submit manuscript

Abstract

In order to cluster the textual data with high dimension in modern data analysis, the spherical k-means clustering is presented. It aims to partition the given points with unit length into k sets so as to minimize the within-cluster sum of cosine dissimilarity. In this paper, we mainly study seeding algorithms for spherical k-means clustering, for its special case (with separable sets), as well as for its generalized problem (\(\alpha \)-spherical k-means clustering). About the spherical k-means clustering with separable sets, an approximate algorithm with a constant factor is presented. Moreover, it can be generalized to the \(\alpha \)-spherical separable k-means clustering. By slickly constructing a useful function, we also show that the famous seeding algorithms such as k-means++ and k-means|| for k-means problem can be applied directly to solve the \(\alpha \)-spherical k-means clustering. Except for theoretical analysis, the numerical experiment is also included.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. Ackermann, M.R.: Algorithms for the Bregman \(k\)-Median Problem. In: Ph.D. Thesis, University of Paderborn (2009)

  2. Aggarwal, A., Deshpande, A., Kannan, R.: Adaptive sampling for \(k\)-means clustering. In: Proceedings of APPROX and RANDOM, pp. 15–28 (2009)

  3. Ahmadian, S., Norouzi-Fard, A., Svensson, O., Ward, J.: Better guarantees for \(k\)-means and Euclidean \(k\)-median by primal-dual algorithms. In: Proceedings of FOCS, pp. 61–72 (2017)

  4. Aloise, D., Deshpande, A., Hansen, P., Popat, P.: NP-hardness of Euclidean sum-of-squares clustering. Mach. Learn. 75, 245–248 (2009)

    Article  Google Scholar 

  5. Arthur, D., Vassilvitskii, S.: \(k\)-means++: The advantages of careful seeding. In: Proceedings of SODA, pp. 1027–1035 (2007)

  6. Awasthi, P., Charikar, M., Krishnaswamy, R., Sinop, A.K.: The hardness of approximation of Euclidean \(k\)-means. In: Proceedings of SoCG, pp. 754–767 (2015)

  7. Bachem, O., Lucic, M., Hassani, S.H., Krause, A.: Approximate \(k\)-means++ in sublinear time. In: Proceedings of AAAI, pp. 1459–1467 (2016)

  8. Bachem, O., Lucic, M., Hassani, S.H., Krause, A.: Fast and provably good seedings for \(k\)-means. In: Proceedings of NIPS, pp. 55–63 (2016)

  9. Bachem, O., Lucic, M., Krause, A.: Distributed and provably good seedings for \(k\)-means in constant rounds. In: Proceedings of ICML, pp. 292–300 (2017)

  10. Bahmani, B., Moseley, B., Vattani, A., Kumar, R., Vassilvitskii, S.: Scalable \(k\)-means++. In: Proceedings of the VLDB Endowment, pp. 622–633 (2012)

  11. Banerjee, A., Dhillon, I.S., Ghosh, J., Sra, S.: Clustering on the unit hypersphere using von mises-fisher distributions. J. Mach. Learn. Res. 6, 1345–1382 (2005)

    MathSciNet  MATH  Google Scholar 

  12. Blömer, J., Lammersen, C., Schmidt, M., Sohler, C.: Theoretical analysis of the \(k\)-means algorithm - a survey. In: Kliemann, L., Sanders, P. (eds.) Algorithm Engineering, Springer International Publishing, pp. 81–116 (2016)

  13. Dhillon, I.S., Modha, D.S.: Concept decompositions for large sparse text data using clustering. Mach. Learn. 42, 143–175 (2001)

    Article  Google Scholar 

  14. Drineas, P., Frieze, A., Kannan, R., Vempala, S., Vinay, V.: Clustering large graphs via the singular value decomposition. Mach. Learn. 56, 9–33 (2004)

    Article  Google Scholar 

  15. Endo, Y., Miyamoto, S.: Spherical \(k\)-Means++ clustering. In: Proceedings of MDAI, pp. 103–114 (2015)

  16. Hornik, K., Feinerer, I., Kober, M., Buchta, C.: Spherical \(k\)-means clustering. J. Stat. Softw. 50, 1–22 (2012)

    Article  Google Scholar 

  17. Lee, E., Schmidt, M., Wright, J.: Improved and simplified inapproximability for \(k\)-means. Inf. Process. Lett. 120, 40–43 (2017)

    Article  MathSciNet  Google Scholar 

  18. Lloyd, S.: Least squares quantization in PCM. IEEE Trans. Inf. Theory 28, 21–33 (1982)

    Article  MathSciNet  Google Scholar 

  19. Ostrovsky, R., Rabani, Y., Schulman, L.J., Swamy, C.: The effectiveness of Lloyd-type methods for the \(k\)-means problem. J. ACM 59, 139–156 (2012)

    Article  MathSciNet  Google Scholar 

  20. Zhao, Y., Karypis, G.: Criterion functions for document clustering: experiments and analysis. In: Technical Report \(\sharp \)01-40, Department of Computer Science, University of Minnesota, November (2001)

  21. Zhong, S.: Efficient online spherical \(k\)-means clustering. In: Proceedings of IJCNN, pp. 3180–3185 (2005)

Download references

Acknowledgements

The first author is supported by the Higher Educational Science and Technology Program of Shandong Province (No. J17KA171). The second author is supported by Natural Science Foundation of China (No. 11531014). The third author is supported by Natural Science Foundation of China (No. 11871081). The fourth author is supported by Natural Science Foundation of China (No. 11801310).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Dongmei Zhang.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Li, M., Xu, D., Zhang, D. et al. The seeding algorithms for spherical k-means clustering. J Glob Optim 76, 695–708 (2020). https://doi.org/10.1007/s10898-019-00779-w

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10898-019-00779-w

Keywords

Navigation