Skip to main content

The seeding algorithm for spherical k-means clustering with penalties

  • Published:
Journal of Combinatorial Optimization Aims and scope Submit manuscript

Abstract

Spherical k-means clustering as a known NP-hard variant of the k-means problem has broad applications in data mining. In contrast to k-means, it aims to partition a collection of given data distributed on a spherical surface into k sets so as to minimize the within-cluster sum of cosine dissimilarity. In the paper, we introduce spherical k-means clustering with penalties and give a \(2\max \{2,M\}(1+M)(\ln k+2)\)-approximation algorithm. Moreover, we prove that when against spherical k-means clustering with penalties but on separable instances, our algorithm is with an approximation ratio \(2\max \{3,M+1\}\) with high probability, where M is the ratio of the maximal and the minimal penalty cost of the given data set.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  • Ahmadian S, Norouzi-Fard A, Svensson O, Ward J (2017) Better guarantees for \(k\)-means and Euclidean \(k\)-median by primal–dual algorithms. In: Proceedings of the 58th annual IEEE symposium on foundations of computer science (FOCS), pp 61–72

  • Aloise D, Deshpande A, Hansen P (2009) NP-hardness of Euclidean sum-of-squares clustering. Mach Learn 75(2):245–248

    Article  Google Scholar 

  • Arthur D, Vassilvitskii S (2006) How slow is the k-means method? In: Proceedings of the 22th symposium on computational geometry (SoCG), pp 144-153

  • Arthur D, Vassilvitskii S (2007) \(k\)-means++: the advantages of careful seeding, In: Proceedings of the 18th annual ACM-SIAM symposium on discrete algorithms (SODA), pp 1027–1035

  • Awasthi P, Charikar M, Krishnaswamy R, Sinop A (2015) The hardness of approximation of Euclidean \(k\)-means. In: Proceedings of the 31st symposium on computational geometry (SoCG), pp 754–767

  • Bahmani B, Moseley B, Vattani A, Kumar R, Vassilvitskii S (2012) Scalable k-means++. Proc VLDB Endow 5(7):622–633

  • Blömer J, Lammersen C, Schmidt M, Sohler C (2016) Theoretical analysis of the k-means algorithm – a survey. In: Kliemann L, Sanders P (eds) Algorithm engineering. Lecture notes in computer science, vol 9220. Springer, Cham, pp 81–116

  • Blömer J, Brauer S, Bujna K (2017) A theoretical analysis of the fuzzy \(k\)-means problem, In: Proceedings of the 16th IEEE international conference on data mining (ICDM), pp 805–810

  • Cohen-Addad V, Klein PN, Mathieu C (2019) Local search yields approximation schemes for \(k\)-means and \(k\)-median in Euclidean and minor-free metrics. SIAM J Comput 48(2):644–667

    Article  MathSciNet  Google Scholar 

  • Dhillon I, Modha D (2001) Concept decompositions for large sparse text data using clustering. Mach Learn 42(1–2):143–175

    Article  Google Scholar 

  • Drineas P, Frieze A, Kannan R, Vempala V (2004) Clustering large graphs via the singular value decomposition. Mach Learn 56(1–3):9–33

    Article  Google Scholar 

  • Endo Y, Miyamoto S (2015) Spherical \(k\)-means++ clustering. In: Proceedings of the 16th international conference on modeling decisions for artificial intelligence (MDAI), pp 103-114

  • Gupta S, Kumar R, Lu K, Moseley B, Vassilvitskii S (2017) Local search methods for \(k\)-means with outliers. Proc VLDB Endow 10(7):757–768

    Article  Google Scholar 

  • Hornik K, Feinerer I, Kober M, Buchata M (2015) Spherical \(k\)-means clustering. J Stat Softw 50(10):1–22

    Google Scholar 

  • Kanungo T, Mount D, Netanyahu N, Piatko C, Silverma R (2004) A local search approximation algorithm for \(k\)-means clustering. Comput Geom 28(2–3):89–112

    Article  MathSciNet  Google Scholar 

  • Li M, Xu D, Zhang D, Zou J (2019) The seeding algorithms for spherical \(k\)-means clustering. J Glob Optim 76(4): 695–708

  • Li M, Xu D, Yue J, Zhang D, Zhang P (2020) The seeding slgorithm for \(k\)-means problem with penalties. J Comb Optim 39(1):15–32

    Article  MathSciNet  Google Scholar 

  • Lloyd S (1982) Least squares quantization in PCM. IEEE Trans Inf Theory 28(2):129–137

    Article  MathSciNet  Google Scholar 

  • Moriya T, Roth H, Nakamura S, Oda H, Kai N, Oda M (2018) Unsupervised pathology image segmentation using representation learning with spherical \(k\)-means. In: Proceeding SPIE 10581, Medical Imaging 2018: Digital Pathology, 1058111

  • Tunali V, Bilgin T, Camurcu A (2016) An improved clustering algorithm for text mining: multi-cluster spherical \(K\)-means. Int Arab J Inf Technol 13(1):12–19

    Google Scholar 

  • Vattani A (2011) K-means requires exponentially many iterations even in the plane. Discrete Comput Geom 45(4):596–616

    Article  MathSciNet  Google Scholar 

  • Xu J, Han J, Xiong K, Nie F (2016) Robust and sparse fuzzy \(k\)-means clustering. In: Proceedings 25th international joint conference on artificial intelligence (IJCAI), pp 2224–2230

  • Xu D, Xu Y, Zhang D (2017) A survey on algorithm for \(k\)-means and its variants. Oper Res Trans 21:101–109 (in Chinese)

    MATH  Google Scholar 

Download references

Acknowledgements

The authors Sai Ji, Dachuan Xu and Dongmei Zhang are supported by National Natural Science Foundation of China (No. 11871081). The third author Longkun Guo is supported by National Natural Science Foundation of China (No. 61772005) and Natural Science Foundation of Fujian province (No. 2017J01753). The fourth author Min Li is supported by Higher Educational Science and Technology Program of Shandong Province (No. J17KA171) and Natural Science Foundation of Shandong Province (No. ZR2019MA032) of China.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Longkun Guo.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

A preliminary version of this paper appeared in Proceedings of the 13th International Conference on Algorithmic Aspects in Information and Management, pp. 149–158, 2019.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Ji, S., Xu, D., Guo, L. et al. The seeding algorithm for spherical k-means clustering with penalties. J Comb Optim 44, 1977–1994 (2022). https://doi.org/10.1007/s10878-020-00569-1

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10878-020-00569-1

Keywords