Skip to main content
Log in

The seeding algorithm for k-means problem with penalties

  • Published:
Journal of Combinatorial Optimization Aims and scope Submit manuscript

Abstract

The k-means problem is a classic NP-hard problem in machine learning and computational geometry. And its goal is to separate the given set into k clusters according to the minimal squared distance. The k-means problem with penalties, as one generalization of k-means problem, allows that some point need not be clustered instead of being paid some penalty. In this paper, we study the k-means problem with penalties by using the seeding algorithm. We propose that the accuracy only involves the ratio of the maximal penalty value to the minimal one. When the penalty is uniform, the approximation factor reduces to the same one for the k-means problem. Moreover, our result generalizes the k-means++ for k-means problem to the penalty version. Numerical experiments show that our seeding algorithm is more effective than the one without using seeding.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2

Similar content being viewed by others

References

  • Aggarwal A, Deshpande A, Kannan R (2009) Adaptive sampling for \(k\)-means clustering. In: Proceedings of APPROX and RANDOM, pp. 15–28

  • Ahmadian S, Norouzi-Fard A, Svensson O, Ward J (2017) Better guarantees for \(k\)-means and Euclidean \(k\)-median by primal–dual algorithms. In: Proceedings of FOCS, pp. 61–72

  • Aloise D, Deshpande A, Hansen P, Popat P (2009) NP-hardness of Euclidean sum-of-squares clustering. Mach Learn 75:245–248

    Article  Google Scholar 

  • Arthur D, Vassilvitskii S (2007) \(k\)-means++: The advantages of careful seeding. In: Proceedings of SODA, pp. 1027–1035

  • Awasthi P, Charikar M, Krishnaswamy R, Sinop AK (2015) The hardness of approximation of Euclidean \(k\)-means. In: Proceedings of SoCG, pp. 754–767

  • Bachem O, Lucic M, Hassani SH, Krause A (2016a) Approximate \(k\)-means++ in sublinear time. In: Proceedings of AAAI, pp. 1459–1467

  • Bachem O, Lucic M, Hassani SH, Krause A (2016b) Fast and provably good seedings for \(k\)-means. In: Proceedings of NIPS, pp. 55–63

  • Bachem O, Lucic M, Krause A (2017) Distributed and provably good seedings for \(k\)-means in constant rounds. In: Proceedings of ICML, pp. 292–300

  • Bahmani B, Moseley B, Vattani A, Kumar R, Vassilvitskii S (2012) Scalable \(k\)-means++. In: Proceedings of the VLDB endowment, pp. 622–633

  • Blömer J, Lammersen C, Schmidt M, Sohler C (2016) Theoretical analysis of the \(k\)-means algorithm–a survey. In: Kliemann L, Sanders P (eds) Algorithm engineering, Springer, New York, pp. 81–116

  • Chang XY, Wang Y, Li R, Xu Z (2014) Sparse \(k\)-means with \(l_\infty \)/\(l_0\) penalty for high-dimensional data clustering. arXiv:1403.7890v1

  • Drineas P, Frieze A, Kannan R, Vempala S, Vinay V (2004) Clustering large graphs via the singular value decomposition. Mach Learn 56:9–33

    Article  Google Scholar 

  • Har-Peled S, Sadri B (2005) How fast is the \(k\)-means method? In: Proceedings of SODA, pp. 332–229

  • Kanungo T, Mount DM, Netanyahu NS, Piatko CD, Silverma R, Wu AY (2004) A local search approximation algorithm for \(k\)-means clustering. Comput Geom Theory Appl 28:89–112

    Article  MathSciNet  Google Scholar 

  • Lee E, Schmidt M, Wright J (2017) Improved and simplified inapproximability for \(k\)-means. Inf Process Lett 120:40–43

    Article  MathSciNet  Google Scholar 

  • Lloyd S (1982) Least squares quantization in PCM. IEEE Trans Inf Theory 28:21–33

    Article  MathSciNet  Google Scholar 

  • Ostrovsky R, Rabani Y, Schulman L, Swamy C (2012) The effectiveness of Lloyd-type methods for the \(k\)-means problem. J ACM 59:28:1–28:22

    Article  MathSciNet  Google Scholar 

  • Rezaei M, Färnti P (2016) Set matching measures for external cluster validity. IEEE Trans Knowl Data Eng 28:2173–2186

    Article  Google Scholar 

  • Tseng GC (2007) Penalized and weighted \(k\)-means for clustering with scattered objects and prior information in high-throughput biological data. Bioinformatics 23:2247–2255

    Article  Google Scholar 

  • Xu D, Xu Y, Zhang D (2017) A survey on algorithm for \(k\)-means problem and its variants. Oper Res Trans 21:101–109

    Article  Google Scholar 

  • Zhang D, Hao C, Wu C, Xu D, Zhang Z (2017) A local search approximation algorithm for the \(k\)-means problem with penalties. In: Proceedings of COCOON, pp. 568–574

Download references

Acknowledgements

The research of the first author is supported by Higher Educational Science and Technology Program of Shandong Province (No. J17KA171). The second author is supported by National Natural Science Foundation of China (No. 11531014). The third author is supported by National Natural Science Foundation of China (Nos. 11626148 and 11701342) and Natural Science Foundation of Shandong Province (No. ZR2016AQ01). The fourth author’s research is supported by National Natural Science Foundation of China (No. 11871081). The fifth author is supported by National Natural Science Foundation of China (Nos. 61672323 and 61972228).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Dongmei Zhang.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Li, M., Xu, D., Yue, J. et al. The seeding algorithm for k-means problem with penalties. J Comb Optim 39, 15–32 (2020). https://doi.org/10.1007/s10878-019-00450-w

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10878-019-00450-w

Keywords

Navigation