The seeding algorithm for k-means problem with penalties

Li, Min; Xu, Dachuan; Yue, Jun; Zhang, Dongmei; Zhang, Peng

doi:10.1007/s10878-019-00450-w

The seeding algorithm for k-means problem with penalties

Published: 26 September 2019

Volume 39, pages 15–32, (2020)
Cite this article

Journal of Combinatorial Optimization Aims and scope Submit manuscript

Min Li¹,
Dachuan Xu²,
Jun Yue¹,
Dongmei Zhang³ &
…
Peng Zhang⁴

431 Accesses
18 Citations
Explore all metrics

Abstract

The k-means problem is a classic NP-hard problem in machine learning and computational geometry. And its goal is to separate the given set into k clusters according to the minimal squared distance. The k-means problem with penalties, as one generalization of k-means problem, allows that some point need not be clustered instead of being paid some penalty. In this paper, we study the k-means problem with penalties by using the seeding algorithm. We propose that the accuracy only involves the ratio of the maximal penalty value to the minimal one. When the penalty is uniform, the approximation factor reduces to the same one for the k-means problem. Moreover, our result generalizes the k-means++ for k-means problem to the penalty version. Numerical experiments show that our seeding algorithm is more effective than the one without using seeding.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

The bi-criteria seeding algorithms for two variants of k-means problem

Article 12 February 2020

Min Li

An Approximation Algorithm Based on Seeding Algorithm for Fuzzy k-Means Problem with Penalties

Article 21 May 2022

Wen-Zhao Liu & Min Li

k-means++ under Approximation Stability

References

Aggarwal A, Deshpande A, Kannan R (2009) Adaptive sampling for \(k\)-means clustering. In: Proceedings of APPROX and RANDOM, pp. 15–28
Ahmadian S, Norouzi-Fard A, Svensson O, Ward J (2017) Better guarantees for \(k\)-means and Euclidean \(k\)-median by primal–dual algorithms. In: Proceedings of FOCS, pp. 61–72
Aloise D, Deshpande A, Hansen P, Popat P (2009) NP-hardness of Euclidean sum-of-squares clustering. Mach Learn 75:245–248
Article Google Scholar
Arthur D, Vassilvitskii S (2007) \(k\)-means++: The advantages of careful seeding. In: Proceedings of SODA, pp. 1027–1035
Awasthi P, Charikar M, Krishnaswamy R, Sinop AK (2015) The hardness of approximation of Euclidean \(k\)-means. In: Proceedings of SoCG, pp. 754–767
Bachem O, Lucic M, Hassani SH, Krause A (2016a) Approximate \(k\)-means++ in sublinear time. In: Proceedings of AAAI, pp. 1459–1467
Bachem O, Lucic M, Hassani SH, Krause A (2016b) Fast and provably good seedings for \(k\)-means. In: Proceedings of NIPS, pp. 55–63
Bachem O, Lucic M, Krause A (2017) Distributed and provably good seedings for \(k\)-means in constant rounds. In: Proceedings of ICML, pp. 292–300
Bahmani B, Moseley B, Vattani A, Kumar R, Vassilvitskii S (2012) Scalable \(k\)-means++. In: Proceedings of the VLDB endowment, pp. 622–633
Blömer J, Lammersen C, Schmidt M, Sohler C (2016) Theoretical analysis of the \(k\)-means algorithm–a survey. In: Kliemann L, Sanders P (eds) Algorithm engineering, Springer, New York, pp. 81–116
Chang XY, Wang Y, Li R, Xu Z (2014) Sparse \(k\)-means with \(l_\infty \)/\(l_0\) penalty for high-dimensional data clustering. arXiv:1403.7890v1
Drineas P, Frieze A, Kannan R, Vempala S, Vinay V (2004) Clustering large graphs via the singular value decomposition. Mach Learn 56:9–33
Article Google Scholar
Har-Peled S, Sadri B (2005) How fast is the \(k\)-means method? In: Proceedings of SODA, pp. 332–229
Kanungo T, Mount DM, Netanyahu NS, Piatko CD, Silverma R, Wu AY (2004) A local search approximation algorithm for \(k\)-means clustering. Comput Geom Theory Appl 28:89–112
Article MathSciNet Google Scholar
Lee E, Schmidt M, Wright J (2017) Improved and simplified inapproximability for \(k\)-means. Inf Process Lett 120:40–43
Article MathSciNet Google Scholar
Lloyd S (1982) Least squares quantization in PCM. IEEE Trans Inf Theory 28:21–33
Article MathSciNet Google Scholar
Ostrovsky R, Rabani Y, Schulman L, Swamy C (2012) The effectiveness of Lloyd-type methods for the \(k\)-means problem. J ACM 59:28:1–28:22
Article MathSciNet Google Scholar
Rezaei M, Färnti P (2016) Set matching measures for external cluster validity. IEEE Trans Knowl Data Eng 28:2173–2186
Article Google Scholar
Tseng GC (2007) Penalized and weighted \(k\)-means for clustering with scattered objects and prior information in high-throughput biological data. Bioinformatics 23:2247–2255
Article Google Scholar
Xu D, Xu Y, Zhang D (2017) A survey on algorithm for \(k\)-means problem and its variants. Oper Res Trans 21:101–109
Article Google Scholar
Zhang D, Hao C, Wu C, Xu D, Zhang Z (2017) A local search approximation algorithm for the \(k\)-means problem with penalties. In: Proceedings of COCOON, pp. 568–574

Download references

Acknowledgements

The research of the first author is supported by Higher Educational Science and Technology Program of Shandong Province (No. J17KA171). The second author is supported by National Natural Science Foundation of China (No. 11531014). The third author is supported by National Natural Science Foundation of China (Nos. 11626148 and 11701342) and Natural Science Foundation of Shandong Province (No. ZR2016AQ01). The fourth author’s research is supported by National Natural Science Foundation of China (No. 11871081). The fifth author is supported by National Natural Science Foundation of China (Nos. 61672323 and 61972228).

Author information

Authors and Affiliations

School of Mathematics and Statistics, Shandong Normal University, Jinan, 250014, People’s Republic of China
Min Li & Jun Yue
Department of Operations Research and Scientific Computing, College of Applied Sciences, Beijing University of Technology, Beijing, 100124, People’s Republic of China
Dachuan Xu
School of Computer Science and Technology, Shandong Jianzhu University, Jinan, 250101, People’s Republic of China
Dongmei Zhang
School of Software, Shandong University, Jinan, 250101, People’s Republic of China
Peng Zhang

Authors

Min Li
View author publications
You can also search for this author in PubMed Google Scholar
Dachuan Xu
View author publications
You can also search for this author in PubMed Google Scholar
Jun Yue
View author publications
You can also search for this author in PubMed Google Scholar
Dongmei Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Peng Zhang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Dongmei Zhang.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Li, M., Xu, D., Yue, J. et al. The seeding algorithm for k-means problem with penalties. J Comb Optim 39, 15–32 (2020). https://doi.org/10.1007/s10878-019-00450-w

Download citation

Published: 26 September 2019
Issue Date: January 2020
DOI: https://doi.org/10.1007/s10878-019-00450-w

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

The seeding algorithm for k-means problem with penalties

Abstract

Access this article

Similar content being viewed by others

The bi-criteria seeding algorithms for two variants of k-means problem

An Approximation Algorithm Based on Seeding Algorithm for Fuzzy k-Means Problem with Penalties

k-means++ under Approximation Stability

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

The seeding algorithm for k-means problem with penalties

Abstract

Access this article

Similar content being viewed by others

The bi-criteria seeding algorithms for two variants of k-means problem

An Approximation Algorithm Based on Seeding Algorithm for Fuzzy k-Means Problem with Penalties

k-means++ under Approximation Stability

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation