An accelerated K-means clustering algorithm using selection and erasure rules

Lee, Suiang-Shyan; Lin, Ja-Chen

doi:10.1631/jzus.C1200078

An accelerated K-means clustering algorithm using selection and erasure rules

Published: 10 October 2012

Volume 13, pages 761–768, (2012)
Cite this article

Journal of Zhejiang University SCIENCE C Aims and scope Submit manuscript

Suiang-Shyan Lee¹ &
Ja-Chen Lin¹

247 Accesses
7 Citations
Explore all metrics

Abstract

The K-means method is a well-known clustering algorithm with an extensive range of applications, such as biological classification, disease analysis, data mining, and image compression. However, the plain K-means method is not fast when the number of clusters or the number of data points becomes large. A modified K-means algorithm was presented by Fahim et al. (2006). The modified algorithm produced clusters whose mean square error was very similar to that of the plain K-means, but the execution time was shorter. In this study, we try to further increase its speed. There are two rules in our method: a selection rule, used to acquire a good candidate as the initial center to be checked, and an erasure rule, used to delete one or many unqualified centers each time a specified condition is satisfied. Our clustering results are identical to those of Fahim et al. (2006). However, our method further cuts computation time when the number of clusters increases. The mathematical reasoning used in our design is included.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Survey of Improved k-means Clustering Algorithms: Improvements, Shortcomings and Scope for Further Enhancement and Scalability

An Enhancing K-Means Algorithm Based on Sorting and Partition

K-Means Algorithm Based on Initial Cluster Center Optimization

References

Chen, L.S.T., Su, W.K., Lin, J.C., 2009. Secret image sharing based on vector quantization. Int. J. Circ. Syst. Signal Process., 3(3):137–144.
Google Scholar
Crespo, F., Weber, R., 2005. A methodology for dynamic data mining based on fuzzy clustering. Fuzzy Sets Syst., 150(2): 267–284. [doi:10.1016/j.fss.2004.03.028]
Article MathSciNet MATH Google Scholar
Fahim, A.M., Salem, A.M., Torkey, F.A., Ramadan, M.A., 2006. An efficient enhanced k-means clustering algorithm. J. Zhejiang Univ.-Sci. A, 7(10):1626–1633. [doi:10.1631/jzus.2006.A1626]
Article MATH Google Scholar
Frank, A., Asuncion, A., 2010. UCI Machine Learning Repository. Schools of Information and Computer Science, University of California, Irvine, CA. Available from http://archive.ics.uci.edu/ml [Accessed on July 19, 2012].
Google Scholar
Kong, W.Z., Zhu, S.A., 2007. Multi-face detection based on downsampling and modified subtractive clustering for color images. J. Zhejiang Univ.-Sci. A, 8(1):72–78. [doi:10.1631/jzus.2007.A0072]
Article MATH Google Scholar
Lee, W.J., Chung, J.S., Ouyang, C.S., Lee, S.J., 2007. Vector quantization of images using a fuzzy clustering method. Cybern. Syst., 39(1):45–60. [doi:10.1080/01969720701710139]
Article Google Scholar
Leng, J., Hong, T.P., 2010. Mining outliers in correlated subspaces for high dimensional data sets. Fundam. Inform., 98(1):71–86. [doi:10.3233/FI-2010-217]
MathSciNet Google Scholar
Lin, H.J., Yan, F.W., Kao, Y.T., 2005. An efficient GA-based clustering technique. Tamkang J. Sci. Eng., 8(2):113–122.
Google Scholar
Lin, J.C., 1996. Multi-class clustering by analytical two-class formulas. Int. J. Pattern Recogn. Artif. Intell., 10(4):307–323. [doi:10.1142/S0218001496000220]
Article Google Scholar
Lu, J.F., Tang, J.B., Tang, Z.M., Yang, J.Y., 2008. Hierarchical initialization approach for K-means clustering. Pattern Recogn. Lett., 29(6):787–795. [doi:10.1016/j.patrec.2007.12.009]
Article Google Scholar
Mahajan, M., Nimbhorkar, P., Varadarajan, K., 2009. The Planar K-means Problem is NP-Hard. 3rd Int. Workshop on Algorithms and Computation, p.274–285. [doi:10. 1007/978-3-642-00202-1_24]
Seligson, D.B., Horvath, S., Shi, T., Yu, H., Tze, S., Grunstein, M., Kurdistani, S.K., 2005. Global histone modification patterns predict risk of prostate cancer recurrence. Nature, 435(7046):1262–1266. [doi:10.1038/nature03672]
Article Google Scholar
Theodoridis, S., Koutroumbas, K., 2009. Chapter 13—Clustering Algorithms II: Hierarchical Algorithms. In: Pattern Recognition (4th Ed.). Academic Press, Elsevier, London, p.653–700. [doi:10.1016/B978-1-59749-272-0.50015-3]
Chapter Google Scholar
Wang, R.Z., Tsai, Y.D., 2007. An image-hiding method with high hiding capacity based on best-block matching and k-means clustering. Pattern Recogn., 40(2):398–409. [doi:10.1016/j.patcog.2006.07.015]
Article MATH Google Scholar
Wittkop, T., Emig, D., Lange, S., Rahmann, S., Albrecht, M., Morris, J.H., Bocker, S., Stoye, J., Baumbach, J., 2010. Partitioning biological data with transitivity clustering. Nat. Methods, 7(6):419–420. [doi:10.1038/nmeth0610-419]
Article Google Scholar
Yue, S.H., Li, P., Guo, J.D., Zhou, S.G., 2005. A statistical information-based clustering approach in distance space. J. Zhejiang Univ.-Sci., 6A(1):71–78. [doi:10.1631/jzus.2005.A0071]
Article MATH Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Science, National Chiao Tung University, Taiwan, 30050, Hsinchu
Suiang-Shyan Lee & Ja-Chen Lin

Authors

Suiang-Shyan Lee
View author publications
You can also search for this author in PubMed Google Scholar
Ja-Chen Lin
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Suiang-Shyan Lee.

Additional information

Project (No. 100-2221-E-009-141-MY3) supported by the National Science Council

Rights and permissions

Reprints and permissions

About this article

Cite this article

Lee, SS., Lin, JC. An accelerated K-means clustering algorithm using selection and erasure rules. J. Zhejiang Univ. - Sci. C 13, 761–768 (2012). https://doi.org/10.1631/jzus.C1200078

Download citation

Received: 22 March 2012
Accepted: 23 July 2012
Published: 10 October 2012
Issue Date: October 2012
DOI: https://doi.org/10.1631/jzus.C1200078

Key words

CLC number

TP301.6

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

An accelerated K-means clustering algorithm using selection and erasure rules

Abstract

Access this article

Similar content being viewed by others

Survey of Improved k-means Clustering Algorithms: Improvements, Shortcomings and Scope for Further Enhancement and Scalability

An Enhancing K-Means Algorithm Based on Sorting and Partition

K-Means Algorithm Based on Initial Cluster Center Optimization

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Key words

CLC number

Navigation

An accelerated K-means clustering algorithm using selection and erasure rules

Abstract

Access this article

Similar content being viewed by others

Survey of Improved k-means Clustering Algorithms: Improvements, Shortcomings and Scope for Further Enhancement and Scalability

An Enhancing K-Means Algorithm Based on Sorting and Partition

K-Means Algorithm Based on Initial Cluster Center Optimization

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Key words

CLC number

Search

Navigation