Skip to main content
Log in

An accelerated K-means clustering algorithm using selection and erasure rules

  • Published:
Journal of Zhejiang University SCIENCE C Aims and scope Submit manuscript

Abstract

The K-means method is a well-known clustering algorithm with an extensive range of applications, such as biological classification, disease analysis, data mining, and image compression. However, the plain K-means method is not fast when the number of clusters or the number of data points becomes large. A modified K-means algorithm was presented by Fahim et al. (2006). The modified algorithm produced clusters whose mean square error was very similar to that of the plain K-means, but the execution time was shorter. In this study, we try to further increase its speed. There are two rules in our method: a selection rule, used to acquire a good candidate as the initial center to be checked, and an erasure rule, used to delete one or many unqualified centers each time a specified condition is satisfied. Our clustering results are identical to those of Fahim et al. (2006). However, our method further cuts computation time when the number of clusters increases. The mathematical reasoning used in our design is included.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  • Chen, L.S.T., Su, W.K., Lin, J.C., 2009. Secret image sharing based on vector quantization. Int. J. Circ. Syst. Signal Process., 3(3):137–144.

    Google Scholar 

  • Crespo, F., Weber, R., 2005. A methodology for dynamic data mining based on fuzzy clustering. Fuzzy Sets Syst., 150(2): 267–284. [doi:10.1016/j.fss.2004.03.028]

    Article  MathSciNet  MATH  Google Scholar 

  • Fahim, A.M., Salem, A.M., Torkey, F.A., Ramadan, M.A., 2006. An efficient enhanced k-means clustering algorithm. J. Zhejiang Univ.-Sci. A, 7(10):1626–1633. [doi:10.1631/jzus.2006.A1626]

    Article  MATH  Google Scholar 

  • Frank, A., Asuncion, A., 2010. UCI Machine Learning Repository. Schools of Information and Computer Science, University of California, Irvine, CA. Available from http://archive.ics.uci.edu/ml [Accessed on July 19, 2012].

    Google Scholar 

  • Kong, W.Z., Zhu, S.A., 2007. Multi-face detection based on downsampling and modified subtractive clustering for color images. J. Zhejiang Univ.-Sci. A, 8(1):72–78. [doi:10.1631/jzus.2007.A0072]

    Article  MATH  Google Scholar 

  • Lee, W.J., Chung, J.S., Ouyang, C.S., Lee, S.J., 2007. Vector quantization of images using a fuzzy clustering method. Cybern. Syst., 39(1):45–60. [doi:10.1080/01969720701710139]

    Article  Google Scholar 

  • Leng, J., Hong, T.P., 2010. Mining outliers in correlated subspaces for high dimensional data sets. Fundam. Inform., 98(1):71–86. [doi:10.3233/FI-2010-217]

    MathSciNet  Google Scholar 

  • Lin, H.J., Yan, F.W., Kao, Y.T., 2005. An efficient GA-based clustering technique. Tamkang J. Sci. Eng., 8(2):113–122.

    Google Scholar 

  • Lin, J.C., 1996. Multi-class clustering by analytical two-class formulas. Int. J. Pattern Recogn. Artif. Intell., 10(4):307–323. [doi:10.1142/S0218001496000220]

    Article  Google Scholar 

  • Lu, J.F., Tang, J.B., Tang, Z.M., Yang, J.Y., 2008. Hierarchical initialization approach for K-means clustering. Pattern Recogn. Lett., 29(6):787–795. [doi:10.1016/j.patrec.2007.12.009]

    Article  Google Scholar 

  • Mahajan, M., Nimbhorkar, P., Varadarajan, K., 2009. The Planar K-means Problem is NP-Hard. 3rd Int. Workshop on Algorithms and Computation, p.274–285. [doi:10. 1007/978-3-642-00202-1_24]

  • Seligson, D.B., Horvath, S., Shi, T., Yu, H., Tze, S., Grunstein, M., Kurdistani, S.K., 2005. Global histone modification patterns predict risk of prostate cancer recurrence. Nature, 435(7046):1262–1266. [doi:10.1038/nature03672]

    Article  Google Scholar 

  • Theodoridis, S., Koutroumbas, K., 2009. Chapter 13—Clustering Algorithms II: Hierarchical Algorithms. In: Pattern Recognition (4th Ed.). Academic Press, Elsevier, London, p.653–700. [doi:10.1016/B978-1-59749-272-0.50015-3]

    Chapter  Google Scholar 

  • Wang, R.Z., Tsai, Y.D., 2007. An image-hiding method with high hiding capacity based on best-block matching and k-means clustering. Pattern Recogn., 40(2):398–409. [doi:10.1016/j.patcog.2006.07.015]

    Article  MATH  Google Scholar 

  • Wittkop, T., Emig, D., Lange, S., Rahmann, S., Albrecht, M., Morris, J.H., Bocker, S., Stoye, J., Baumbach, J., 2010. Partitioning biological data with transitivity clustering. Nat. Methods, 7(6):419–420. [doi:10.1038/nmeth0610-419]

    Article  Google Scholar 

  • Yue, S.H., Li, P., Guo, J.D., Zhou, S.G., 2005. A statistical information-based clustering approach in distance space. J. Zhejiang Univ.-Sci., 6A(1):71–78. [doi:10.1631/jzus.2005.A0071]

    Article  MATH  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Suiang-Shyan Lee.

Additional information

Project (No. 100-2221-E-009-141-MY3) supported by the National Science Council

Rights and permissions

Reprints and permissions

About this article

Cite this article

Lee, SS., Lin, JC. An accelerated K-means clustering algorithm using selection and erasure rules. J. Zhejiang Univ. - Sci. C 13, 761–768 (2012). https://doi.org/10.1631/jzus.C1200078

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1631/jzus.C1200078

Key words

CLC number

Navigation