K-normal: An Improved K-means for Dealing with Clusters of Different Sizes

Lu, Yonggang; Qiao, Jiangang; Wang, Xiaochun

doi:10.1007/978-3-319-63315-2_29

K-normal: An Improved K-means for Dealing with Clusters of Different Sizes

Yonggang Lu¹⁷,
Jiangang Qiao¹⁷ &
Xiaochun Wang¹⁷

Conference paper
First Online: 21 July 2017

2353 Accesses
1 Altmetric

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 10363))

Abstract

K-means is the most well-known and widely used classical clustering method, benefited from its efficiency and ease of implementation. But k-means has three main drawbacks: the selection of its initial cluster centers can greatly affect its final results, the number of clusters has to be predefined, and it can only find clusters of similar sizes. A lot of work has been done on improving the selection of the initial cluster centers and on determining the number of clusters. However, very little work has been done on improving k-means to deal with clusters of different sizes. In this paper, we have proposed a new clustering method, called k-normal, whose main idea is to learn cluster sizes during the same process of learning cluster centers. The proposed k-normal method can identify clusters of different sizes while keeping the efficiency of k-means. Although the Expectation Maximization (EM) method based on Gaussian mixture models can also identify the clusters of different sizes, it has a much higher computational complexity than both k-normal and k-means. Experiments on a synthetic dataset and seven real datasets show that, k-normal can outperform k-means on all the datasets. If compared with the EM method, k-normal still produces better results on six out of the eight datasets while enjoys a much higher efficiency.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

Omran, M.G., Engelbrecht, A.P., Salman, A.: An overview of clustering methods. Intell. Data Anal. 11, 583–605 (2007)
Google Scholar
Kleinberg, J.: An impossibility theorem for clustering. In: Advances in Neural Information Processing Systems (NIPS) 15, Vancouver, British Columbia, Canada, 9–14 December, pp. 463–470 (2002)
Google Scholar
Steinhaus, H.: Sur la division des corps matriels en parties. Bull. Acad. Polon. Sci. (in French) 4, 801–804 (1957)
Google Scholar
Jain, A.K.: Data clustering: 50 years beyond k-means. Pattern Recogn. Lett. 31, 651–666 (2010)
Google Scholar
Arthur, D., Vassilvitskii, S.: How slow is the k-means method. In: Proceedings of the Twenty-Second Annual Symposium on Computational Geometry, pp. 144–153 (2006)
Google Scholar
Har-Peled, S., Sadri, B.: How fast is the k-means method. In: Proceedings of the Sixteenth Annual ACM-SIAM Symposium on Discrete Algorithms, Society for Industrial and Applied Mathematics, USA (2005)
Google Scholar
MacKay, D.: An example inference task: clustering. In: Information Theory, Inference and Learning Algorithms, Cambridge University Press (2003)
Google Scholar
Bezdek, J.: A convergence theorem for the fuzzy isodata clustering algorithms. Pattern Anal. Mach. Intell. 2, 1–8 (1980)
Article MATH Google Scholar
Schaefer, G., Zhou, H.: Fuzzy clustering for colour reduction in images. Telecommun. Syst. 40(1-2), 17–25 (2009)
Google Scholar
Zhou, H., Sadka, A.H., Swash, M.R., Azizi, J., Sadiq, U.A.: Feature extraction and clustering for dynamic video summarisation. Neurocomputing 73(10–12), 1718–1729 (2010)
Google Scholar
Arthur, D., Vassilvitskii, S.: k-means++: the advantages of careful seeding. In: Proceedings of the Eighteenth Annual ACM-SIAM Symposium on Discrete Algorithms, pp. 1027–1035 (2007)
Google Scholar
Erisoglu, M., Calis, N., Sakallioglu, S.: A new algorithm for initial cluster centers in k-means algorithm. Pattern Recogn. Lett. 32, 1701–1705 (2011)
Google Scholar
Hung, M.C., Wu, J., Chang, J.H., Yang, D.L.: An efficient k-Means clustering algorithm using simple partitioning. J. Inf. Sci. Eng. 21, 1157–1177 (2005)
Google Scholar
Pelleg, D., Moore, A.W.: X-means: extending k-means with efficient estimation of the number of clusters. In: Proceedings of the Seventeenth International Conference on Machine Learning, pp. 727–734 (2000)
Google Scholar
Liao, K.Y., Liu, G.Z., Xiao, L., Liu, C.T.: A sample-based hierarchical adaptive k-means clustering method for large-scale video retrieval. Knowl.-Based Syst. 49, 123–133 (2013)
Google Scholar
Redner, R.A., Walker, H.F.: Mixture densities, maximum likelihood and the EM algorithm. SIAM Rev. 26, 195–239 (1984)
Article MathSciNet MATH Google Scholar
UCI Machine Learning Repository. http://archive.ics.uci.edu/ml (2017)
Fowlkes, E.B., Mallows, C.L.: A method for comparing two hierarchical clusterings. J. Am. Stat. Assoc. 78, 553–569 (1983)
Article MATH Google Scholar

Download references

Acknowledgment

This work is supported by the National Natural Science Foundation of China (Grants No. 61272213) and the Fundamental research Funds for the Central Universities (Grants No. lzujbky-2016-k07).

Author information

Authors and Affiliations

School of Information Science and Engineering, Lanzhou University, Lanzhou, 730000, Gansu, China
Yonggang Lu, Jiangang Qiao & Xiaochun Wang

Authors

Yonggang Lu
View author publications
You can also search for this author in PubMed Google Scholar
Jiangang Qiao
View author publications
You can also search for this author in PubMed Google Scholar
Xiaochun Wang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Yonggang Lu .

Editor information

Editors and Affiliations

Tongji University, Shanghai, China
De-Shuang Huang
Liverpool John Moores University, Liverpool, United Kingdom
Abir Hussain
Inha University, Incheon, Korea (Republic of)
Kyungsook Han
Indian Institute of Technology Madras, Chennai, India
M. Michael Gromiha

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Lu, Y., Qiao, J., Wang, X. (2017). K-normal: An Improved K-means for Dealing with Clusters of Different Sizes. In: Huang, DS., Hussain, A., Han, K., Gromiha, M. (eds) Intelligent Computing Methodologies. ICIC 2017. Lecture Notes in Computer Science(), vol 10363. Springer, Cham. https://doi.org/10.1007/978-3-319-63315-2_29

Download citation

DOI: https://doi.org/10.1007/978-3-319-63315-2_29
Published: 21 July 2017
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-63314-5
Online ISBN: 978-3-319-63315-2
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics