A Novel Method for Identifying Optimal Number of Clusters with Marginal Differential Entropy

Shu, Bo; Chen, Wei; Niu, Zhendong; Zhang, Changmin; Jiang, Xiaotian

doi:10.1007/978-3-642-39527-7_36

A Novel Method for Identifying Optimal Number of Clusters with Marginal Differential Entropy

Bo Shu²⁴,
Wei Chen²⁵,
Zhendong Niu²⁴,
Changmin Zhang²⁴ &
…
Xiaotian Jiang²⁴

Conference paper

1479 Accesses
1 Citations

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 7901))

Abstract

Clustering evaluation plays an important role in clustering algorithms. Most of recent approaches about clustering that evaluate and identify the optimal number of clusters need to calculate the distances between data points pair-wisely or evaluate the entropy in the entire dimension space and have high computational complexity. In this paper, we propose an entropy-based clustering evaluation method for identifying the optimal number of clusters which first projects the clusters centroids to each of its individual dimensions, then accumulates the marginal differential entropy in each dimension. With the sum of marginal entropies we can analyze the performance and identify the optimal number of clusters. This method can dramatically reduce the computational complexity without losing accuracy. Experiment results show that the proposed method has high stability under various situations and can apply to massive high-dimensional data points.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Wernick, M., Yang, Y., Brankov, J., Yourganov, G., Strother, S.: Machine learning in medical imaging. IEEE Signal Processing Magazine 27(4), 25–38 (2010)
Article Google Scholar
Guyon, I., Elisseeff, A.: An introduction to variable and feature selection. The Journal of Machine Learning Research 3, 1157–1182 (2003)
MATH Google Scholar
Foroutan, I., Sklansky, J.: Feature selection for automatic classification of non-gaussian data. IEEE Transactions on Systems, Man and Cybernetics 17(2), 187–198 (1987)
Article Google Scholar
Wang, J., Wu, X., Zhang, C.: Support vector machines based on k-means clustering for real-time business intelligence systems. International Journal of Business Intelligence and Data Mining 1(1), 54–64 (2005)
Article MathSciNet Google Scholar
Richards, J.A.: Remote sensing digital image analysis. Springer (2012)
Google Scholar
Manning, C.D., Raghavan, P., Schütze, H.: Introduction to information retrieval, vol. 1. Cambridge University Press, Cambridge (2008)
Book MATH Google Scholar
Singhal, A.: Modern information retrieval: A brief overview. IEEE Data Engineering Bulletin 24(4), 35–43 (2001)
Google Scholar
Li, W., Godzik, A.: Cd-hit: a fast program for clustering and comparing large sets of protein or nucleotide sequences. Bioinformatics 22(13), 1658–1659 (2006)
Article Google Scholar
Davies, D.L., Bouldin, D.W.: A cluster separation measure. IEEE Transactions on Pattern Analysis and Machine Intelligence (2), 224–227 (1979)
Google Scholar
Dunn, J.C.: Well-separated clusters and optimal fuzzy partitions. Journal of Cybernetics 4(1), 95–104 (1974)
Article MathSciNet Google Scholar
Rousseeuw, P.J.: Silhouettes: a graphical aid to the interpretation and validation of cluster analysis. Journal of Computational and Applied Mathematics 20, 53–65 (1987)
Article MATH Google Scholar
Tibshirani, R., Walther, G., Hastie, T.: Estimating the number of clusters in a data set via the gap statistic. Journal of the Royal Statistical Society: Series B (Statistical Methodology) 63(2), 411–423 (2001)
Article MathSciNet MATH Google Scholar
Chen, K., Liu, L.: The” best k” for entropy-based categorical data clustering. In: Proceedings of the 17th International Conference on Scientific and Statistical Database Management, pp. 253–262. Lawrence Berkeley Laboratory (2005)
Google Scholar
Ester, M., Kriegel, H.-P., Sander, J., Xu, X.: A density-based algorithm for discovering clusters in large spatial databases with noise. In: KDD (1996)
Google Scholar
Zhang, T., Ramakrishnan, R., Livny, M.: Birch: an efficient data clustering method for very large databases. ACM SIGMOD Record 25, 103–114 (1996)
Article Google Scholar
Agrawal, R., Gehrke, J., Gunopulos, D., Raghavan, P.: Automatic subspace clustering of high dimensional data for data mining applications, vol. 27. ACM (1998)
Google Scholar

Download references

Author information

Authors and Affiliations

School of Computer Science and Technology, Beijing Institute of Technology, Beijing, 100081, China
Bo Shu, Zhendong Niu, Changmin Zhang & Xiaotian Jiang
Agricultural Information Institute, Chinese Academy of Agricultural Sciences, Beijing, 100081, China
Wei Chen

Authors

Bo Shu
View author publications
You can also search for this author in PubMed Google Scholar
Wei Chen
View author publications
You can also search for this author in PubMed Google Scholar
Zhendong Niu
View author publications
You can also search for this author in PubMed Google Scholar
Changmin Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Xiaotian Jiang
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

College of Computer Science, Zhejiang University, Hangzhou, China
Yunjun Gao
Seoul National University, Seoul, Korea
Kyuseok Shim
Institute of Software, Chinese Academy of Sciences, South-Fourth-Street 4, Zhong-Guan-Cun, 100190, Beijing, P.R. China
Zhiming Ding
School of Computer Science and Technology, University of Science and Technology of China, 230027, Hefei, China
Peiquan Jin
School of Computer Science and Technology, Hangzhou Dianzi University, 310018, Hangzhou, China
Zujie Ren
Key Laboratory of Intelligence Computing and Novel Software Technology, Tianjin Key Laboratory of Computer Vision and System, Ministry of Education, Tianjin University of Technology, 300384, Tianjin, China
Yingyuan Xiao
CityU-USTC Advanced Research Institute, Suzhou, China
An Liu
School of Information Science and Technology, Southwest Jiaotong University, 610031, Chengdu, China
Shaojie Qiao

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Shu, B., Chen, W., Niu, Z., Zhang, C., Jiang, X. (2013). A Novel Method for Identifying Optimal Number of Clusters with Marginal Differential Entropy. In: Gao, Y., et al. Web-Age Information Management. WAIM 2013. Lecture Notes in Computer Science, vol 7901. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-39527-7_36

Download citation

DOI: https://doi.org/10.1007/978-3-642-39527-7_36
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-39526-0
Online ISBN: 978-3-642-39527-7
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics