Fast Estimation for the Number of Clusters

Zhang, Xiaohong; He, Zhenzhen; Jia, Zongpu; Ren, Jianji

doi:10.1007/978-3-030-63941-9_27

Xiaohong Zhang²¹,
Zhenzhen He²¹,
Zongpu Jia²¹ &
…
Jianji Ren²¹

Part of the book series: Lecture Notes of the Institute for Computer Sciences, Social Informatics and Telecommunications Engineering ((LNICST,volume 337))

Included in the following conference series:

International Conference on 5G for Future Wireless Networks

699 Accesses

Abstract

Clustering analysis has been widely used in many areas. In many cases, the number of clusters is required to been assigned artificially, while inappropriate assignments affect analysis negatively. Many solutions have been proposed to estimate the optimal number of clusters. However, the accuracy of those solutions drop severely on overlapping data sets. To handle the accuracy problem, we propose a fast estimation solution based on the cluster centers selected in a static way. In the solution, each data point is assigned with one score calculated according to a density-distance model. The score of each data point does not change any more once it is generated. The solution takes the top k data points with the highest scores as the centers of k clusters. It utilizes the significant change of the minimal distance between cluster centers to identify the optimal number of the clusters in overlapping data sets. The experiment results verify the usefulness and effectiveness of our solution.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 99.00; Price excludes VAT (USA)

Softcover Book: USD 129.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Slice_OP: Selecting Initial Cluster Centers Using Observation Points

Enhancing Cluster Center Identification in Density Peak Clustering

Non-hierarchical Clustering for Large Data Without Recalculating Cluster Center

References

Anil, K.: Data clustering: 50 years beyond K-means. Pattern Recogn. Lett. 31(8), 651–666 (2010)
Article Google Scholar
He, Z., Jia, Z., Zhang, X.: A fast method for estimating the number of clusters based on score and the minimum distance of the center point. Information 11, 16 (2020)
Article Google Scholar
Chen, Z.W., Chang, D.X.: Automatic clustering algorithm base on density difference. J. Softw. 29(4), 935–944 (2018)
MATH Google Scholar
Jia, R.Y., Li, Z.: The level of K-means clustering algorithm base on minimum spanning tree. Microelectron. Comput. 33(3), 86–93 (2016)
Google Scholar
Ünlü, R., Xanthopoulos, P.: Estimating the number of clusters in a dataset via consensus clustering. Expert Syst. Appl. 125, 33–39 (2019)
Article Google Scholar
Bai, L., Cheng, X., Liang, J., Shen, H., Guo, Y.: Fast density clustering strategies based on the k-means algorithm. Pattern Recogn. 71, 375–386 (2017)
Article Google Scholar
Davies, D.L., Bouldin, D.W.: A cluster separation measure. IEEE Trans. Pattern Anal. Mach. Intell. 1(2), 224–227 (1979)
Article Google Scholar
Maulik, U., Bandyopadhyay, S.: Performance evaluation of some clustering algorithms and validity indices. IEEE Trans. Pattern Anal. Mach. Intell. 24(12), 1650–1654 (2002)
Article Google Scholar
Beni, G., Xie, X.: A validity measure for fuzzy clustering. IEEE Trans. Pattern Anal. Mach. Intell. 13(8), 841–847 (1991)
Article Google Scholar
Rodriguez, A., Laio, A.: Machine learning clustering by fast search and find of density peaks. Science 344(619), 1492 (2014)
Article Google Scholar
Gupta, A., Datta, S., Das, S.: Fast automatic estimation of the number of clusters from the minimum inter-center distance for k-means clustering. Pattern Recogn. Lett. 116, 72–79 (2018)
Article Google Scholar
He, L., Wu, L.D., Cai, Y.C.: Survey of clustering algorithms in data mining. Appl. Res. Comput. 71, 375–386 (2017)
Google Scholar
Macqueen, J.: Some methods for classification and analysis of multivariate observations. In: Fifth Berkeley Symposium on Mathematical Statistics and Probability, vol. 1, no. 14, pp. 281–297. California Press, Berkely (1967)
Google Scholar
Zhai, D.H., Yu, J., Gao, F.: K-means text clustering algorithm based on initial cluster centers selection according to maximum distance. Appl. Res. Comput. 31(3), 713–719 (2014)
Google Scholar
de Amorim, R.C., Hennig, C.: Recovering the number of clusters in data sets with noise features using feature rescaling factors. Inf. Sci. 324, 126–145 (2015)
Article MathSciNet Google Scholar
Teklehaymanot, F.K., Muma, M., Zoubir, A.M.: A novel Bayesian cluster enumeration criterion for unsupervised learning. IEEE Trans. Signal Process 66(20), 5392–5406 (2018)
Article MathSciNet Google Scholar
Bensaid, A.M., Hall, L.O., Bezdek, J.C.: Validity-guided (re)clustering with applications to image segmentation. IEEE Trans. Fuzzy Syst. 4, 112–123 (1996)
Article Google Scholar
Ren, M., Liu, P., Wang, Z., Yi, J.: A self-adaptive fuzzy c-means algorithm for determining the optimal number of clusters. Comput. Intell. Neurosci. 3–15 (2016)
Google Scholar
Sweeney, T.E., Chen, A.C., Gevaert, O.: Combined mapping of multiple clustering algorithms (COMMUNAL): a robust method for selection of cluster number, K. Sci. Rep. 5, 16971 (2015)
Article Google Scholar
Wang, M., Abrams, Z.B., Kornblau, S.M.: Thresher: determining the number of clusters while removing outliers. BMC Bioinformatics 19(1), 9 (2018)
Article Google Scholar
Kingrani, S.K., Levene, M., Zhang, D.: Estimating the number of clusters using diversity. Artif. Intell. Res. 7(1), 15 (2018)
Article Google Scholar
Doan, H., Nguyen, D.: A method for finding the appropriate number of clusters. Int. Arab J. Inf. Technol. 15(4), 675–682 (2018)
Google Scholar
Wang, Y., Shi, Z., Guo, X., Liu, X., Zhu, E., Yin, J.: Deep embedding for determining the number of clusters. In: AAAI (2018)
Google Scholar
Caliński, T., Harabasz, J.: A dendrite method for cluster analysis. Commun. Stat. 3(1), 1–27 (1974)
MathSciNet MATH Google Scholar
Bezdek, J.C.: Mathematical models for systematics and taxonomy. In: Eighth International Conference on Numerical Taxonomy, vol. 3, pp. 143–166 (1975)
Google Scholar
Dave, R.N.: Validating fuzzy partitions obtained through c-shells clustering. Pattern Recognit. Lett. 17(6), 613–623 (1996)
Article MathSciNet Google Scholar
Sugar, C.A., James, G.M.: Finding the number of clusters in a dataset: an information-theoretic approach. J. Am. Stat. Assoc. 98(463), 750–763 (2003)
Article MathSciNet Google Scholar
Bezdek, J.C.: Cluster validity with fuzzy sets. J. Cybernet. 3(3), 58–73 (1973)
Article MathSciNet Google Scholar
Pakhira, M.K., Bandyopadhyay, S., Maulik, U.: Validity index for crisp and fuzzy clusters. Pattern Recognit. 37(3), 487–501 (2004)
Article Google Scholar
Zhao, Q., Xu, M., Fränti, P.: Sum-of-squares based cluster validity index and significance analysis. In: Kolehmainen, M., Toivanen, P., Beliczynski, B. (eds.) ICANNGA 2009. LNCS, vol. 5495, pp. 313–322. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-642-04921-7_32
Chapter Google Scholar

Download references

Acknowledgement

This work is supported by the National Natural Science Foundation of China (61602156 and 61433012), the project of the Scientific and Technological in Henan province (172102310677), the project of the Basic and Frontier Technology in Henan province (142300 410147) and the PhD foundation of Henan Polytechnic university (B2012-099).

Author information

Authors and Affiliations

College of Computer Science and Technology, Henan Polytechnic University, Jiaozuo, 45400, Henan, China
Xiaohong Zhang, Zhenzhen He, Zongpu Jia & Jianji Ren

Authors

Xiaohong Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Zhenzhen He
View author publications
You can also search for this author in PubMed Google Scholar
Zongpu Jia
View author publications
You can also search for this author in PubMed Google Scholar
Jianji Ren
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Jianji Ren .

Editor information

Editors and Affiliations

College of Intelligence and Computing, Tianjin University, Tianjin, Tianjin, China
Xiaofei Wang
Department of Electrical and Computer Engineering, The University of British Columbia, Vancouver, BC, Canada
Victor C. M. Leung
School of Computer Science and Technology, Tianjin University, Tianjin, Tianjin, China
Keqiu Li
University of Science and Technology, Beijing, China
Haijun Zhang
Shenzhen University, Shenzhen, China
Xiping Hu
National University of Defense Technology, Changsha, China
Qiang Liu

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Zhang, X., He, Z., Jia, Z., Ren, J. (2020). Fast Estimation for the Number of Clusters. In: Wang, X., Leung, V.C.M., Li, K., Zhang, H., Hu, X., Liu, Q. (eds) 6GN for Future Wireless Networks. 6GN 2020. Lecture Notes of the Institute for Computer Sciences, Social Informatics and Telecommunications Engineering, vol 337. Springer, Cham. https://doi.org/10.1007/978-3-030-63941-9_27

Download citation

DOI: https://doi.org/10.1007/978-3-030-63941-9_27
Published: 29 January 2021
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-63940-2
Online ISBN: 978-3-030-63941-9
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics