Abstract
Clustering analysis has been widely used in many areas. In many cases, the number of clusters is required to been assigned artificially, while inappropriate assignments affect analysis negatively. Many solutions have been proposed to estimate the optimal number of clusters. However, the accuracy of those solutions drop severely on overlapping data sets. To handle the accuracy problem, we propose a fast estimation solution based on the cluster centers selected in a static way. In the solution, each data point is assigned with one score calculated according to a density-distance model. The score of each data point does not change any more once it is generated. The solution takes the top k data points with the highest scores as the centers of k clusters. It utilizes the significant change of the minimal distance between cluster centers to identify the optimal number of the clusters in overlapping data sets. The experiment results verify the usefulness and effectiveness of our solution.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Anil, K.: Data clustering: 50 years beyond K-means. Pattern Recogn. Lett. 31(8), 651–666 (2010)
He, Z., Jia, Z., Zhang, X.: A fast method for estimating the number of clusters based on score and the minimum distance of the center point. Information 11, 16 (2020)
Chen, Z.W., Chang, D.X.: Automatic clustering algorithm base on density difference. J. Softw. 29(4), 935–944 (2018)
Jia, R.Y., Li, Z.: The level of K-means clustering algorithm base on minimum spanning tree. Microelectron. Comput. 33(3), 86–93 (2016)
Ünlü, R., Xanthopoulos, P.: Estimating the number of clusters in a dataset via consensus clustering. Expert Syst. Appl. 125, 33–39 (2019)
Bai, L., Cheng, X., Liang, J., Shen, H., Guo, Y.: Fast density clustering strategies based on the k-means algorithm. Pattern Recogn. 71, 375–386 (2017)
Davies, D.L., Bouldin, D.W.: A cluster separation measure. IEEE Trans. Pattern Anal. Mach. Intell. 1(2), 224–227 (1979)
Maulik, U., Bandyopadhyay, S.: Performance evaluation of some clustering algorithms and validity indices. IEEE Trans. Pattern Anal. Mach. Intell. 24(12), 1650–1654 (2002)
Beni, G., Xie, X.: A validity measure for fuzzy clustering. IEEE Trans. Pattern Anal. Mach. Intell. 13(8), 841–847 (1991)
Rodriguez, A., Laio, A.: Machine learning clustering by fast search and find of density peaks. Science 344(619), 1492 (2014)
Gupta, A., Datta, S., Das, S.: Fast automatic estimation of the number of clusters from the minimum inter-center distance for k-means clustering. Pattern Recogn. Lett. 116, 72–79 (2018)
He, L., Wu, L.D., Cai, Y.C.: Survey of clustering algorithms in data mining. Appl. Res. Comput. 71, 375–386 (2017)
Macqueen, J.: Some methods for classification and analysis of multivariate observations. In: Fifth Berkeley Symposium on Mathematical Statistics and Probability, vol. 1, no. 14, pp. 281–297. California Press, Berkely (1967)
Zhai, D.H., Yu, J., Gao, F.: K-means text clustering algorithm based on initial cluster centers selection according to maximum distance. Appl. Res. Comput. 31(3), 713–719 (2014)
de Amorim, R.C., Hennig, C.: Recovering the number of clusters in data sets with noise features using feature rescaling factors. Inf. Sci. 324, 126–145 (2015)
Teklehaymanot, F.K., Muma, M., Zoubir, A.M.: A novel Bayesian cluster enumeration criterion for unsupervised learning. IEEE Trans. Signal Process 66(20), 5392–5406 (2018)
Bensaid, A.M., Hall, L.O., Bezdek, J.C.: Validity-guided (re)clustering with applications to image segmentation. IEEE Trans. Fuzzy Syst. 4, 112–123 (1996)
Ren, M., Liu, P., Wang, Z., Yi, J.: A self-adaptive fuzzy c-means algorithm for determining the optimal number of clusters. Comput. Intell. Neurosci. 3–15 (2016)
Sweeney, T.E., Chen, A.C., Gevaert, O.: Combined mapping of multiple clustering algorithms (COMMUNAL): a robust method for selection of cluster number, K. Sci. Rep. 5, 16971 (2015)
Wang, M., Abrams, Z.B., Kornblau, S.M.: Thresher: determining the number of clusters while removing outliers. BMC Bioinformatics 19(1), 9 (2018)
Kingrani, S.K., Levene, M., Zhang, D.: Estimating the number of clusters using diversity. Artif. Intell. Res. 7(1), 15 (2018)
Doan, H., Nguyen, D.: A method for finding the appropriate number of clusters. Int. Arab J. Inf. Technol. 15(4), 675–682 (2018)
Wang, Y., Shi, Z., Guo, X., Liu, X., Zhu, E., Yin, J.: Deep embedding for determining the number of clusters. In: AAAI (2018)
Caliński, T., Harabasz, J.: A dendrite method for cluster analysis. Commun. Stat. 3(1), 1–27 (1974)
Bezdek, J.C.: Mathematical models for systematics and taxonomy. In: Eighth International Conference on Numerical Taxonomy, vol. 3, pp. 143–166 (1975)
Dave, R.N.: Validating fuzzy partitions obtained through c-shells clustering. Pattern Recognit. Lett. 17(6), 613–623 (1996)
Sugar, C.A., James, G.M.: Finding the number of clusters in a dataset: an information-theoretic approach. J. Am. Stat. Assoc. 98(463), 750–763 (2003)
Bezdek, J.C.: Cluster validity with fuzzy sets. J. Cybernet. 3(3), 58–73 (1973)
Pakhira, M.K., Bandyopadhyay, S., Maulik, U.: Validity index for crisp and fuzzy clusters. Pattern Recognit. 37(3), 487–501 (2004)
Zhao, Q., Xu, M., Fränti, P.: Sum-of-squares based cluster validity index and significance analysis. In: Kolehmainen, M., Toivanen, P., Beliczynski, B. (eds.) ICANNGA 2009. LNCS, vol. 5495, pp. 313–322. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-642-04921-7_32
Acknowledgement
This work is supported by the National Natural Science Foundation of China (61602156 and 61433012), the project of the Scientific and Technological in Henan province (172102310677), the project of the Basic and Frontier Technology in Henan province (142300 410147) and the PhD foundation of Henan Polytechnic university (B2012-099).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2020 ICST Institute for Computer Sciences, Social Informatics and Telecommunications Engineering
About this paper
Cite this paper
Zhang, X., He, Z., Jia, Z., Ren, J. (2020). Fast Estimation for the Number of Clusters. In: Wang, X., Leung, V.C.M., Li, K., Zhang, H., Hu, X., Liu, Q. (eds) 6GN for Future Wireless Networks. 6GN 2020. Lecture Notes of the Institute for Computer Sciences, Social Informatics and Telecommunications Engineering, vol 337. Springer, Cham. https://doi.org/10.1007/978-3-030-63941-9_27
Download citation
DOI: https://doi.org/10.1007/978-3-030-63941-9_27
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-63940-2
Online ISBN: 978-3-030-63941-9
eBook Packages: Computer ScienceComputer Science (R0)