Skip to main content

Fast Estimation for the Number of Clusters

  • Conference paper
  • First Online:
6GN for Future Wireless Networks (6GN 2020)

Abstract

Clustering analysis has been widely used in many areas. In many cases, the number of clusters is required to been assigned artificially, while inappropriate assignments affect analysis negatively. Many solutions have been proposed to estimate the optimal number of clusters. However, the accuracy of those solutions drop severely on overlapping data sets. To handle the accuracy problem, we propose a fast estimation solution based on the cluster centers selected in a static way. In the solution, each data point is assigned with one score calculated according to a density-distance model. The score of each data point does not change any more once it is generated. The solution takes the top k data points with the highest scores as the centers of k clusters. It utilizes the significant change of the minimal distance between cluster centers to identify the optimal number of the clusters in overlapping data sets. The experiment results verify the usefulness and effectiveness of our solution.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

References

  1. Anil, K.: Data clustering: 50 years beyond K-means. Pattern Recogn. Lett. 31(8), 651–666 (2010)

    Article  Google Scholar 

  2. He, Z., Jia, Z., Zhang, X.: A fast method for estimating the number of clusters based on score and the minimum distance of the center point. Information 11, 16 (2020)

    Article  Google Scholar 

  3. Chen, Z.W., Chang, D.X.: Automatic clustering algorithm base on density difference. J. Softw. 29(4), 935–944 (2018)

    MATH  Google Scholar 

  4. Jia, R.Y., Li, Z.: The level of K-means clustering algorithm base on minimum spanning tree. Microelectron. Comput. 33(3), 86–93 (2016)

    Google Scholar 

  5. Ünlü, R., Xanthopoulos, P.: Estimating the number of clusters in a dataset via consensus clustering. Expert Syst. Appl. 125, 33–39 (2019)

    Article  Google Scholar 

  6. Bai, L., Cheng, X., Liang, J., Shen, H., Guo, Y.: Fast density clustering strategies based on the k-means algorithm. Pattern Recogn. 71, 375–386 (2017)

    Article  Google Scholar 

  7. Davies, D.L., Bouldin, D.W.: A cluster separation measure. IEEE Trans. Pattern Anal. Mach. Intell. 1(2), 224–227 (1979)

    Article  Google Scholar 

  8. Maulik, U., Bandyopadhyay, S.: Performance evaluation of some clustering algorithms and validity indices. IEEE Trans. Pattern Anal. Mach. Intell. 24(12), 1650–1654 (2002)

    Article  Google Scholar 

  9. Beni, G., Xie, X.: A validity measure for fuzzy clustering. IEEE Trans. Pattern Anal. Mach. Intell. 13(8), 841–847 (1991)

    Article  Google Scholar 

  10. Rodriguez, A., Laio, A.: Machine learning clustering by fast search and find of density peaks. Science 344(619), 1492 (2014)

    Article  Google Scholar 

  11. Gupta, A., Datta, S., Das, S.: Fast automatic estimation of the number of clusters from the minimum inter-center distance for k-means clustering. Pattern Recogn. Lett. 116, 72–79 (2018)

    Article  Google Scholar 

  12. He, L., Wu, L.D., Cai, Y.C.: Survey of clustering algorithms in data mining. Appl. Res. Comput. 71, 375–386 (2017)

    Google Scholar 

  13. Macqueen, J.: Some methods for classification and analysis of multivariate observations. In: Fifth Berkeley Symposium on Mathematical Statistics and Probability, vol. 1, no. 14, pp. 281–297. California Press, Berkely (1967)

    Google Scholar 

  14. Zhai, D.H., Yu, J., Gao, F.: K-means text clustering algorithm based on initial cluster centers selection according to maximum distance. Appl. Res. Comput. 31(3), 713–719 (2014)

    Google Scholar 

  15. de Amorim, R.C., Hennig, C.: Recovering the number of clusters in data sets with noise features using feature rescaling factors. Inf. Sci. 324, 126–145 (2015)

    Article  MathSciNet  Google Scholar 

  16. Teklehaymanot, F.K., Muma, M., Zoubir, A.M.: A novel Bayesian cluster enumeration criterion for unsupervised learning. IEEE Trans. Signal Process 66(20), 5392–5406 (2018)

    Article  MathSciNet  Google Scholar 

  17. Bensaid, A.M., Hall, L.O., Bezdek, J.C.: Validity-guided (re)clustering with applications to image segmentation. IEEE Trans. Fuzzy Syst. 4, 112–123 (1996)

    Article  Google Scholar 

  18. Ren, M., Liu, P., Wang, Z., Yi, J.: A self-adaptive fuzzy c-means algorithm for determining the optimal number of clusters. Comput. Intell. Neurosci. 3–15 (2016)

    Google Scholar 

  19. Sweeney, T.E., Chen, A.C., Gevaert, O.: Combined mapping of multiple clustering algorithms (COMMUNAL): a robust method for selection of cluster number, K. Sci. Rep. 5, 16971 (2015)

    Article  Google Scholar 

  20. Wang, M., Abrams, Z.B., Kornblau, S.M.: Thresher: determining the number of clusters while removing outliers. BMC Bioinformatics 19(1), 9 (2018)

    Article  Google Scholar 

  21. Kingrani, S.K., Levene, M., Zhang, D.: Estimating the number of clusters using diversity. Artif. Intell. Res. 7(1), 15 (2018)

    Article  Google Scholar 

  22. Doan, H., Nguyen, D.: A method for finding the appropriate number of clusters. Int. Arab J. Inf. Technol. 15(4), 675–682 (2018)

    Google Scholar 

  23. Wang, Y., Shi, Z., Guo, X., Liu, X., Zhu, E., Yin, J.: Deep embedding for determining the number of clusters. In: AAAI (2018)

    Google Scholar 

  24. Caliński, T., Harabasz, J.: A dendrite method for cluster analysis. Commun. Stat. 3(1), 1–27 (1974)

    MathSciNet  MATH  Google Scholar 

  25. Bezdek, J.C.: Mathematical models for systematics and taxonomy. In: Eighth International Conference on Numerical Taxonomy, vol. 3, pp. 143–166 (1975)

    Google Scholar 

  26. Dave, R.N.: Validating fuzzy partitions obtained through c-shells clustering. Pattern Recognit. Lett. 17(6), 613–623 (1996)

    Article  MathSciNet  Google Scholar 

  27. Sugar, C.A., James, G.M.: Finding the number of clusters in a dataset: an information-theoretic approach. J. Am. Stat. Assoc. 98(463), 750–763 (2003)

    Article  MathSciNet  Google Scholar 

  28. Bezdek, J.C.: Cluster validity with fuzzy sets. J. Cybernet. 3(3), 58–73 (1973)

    Article  MathSciNet  Google Scholar 

  29. Pakhira, M.K., Bandyopadhyay, S., Maulik, U.: Validity index for crisp and fuzzy clusters. Pattern Recognit. 37(3), 487–501 (2004)

    Article  Google Scholar 

  30. Zhao, Q., Xu, M., Fränti, P.: Sum-of-squares based cluster validity index and significance analysis. In: Kolehmainen, M., Toivanen, P., Beliczynski, B. (eds.) ICANNGA 2009. LNCS, vol. 5495, pp. 313–322. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-642-04921-7_32

    Chapter  Google Scholar 

Download references

Acknowledgement

This work is supported by the National Natural Science Foundation of China (61602156 and 61433012), the project of the Scientific and Technological in Henan province (172102310677), the project of the Basic and Frontier Technology in Henan province (142300 410147) and the PhD foundation of Henan Polytechnic university (B2012-099).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Jianji Ren .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2020 ICST Institute for Computer Sciences, Social Informatics and Telecommunications Engineering

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Zhang, X., He, Z., Jia, Z., Ren, J. (2020). Fast Estimation for the Number of Clusters. In: Wang, X., Leung, V.C.M., Li, K., Zhang, H., Hu, X., Liu, Q. (eds) 6GN for Future Wireless Networks. 6GN 2020. Lecture Notes of the Institute for Computer Sciences, Social Informatics and Telecommunications Engineering, vol 337. Springer, Cham. https://doi.org/10.1007/978-3-030-63941-9_27

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-63941-9_27

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-63940-2

  • Online ISBN: 978-3-030-63941-9

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics