Skip to main content
Log in

A cluster validity evaluation method for dynamically determining the near-optimal number of clusters

  • Methodologies and Application
  • Published:
Soft Computing Aims and scope Submit manuscript

Abstract

Cluster validity evaluation is a hot issue in clustering algorithm research. Aiming at determining the optimal number of clusters in cluster validity evaluation, this paper proposes a new cluster validity index Ratio of Deviation of Sum-of-squares and Euclid distance (RDSED), and designs a cluster validity evaluation method based on RDSED which is suitable to dynamically determine the near-optimal number of clusters. Firstly, based on the analysis of the relationships of the intra-class and inter-class, the concepts of sum-of-squares of within-cluster, sum-of-squares of between-cluster, total sum-of-squares, sum of intra-cluster distance and average distance between clusters are proposed, and then a cluster validity index RDSED based on these concepts is constructed. Secondly, a cluster validity evaluation method based on RDSED for dynamically determining the near-optimal number of clusters is designed. In this method, RDSED value is calculated from large to small in the range of clustering number and this index value is used to dynamically terminate the clustering validity verification process, and finally the near-optimal number of clusters and clustering partition results are obtained. Experiment results of artificial datasets and real datasets show that, compared with some classical clustering validity evaluation method, the proposed cluster validity evaluation method can obtain the near-optimal number of clusters that is closest to the real cluster number in most cases and can effectively evaluate clustering partition results.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12

Similar content being viewed by others

Notes

  1. http://archive.ics.uci.edu/ml/index.html.

  2. http://download.csdn.net/detail/u010459260/7099771.

References

  • Bakshi S, Jagadev AK, Dehuri S, Wang G-N (2014a) Enhancing scalability and accuracy of recommendation systems using unsupervised learning and particle swarm optimization. Appl Soft Comput 15:21–29

    Google Scholar 

  • Bakshi S, Jagadev AK, Dehuri S, Wang G-N (2014b) Enhancing scalability and accuracy of recommendation systems using unsupervised learning and particle swarm optimization. Appl Soft Comput 15:21–29

    Google Scholar 

  • Cagnina L, Errecalde M, Ingaramo D, Rosso P (2014) An efficient particle swarm optimization approach to cluster short texts. Inf Sci 265:36–49

    Google Scholar 

  • Calinski T, Harabasz J (1974) A dendrite method for cluster analysis. Commun Stat 3:1–27

    MathSciNet  MATH  Google Scholar 

  • Campo DN, Stegmayer G, Milone DH (2016a) A new index for clustering validation with overlapped clusters. Expert Syst Appl 64:549–556

    Google Scholar 

  • Campo DN, Stegmayer G, Milone DH (2016b) A new index for clustering validation with overlapped clusters. Expert Syst Appl 64:549–556

    Google Scholar 

  • Davies DL, Bouldin DW (1979) A clustering separation measure. IEEE Trans Pattern Anal Mach Intell 1:224–227

    Google Scholar 

  • Draszawka K, Szymański J (2011) External validation measures for nested clustering of text documents. Stud Computat Intell 369:207–225

    Google Scholar 

  • Dunn JC (1973) A fuzzy relative of the ISODATA process and its use in detecting compact well-separated clusters. J Cybern 3:32–57

    MathSciNet  MATH  Google Scholar 

  • Gurrutxaga I, Muguerza J, Arbelaitz O, Perez JM, Martin JI (2011) Towards a standard methodology to evaluate internal cluster validity indices. Pattern Recognit Lett 32:505–515

    Google Scholar 

  • Halkidi M, Batistakis Y, Vazirgiannis M (2001) On clustering validation techniques. J Intell Inf Syst 17(2–3):107–145

    MATH  Google Scholar 

  • Haouas F, Dhiaf ZB, Hammouda A, Solaiman B (2017a) A new efficient fuzzy cluster validity index: application to images clustering. In: IEEE international conference on fuzzy systems. pp 1–6

  • Haouas F, Dhiaf ZB, Hammouda A, Solaiman B (2017b) A new efficient fuzzy cluster validity index: application to images clustering. In: IEEE international conference on fuzzy systems. pp 1–6

  • Hartigan J (1975) Clustering algorithms. Wiley, NewYork

    MATH  Google Scholar 

  • Holzinger KJ, Harman HH (1941) Factor analysis. University of Chicago Press, Chicago

    MATH  Google Scholar 

  • Huang H, Ma Y (2019) A hybrid clustering approach for bag-of-words image categorization. Math Probl Eng 2019:1–11. https://doi.org/10.1155/2019/4275720

    Google Scholar 

  • Ilham A, Wahono RS, Supriyanto C, Wijaya A (2019) U-control chart based differential evolution clustering for determining the number of cluster in k-means. Int J Intell Eng Syst 2019(12):306–316

    Google Scholar 

  • Kashyap Manish, Bhattacharya Mahua (2017) A density invariant approach to clustering. Neural Comput Appl 28:1695–1713

    Google Scholar 

  • Kole DK, Halder A (2010) An efficient dynamic image segmentation algorithm using a hybrid technique based on particle S warm optimization and genetic algorithm. In: 2010 international conference on advances in computer engineering. pp 252–255

  • Krzanowski W, Lai Y (1985) A criterion for determining the number of groups in a data set using sum of squares clustering. Biometrics 44:23–34

    MathSciNet  MATH  Google Scholar 

  • Kuo RJ, Syu YJ, Chen Z-Y, Tien FC (2012) Integration of particle swarm optimization and genetic algorithm for dynamic clustering. Inf Sci 195:124–140

    Google Scholar 

  • Lee JS, Olafsson S (2011) Data clustering by minimizing disconnectivity. Inf Sci 181:732–746

    Google Scholar 

  • Lee SH, Jeong YS, Kim JY, Jeong MK (2018) A new clustering validity index for arbitrary shape of clusters. Pattern Recognit Lett 112:263–269

    Google Scholar 

  • Li H, He H, Wen Y (2015) Dynamic particle swarm optimization and K-means clustering algorithm for image segmentation. Optik 126:4817–4822

    Google Scholar 

  • MacQueen J (1967) Some methods for classification and analysis of multivariate observations. In: Proceedings of the fifth berkeley symposium on mathematical statistics and probability, volume 1: Statistics, Berkeley, Calif., pp 281–297

  • Naïja Y, Sinaoui KB (2012) Interpretability-based validity methods for clustering results evaluation. J Intell Inf Syst 39(1):109–139

    Google Scholar 

  • Naldi M, Carvalho A, Campello R (2013) Cluster ensemble selection based on relative validity indexes. Data Min Knowl Discov 27(2):259–289

    MathSciNet  MATH  Google Scholar 

  • Omran MG, Salman A, Engelbrecht AP (2005) Dynamic clustering using particle swarm optimization with application in image segmentation. Pattern Anal Appl 8(4):332–344

    MathSciNet  Google Scholar 

  • Pfitzner D, Leibbrandt R, Powers D (2009) Characterization and evaluation of similarity measures for pairs of clusterings. Knowl Inf Syst 19:361–394

    Google Scholar 

  • Rand WM (1971) Objective criteria for the evaluation of clustering methods. J Am Stat Assoc 66(336):846–850

    Google Scholar 

  • Rezaee MR, Lelieveldt BPF, Reiber JHC (1998) A new cluster validity index for the fuzzy c-means. Pattern Recognit Lett 19(3–4):237–246

    MATH  Google Scholar 

  • Rojas-Thomas JC, Santos M, Mora M (2017) New internal index for clustering validation based on graphs. Expert Syst Appl 86:334–349

    Google Scholar 

  • Rousseeuw P (1987) Silhouettes: a graphical aid to the interpretation and validation of cluster analysis. J Comput Appl Math 20:53–65

    MATH  Google Scholar 

  • Saha S, Bandyopadhyay S (2012) Some connectivity based cluster validity indices. Appl Soft Comput 12:1555–1565

    Google Scholar 

  • Salehian S, Subraminiam SK (2015) Unequal clustering by improved particle swarm optimization in wireless sensor network. Procedia Comput Sci 62:403–409

    Google Scholar 

  • Sneath PHA, Sokal RR (1973) Numerical taxonomy, books in biology. W.H. Freeman and Company, San Francisco

    MATH  Google Scholar 

  • Strehl A (2002) Relationship-based clustering and cluster ensembles for high-dimensional data mining. Doctoral dissertation. The University of Texas, Austin

  • UC Irvine Machine Learning Repository. http://archive.ics.uci.edu/ml/index.html

  • Yang L, Bezdek JC, Romano S, Vinh NX, Chan J, Bailey J (2017) Ground truth bias in external cluster validity indices. Pattern Recognit 65:58–70

    Google Scholar 

  • Zhao Q, Xu M, Fränti P (2009a) Sum-of-square based cluster validity index and significance analysis. In: Proceedings of the 17th international conference on adaptive and natural computing algorithms. pp 313–322

  • Zhao Q, Xu M, Fränti P (2009b) Sum-of-square based cluster validity index and significance analysis. In: Proceedings of the 17th international conference on adaptive and natural computing algorithms. pp 313–322

  • Zhou ZH (2016) Machine learning. Tsinghua University Press, Beijing, pp 214–217

    Google Scholar 

  • Zhou S, Xu Z (2018) A novel internal validity index based on the cluster centre and the nearest neighbour cluster. Appl Soft Comput 71:78–88

    Google Scholar 

Download references

Acknowledgements

The authors would like to thank all the referees for their constructive and insightful comments on this paper.

Funding

This study was funded by National Natural Science Foundation of China (No. 61862042, 61762062, 61601215, 61862044); Science and Technology Innovation Platform Project of Jiangxi Province (No.20181BCD40005); Major Discipline Academic and Technical Leader Training Plan Project of Jiangxi Province(No.20172BCB22030); Primary Research & Development Plan Project of Jiangxi Province (No. 20192BBE50075, 20181ACE50033, 20171BBE50064, 2013ZBBE50018); Natural Science Foundation of Jiangxi Province (No. 20192BAB207019, 20192BAB207020, 20171BAB202027); and Graduate Innovation Fund Project of Jiangxi Province (No.YC2019-S100, YC2019-S048).

Author information

Authors and Affiliations

Authors

Corresponding authors

Correspondence to Xiangjun Li or Xinping Zhang.

Ethics declarations

Conflict of interest

All authors declare that they have no conflict of interest.

Ethical approval

This article does not contain any studies with human participants or animals performed by any of the authors.

Informed consent

Informed consent was obtained from all individual participants included in the study.

Additional information

Communicated by V. Loia.

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Li, X., Liang, W., Zhang, X. et al. A cluster validity evaluation method for dynamically determining the near-optimal number of clusters. Soft Comput 24, 9227–9241 (2020). https://doi.org/10.1007/s00500-019-04449-7

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00500-019-04449-7

Keywords

Navigation