Development of assessment criteria for clustering algorithms

Salem, Sameh A.; Nandi, Asoke K.

doi:10.1007/s10044-007-0099-1

Development of assessment criteria for clustering algorithms

Theoretical Advances
Published: 26 January 2008

Volume 12, pages 79–98, (2009)
Cite this article

Pattern Analysis and Applications Aims and scope Submit manuscript

Sameh A. Salem¹ &
Asoke K. Nandi¹

398 Accesses
Explore all metrics

Abstract

In this paper, new measures—called clustering performance measures (CPMs)—for assessing the reliability of a clustering algorithm are proposed. These CPMs are defined using a validation measure, which determines how well the algorithm works with a given set of parameter values, and a repeatability measure, which is used for studying the stability of the clustering solutions and has the ability to estimate the correct number of clusters in a dataset. These proposed CPMs can be used to evaluate clustering algorithms that have a structure bias to certain types of data distribution as well as those that have no structure biases. Additionally, we propose a novel cluster validity index, V _I index, which is able to handle non-spherical clusters. Five clustering algorithms on different types of real-world data and synthetic data are evaluated. The first dataset type refers to a communications signal dataset representing one modulation scheme under a variety of noise conditions, the second represents two breast cancer datasets, while the third type represents different synthetic datasets with arbitrarily shaped clusters. Additionally, comparisons with other methods for estimating the number of clusters indicate the applicability and reliability of the proposed cluster validity V _I index and repeatability measure for correct estimation of the number of clusters.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Normalised Clustering Accuracy: An Asymmetric External Cluster Validity Measure

Article Open access 28 June 2024

Estimating the number of clusters in multivariate data by various fittings of the L-curve

Article 18 November 2024

Comparing High-Dimensional Partitions with the Co-clustering Adjusted Rand Index

Article 14 November 2020

Notes

This paper is an extension of [24] and contains further investigations and experimental results. The current manuscript represents a significant extension.
Software codes for the CPMs may be available on request from sameh.salem@liverpool.ac.uk

References

Webb AR (2003) Statistical pattern recognition. Wiley, New York
Theodoridis S, Koutroubas K (2003) Pattern recognition. Academic Press, New York
Halkidi M, Batistakis Y, Vazirgiannis M (2002) Cluster validity methods: Part I, SIGMOD. Record 31(2):40–45
Google Scholar
Halkidi M, Batistakis Y, Vazirgiannis M (2002) Cluster validity methods: Part II, SIGMOD. Record 31(3):19–27
Google Scholar
Davies DL, Bouldin DW (1979) A cluster separation measure. IEEE Trans Pattern Anal Mach Intell 1:224–227
Google Scholar
Dunn JC (1973) A fuzzy relative of ISODATA process and its use in detecting compact well-separated clusters. J Cybern 3:32–57
Article MATH MathSciNet Google Scholar
Calinski RB, Harabasz J (1974) A dendrite method for cluster analysis. Commun Stat 3:1–27
Article MathSciNet Google Scholar
Maulik U, Bandyopadhyay S (2002) Performance evaluation of some clustering algorithms and validity indices. IEEE Trans Pattern Anal Mach Intell 24(12):1650–1654
Article Google Scholar
Milligan GW, Cooper C (1985) An examination of procedures for determining the number of clusters in a data set. Psychometrika 50(2):159–179
Article Google Scholar
Fraley C, Raftery AE (1998) How many clusters? Which clustering method? Answers via model-based cluster analysis. Comput J 41(8):578–588
Article MATH Google Scholar
Bezdek JC, Pal NR (1998) Some new indexes of cluster validity. IEEE Trans Syst Man Cybern Part B 28(3):301–315
Article Google Scholar
Xie XL, Beni G (1991) A validity measure for fuzzy clustering. IEEE Trans Pattern Anal Mach Intell 13:841–847
Article Google Scholar
Chou C, Su M, Lai E (2004) A new cluster validity measure and its application to image compression. Pattern Anal Appl 7:205–220
MathSciNet Google Scholar
Tibshirani R, Walther G, Hastie T (2001) Estimating the number of clusters via the gap statistic. J R Stat Soc B 63(2):411–423
Article MATH MathSciNet Google Scholar
Jain AK, Dubes RC (1988) Algorithms for clustering data. Prentice-Hall, Englewood Cliffs
MATH Google Scholar
Law MH, Jain AK (2003) Cluster validity by bootstrapping partitions. Technical report MSU-CSE-03-5, Department of Computer Science and Engineering, Michigan State University
Lange T, Braun M, Buhmann JM (2004) Stability-based validation of clustering solutions. Neural Comput 16:1299–1323
Article MATH Google Scholar
Ben-Hur A, Elisseeff A, Guyon I (2002) A stability based method for discovering structure in clustered data. In: Pacific symposium on biocomputing. World Scientific, Singapore, pp 6–17
Levine E, Domany E (2001) Resampling method for unsupervised estimation of cluster validity. Neural Comput 13:2573–2593
Article MATH Google Scholar
Jain A, Morean J (1987) Bootstrap techniques in cluster analysis. Pattern Recognit 20:547–568
Article Google Scholar
Tibshirani R, Walther G, Botstein D, Brown P (2001) Cluster validation by prediction strength. Technical report, Statistics Department, Stanford University, Stanford, CA
Dudoit S, Fridlyand JA (2002) Prediction-based resampling method for estimating the number of clusters in a data set. Genome Biol 3(7). Available online: http://genomebiology.com/2002/317/research/0036
Lange T, Braun M, Roth V, Buhmann JM (2002) Stability-based model selection. Adv Neural Inf Process Syst 15:617–624
Google Scholar
Salem SA, Nandi AK (2005) New assessment criteria for clustering algorithms. In: Proceedings of the IEEE international workshop on machine learning for signal processing, Mystic, CT, USA, pp 285–290
Proakis JG (2001) Digital communications. McGraw-Hill, Boston
Google Scholar
UCI Machine Learning. http://www.ics.uci.edu/∼mlearn/MLRepository.html
Ng AY, Jordan MI, Weiss Y (2002) On spectral clustering: analysis and an algorithm. In: Advances in neural information processing systems, vol 14. MIT Press, Cambridge
Cormen TH, Leiserson CE, Rivest LR, Stein C (2001) Introduction to algorithms. ISBN 10:0-262-03293-7. The MIT Press, London
Fischer B, Buhmann JM (2003) Path based clustering for grouping smooth curves and texture segmentation. IEEE Trans Pattern Anal Mach Intell 25:1–6
Article Google Scholar
Kaufman L, Rousseeuw P (1990) Finding groups in data: an introduction to cluster analysis. Wiley, New York
Google Scholar
Fonseca JRS, Cardoso MGMS (2007) Mixture-model cluster analysis using information theoretical criteria. Intell Data Anal 11:155–173
Google Scholar
Kverh B, Leonardis A (2004) A generalisation of model selection criteria. Pattern Anal Appl 7:51–65
Article MathSciNet Google Scholar
Hu T, Sung Y (2005) Clustering spatial data with a hybrid EM approach. Pattern Anal Appl 8:139–148
Article MathSciNet Google Scholar
Hu X, Xu L (2004) Investigation on several model selection criteria for determining the number of clusters. Neural Inf Process Lett Rev 4:1–10
Google Scholar
Jain AK, Murty MN, Flyn PJ (1999) Data clustering: a review. ACM Comput Surv 31(3):264–323
Article Google Scholar
Han J, Kamber M (2001) Data mining: concepts and techniques. Morgan Kaufmann, San Francisco
Kohonen T (1997) Self-organizing maps. Springer, Heidelberg
Chen T, Chen L-K, MA K-K (1999) Colour image indexing using SOM for region-of-interest retrieval. Pattern Anal Appl 2(2):164–171
Article MathSciNet Google Scholar
Zhang S, Ganesan R, Xistris GD (1996) Self-organizing neural networks for automated machinery monitoring systems. Mech Syst Signal Process 10(5):517–532
Article Google Scholar
Chen GW, Luo JB, Parker KJ (1998) Image segmentation via adaptive k means clustering and knowledge-based morphological operations with biomedical operations. IEEE Trans Image Process 7(12):1673–1683
Article Google Scholar
Frigui H (2005) Unsupervised learning of arbitrarily shaped clusters using ensembles of Gaussian models. Pattern Anal Appl 8:32–49
Article MathSciNet Google Scholar
Pelleg D, Moore AW (2000) X-means: extending K-means with efficient estimation of the number of clusters. In: Seventeenth international conference on machine learning. Morgan Kaufmann, San Francisco, pp 727–734
Bezdek JC (1981) Pattern recognition with fuzzy objective function algorithm. Plenum Press, New York
Yang MS (1993) A survey of fuzzy clustering. Math Comput Modell 18:1–16
Article MATH Google Scholar
Baraldi A, Blonda P (1999) A survey of fuzzy clustering algorithms for pattern recognition. IEEE Trans Syst Man Cybern Part B 29(6):778–801
Article Google Scholar
Xu W, Nandi AK, Zhang J (2003) Novel fuzzy reinforcement learning vector quantization algorithm and its application in image compression. IEEE Proc Vis Image Signal Process 150(5):292–298
Article Google Scholar
Ester M, Kriegel H, Sander J, Xu X (1996) A density-based algorithm for discovering clusters in large spatial databases with noise. In: Proceedings of the second international conference on knowledge discovery and data mining, (KDD). Portland,OR, USA, pp 226–231
Ankerst M, Breuing M, Kriegel H, Sander J (1996) OPTICS: ordering points to identify the clustering structure. In: Proceedings of the international conference on management of data, (SIGMOD). ACM Press, Philadelphia 28(2):49–60
Jack LB, Nandi AK (2004) Microarray data using the self organising oscillator network. In: Proceedings of EUSIPCO 2004, Vienna, Austria, pp 2183–2186
Von Luxburg U (2006) A tutorial on spectral clustering. Max Planck Institute for Biological Cybernetics. Technical report no. TR-149

Download references

Acknowledgments

The authors would like to acknowledge the financial support of the Egyptian Ministry of Higher Eduction, Egypt, for S. A. Salem and many fruitful discussions with Dr. L. B. Jack formerly of the University of Liverpool.

Author information

Authors and Affiliations

Signal Processing and Communications Group, Department of Electrical Engineering and Electronics, The University of Liverpool, Brownlow Hill, Liverpool, L69 3GJ, UK
Sameh A. Salem & Asoke K. Nandi

Authors

Sameh A. Salem
View author publications
You can also search for this author inPubMed Google Scholar
Asoke K. Nandi
View author publications
You can also search for this author inPubMed Google Scholar

Corresponding author

Correspondence to Sameh A. Salem.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Salem, S.A., Nandi, A.K. Development of assessment criteria for clustering algorithms. Pattern Anal Applic 12, 79–98 (2009). https://doi.org/10.1007/s10044-007-0099-1

Download citation

Received: 29 November 2006
Accepted: 04 December 2007
Published: 26 January 2008
Issue Date: February 2009
DOI: https://doi.org/10.1007/s10044-007-0099-1

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Development of assessment criteria for clustering algorithms

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Normalised Clustering Accuracy: An Asymmetric External Cluster Validity Measure

Estimating the number of clusters in multivariate data by various fittings of the L-curve

Comparing High-Dimensional Partitions with the Co-clustering Adjusted Rand Index

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now