Abstract
As humans, we have innate faculties that allow us to efficiently segment groups of objects. Computers, to some degree, can be programmed with similar categorical capabilities, which stem from exploratory data analysis. Out of the various subsets of data reasoning, clustering provides insight into the structure and relationships of input samples situated in a number of distributions. To determine these relationships, many clustering methods rely on one or more human inputs; the most important being the number of distributions, c, to seek. This work investigates a technique for estimating the number of clusters from a general type of data called relational data. Several numerical examples are presented to illustrate the effectiveness of the proposed method.












Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.References
Anderson DT, Luke RH, Keller JM, Skubic M (2008a) Modeling human activity from voxel person using fuzzy logic. IEEE Trans Fuzzy Syst (to appear)
Anderson DT, Luke RH, Keller JM, Skubic M, Rantz M, Aud M (2008b) Linguistic summarization of activities for fall detection using voxel person and fuzzy logic. Comp Vis Image Underst
Asuncion A, Newman DJ (2007) UCI machine learning repository. http://archive.ics.uci.edu/ml/
Atiquzzaman M (1992) Multiresolution Hough transform—an efficient method of detecting patterns in images. IEEE Trans Pattern Anal Mach Intell 14:1090–1095
Baumgartner R, Somorajai R, Summers R, Richter W, Ryner L (2000) Correlator beware: correlation has limited selectivity for fMRI data analysis. NeuroImage 12:240–243
Baumgartner R, Somorajai R, Summers R, Richter W (2001) Ranking fMRI time courses by minimum spanning trees: assessing coactivation in fMRI. NeuroImage 13:734–742
Bezdek JC (1981) Pattern recognition with fuzzy objective function algorithms. Plenum, New York
Bezdek JC, Hathaway RJ (2002) VAT: a tool for visual assessment of (cluster) tendency. In: Proceedings of the IEEE joint conference on neural networks
Bezdek JC, Pal NR (1998) Some new indexes of cluster validity. IEEE Trans Syst Man Cybern 28:301–315
Bezdek JC, Hathaway RJ, Huband JM (2005) bigVAT: visual assessment of cluster tendency for large datasets. Pattern Recogn 38:1875–1886
Bezdek JC, Hathaway RJ, Huband JM (2006) Visual assessment of clustering tendency for rectangular dissimilarity matrices. IEEE Trans Fuzzy Syst 15:890–903
Borg I, Lingoes J (1987) Multidimensional similiarity structure analysis. Springer, New York
Canny J (1986) A computational approach to edge detection. IEEE Trans Pattern Anal Mach Intell 8:679–698
Cattell RB (1944) A note on correlation clusters and cluster search methods. Psychometrika 9:169–184
Cleveland WS (1993) Visualizing data. Hobart Press, Summit
Dhillion I, Modha D, Spranger W (2000) Visualizing class structure of multidimensional data. In: Proceedings of the 30th symposium on the interface: computing science and statistics
Everitt BS (1978) Graphical techniques for multivariate data. Heinemann, London
Floodgate GD, Hayes PR (1963) The Adansonian taxonomy of some yellow pigmented marine bacteria. J Gen Microbiol 30:237–244
Gene Ontology Consortium (2004) The gene ontology (GO) database and informatics resource. Nucleic Acids Res 32:D258–D261
Gonzalez RC, Woods RE (2002) Digital image processing. Prentice-Hall, Upper Saddle River
Hathaway RJ, Bezdek JC (1994) NERF c-means: non-Euclidean relational fuzzy clustering. Pattern Recogn 27:429–437
Hathaway RJ, Bezdek JC (2006) Visual cluster validity for prototype generator clustering models. Pattern Recogn Lett 24:1563–1569
Hathaway RJ, Bezdek JC, Huband JM (2005) Scalable visual assessment of cluster tendency. Pattern Recogn 39:1315–1324
Havens TC, Bezdek JC, Keller JM, Popescu M (2008a) Dunn’s cluster validity index as a contrast measure of VAT images. In: Proceedings of the IEEE international conference on pattern recognition
Havens TC, Bezdek JC, Keller JM, Popescu M, Huband JM (2008b) Is VAT really single linkage in disguise? Ann Math Artif Intell (in review)
Huband JM, Bezdek JC (2008) VCV—visual cluster validity. Pattern Recogn (in review)
Jain AK, Dubes RC (1988) Algorithms for clustering data. Prentice-Hall, Englewood Cliffs
Johnson RA, Wichern DA (1992) Applied multivariate statistical analysis, 3rd edn. Prentice-Hall, Englewood Cliffs
Kendall M, Gibbons JD (1990) Rank correlation methods. Oxford University Press, New York
Ling RF (1973) A computer generated aid for cluster analysis. Commun ACM 16:355–361
Otsu N (1979) A threshold selection method from gray-level histograms. IEEE Trans Syst Man Cybern 9:757–763
Popescu M, Keller JM, Mitchell JA, Bezdek JC (2004) Functional summarization of gene product clusters using gene ontology similarity measures. In: Proceedings of the IEEE international conference on intelligent sensors. Sens Netw Inform Process
Saha PK, Udupa JK (2001) Optimum image thresholding via class uncertainty and region homogenity. IEEE Trans Pattern Anal Mach Intell 23:689–706
Sledge IJ, Keller JM (2008) Growing neural gas for temporal clustering. In: Proceedings of the IEEE international conference on pattern recognition
Sledge IJ, Havens TC, Bezdek JC, Keller JM (2008a) Partitioning ordered dissimilarity data. IEEE Trans Knowl Data Eng (in review)
Sledge IJ, Huband JM, Bezdek JC (2008b) Automatic cluster count extraction from unlabeled datasets. In: Proceedings of the IEEE conference on fuzzy systems and knowledge discovery
Sledge IJ, Keller JM, Alexander GL (2008c) Emergent trend detection in diurnal activity. In: Proceedings of the IEEE engineering in biology and medicine conference
Sledge IJ, Keller JM, Havens TC, Alexander GL, Skubic M (2008d) Temporal activity analysis. In: Proceedings of the association for the advancement of artificial intelligence
Sledge IJ, Havens TC, Keller JM, Bezdek JC (2009) Relational generalizations of validity indexes. IEEE Trans Syst Man Cybern (in review)
Sneath P (1957) A computer approach to numerical taxonomy. J Gen Microbiol 17:201–226
Strehl A, Ghosh J (2000a) A scalable approach to balanced, high-dimensional clustering of market-baskets. In: Proceedings of the international conference on high performance computing
Strehl A, Ghosh J (2000b) Value-based customer grouping from large retail data-sets. In: Proceedings of the SPIE conference on data mining and knowledge discovery
Theodoridis S, Koutroumbas K (2003) Pattern recognition, 2nd edn. Elsevier, New York
Tran-Luu TD (1996) Mathematical concepts and novel heuristic methods for data clustering and visualization. Ph.D. thesis, University of Maryland, College Park
Tryon RC (1939) Cluster analysis. Edwards Bros., Ann Arbor
Tukey JW (1977) Exploratory data analysis. Addison-Wesley, Reading
Wang W, Zhang Y (2007) On fuzzy cluster validity indices. Fuzzy Sets Syst 158:2095–2117
Acknowledgments
This work was funded by the National Science Foundation under ITR grant number IIS-0428420. The authors would also like to thank the reviewers for their insightful comments that helped to improve the quality of this manuscript.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Sledge, I.J., Havens, T.C., Huband, J.M. et al. Finding the number of clusters in ordered dissimilarities. Soft Comput 13, 1125–1142 (2009). https://doi.org/10.1007/s00500-009-0421-5
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00500-009-0421-5