Abstract
Comparing clustering algorithms is much more difficult than comparing classification algorithms, which is due to the unsupervised nature of the task and the lack of a precisely stated objective. We consider explorative cluster analysis as a predictive task (predict regions where data lumps together) and propose a measure to evaluate the performance on an hold-out test set. The performance is discussed for typical situations and results on artificial and real world datasets are presented for partitional, hierarchical, and density-based clustering algorithms. The proposed S-measure successfully senses the individual strengths and weaknesses of each algorithm.
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Kaufman, L., Rousseeuw, P.J.: Finding Groups in Data – An Introduction to Cluster Analysis. Wiley, Chichester (1989)
Everitt, B.S.: Cluster Analysis. Wiley, Chichester (1974)
Hartigan, J.A.: Clustering Algorithms. John Wiley & Sons, Chichester (1975)
Jain, A.K., Dubes, R.C.: Algorithms for Clustering Data. Prentice-Hall, Englewood Cliffs (1988)
Rand, W.M.: Objective Criteria for the Evaluation of Clustering Methods. Journal of the American Statistical Association 66(336), 846–850 (1971)
Bezdek, J.C.: Pattern Recognition with Fuzzy Objective Function Algorithms. Plenum Press, New York (1981)
Xie, X.L., Beni, G.: A Validity Measure for Fuzzy Clustering. IEEE Trans. on Pattern Analysis and Machine Intelligence 13(8), 841–847 (1991)
Fisher, D.H.: Knowledge Acquisition Via Incremental Conceptual Clustering. Machine Learning 2(2), 139–172 (1987)
Halkidi, M., Batistakis, Y., Vazirgiannis, M.: Clustering Validity Methods: Part I 31(2), 40–45 (2002)
Möller, U., Radke, D.: A Cluster Validity Approach based on Nearest Neighbour Resampling. In: Proc. 18th Int. Conf. Pattern Recognition, pp. 892–895 (2006)
Levine, E., Domany, E.: Resampling Methods for Unsupervised Estimation of Cluster Validity. Neural Computation 13, 2573–2595 (2001)
Borgelt, C., Kruse, R.: Finding the Number of Fuzzy Clusters by Resampling. In: IEEE Int. Conf. on Fuzzy Systems, pp. 48–54 (2006)
McQueen, J.B.: Some methods of classification and analysis of multivariate observations. In: Proc. of 5th Berkeley Symp. on Mathematical Statistics and Probability, pp. 281–297 (1967)
Ester, M., Kriegel, H.P., Sander, J., Xiaowei, X.: A density-based algorithm for discovering clusters in large spatial databases with noise. In: Proc. of the 2nd ACM SIGKDD Int. Conf. on Knowl. Discovery and Data Mining, Portland, Oregon, pp. 226–331 (1996)
Höppner, F.: Local pattern detection and clustering – are there substantive differences? In: Morik, K., Boulicaut, J.-F., Siebes, A. (eds.) Local Pattern Detection. LNCS (LNAI), vol. 3539, pp. 53–70. Springer, Heidelberg (2005)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2009 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Höppner, F. (2009). How Much True Structure Has Been Discovered?. In: Perner, P. (eds) Machine Learning and Data Mining in Pattern Recognition. MLDM 2009. Lecture Notes in Computer Science(), vol 5632. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-03070-3_29
Download citation
DOI: https://doi.org/10.1007/978-3-642-03070-3_29
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-03069-7
Online ISBN: 978-3-642-03070-3
eBook Packages: Computer ScienceComputer Science (R0)