Skip to main content

How Much True Structure Has Been Discovered?

Validating Explorative Clustering on a Hold-Out Test Set

  • Conference paper
Machine Learning and Data Mining in Pattern Recognition (MLDM 2009)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 5632))

  • 2399 Accesses

Abstract

Comparing clustering algorithms is much more difficult than comparing classification algorithms, which is due to the unsupervised nature of the task and the lack of a precisely stated objective. We consider explorative cluster analysis as a predictive task (predict regions where data lumps together) and propose a measure to evaluate the performance on an hold-out test set. The performance is discussed for typical situations and results on artificial and real world datasets are presented for partitional, hierarchical, and density-based clustering algorithms. The proposed S-measure successfully senses the individual strengths and weaknesses of each algorithm.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Similar content being viewed by others

References

  1. Kaufman, L., Rousseeuw, P.J.: Finding Groups in Data – An Introduction to Cluster Analysis. Wiley, Chichester (1989)

    MATH  Google Scholar 

  2. Everitt, B.S.: Cluster Analysis. Wiley, Chichester (1974)

    MATH  Google Scholar 

  3. Hartigan, J.A.: Clustering Algorithms. John Wiley & Sons, Chichester (1975)

    MATH  Google Scholar 

  4. Jain, A.K., Dubes, R.C.: Algorithms for Clustering Data. Prentice-Hall, Englewood Cliffs (1988)

    MATH  Google Scholar 

  5. Rand, W.M.: Objective Criteria for the Evaluation of Clustering Methods. Journal of the American Statistical Association 66(336), 846–850 (1971)

    Article  Google Scholar 

  6. Bezdek, J.C.: Pattern Recognition with Fuzzy Objective Function Algorithms. Plenum Press, New York (1981)

    Book  MATH  Google Scholar 

  7. Xie, X.L., Beni, G.: A Validity Measure for Fuzzy Clustering. IEEE Trans. on Pattern Analysis and Machine Intelligence 13(8), 841–847 (1991)

    Article  Google Scholar 

  8. Fisher, D.H.: Knowledge Acquisition Via Incremental Conceptual Clustering. Machine Learning 2(2), 139–172 (1987)

    Google Scholar 

  9. Halkidi, M., Batistakis, Y., Vazirgiannis, M.: Clustering Validity Methods: Part I 31(2), 40–45 (2002)

    Google Scholar 

  10. Möller, U., Radke, D.: A Cluster Validity Approach based on Nearest Neighbour Resampling. In: Proc. 18th Int. Conf. Pattern Recognition, pp. 892–895 (2006)

    Google Scholar 

  11. Levine, E., Domany, E.: Resampling Methods for Unsupervised Estimation of Cluster Validity. Neural Computation 13, 2573–2595 (2001)

    Article  MATH  Google Scholar 

  12. Borgelt, C., Kruse, R.: Finding the Number of Fuzzy Clusters by Resampling. In: IEEE Int. Conf. on Fuzzy Systems, pp. 48–54 (2006)

    Google Scholar 

  13. McQueen, J.B.: Some methods of classification and analysis of multivariate observations. In: Proc. of 5th Berkeley Symp. on Mathematical Statistics and Probability, pp. 281–297 (1967)

    Google Scholar 

  14. Ester, M., Kriegel, H.P., Sander, J., Xiaowei, X.: A density-based algorithm for discovering clusters in large spatial databases with noise. In: Proc. of the 2nd ACM SIGKDD Int. Conf. on Knowl. Discovery and Data Mining, Portland, Oregon, pp. 226–331 (1996)

    Google Scholar 

  15. Höppner, F.: Local pattern detection and clustering – are there substantive differences? In: Morik, K., Boulicaut, J.-F., Siebes, A. (eds.) Local Pattern Detection. LNCS (LNAI), vol. 3539, pp. 53–70. Springer, Heidelberg (2005)

    Chapter  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2009 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Höppner, F. (2009). How Much True Structure Has Been Discovered?. In: Perner, P. (eds) Machine Learning and Data Mining in Pattern Recognition. MLDM 2009. Lecture Notes in Computer Science(), vol 5632. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-03070-3_29

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-03070-3_29

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-03069-7

  • Online ISBN: 978-3-642-03070-3

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics