Skip to main content

Assessing Clustering Reliability and Features Informativeness by Random Permutations

  • Conference paper

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 4694))

Abstract

Assessing the quality of a clustering’s outcome is a challenging task that can be cast in a number of different frameworks, depending on the specific subtask, like estimating the right clusters’ number or quantifying how much the data support the partition given by the algorithm. In this paper we propose a computational intensive procedure to evaluate: (i) the consistence of a clustering solution, (ii) the informativeness of each feature and (iii) the most suitable value for a parameter. The proposed approach does not depend on the specific clustering algorithm chosen, it is based on random permutations and produces an ensemble of empirical probability distributions of an index of quality. Looking to this ensemble it is possible to extract hints on how single features affect the clustering outcome, how consistent is the clustering result and what’s the most suitable value for a parameter (e.g. the correct number of clusters). Results on simulated and real data highlight a surprisingly effective discriminative power.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   129.00
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   169.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Antoniol, G., Ceccarelli, M., Maratea, A., Russo, F.: classification of digital terrain models through fuzzy clustering: an application. In: Di Gesù, V., Masulli, F., Petrosino, A. (eds.) WILF 2003. LNCS (LNAI), vol. 2955, pp. 174–182. Springer, Heidelberg (2006)

    Chapter  Google Scholar 

  2. Archie, J.W.: A randomization Test for Phylogenetic information in Systematic Data. Syst. Zool. 38, 239–252 (1989)

    Article  Google Scholar 

  3. Baldi, P., Ceccarelli, M., Maratea, A.: An approach to multifactorial microarray data analysis. In: BITS 2007, Naples (2007)

    Google Scholar 

  4. Ben-Hur, A., Elisseeff, A., Guyon, I.: A Stability Based Method for Discovering Structure in Clustered Data. In: Proceedings of the Pacific Symposium on Biocomputing Kaua’i, HI (2002)

    Google Scholar 

  5. Bishop, C.M.: Neural Networks for Pattern Recognition. Clarendon Press, Oxford (1996)

    MATH  Google Scholar 

  6. Duda, R.O., Hart, P.E., Stork, D.G.: Pattern Classification. Wiley, New York (2001)

    MATH  Google Scholar 

  7. Dudoit, S., Fridlyand, J.: Bagging to improve the accuracy of a Clustering Procedure. Bioinformatics 19(9), 1090–1099 (2003)

    Article  Google Scholar 

  8. Fridlyand, J., Dudoit, S.: Applications of resampling methods to estimate the number of clusters and to improve the accuracy of a clustering method, Stat. Berkeley Tech. Report No. 600 (2001)

    Google Scholar 

  9. Golub, T.R., Slonim, K., Tamayo, P., Huard, C., Gaasenbeek, M., Mesirov, J.P., Coller, H., Loh, M.L., Downing, J.R., Caligiuri, M.A., Bloomfield, C.D., Lande, E.S.: Molecular Classification of Cancer: Class Discovery and Class Prediction by Gene Expression Monitoring. Science 286, 531–537 (1999)

    Article  Google Scholar 

  10. Heer, J., Chi, E.: Mining the Structure of User Activity using Cluster Stability. In: Proceedings of the Workshop on Web Analytics, SIAM Conference on Data Mining (2002)

    Google Scholar 

  11. McShane, L.M., Radmacher, M.D., Friedlin, B., Yu, R., Li, M.C., Simon, R.: Methods for assessing reproducibility of clustering patterns observed in analyses of microarray data. Bioinformatics 18, 1462–1469 (2002)

    Article  Google Scholar 

  12. Smolkin, M., Ghosh, D.: Cluster Stability Scores for Microarray Data in Cancer Studies. BMC Bioinformatics 4(36) (2003)

    Google Scholar 

  13. Watanabe, S.: Knowing and Guessing: A Quantitative Study of Inference and Information. Wiley, New York (1969)

    MATH  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Bruno Apolloni Robert J. Howlett Lakhmi Jain

Rights and permissions

Reprints and permissions

Copyright information

© 2007 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Ceccarelli, M., Maratea, A. (2007). Assessing Clustering Reliability and Features Informativeness by Random Permutations. In: Apolloni, B., Howlett, R.J., Jain, L. (eds) Knowledge-Based Intelligent Information and Engineering Systems. KES 2007. Lecture Notes in Computer Science(), vol 4694. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-74829-8_107

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-74829-8_107

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-74828-1

  • Online ISBN: 978-3-540-74829-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics