Assessing Clustering Reliability and Features Informativeness by Random Permutations

Ceccarelli, Michele; Maratea, Antonio

doi:10.1007/978-3-540-74829-8_107

Assessing Clustering Reliability and Features Informativeness by Random Permutations

Michele Ceccarelli¹ &
Antonio Maratea¹

Conference paper

1218 Accesses
4 Citations

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 4694))

Abstract

Assessing the quality of a clustering’s outcome is a challenging task that can be cast in a number of different frameworks, depending on the specific subtask, like estimating the right clusters’ number or quantifying how much the data support the partition given by the algorithm. In this paper we propose a computational intensive procedure to evaluate: (i) the consistence of a clustering solution, (ii) the informativeness of each feature and (iii) the most suitable value for a parameter. The proposed approach does not depend on the specific clustering algorithm chosen, it is based on random permutations and produces an ensemble of empirical probability distributions of an index of quality. Looking to this ensemble it is possible to extract hints on how single features affect the clustering outcome, how consistent is the clustering result and what’s the most suitable value for a parameter (e.g. the correct number of clusters). Results on simulated and real data highlight a surprisingly effective discriminative power.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 129.00; Price excludes VAT (USA)

Softcover Book: USD 169.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Antoniol, G., Ceccarelli, M., Maratea, A., Russo, F.: classification of digital terrain models through fuzzy clustering: an application. In: Di Gesù, V., Masulli, F., Petrosino, A. (eds.) WILF 2003. LNCS (LNAI), vol. 2955, pp. 174–182. Springer, Heidelberg (2006)
Chapter Google Scholar
Archie, J.W.: A randomization Test for Phylogenetic information in Systematic Data. Syst. Zool. 38, 239–252 (1989)
Article Google Scholar
Baldi, P., Ceccarelli, M., Maratea, A.: An approach to multifactorial microarray data analysis. In: BITS 2007, Naples (2007)
Google Scholar
Ben-Hur, A., Elisseeff, A., Guyon, I.: A Stability Based Method for Discovering Structure in Clustered Data. In: Proceedings of the Pacific Symposium on Biocomputing Kaua’i, HI (2002)
Google Scholar
Bishop, C.M.: Neural Networks for Pattern Recognition. Clarendon Press, Oxford (1996)
MATH Google Scholar
Duda, R.O., Hart, P.E., Stork, D.G.: Pattern Classification. Wiley, New York (2001)
MATH Google Scholar
Dudoit, S., Fridlyand, J.: Bagging to improve the accuracy of a Clustering Procedure. Bioinformatics 19(9), 1090–1099 (2003)
Article Google Scholar
Fridlyand, J., Dudoit, S.: Applications of resampling methods to estimate the number of clusters and to improve the accuracy of a clustering method, Stat. Berkeley Tech. Report No. 600 (2001)
Google Scholar
Golub, T.R., Slonim, K., Tamayo, P., Huard, C., Gaasenbeek, M., Mesirov, J.P., Coller, H., Loh, M.L., Downing, J.R., Caligiuri, M.A., Bloomfield, C.D., Lande, E.S.: Molecular Classification of Cancer: Class Discovery and Class Prediction by Gene Expression Monitoring. Science 286, 531–537 (1999)
Article Google Scholar
Heer, J., Chi, E.: Mining the Structure of User Activity using Cluster Stability. In: Proceedings of the Workshop on Web Analytics, SIAM Conference on Data Mining (2002)
Google Scholar
McShane, L.M., Radmacher, M.D., Friedlin, B., Yu, R., Li, M.C., Simon, R.: Methods for assessing reproducibility of clustering patterns observed in analyses of microarray data. Bioinformatics 18, 1462–1469 (2002)
Article Google Scholar
Smolkin, M., Ghosh, D.: Cluster Stability Scores for Microarray Data in Cancer Studies. BMC Bioinformatics 4(36) (2003)
Google Scholar
Watanabe, S.: Knowing and Guessing: A Quantitative Study of Inference and Information. Wiley, New York (1969)
MATH Google Scholar

Download references

Author information

Authors and Affiliations

Research Centre On Software Technology, University of Sannio, via Traiano 11, Benevento, Italy
Michele Ceccarelli & Antonio Maratea

Authors

Michele Ceccarelli
View author publications
You can also search for this author in PubMed Google Scholar
Antonio Maratea
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Bruno Apolloni Robert J. Howlett Lakhmi Jain

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Ceccarelli, M., Maratea, A. (2007). Assessing Clustering Reliability and Features Informativeness by Random Permutations. In: Apolloni, B., Howlett, R.J., Jain, L. (eds) Knowledge-Based Intelligent Information and Engineering Systems. KES 2007. Lecture Notes in Computer Science(), vol 4694. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-74829-8_107

Download citation

DOI: https://doi.org/10.1007/978-3-540-74829-8_107
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-74828-1
Online ISBN: 978-3-540-74829-8
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics