Abstract
The validation of clusters discovered in bio-molecular data is a central issue in bioinformatics. Recently, stability-based methods have been successfully applied to the analysis of the reliability of clusterings characterized by a relatively low number of examples and clusters. Nevertheless, several problems in functional genomics are characterized by a very large number of examples and clusters. We present a stability-based algorithm to discover significant clusters in hierarchical clusterings with a large number of examples and clusters. Preliminary results on gene expression data of patients affected by Human Myeloid Leukemia, show how to apply the proposed method when thousands of gene clusters are involved.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Datta, S.: Comparison and validation of statistical clustering techniques for microarray gene expression data. Bioinformatics 19, 459–466 (2003)
Napolitano, F., Raiconi, G., Tagliaferri, R., Ciaramella, A., Staiano, A., Miele, G.: Clustering and visualization approaches for human cell cycle gene expression data analysis. Int. J. Approx. Reasoning 47, 70–84 (2008)
Handl, J., Knowles, J., Kell, D.: Computational cluster validation in post-genomic data analysis. Bioinformatics 21, 3201–3215 (2005)
Bolshakova, N., Azuaje, F., Cunningham, P.: An integrated tool for microarray data clustering and cluster validity assessment. Bioinformatics 21, 451–455 (2005)
Kerr, M., Curchill, G.: Bootstrapping cluster analysis: assessing the reliability of conclusions from microarray experiments. PNAS 98, 8961–8965 (2001)
Monti, S., Tamayo, P., Mesirov, J., Golub, T.: Consensus Clustering: A Resampling-based Method for Class Discovery and Visualization of Gene Expression Microarray Data. Machine Learning 52, 91–118 (2003)
Ben-Hur, A., Ellisseeff, A., Guyon, I.: A stability based method for discovering structure in clustered data. In: Altman, R., Dunker, A., Hunter, L., Klein, T., Lauderdale, K. (eds.) Pacific Symposium on Biocomputing, Lihue, Hawaii, USA, vol. 7, pp. 6–17. World Scientific, Singapore (2002)
McShane, L., Radmacher, D., Freidlin, B., Yu, R., Li, M., Simon, R.: Method for assessing reproducibility of clustering patterns observed in analyses of microarray data. Bioinformatics 18, 1462–1469 (2002)
Smolkin, M., Gosh, D.: Cluster stability scores for microarray data in cancer studies. BMC Bioinformatics 36 (2003)
Bertoni, A., Valentini, G.: Randomized maps for assessing the reliability of patients clusters in DNA microarray data analyses. Artificial Intelligence in Medicine 37, 85–109 (2006)
Bertoni, A., Valentini, G.: Model order selection for bio-molecular data clustering. BMC Bioinformatics 8 (2007)
Achlioptas, D.: Database-friendly random projections: Johnson-lindenstrauss with binary coins. Journal of Comp. & Sys. Sci. 66, 671–687 (2003)
Efron, B., Tibshirani, R.: An introduction to the Bootstrap. Chapman and Hall, New York (1993)
Dudoit, S., Fridlyand, J.: Bagging to improve the accuracy of a clustering procedure. Bioinformatics 19, 1090–1099 (2003)
Gentleman, R., et al.: Bioconductor: open software development for computational biology and bioinformatics. Genome Biology 5 (2004)
Irizarry, R., Hobbs, B., Collin, F., Beazer-Barclay, Y., Antonellis, K., Scherf, U., Speed, T.: Exploration, normalization, and summaries of high density oligonucleotide array probe level data. Biostatistics 2, 249–264 (2003)
Gautier, L., Cope, L., Bolstad, B., Irizarry, R.: Affy–analysis of affymetrix genechip data at the probe level. Bioinformatics 20, 307–315 (2004)
The Gene Ontology Consortium: Gene ontology: tool for the unification of biology. Nature Genet. 25, 25–29 (2000)
Khatri, P., Draghici, S.: Ontological analysis of gene expression data: current tools, limitations, and open problems. Bioinformatics 21, 3587–3595 (2005)
Dopazo, J.: Functional interpretation of microarray experiments. OMICS 3 (2006)
Author information
Authors and Affiliations
Editor information
Rights and permissions
Copyright information
© 2008 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Avogadri, R., Brioschi, M., Ruffino, F., Ferrazzi, F., Beghini, A., Valentini, G. (2008). An Algorithm to Assess the Reliability of Hierarchical Clusters in Gene Expression Data. In: Lovrek, I., Howlett, R.J., Jain, L.C. (eds) Knowledge-Based Intelligent Information and Engineering Systems. KES 2008. Lecture Notes in Computer Science(), vol 5179. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-85567-5_95
Download citation
DOI: https://doi.org/10.1007/978-3-540-85567-5_95
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-85566-8
Online ISBN: 978-3-540-85567-5
eBook Packages: Computer ScienceComputer Science (R0)