Skip to main content

An Algorithm to Assess the Reliability of Hierarchical Clusters in Gene Expression Data

  • Conference paper
Knowledge-Based Intelligent Information and Engineering Systems (KES 2008)

Abstract

The validation of clusters discovered in bio-molecular data is a central issue in bioinformatics. Recently, stability-based methods have been successfully applied to the analysis of the reliability of clusterings characterized by a relatively low number of examples and clusters. Nevertheless, several problems in functional genomics are characterized by a very large number of examples and clusters. We present a stability-based algorithm to discover significant clusters in hierarchical clusterings with a large number of examples and clusters. Preliminary results on gene expression data of patients affected by Human Myeloid Leukemia, show how to apply the proposed method when thousands of gene clusters are involved.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Datta, S.: Comparison and validation of statistical clustering techniques for microarray gene expression data. Bioinformatics 19, 459–466 (2003)

    Article  Google Scholar 

  2. Napolitano, F., Raiconi, G., Tagliaferri, R., Ciaramella, A., Staiano, A., Miele, G.: Clustering and visualization approaches for human cell cycle gene expression data analysis. Int. J. Approx. Reasoning 47, 70–84 (2008)

    Article  Google Scholar 

  3. Handl, J., Knowles, J., Kell, D.: Computational cluster validation in post-genomic data analysis. Bioinformatics 21, 3201–3215 (2005)

    Article  Google Scholar 

  4. Bolshakova, N., Azuaje, F., Cunningham, P.: An integrated tool for microarray data clustering and cluster validity assessment. Bioinformatics 21, 451–455 (2005)

    Article  Google Scholar 

  5. Kerr, M., Curchill, G.: Bootstrapping cluster analysis: assessing the reliability of conclusions from microarray experiments. PNAS 98, 8961–8965 (2001)

    Article  MATH  Google Scholar 

  6. Monti, S., Tamayo, P., Mesirov, J., Golub, T.: Consensus Clustering: A Resampling-based Method for Class Discovery and Visualization of Gene Expression Microarray Data. Machine Learning 52, 91–118 (2003)

    Article  MATH  Google Scholar 

  7. Ben-Hur, A., Ellisseeff, A., Guyon, I.: A stability based method for discovering structure in clustered data. In: Altman, R., Dunker, A., Hunter, L., Klein, T., Lauderdale, K. (eds.) Pacific Symposium on Biocomputing, Lihue, Hawaii, USA, vol. 7, pp. 6–17. World Scientific, Singapore (2002)

    Google Scholar 

  8. McShane, L., Radmacher, D., Freidlin, B., Yu, R., Li, M., Simon, R.: Method for assessing reproducibility of clustering patterns observed in analyses of microarray data. Bioinformatics 18, 1462–1469 (2002)

    Article  Google Scholar 

  9. Smolkin, M., Gosh, D.: Cluster stability scores for microarray data in cancer studies. BMC Bioinformatics 36 (2003)

    Google Scholar 

  10. Bertoni, A., Valentini, G.: Randomized maps for assessing the reliability of patients clusters in DNA microarray data analyses. Artificial Intelligence in Medicine 37, 85–109 (2006)

    Article  Google Scholar 

  11. Bertoni, A., Valentini, G.: Model order selection for bio-molecular data clustering. BMC Bioinformatics 8 (2007)

    Google Scholar 

  12. Achlioptas, D.: Database-friendly random projections: Johnson-lindenstrauss with binary coins. Journal of Comp. & Sys. Sci. 66, 671–687 (2003)

    Article  MATH  MathSciNet  Google Scholar 

  13. Efron, B., Tibshirani, R.: An introduction to the Bootstrap. Chapman and Hall, New York (1993)

    MATH  Google Scholar 

  14. Dudoit, S., Fridlyand, J.: Bagging to improve the accuracy of a clustering procedure. Bioinformatics 19, 1090–1099 (2003)

    Article  Google Scholar 

  15. Gentleman, R., et al.: Bioconductor: open software development for computational biology and bioinformatics. Genome Biology 5 (2004)

    Google Scholar 

  16. Irizarry, R., Hobbs, B., Collin, F., Beazer-Barclay, Y., Antonellis, K., Scherf, U., Speed, T.: Exploration, normalization, and summaries of high density oligonucleotide array probe level data. Biostatistics 2, 249–264 (2003)

    Article  Google Scholar 

  17. Gautier, L., Cope, L., Bolstad, B., Irizarry, R.: Affy–analysis of affymetrix genechip data at the probe level. Bioinformatics 20, 307–315 (2004)

    Article  Google Scholar 

  18. The Gene Ontology Consortium: Gene ontology: tool for the unification of biology. Nature Genet. 25, 25–29 (2000)

    Google Scholar 

  19. Khatri, P., Draghici, S.: Ontological analysis of gene expression data: current tools, limitations, and open problems. Bioinformatics 21, 3587–3595 (2005)

    Article  Google Scholar 

  20. Dopazo, J.: Functional interpretation of microarray experiments. OMICS 3 (2006)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Ignac Lovrek Robert J. Howlett Lakhmi C. Jain

Rights and permissions

Reprints and permissions

Copyright information

© 2008 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Avogadri, R., Brioschi, M., Ruffino, F., Ferrazzi, F., Beghini, A., Valentini, G. (2008). An Algorithm to Assess the Reliability of Hierarchical Clusters in Gene Expression Data. In: Lovrek, I., Howlett, R.J., Jain, L.C. (eds) Knowledge-Based Intelligent Information and Engineering Systems. KES 2008. Lecture Notes in Computer Science(), vol 5179. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-85567-5_95

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-85567-5_95

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-85566-8

  • Online ISBN: 978-3-540-85567-5

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics