Skip to main content

A Tree Index to Support Clustering Based Exploratory Data Analysis

  • Conference paper
Book cover Bioinformatics Research and Development (BIRD 2008)

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 13))

Included in the following conference series:

Abstract

In microarray data analysis, visualizations based on agglomerative clustering results are widely applied to help biomedical researchers in generating a mental model of their data. In order to support a selection of the to-be-applied algorithm and parameterizations, we propose a novel cluster index, the tree index (TI), to evaluate hierarchical cluster results regarding their visual appearance and their accordance to available background information. Visually appealing cluster trees are characterized by splits that separate those homogeneous clusters from the rest of the data, which have low inner cluster variance and share a medical class label. To evaluate clustering trees regarding this property, the TI computes the likeliness of every single split in the cluster tree. Computing TIs for different algorithms and parameterizations allows to identify the most appealing cluster tree among many possible tree visualizations obtained. Application is shown on simulated data as well as on two public available cancer data sets.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 89.00
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Quackenbush, J.: Computational analysis of microarray data. Nat. Rev. Genet. 2(6), 418–427 (2001)

    Article  Google Scholar 

  2. Ochs, M.F., Godwin, A.K.: Microarray in cancer: Research and applications. Biotechn. 34, 4–15 (2003)

    Google Scholar 

  3. Duda, R.O., Hart, P.E., Stork, D.G.: Pattern Classification. John Wiley and Sons, Inc., New York (2001)

    MATH  Google Scholar 

  4. Hastie, T., Tibshirani, R., Friedman, J.: The Elements of Statistical Learning Data Mining, Inference, and Prediction. Springer Series in Statistics. Springer, Heidelberg (2001) Fondi di Ricerca Salvatore Ruggieri - Numero 555 d’inventario

    MATH  Google Scholar 

  5. Jain, A.K., Murty, M.N., Flynn, P.J.: Data clustering: a review. ACM Comput. Surv. 31(3), 264–323 (1999)

    Article  Google Scholar 

  6. Hartigan, J.A.: Clustering Algorithms. Wiley, Chichester (1975)

    MATH  Google Scholar 

  7. Eisen, M.B., Spellman, P.T., Brown, P.O., Botstein, D.: Cluster analysis and display of genome-wide expression patterns. PNAS 95, 14863–14868 (1998)

    Article  Google Scholar 

  8. Shi, J., Malik, J.: Normalized cuts and image segmentation. IEEE PAMI 22(8), 888–905 (2000)

    Google Scholar 

  9. Kluger, Y., Basri, R., Chang, J., Gerstein, M.: Spectral biclustering of microarray data: Coclustering genes and conditions. Genome Res. 13(4), 703–716 (2003)

    Article  Google Scholar 

  10. Xing, E., Karp, R.: CLIFF: Clustering of high–dimensional microarray data via iterative feature filtering using normalized cuts. Bioinformatics 17(suppl. 1), 306–315 (2001)

    Google Scholar 

  11. Handl, J., Knowles, J., Kell, D.B.: Computational cluster validation in post-genomic data analysis. Bioinformatics 21(15), 3201–3212 (2005)

    Article  Google Scholar 

  12. Gat-Viks, I., Sharan, R., Shamir, R.: Scoring clustering solutions by their biological relevance. Bioinformatics 19(18), 2381–2389 (2003)

    Article  Google Scholar 

  13. Halkidi, M., Batistakis, Y., Vazirgiannis, M.: On clustering validation techniques. Journal of Intelligent Information Systems 17(2-3), 107–145 (2001)

    Article  MATH  Google Scholar 

  14. Goodman, L., Kruskal, W.: Measures of associations for cross-validations. J. Am. Stat. Assoc. 49, 732–764 (1954)

    Article  MATH  Google Scholar 

  15. Calinski, R., Harabasz, J.: A dendrite method for cluster analysis. Comm. in Statistics 3, 1–27 (1974)

    MathSciNet  Google Scholar 

  16. Dunn, J.: Well separated clusters and optimal fuzzy partitions. J. Cybernetics 4, 95–104 (1974)

    Article  MathSciNet  Google Scholar 

  17. Hubert, L., Schulz, J.: Quadratic assignment as a general data-analysis strategy. Br. J. Math. Stat. Psychol. 29, 190–241 (1976)

    MATH  Google Scholar 

  18. Davies, D., Bouldin, D.: A cluster separation measure. IEEE Trans. Pattern Recogn. Machine Intell. 1, 224–227 (1979)

    Article  Google Scholar 

  19. Rousseeuw, P.: Silhouettes: a graphical aid to the interpretation and validation of cluster analysis. J. Comput. Appl. Math. 20, 53–56 (1987)

    Article  MATH  Google Scholar 

  20. Shamir, R., Sharan, R.: Algorithmic approaches to clustering gene expression data. In: Jiang, T., Smith, T., Xu, Y., Zhang, M.Q. (eds.) Current Topics in Computational Biology. MIT Press, Cambridge (2001)

    Google Scholar 

  21. Maulik, U., Bandyopadhyay, S.: Performance evaluation of some clustering algorithms and validity indices. IEEE transactions PAMI 24(12), 1650–1654 (2002)

    Google Scholar 

  22. Chen, G., Jaradat, S.A., et al.: Evaluation and comparison of clustering algorithms in analyzing ES cell gene expression data. Statistica Sinica 12, 241–262 (2002)

    MATH  MathSciNet  Google Scholar 

  23. Bolshakova, N., Azuaje, F., Cunningham, P.: An integrated tool for microarray data clustering and cluster validity assessment. Bioinformatics 21(4), 451–455 (2005)

    Article  Google Scholar 

  24. Bolshakova, N., Azuaje, F.: Estimating the number of clusters in DNA microarray data. Methods Inf. Med. 45(2), 153–157 (2006)

    Google Scholar 

  25. Rand, W.: Objective criteria for the evaluation of clustering methods. J. of the American Statistical Association 66, 846–850 (1971)

    Article  Google Scholar 

  26. Hubert, A.: Comparing partitions. J. of Classification 2, 193–198 (1985)

    Article  Google Scholar 

  27. Thalamuthu, A., Mukhopadhyay, I., Zheng, X., Tseng, G.C.: Evaluation and comparison of gene clustering methods in microarray analysis. Bioinformatics 22(19), 2405–2412 (2006)

    Article  Google Scholar 

  28. Tavazoie, S., Hughes, J.D., Campbell, M.J., Cho, R.J., Church, G.M.: Systematic determination of genetic network architecture. Nat. Gen. 22, 281–285 (1999)

    Article  Google Scholar 

  29. Toronen, P.: Selection of informative clusters from hierarchical cluster tree with gene classes. BMC Bioinformatics 5(1), (32) (2004)

    Article  Google Scholar 

  30. Datta, S., Datta, S.: Methods for evaluating clustering algorithms for gene expression data using a reference set of functional classes. BMC Bioinf. 7(397) (2006)

    Google Scholar 

  31. Steuer, R., Selbig, P.H.J.: Validation and functional annotation of expression-based clusters based on gene ontology. BMC Bioinformatics 7(380) (2006)

    Google Scholar 

  32. Yeung, K., Haynor, D., Ruzzo, W.: Validating clustering for gene expression data. Bioinformatics 17(4), 309–318 (2001)

    Article  Google Scholar 

  33. Johnson, N.L., Kotz, S., Balakrishnan, N.: Discrete multivariate distributions. Wiley, Chichester (1997)

    MATH  Google Scholar 

  34. van de Vijver, M.J., Yudong, D., van’t Veer, L., Hongyue, D., et al.: A gene-expression signature as a predictor of survival in breast cancer. The New Eng. J. Med. 347(25), 1999–2009 (2002)

    Article  Google Scholar 

  35. van’t Veer, L.J., Dai, H., van de Vijver, M.J., He, Y.D., A.A.M.H., Mao, M., Peterse, H.L., van der Kooy, K., Marton, M.J., A.T.W.: Gene expression profiling predicts clinical outcome of breast cancer. Nature 415, 530–536 (2002)

    Article  Google Scholar 

  36. Ramaswamy, S., Tamayo, P., Rifkin, R., Mukherjee, S., Yeang, C., Angelo, M., Ladd, C., Reich, M., Latulippe, E., Mesirov, J., Poggio, T., Gerald, W., Loda, M., Lander, E., Golub, T.: Multiclass cancer diagnosis using tumor gene expression signatures. PNAS 98(26), 15149–15154 (2001)

    Article  Google Scholar 

  37. Ding, C.: Analysis of gene expression profiles: class discovery and leaf ordering. In: Proc. RECOMB 2002 (2002)

    Google Scholar 

  38. Mewes, H., Frishman, D., Guldener, U., Mannhaupt, G., Mayer, K., Mokrejs, M., Morgenstern, B., Munsterkotter, M., Rudd, S., Weil, B.: MIPS: a database for genomes and protein sequences. Nucleic Acid Res. 30, 31–34 (2002)

    Article  Google Scholar 

  39. GO-Consortium: The Gene Ontology Consortium; Gene Ontology: tool for the unification of biology. Nat.Gene. 25, 25–29 (2000)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Mourad Elloumi Josef Küng Michal Linial Robert F. Murphy Kristan Schneider Cristian Toma

Rights and permissions

Reprints and permissions

Copyright information

© 2008 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Martin, C., Nattkemper, T.W. (2008). A Tree Index to Support Clustering Based Exploratory Data Analysis. In: Elloumi, M., Küng, J., Linial, M., Murphy, R.F., Schneider, K., Toma, C. (eds) Bioinformatics Research and Development. BIRD 2008. Communications in Computer and Information Science, vol 13. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-70600-7_1

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-70600-7_1

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-70598-7

  • Online ISBN: 978-3-540-70600-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics