A Tree Index to Support Clustering Based Exploratory Data Analysis

Martin, Christian; Nattkemper, Tim W.

doi:10.1007/978-3-540-70600-7_1

Christian Martin¹ &
Tim W. Nattkemper¹

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 13))

Included in the following conference series:

International Conference on Bioinformatics Research and Development

724 Accesses
1 Citations

Abstract

In microarray data analysis, visualizations based on agglomerative clustering results are widely applied to help biomedical researchers in generating a mental model of their data. In order to support a selection of the to-be-applied algorithm and parameterizations, we propose a novel cluster index, the tree index (TI), to evaluate hierarchical cluster results regarding their visual appearance and their accordance to available background information. Visually appealing cluster trees are characterized by splits that separate those homogeneous clusters from the rest of the data, which have low inner cluster variance and share a medical class label. To evaluate clustering trees regarding this property, the TI computes the likeliness of every single split in the cluster tree. Computing TIs for different algorithms and parameterizations allows to identify the most appealing cluster tree among many possible tree visualizations obtained. Application is shown on simulated data as well as on two public available cancer data sets.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 89.00; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Quackenbush, J.: Computational analysis of microarray data. Nat. Rev. Genet. 2(6), 418–427 (2001)
Article Google Scholar
Ochs, M.F., Godwin, A.K.: Microarray in cancer: Research and applications. Biotechn. 34, 4–15 (2003)
Google Scholar
Duda, R.O., Hart, P.E., Stork, D.G.: Pattern Classification. John Wiley and Sons, Inc., New York (2001)
MATH Google Scholar
Hastie, T., Tibshirani, R., Friedman, J.: The Elements of Statistical Learning Data Mining, Inference, and Prediction. Springer Series in Statistics. Springer, Heidelberg (2001) Fondi di Ricerca Salvatore Ruggieri - Numero 555 d’inventario
MATH Google Scholar
Jain, A.K., Murty, M.N., Flynn, P.J.: Data clustering: a review. ACM Comput. Surv. 31(3), 264–323 (1999)
Article Google Scholar
Hartigan, J.A.: Clustering Algorithms. Wiley, Chichester (1975)
MATH Google Scholar
Eisen, M.B., Spellman, P.T., Brown, P.O., Botstein, D.: Cluster analysis and display of genome-wide expression patterns. PNAS 95, 14863–14868 (1998)
Article Google Scholar
Shi, J., Malik, J.: Normalized cuts and image segmentation. IEEE PAMI 22(8), 888–905 (2000)
Google Scholar
Kluger, Y., Basri, R., Chang, J., Gerstein, M.: Spectral biclustering of microarray data: Coclustering genes and conditions. Genome Res. 13(4), 703–716 (2003)
Article Google Scholar
Xing, E., Karp, R.: CLIFF: Clustering of high–dimensional microarray data via iterative feature filtering using normalized cuts. Bioinformatics 17(suppl. 1), 306–315 (2001)
Google Scholar
Handl, J., Knowles, J., Kell, D.B.: Computational cluster validation in post-genomic data analysis. Bioinformatics 21(15), 3201–3212 (2005)
Article Google Scholar
Gat-Viks, I., Sharan, R., Shamir, R.: Scoring clustering solutions by their biological relevance. Bioinformatics 19(18), 2381–2389 (2003)
Article Google Scholar
Halkidi, M., Batistakis, Y., Vazirgiannis, M.: On clustering validation techniques. Journal of Intelligent Information Systems 17(2-3), 107–145 (2001)
Article MATH Google Scholar
Goodman, L., Kruskal, W.: Measures of associations for cross-validations. J. Am. Stat. Assoc. 49, 732–764 (1954)
Article MATH Google Scholar
Calinski, R., Harabasz, J.: A dendrite method for cluster analysis. Comm. in Statistics 3, 1–27 (1974)
MathSciNet Google Scholar
Dunn, J.: Well separated clusters and optimal fuzzy partitions. J. Cybernetics 4, 95–104 (1974)
Article MathSciNet Google Scholar
Hubert, L., Schulz, J.: Quadratic assignment as a general data-analysis strategy. Br. J. Math. Stat. Psychol. 29, 190–241 (1976)
MATH Google Scholar
Davies, D., Bouldin, D.: A cluster separation measure. IEEE Trans. Pattern Recogn. Machine Intell. 1, 224–227 (1979)
Article Google Scholar
Rousseeuw, P.: Silhouettes: a graphical aid to the interpretation and validation of cluster analysis. J. Comput. Appl. Math. 20, 53–56 (1987)
Article MATH Google Scholar
Shamir, R., Sharan, R.: Algorithmic approaches to clustering gene expression data. In: Jiang, T., Smith, T., Xu, Y., Zhang, M.Q. (eds.) Current Topics in Computational Biology. MIT Press, Cambridge (2001)
Google Scholar
Maulik, U., Bandyopadhyay, S.: Performance evaluation of some clustering algorithms and validity indices. IEEE transactions PAMI 24(12), 1650–1654 (2002)
Google Scholar
Chen, G., Jaradat, S.A., et al.: Evaluation and comparison of clustering algorithms in analyzing ES cell gene expression data. Statistica Sinica 12, 241–262 (2002)
MATH MathSciNet Google Scholar
Bolshakova, N., Azuaje, F., Cunningham, P.: An integrated tool for microarray data clustering and cluster validity assessment. Bioinformatics 21(4), 451–455 (2005)
Article Google Scholar
Bolshakova, N., Azuaje, F.: Estimating the number of clusters in DNA microarray data. Methods Inf. Med. 45(2), 153–157 (2006)
Google Scholar
Rand, W.: Objective criteria for the evaluation of clustering methods. J. of the American Statistical Association 66, 846–850 (1971)
Article Google Scholar
Hubert, A.: Comparing partitions. J. of Classification 2, 193–198 (1985)
Article Google Scholar
Thalamuthu, A., Mukhopadhyay, I., Zheng, X., Tseng, G.C.: Evaluation and comparison of gene clustering methods in microarray analysis. Bioinformatics 22(19), 2405–2412 (2006)
Article Google Scholar
Tavazoie, S., Hughes, J.D., Campbell, M.J., Cho, R.J., Church, G.M.: Systematic determination of genetic network architecture. Nat. Gen. 22, 281–285 (1999)
Article Google Scholar
Toronen, P.: Selection of informative clusters from hierarchical cluster tree with gene classes. BMC Bioinformatics 5(1), (32) (2004)
Article Google Scholar
Datta, S., Datta, S.: Methods for evaluating clustering algorithms for gene expression data using a reference set of functional classes. BMC Bioinf. 7(397) (2006)
Google Scholar
Steuer, R., Selbig, P.H.J.: Validation and functional annotation of expression-based clusters based on gene ontology. BMC Bioinformatics 7(380) (2006)
Google Scholar
Yeung, K., Haynor, D., Ruzzo, W.: Validating clustering for gene expression data. Bioinformatics 17(4), 309–318 (2001)
Article Google Scholar
Johnson, N.L., Kotz, S., Balakrishnan, N.: Discrete multivariate distributions. Wiley, Chichester (1997)
MATH Google Scholar
van de Vijver, M.J., Yudong, D., van’t Veer, L., Hongyue, D., et al.: A gene-expression signature as a predictor of survival in breast cancer. The New Eng. J. Med. 347(25), 1999–2009 (2002)
Article Google Scholar
van’t Veer, L.J., Dai, H., van de Vijver, M.J., He, Y.D., A.A.M.H., Mao, M., Peterse, H.L., van der Kooy, K., Marton, M.J., A.T.W.: Gene expression profiling predicts clinical outcome of breast cancer. Nature 415, 530–536 (2002)
Article Google Scholar
Ramaswamy, S., Tamayo, P., Rifkin, R., Mukherjee, S., Yeang, C., Angelo, M., Ladd, C., Reich, M., Latulippe, E., Mesirov, J., Poggio, T., Gerald, W., Loda, M., Lander, E., Golub, T.: Multiclass cancer diagnosis using tumor gene expression signatures. PNAS 98(26), 15149–15154 (2001)
Article Google Scholar
Ding, C.: Analysis of gene expression profiles: class discovery and leaf ordering. In: Proc. RECOMB 2002 (2002)
Google Scholar
Mewes, H., Frishman, D., Guldener, U., Mannhaupt, G., Mayer, K., Mokrejs, M., Morgenstern, B., Munsterkotter, M., Rudd, S., Weil, B.: MIPS: a database for genomes and protein sequences. Nucleic Acid Res. 30, 31–34 (2002)
Article Google Scholar
GO-Consortium: The Gene Ontology Consortium; Gene Ontology: tool for the unification of biology. Nat.Gene. 25, 25–29 (2000)
Google Scholar

Download references

Author information

Authors and Affiliations

Technical Faculty AG Applied Neuroinformatics, Bielefeld University,
Christian Martin & Tim W. Nattkemper

Authors

Christian Martin
View author publications
You can also search for this author in PubMed Google Scholar
Tim W. Nattkemper
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Mourad Elloumi Josef Küng Michal Linial Robert F. Murphy Kristan Schneider Cristian Toma

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Martin, C., Nattkemper, T.W. (2008). A Tree Index to Support Clustering Based Exploratory Data Analysis. In: Elloumi, M., Küng, J., Linial, M., Murphy, R.F., Schneider, K., Toma, C. (eds) Bioinformatics Research and Development. BIRD 2008. Communications in Computer and Information Science, vol 13. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-70600-7_1

Download citation

DOI: https://doi.org/10.1007/978-3-540-70600-7_1
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-70598-7
Online ISBN: 978-3-540-70600-7
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics