Skip to main content

Unsupervised Decision Trees Structured by Gene Ontology (GO-UDTs) for the Interpretation of Microarray Data

  • Conference paper
  • 3770 Accesses

Abstract

Unsupervised data mining of microarray gene expression data is a standard approach for finding relevant groups of genes as well as samples. Clustering of samples is important for finding e.g. disease subtypes or related treatments. Unfortunately, most sample-wise clustering methods do not facilitate the biological interpretation of the results. We propose a novel approach for microarray sample-wise clustering that computes dendrograms with Gene Ontology terms annotated to each node. These dendrograms resemble decision trees with simple rules which can help to find biologically meaningful differences between the sample groups. We have applied our method to a gene expression data set from a study of prostate cancer. The original clustering which contains clinically relevant features is well reproduced, but in addition our unsupervised decision tree rules give hints for a biological explanation of the clusters.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   129.00
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   169.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  • BASAK, J. and KRISHNAPURAM, R. (2005): Interpretable Hierarchical Clustering by Constructing and Unsupervised Decision Tree. IEEE Transactions on Knowledge and Data Engineering, 17, 121–132.

    Article  Google Scholar 

  • FRALEY, C. and RAFTERY, A.E. (2002): MCLUST: Software for Model-based Clustering, Density Estimation and Discriminat Analysis, and Density Estimation. J Am. Stat. Ass., 97, 611–631.

    Article  MathSciNet  MATH  Google Scholar 

  • GOLUB, T.R., SLONIM, D.K., TAMAYO, P., HUARD, C., GAASENBEEK, M., MESIROV, J.P., COLLER, H., LOH, M.L., DOWNING, J.R., CALIGIURI, M.A., BLOOMFIELD, C.D. and LANDER, E.S. (1999): Molecular Classification of Cancer: Class Discovery and Class Prediction by Gene Expression Monitoring. Science, 286, 531–537.

    Article  Google Scholar 

  • HONDA, K., YAMADA, T., HAYASHIDA, Y., IDOGAWA, M., SATO, S., HASEGAWA, F., INO, Y., ONO, M. and HIROHASHI, S. (2005): Actinin-4 Increases Cell Motility and Promotes Lymph Node Metastasis of Colorectal Cancer. Gastroenterology, 128, 51–62.

    Article  Google Scholar 

  • KARAKOS, D., KHUDANPUR, S., EISNER, J. and PRIEBE, C.E. (2005): Unsupervised Classification via Decision Trees: An Information-theoretic Perspective. In Proceedings of the 2005 IEEE Int. Conf. on Acoustics, Speech and Signal Processing, IEEE.

    Google Scholar 

  • LAPOINTE, J., LI, C., HIGGINS, J.P., RIJN, M.V.D., BLAIR, E., MONTGOMERY, K., FERRARI, M., EGEVAD, L., RAYFORD, W., BERGERHEIM, U., EKMAN, P., DEMARZO, A., TIBSHIRANI, R., BOTSTEIN, D., BROWN, P., BROOKS, J. and POLLACK, J. (2004): Gene Expression Profiling Identifies Clinically Relevant Subtypes of Prostate Cancer. PNAS, 101, 811–816.

    Article  Google Scholar 

  • LOTTAZ, C. and SPANG, R. (2005): Molecular Decomposition of Complex Clinical Phenotypes Using Biologically Structured Analysis of Microarray Data. Bioinformatics, 21, 1971–1978.

    Article  Google Scholar 

  • MICHIELIS, S., KOSCIELNY, S. and HILL, C. (2005): Prediction of Cancer Outcome with Microarrays: A Multiple Random Validation Study. The Lancet, 365

    Google Scholar 

  • PAVLIDIS, P. (2005): ErmineJ — Gene Ontology Analysis for Microarray Data, v2.0.4. http://microarray.genomecenter.columbia.edu/ermineJ.

    Google Scholar 

  • ROSSI, S., GRANER, E., FEBBO, P., WEINSTEIN, L., BHATTACHARYA, N., ONODY, T., BUBLEY, G., BALK, S. and LODA, M. (2003): Fatty Acid Synthase Expression Defines Distinct Molecular Signatures in Prostate Cancer. Mol. Cancer Res., 1, 707–715.

    Google Scholar 

  • SMYTH, G.K. (2005): Limma: Linear Models for Microarray Data. In: R. Gentleman, V. Carey, S. Dudoit, R. Irizarry and W. Huber (Eds.): Bioinformatics and Computational Biology Solutions using R and Bioconductor. Springer, New York, 397–420.

    Chapter  Google Scholar 

  • STOREY, J.D. and TIBSHIRANI, R. (2003): Statistical Significance for Genomewide Studies. PNAS, 100, 9440–9445.

    Article  MathSciNet  MATH  Google Scholar 

  • THE GENE ONTOLOGY CONSORTIUM. (2000): Gene Ontology: Tool for the Unification of Biology. Nature Genetics, 25, 25–29.

    Article  Google Scholar 

  • YANG, Y.H., DUDOIT, S., LUU, P., LIN, D.M., PENG, V., NGAI, J. and SPEED, T.P. (2002): Normalization for cDNA Microarray Data: A Robust Composite Method Addressing Single and Multiple Slide Systematic Variation. Nucleic Acids Research, 30.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2007 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Redestig, H., Sohler, F., Zimmer, R., Selbig, J. (2007). Unsupervised Decision Trees Structured by Gene Ontology (GO-UDTs) for the Interpretation of Microarray Data. In: Decker, R., Lenz, H.J. (eds) Advances in Data Analysis. Studies in Classification, Data Analysis, and Knowledge Organization. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-70981-7_67

Download citation

Publish with us

Policies and ethics