Abstract
Unsupervised data mining of microarray gene expression data is a standard approach for finding relevant groups of genes as well as samples. Clustering of samples is important for finding e.g. disease subtypes or related treatments. Unfortunately, most sample-wise clustering methods do not facilitate the biological interpretation of the results. We propose a novel approach for microarray sample-wise clustering that computes dendrograms with Gene Ontology terms annotated to each node. These dendrograms resemble decision trees with simple rules which can help to find biologically meaningful differences between the sample groups. We have applied our method to a gene expression data set from a study of prostate cancer. The original clustering which contains clinically relevant features is well reproduced, but in addition our unsupervised decision tree rules give hints for a biological explanation of the clusters.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
References
BASAK, J. and KRISHNAPURAM, R. (2005): Interpretable Hierarchical Clustering by Constructing and Unsupervised Decision Tree. IEEE Transactions on Knowledge and Data Engineering, 17, 121–132.
FRALEY, C. and RAFTERY, A.E. (2002): MCLUST: Software for Model-based Clustering, Density Estimation and Discriminat Analysis, and Density Estimation. J Am. Stat. Ass., 97, 611–631.
GOLUB, T.R., SLONIM, D.K., TAMAYO, P., HUARD, C., GAASENBEEK, M., MESIROV, J.P., COLLER, H., LOH, M.L., DOWNING, J.R., CALIGIURI, M.A., BLOOMFIELD, C.D. and LANDER, E.S. (1999): Molecular Classification of Cancer: Class Discovery and Class Prediction by Gene Expression Monitoring. Science, 286, 531–537.
HONDA, K., YAMADA, T., HAYASHIDA, Y., IDOGAWA, M., SATO, S., HASEGAWA, F., INO, Y., ONO, M. and HIROHASHI, S. (2005): Actinin-4 Increases Cell Motility and Promotes Lymph Node Metastasis of Colorectal Cancer. Gastroenterology, 128, 51–62.
KARAKOS, D., KHUDANPUR, S., EISNER, J. and PRIEBE, C.E. (2005): Unsupervised Classification via Decision Trees: An Information-theoretic Perspective. In Proceedings of the 2005 IEEE Int. Conf. on Acoustics, Speech and Signal Processing, IEEE.
LAPOINTE, J., LI, C., HIGGINS, J.P., RIJN, M.V.D., BLAIR, E., MONTGOMERY, K., FERRARI, M., EGEVAD, L., RAYFORD, W., BERGERHEIM, U., EKMAN, P., DEMARZO, A., TIBSHIRANI, R., BOTSTEIN, D., BROWN, P., BROOKS, J. and POLLACK, J. (2004): Gene Expression Profiling Identifies Clinically Relevant Subtypes of Prostate Cancer. PNAS, 101, 811–816.
LOTTAZ, C. and SPANG, R. (2005): Molecular Decomposition of Complex Clinical Phenotypes Using Biologically Structured Analysis of Microarray Data. Bioinformatics, 21, 1971–1978.
MICHIELIS, S., KOSCIELNY, S. and HILL, C. (2005): Prediction of Cancer Outcome with Microarrays: A Multiple Random Validation Study. The Lancet, 365
PAVLIDIS, P. (2005): ErmineJ — Gene Ontology Analysis for Microarray Data, v2.0.4. http://microarray.genomecenter.columbia.edu/ermineJ.
ROSSI, S., GRANER, E., FEBBO, P., WEINSTEIN, L., BHATTACHARYA, N., ONODY, T., BUBLEY, G., BALK, S. and LODA, M. (2003): Fatty Acid Synthase Expression Defines Distinct Molecular Signatures in Prostate Cancer. Mol. Cancer Res., 1, 707–715.
SMYTH, G.K. (2005): Limma: Linear Models for Microarray Data. In: R. Gentleman, V. Carey, S. Dudoit, R. Irizarry and W. Huber (Eds.): Bioinformatics and Computational Biology Solutions using R and Bioconductor. Springer, New York, 397–420.
STOREY, J.D. and TIBSHIRANI, R. (2003): Statistical Significance for Genomewide Studies. PNAS, 100, 9440–9445.
THE GENE ONTOLOGY CONSORTIUM. (2000): Gene Ontology: Tool for the Unification of Biology. Nature Genetics, 25, 25–29.
YANG, Y.H., DUDOIT, S., LUU, P., LIN, D.M., PENG, V., NGAI, J. and SPEED, T.P. (2002): Normalization for cDNA Microarray Data: A Robust Composite Method Addressing Single and Multiple Slide Systematic Variation. Nucleic Acids Research, 30.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2007 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Redestig, H., Sohler, F., Zimmer, R., Selbig, J. (2007). Unsupervised Decision Trees Structured by Gene Ontology (GO-UDTs) for the Interpretation of Microarray Data. In: Decker, R., Lenz, H.J. (eds) Advances in Data Analysis. Studies in Classification, Data Analysis, and Knowledge Organization. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-70981-7_67
Download citation
DOI: https://doi.org/10.1007/978-3-540-70981-7_67
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-70980-0
Online ISBN: 978-3-540-70981-7
eBook Packages: Mathematics and StatisticsMathematics and Statistics (R0)