Abstract
Cluster methods are typically applied to time course gene expression data to find co-regulated genes which can finally help to reveal pathways and interactions between genes. Clustering is either carried out on the raw data or on functional data. In functional data analysis a curve is fit to each observation in order to account for time dependency. As gene expression over time is biologically a continuous process it can be represented by a continuous function. The different curve shapes found in a dataset can have important interpretations and characteristic patterns can be found by clustering the estimated regression coefficients.
In this simulation study on artificial data the well-known K-Means algorithm as well as the quality-based cluster algorithm QT-Clust are applied to both the raw data as well as functional data. The performance of the different methods is evaluated when different types of noise are added to the data. All cluster algorithms used are implemented in R.
Keywords
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
References
Abraham, C., Cornillon, P.-A., Matzner-Lober, E. & Molinari, N. (2003). Unsupervised curve clustering using B-splines. Scandinavian Journal of Statistics, 30(3), 581–595.
Androulakis, I., Yang, E., & Almon, R. (2007). Analysis of time-series gene expression data: Methods, challenges, and opportunities. Annual Review of Biomedical Engineering, 9, 205–228.
de Hoon, M. J.L., Imoto, S., & Miyano, S. (2002). Statistical analysis of a small set of time-ordered gene expression data using linear splines. Bioinformatics, 18(11), 1477–1485.
Eisen, M. B., Spellman, P. T., Brown, P. O., & Botstein, D. (1998). Cluster analysis and display of genome-wide expression patterns. Proceedings of the National Academy of Sciences of the United States of America, 95, 14863–14868.
Fraley, C., & Raftery, A. (1998). How many clusters? Which clustering method? Answers via model–based cluster analysis. The Computer Journal, 41(8), 578–588.
Hakamada, K., Okamoto, M. & Hanai, T. (2006). Novel technique for preprocessing high dimensional time-course data from DNA microarray: mathematical model–based clustering. Bioinformatics, 22(7) 843–848.
Heyer, L. J., Kruglyak, S. & Yooseph, S. (1999). Exploring expression data: identification and analysis of coex pressed genes. Genome Research, 9, 1106–1115.
Hubert, L., & Arabie, P. (1985). Comparing partitions. Journal of Classification, 2, 193–218.
Kerr, G., Ruskin, H. J., Crane, M., & Doolan, P. (2008). Techniques for clustering gene expression data. Computers in Biology and Medicine, 38(3), 283–293.
Leisch, F. (2006). A toolbox for k-centroids cluster analysis. Computational Statistics and Data Analysis, 51(2) 526–544.
R Development Core Team. (2009). R: A language and environment for statistical computing. Nienna, Austria (ISBN: 3-900051-07-0).
Ramsey, J. O., & Silverman, B.W. (1997). Functional data analysis. New York: Springer. (ISBN 0-387-94956-9).
Scharl, T. & Leisch, F. (2006). The stochastic qt-clust algorithm: evaluation of stability and variance on time-course microarray data. In A. Rizzi & M. Vichi (Eds.), Compstat 2006—proceedings in computational statistics (pp. 1015–1022). Heidelberg: Physica.
Scharl, T., & Leisch, F. (2008). Using neighborhood graphs for the investigation of E. coli gene clusters. In M. Ahdesmäki et al. (Eds.), Proceedings of the 5th international workshop on computational systems biology, WCSB 2008 (June 11-13, 2008, Leipzig, Germany) (pp. 157–160). Tampere, Finland: Tampere University of Technology.
Serban, N., & Wasserman, L. (2005). Cats: Clustering after transformation and smoothing. Journal of the American Statistical Association, 100(471), 990–999.
Sheng, Q., Moreau, Y., Smet, F. D., Marchal, K., & Moor, B. D. (2005). Advances in cluster analysis of microarray data. In F. Azuaje, & J. Dopazo (Eds.), Data analysis and visualization in genomics and proteomics. New York: Wiley (ISBN 0-470-09439-7).
Smet, F. D., Mathys, J., Marchal, K., Thijs, G., Moor, B. D. & Moreau, Y. (2002). Adaptive quality-based clustering of gene expression profiles. Bioinformatics, 18(5) 735–746.
Tarpey, T. (2003). Clustering functional data. Journal of Classification, 20, 93–114.
Tarpey, T. (2007). Linear transformations and the k–means clustering algorithm: Applications to clustering curves. The American Statistician, 61, 34–40.
Thalamuthu, A., Mukhopadhyay, I., Zheng, X., & Tseng, G. C. (2006). Evaluation and comparison of gene clustering methods in microarray analysis. Bioinformatics, 22(19), 2405–2412.
Acknowledgements
This work was supported by the Austrian K ind /K net Center of Biopharmaceutical Technology (ACBT).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2009 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Scharl, T., Leisch, F. (2009). Quality-Based Clustering of Functional Data: Applications to Time Course Microarray Data. In: Fink, A., Lausen, B., Seidel, W., Ultsch, A. (eds) Advances in Data Analysis, Data Handling and Business Intelligence. Studies in Classification, Data Analysis, and Knowledge Organization. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-01044-6_62
Download citation
DOI: https://doi.org/10.1007/978-3-642-01044-6_62
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-01043-9
Online ISBN: 978-3-642-01044-6
eBook Packages: Mathematics and StatisticsMathematics and Statistics (R0)