Abstract
Clustering gene expression data given in terms of time-series is a challenging problem that imposes its own particular constraints, namely exchanging two or more time points is not possible as it would deliver quite different results, and also it would lead to erroneous biological conclusions. We have focused on issues related to clustering gene expression temporal profiles, and devised a novel algorithm for clustering gene temporal expression profile microarray data. The proposed clustering method introduces the concept of profile alignment which is achieved by minimizing the area between two aligned profiles. The overall pattern of expression in the time-series context is accomplished by applying agglomerative clustering combined with profile alignment, and finding the optimal number of clusters by means of a variant of a clustering index, which can effectively decide upon the optimal number of clusters for a given dataset. The effectiveness of the proposed approach is demonstrated on two well-known datasets, yeast and serum, and corroborated with a set of pre-clustered yeast genes, which show a very high classification accuracy of the proposed method, though it is an unsupervised scheme.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Bari, A., Rueda, L.: A New Profile Alignment Method for Clustering Gene Expression Data. In: Lamontagne, L., Marchand, M. (eds.) Canadian AI 2006. LNCS, vol. 4013, pp. 86–97. Springer, Heidelberg (2006)
Bréhélin, L.: Clustering Gene Expression Series with Prior Knowledge. In: Casadio, R., Myers, G. (eds.) WABI 2005. LNCS (LNBI), vol. 3692, pp. 27–38. Springer, Heidelberg (2005)
Cho, R.J., Campbell, M.J., Winzeler, E.A., Steinmetz, L., Conway, A., Wodicka, L., Wolfsberg, T.G., Gabrielian, A.E., Landsman, D., Lockhart, D.J., Davis, R.W.: A genome-wide transcriptional analysis of the mitotic cell cycle. Molecular Cell 2(1), 65–73 (1998)
Conesa, A., Nueda, M.J., Ferrer, A., Talon, M.: maSigPro: a method to identify significantly differential expression profiles in time-course microarray experiments. Bioinformatics 22(9), 1096–1102 (2006)
Chu, S., DeRisi, J., Eisen, M., Mulholland, J., Botstein, D., Brown, P., Herskowitz, I.: The transcriptional program of sporulation in budding yeast. Science 282, 699–705 (1998)
Déjean, S., Martin, P.G.P., Baccini, A., Besse, P.: Clustering Time-Series Gene Expression Data Using Smoothing Spline Derivatives. EURASIP J. Bioinform. Syst. Biol. 2007, 70561 (2007)
Drăghici, S.: Data Analysis Tools for DNA Microarrays. Chapman & Hall, Boca Raton (2003)
Eisen, M., Spellman, P., Brown, P., Botstein, D.: Cluster analysis and display of genome-wide expression patterns. Proc. Natl. Acad. Sci. 95, 14863–14868 (1998)
Ernst, J., Nau, G.J., Bar-Joseph, Z.: Clustering Short Time Series Gene Expression Data. Bioinformatics 21(suppl. 1), i159–i168 (2005)
Gasch, A.P., Eisen, M.B.: Exploring the conditional coregulation of yeast gene expression through fuzzy k-means clustering. Genome Biology 3(11), 0059.1–0059.22 (2002)
Guillemin, K., Salama, N., Tompkins, L., Falkow, S.: Cag pathogenicity island-specific responses of gastric epithelial cells to Helicobacter pylori infection. Proc. Natl. Acad. Sci. 99, 15136–15141 (2002)
Hartigan, J.A.: Clustering Algorithms. John Wiley and Sons, Chichester (1975)
Heijne, W.H., Stierum, R.H., Slijper, M., van Bladeren, P.J., van Ommen, B.: Toxicogenomics of bromobenzene hepatotoxicity: a combined transcriptomics and proteomics approach. Biochem. Pharmacol. 65, 857–875 (2003)
Heyer, L., Kruglyak, S., Yooseph, S.: Exploring expression data: identification and analysis of coexpressed genes. Genome Res. 9, 1106–1115 (1999)
Hogg, R., Craig, A.: Introduction to Mathematical Statistics, 5th edn. MacMillan, Basingstoke (1995)
Hwang, J., Peddada, S.: Confidence interval estimation subject to order restrictions. Ann. Statist. 22, 67–93 (1994)
Iyer, V., Eisen, M., Ross, D., Schuler, G., Moore, T., Lee, J., Trent, J., Staudt, L., Hudson Jr., J., Boguski, M.: The transcriptional program in the response of human fibroblasts to serum. Science 283, 83–87 (1999)
Bar-Joseph, Z., Gerber, G., Jaakkola, T., Gifford, D., Simon, I.: Continuous representations of time series gene expression data. Journal of Computational Biology 10(3-4), 341–356 (2003)
Lobenhofer, E., Bennett, L., Cable, P., Li, L., Bushel, P., Afshari, C.: Regulation of DNA replication fork genes by 17betaestradiol. Molec. Endocrin. 16, 1215–1229 (2002)
Maulik, U., Bandyopadhyay, S.: Performance Evaluation of Some Clustering Algorithms and Validity Indices. IEEE Transactions on Pattern Analysis and Machine Intelligence 24(12), 1650–1654 (2002)
Moller-Levet, C., Klawonn, F., Cho, K.-H., Wolkenhauer, O.: Clustering of unevenly sampled gene expression time-series data. Fuzzy sets and Systems 152(1,16), 49–66 (2005)
Peddada, S., Prescott, K., Conaway, M.: Tests for order restrictions in binary data. Biometrics 57, 1219–1227 (2001)
Peddada, S., Lobenhofer, E., Li, L., Afshari, C., Weinberg, C., Umbach, D.: Gene selection and clustering for time-course and dose-response microarray experiments using order-restricted inference. Bioinformatics 19(7), 834–841 (2003)
Petrie, T.: Probabilistic functions of finite state Markov chains. Ann. Math. Statist. 40, 97–115 (1969)
Ramoni, M., Sebastiani, P., Kohane, I.: Cluster analysis of gene expression dynamics. Proc. Natl. Acad. Sci. USA 99(14), 9121–9126 (2002)
Ramsay, J., Silverman, B.: Functional Data Analysis, 2nd edn. Springer, New York (2005)
Rueda, L., Bari, A.: Clustering Temporal Gene Expression Data with Unequal Time Intervals. In: 2nd International Conference on Bio-Inspired Models of Network, Information, and Computing Systems, Bioinformatics Track, Budapest, Hungary (2007) ICST 978-963-9799-11-0
Schliep, A., Schonhuth, A., Steinhoff, C.: Using hidden Markov models to analyze gene expression time course data. Bioinformatics 19, I264–I272 (2003)
Spellman, P.T., Sherlock, G., Zhang, M.Q., Iyer, V.R., Anders, K., Eisen, M.B., Brown, P.O., Botstein, D., Futcher, B.: Comprehensive identification of cell cycleregulated genes of the yeast saccharomyces cerevisiae by microarray hybridization. Mol. Biol. Cell. 9, 3273–3297 (1998)
Tamayo, P., Slonim, D., Mesirov, J., Zhu, Q., Kitareewan, S., Dmitrovsky, E., Lander, E., Golub, T.: Interpreting patterns of gene expression with self-organizing maps: Methods and application to hematopoietic differentiation. Proc. Natl. Acad. Sci. 96(6), 2907–2912 (1999)
Tavazoie, S., Hughes, J., Campbell, M., Cho, R., Church, G.: Systematic determination of genetic network architecture. Nat. Genet. 22, 281–285 (1999)
Zhu1, G., Spellman, P.T., Volpe, T., Brown, P.O., Botstein, D., Davis, T.N., Futcher, B.: Two yeast forkhead genes regulate cell cycle and pseudohyphal growth. Nature 406, 90–94 (2000)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2008 Springer-Verlag Berlin Heidelberg
About this chapter
Cite this chapter
Rueda, L., Bari, A., Ngom, A. (2008). Clustering Time-Series Gene Expression Data with Unequal Time Intervals. In: Priami, C., Dressler, F., Akan, O.B., Ngom, A. (eds) Transactions on Computational Systems Biology X. Lecture Notes in Computer Science(), vol 5410. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-92273-5_6
Download citation
DOI: https://doi.org/10.1007/978-3-540-92273-5_6
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-92272-8
Online ISBN: 978-3-540-92273-5
eBook Packages: Computer ScienceComputer Science (R0)