Abstract
This paper addresses the clustering and classification of active genes during the process of cell division. Cell division ensures the proliferation of cells, but becomes drastically aberrant in cancer cells. The studied genes are described by their expression profiles (i.e. time series) during the cell division cycle. This work focuses on evaluating the efficiency of four major metrics for clustering and classifying gene expression profiles. The study is based on a random-periods model for the expression of cell-cycle genes. The model accounts for the observed attenuation in cycle amplitude or duration, variations in the initial amplitude, and drift in the expression profiles.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Anagnostopoulos, A., Vlachos, M., Hadjieleftheriou, M., Keogh, E.J., Yu, P.S.: Global Distance-Based Segmentation of Trajectories. In: Proc. of ACM SIGKDD, pp. 34–43 (2006)
Bar-Joseph, Z., Gerber, G.K., Gifford, D.K., Jaakkola, T., Simon, I.: Continuous Representations of Time-Series Gene Expression Data. Journal of Computational Biology 10(3), 341–356 (2003)
Caiado, J., Crato, N., Pena, D.: A periodogram-based metric for time series classification. Computational Statistics and Data Analysis 50, 2668–2684 (2006)
Douzal-Chouakria, A., Nagabhushan, P.N.: Adaptive dissimilarity index for measuring time series proximity. Advances in Data Analysis and Classification Journal 1(5-21) (2007)
Douzal-Chouakria, A., Diallo, A., Giroud, F.: Adaptive clustering for time series: application for identifying cell-cycle expressed genes. Computational Statistics and Data Analysis 53(4), 1414–1426 (2009)
Džeroski, S., Gjorgjioski, V., Slavkov, I., Struyf, J.: Analysis of time series data with predictive clustering trees. In: Džeroski, S., Struyf, J. (eds.) Knowledge Discovery in Inductive Databases, 5th International Workshop, KDID, Berlin, Germany (2006)
Eisen, M.B., Brown, P.O.: DNA arrays for analysis of gene expression. Methods Enzymol. 303, 179–205 (1999)
Garcia-Escudero, L.A., Gordaliza, A.: A proposal for robust curve clustering. Journal of Classification 22, 185–201 (2005)
Heckman, N.E., Zamar, R.H.: Comparing the shapes of regression functions. Biometrika 22, 135–144 (2000)
He, Y., Pan, W., Lin, J.: Cluster analysis using multivariate normal mixture models to detect differential gene expression with microarray data. Computational Statistics and Data Analysis 51(2), 641–658 (2006)
Kakizawa, Y., Shumway, R.H., Taniguchi, N.: Discrimination and clustering for multivariate time series. Journal of the American Statistical Association 93, 328–340 (1998)
Kaufman, L., Rousseeuw, P.J.: Finding Groups in Data. An Introduction to Cluster Analysis. John Wiley & Sons, New York (1990)
Keller, K., Wittfeld, K.: Distances of time series components by means of symbolic dynamics. International Journal of Bifurcation Chaos 14, 693–704 (2004)
Keogh, E.J., Pazzani, M.J.: Scaling Up Dynamic Time Warping for Data Mining Applications. In: Proc. of ACM SIGKDD, pp. 285–289 (2000)
Kruskall, J.B., Liberman, M.: The symmetric time warping algorithm: From continuous to discrete. In: Time Warps, String Edits and Macromolecules. Addison-Wesley, Reading (1983)
Liu, D., Umbach, D.M., Peddada, S.D., Li, L., Crockett, P.W., Weinberg, C.R.: A Random-Periods Model for Expression of Cell-Cycle Genes. Proc. Natl. Acad. Sci. USA 101, 7240–7245 (2004)
Liu, X., Lee, S., Casella, G., Peter, G.F.: Assessing agreement of clustering methods with gene expression microarray data. Computational Statistics and Data Analysis 52(12), 5356–5366 (2008)
Maharaj, E.A.: Cluster of time series. Journal of Classification 17, 297–314 (2000)
Oates, T., Firoiou, L., Cohen, P.R.: Clustering time series with Hidden Markov Models and Dynamic Time Warping. In: Proc. 6th IJCAI 1999, Workshop on Neural, Symbolic and Reinforcement Learning Methods for Sequence Learning, Stockholm, pp. 17–21 (1999)
Park, C., Koo, J., Kim, S., Sohn, I., Lee, J.W.: Classification of gene functions using support vector machine for time-course gene expression data. Computational Statistics and Data Analysis 52(5), 2578–2587 (2008)
Scrucca, L.: Class prediction and gene selection for DNA microarrays using regularized sliced inverse regression. Computational Statistics and Data Analysis 52(1), 438–451 (2007)
Serban, N., Wasserman, L.: CATS: Cluster After Transformation and Smoothing. Journal of the American Statistical Association 100, 990–999 (2004)
Shieh, J., Keogh, E.J.: iSAX: Indexing and Mining Terabyte Sized Time Series. In: Proc. of ACM SIGKDD, pp. 623–631 (2008)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2009 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Diallo, A., Douzal-Chouakria, A., Giroud, F. (2009). Which Distance for the Identification and the Differentiation of Cell-Cycle Expressed Genes?. In: Adams, N.M., Robardet, C., Siebes, A., Boulicaut, JF. (eds) Advances in Intelligent Data Analysis VIII. IDA 2009. Lecture Notes in Computer Science, vol 5772. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-03915-7_24
Download citation
DOI: https://doi.org/10.1007/978-3-642-03915-7_24
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-03914-0
Online ISBN: 978-3-642-03915-7
eBook Packages: Computer ScienceComputer Science (R0)