Abstract
Predictive clustering is a general framework that unifies clustering and prediction. This paper investigates how to apply this framework to cluster time series data. The resulting system, Clus-TS, constructs predictive clustering trees (PCTs) that partition a given set of time series into homogeneous clusters. In addition, PCTs provide a symbolic description of the clusters. We evaluate Clus-TS on time series data from microarray experiments. Each data set records the change over time in the expression level of yeast genes as a response to a change in environmental conditions. Our evaluation shows that Clus-TS is able to cluster genes with similar responses, and to predict the time series based on the description of a gene. Clus-TS is part of a larger project where the goal is to investigate how global models can be combined with inductive databases.
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Blockeel, H., De Raedt, L., Ramon, J.: Top-down induction of clustering trees. In: 15th Int’l Conf. on Machine Learning, pp. 55–63 (1998)
Curk, T., Zupan, B., Petrovič, U., Shaulsky, G.: Računalniško odkrivanje mehanizmov uravnavanja istražanja genov. In: Prvo srečanje slovenskih bioinformatikov, pp. 56–58 (2005)
De Raedt, L.: A perspective on inductive databases. SIGKDD Explorations 4(2), 69–77 (2002)
Ernst, J., Nau, G.J., Bar-Joseph, Z.: Clustering short time series gene expression data. Bioinformatics 21(Suppl. 1), 159–168 (2005)
Ashburner, M., et al.: Gene Ontology: Tool for the unification of biology. The Gene Ontology Consortium. Nature Genet. 25(1), 25–29 (2000)
Ferri, C., Flach, P.A., Hernández-Orallo, J.: Learning decision trees using the area under the ROC curve. In: 19th Int’l Conf. on Machine Learning, pp. 139–146 (2002)
Fromont, E., Blockeel, H., Struyf, J.: Integrating decision tree learning into inductive databases. In: KDID 2006. LNCS, vol. 4747, pp. 81–96. Springer, Heidelberg (2007)
Garofalakis, M., Hyun, D., Rastogi, R., Shim, K.: Building decision trees with constraints. Data Mining and Knowledge Discovery 7(2), 187–214 (2003)
Gasch, A., Spellman, P., Kao, C., Carmel-Harel, O., Eisen, M., Storz, G., Botstein, D., Brown, P.: Genomic expression program in the response of yeast cells to environmental changes. Mol. Biol. Cell. 11, 4241–4257 (2000)
Imielinski, T., Mannila, H.: A database perspective on knowledge discovery. Communications of the ACM 39(11), 58–64 (1996)
Kaufman, L., Rousseeuw, P.J. (eds.): Finding groups in data: An introduction to cluster analysis. Wiley, Chichester (1990)
Lee, S.D., De Raedt, L.: An efficient algorithm for mining string data-bases under constraints. In: Goethals, B., Siebes, A. (eds.) KDID 2004. LNCS, vol. 3377, pp. 108–129. Springer, Heidelberg (2005)
Liao, T.W.: Clustering of time series data – a survey. Pattern Recognition 38, 1857–1874 (2005)
Manning, C.D., Raghavan, P., Schütze, H.: Introduction to Information Retrieval. Cambridge University Press, Cambridge (2007)
Michalski, R.S., Stepp, R.E.: Learning from observation: conceptual clustering. In: Machine Learning: an Artificial Intelligence Approach, vol. 1, Tioga Publishing Company (1983)
Mitasiunaité, I., Boulicaut, J.-F.: Looking for monotonicity properties of a similarity constraint on sequences. In: ACM Symposium of Applied Computing SAC’2006, Special Track on Data Mining, pp. 546–552. ACM Press, New York (2006)
Quinlan, J.R.: C4.5: Programs for Machine Learning. Morgan Kaufmann series in Machine Learning. Morgan Kaufmann, San Francisco (1993)
Raileanu, L.E., Stoffel, K.: Theoretical comparison between the Gini index and information gain criteria. Annals of Mathematics and Artificial Intelligence 41(1), 77–93 (2004)
Sakoe, H., Chiba, S.: Dynamic programming algorithm optimization for spokenword recognition. In: IEEE Transaction on Acoustics, Speech, and Signal Processing. LNAI, vol. ASSP-26, pp. 43–49. IEEE Computer Society Press, Los Alamitos (1978)
Sese, J., Kurokawa, Y., Monden, M., Kato, K., Morishita, S.: Constrained clusters of gene expression profiles with pathological features. Bioinformatics 20, 3137–3145 (2004)
Slavkov, I., Džeroski, S., Struyf, J., Loskovska, S.: Constrained clustering of gene expression profiles. In: Conf. on Data Mining and Data Warehouses (SiKDD 2005) at the 7th Int’l Multi-Conference on Information Society 2005, pp. 212–215 (2005)
Struyf, J., Džeroski, S.: Constraint based induction of multi-objective regression trees. In: Bonchi, F., Boulicaut, J.-F. (eds.) KDID 2005. LNCS, vol. 3933, pp. 222–233. Springer, Heidelberg (2006)
Todorovski, L., Cestnik, B., Kline, M., Lavrač, N., Džeroski, S.: Qualitative clustering of short time-series: A case study of firms reputation data. In: ECML/PKDD 2002 Workshop on Integration and Collaboration Aspects of Data Mining, Decision Support and Meta-Learning, pp. 141–149 (2002)
Torgo, L.: A comparative study of reliable error estimators for pruning regression trees. In: Coelho, H. (ed.) IBERAMIA 1998. LNCS (LNAI), vol. 1484, Springer, Heidelberg (1998)
Ženko, B., Džeroski, S., Struyf, J.: Learning predictive clustering rules. In: Bonchi, F., Boulicaut, J.-F. (eds.) KDID 2005. LNCS, vol. 3933, pp. 234–250. Springer, Heidelberg (2006)
Wagstaff, K.L.: Value, cost, and sharing: Open issues in constrained clustering. In: KDID 2006. LNCS, vol. 4747, pp. 24–41. Springer, Heidelberg (2007)
Author information
Authors and Affiliations
Editor information
Rights and permissions
Copyright information
© 2007 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Džeroski, S., Gjorgjioski, V., Slavkov, I., Struyf, J. (2007). Analysis of Time Series Data with Predictive Clustering Trees. In: Džeroski, S., Struyf, J. (eds) Knowledge Discovery in Inductive Databases. KDID 2006. Lecture Notes in Computer Science, vol 4747. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-75549-4_5
Download citation
DOI: https://doi.org/10.1007/978-3-540-75549-4_5
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-75548-7
Online ISBN: 978-3-540-75549-4
eBook Packages: Computer ScienceComputer Science (R0)