Abstract
In recent years, Data Stream Mining (DSM) has received a lot of attention due to the increasing number of applicative contexts which generate temporally ordered, fast changing, and potentially infinite data. To deal with such data, learning techniques require to satisfy several computational and storage constraints so that new and specific methods have to be developed. In this paper we introduce a new strategy for dealing with the problem of streaming time series clustering. The method allows to detect a partition of the streams over a user chosen time period and to discover evolutions in proximity relations. We show that it is possible to reach these aims, performing the clustering of temporally non overlapping data batches arriving on-line and then running a suitable clustering algorithm on a dissimilarity matrix updated using the outputs of the local clustering. Through an application on real and simulated data, we will show that this method provides results comparable to algorithms for stored data.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Aggarwal, C.C., Han, J., Wang, J., Yu, P.S.: A framework for clustering evolving data streams. In: VLDB 2003: Proceedings of the 29th International Conference on Very Large Data Bases, pp. 81–92. VLDB Endowment (2003)
Aggarwal, C.C.: On biased reservoir sampling in the presence of stream evolution. In: VLDB, San Francisco (2001, 2006)
Balzanella, A., Irpino, A., Verde, R.: Dimensionality reduction techniques for streaming time series: A new symbolic approach. In: Classification as a Tool for Research. Studies in Classification, Data Analysis, and Knowledge Organization, pp. 381–389. Springer, Heidelberg (2010)
Balzanella, A., Lechevallier, Y., Verde, R.: Clustering multiple data streams. New Perspectives in Statistical Modeling and Data Analysis. Springer, Heidelberg (2011)
Beringer, J., Hullermeier, E.: Online clustering of parallel data streams. Data and Knowledge Engineering 58(2), 180–204 (2006)
Dai, B.-R., Huang, J.-W., Yeh, M.-Y., Chen, M.-S.: Adaptive Clustering for Multiple Evolving Streams. IEEE Transactions On Knowledge And Data Engineering 18(9) (2006)
Calinski, R.B., Harabasz, J.: A dendrite method for cluster analysis. Communications in Statistics 3, 1–27 (1974)
Davies, D.L., Bouldin, D.W.: Cluster Separation Measure. IEEE Transactions on Pattern Analysis and Machine Intelligence 1(2), 95–104 (1979)
De Carvalho, F., Lechevallier, Y., Verde, R.: Clustering methods in symbolic data analysis. In: Classification, Clustering, and Data Mining Applications. Studies in Classification, Data Analysis, and Knowledge Organization, pp. 299–317. Springer, Berlin (2004)
Diday, E.: La methode des Nuees dynamiques. Revue de Statistique Appliquee 19(2), 19–34 (1971)
Diday, E., Noirhomme-Fraiture, M.: Symbolic Data Analysis and the SODAS Software. Wiley (2008)
Flajolet, P., Martin, G.N.: Probabilistic counting. In: SFCS 1983: Proceedings of the 24th Annual Symposium on Foundations of Computer Science, pp. 76–82. IEEE Computer Society, Washington, DC (1983)
Gama, J., Pinto, C.: Discretization from Data Streams: applications to Histograms and Data Mining. In: Proceedings of the 2006 ACM Symposium on Applied Computing, pp. 662–667 (2006)
Ganguly, A.R., Gama, J., Omitaomu, O.A., Gaber, M.M., Vatsavai, R.R.: Knowledge discovery from sensor data. CRC Press (2009)
Greenwald, M., Sanjeev, K.: Space-efficient online computation of quantile summaries. SIGMOD Rec. 30(2), 58–66 (2001)
Guha, S., Harb, B.: Wavelet synopsis for data streams: minimizing non-euclidean error. In: KDD, pp. 88–97 (2005)
Guha, S., Meyerson, A., Mishra, N., Motwani, R.: Clustering Data Streams: Theory and practice. IEEE Transactions on Knowledge and Data Engineering 15(3), 515–528 (2003)
Kavitha, V., Punithavalli, M.: Clustering Time Series Data Stream - A Literature Survey. International Journal of Computer Science and Information Security, IJCSISÂ 8(1) (April 2010) ISSN 1947-5500
Hubert, L., Arabie, P.: Comparing partitions. Journal of Classification, 193–218 (1985)
Camerra, A., Palpanas, T., Shieh, J., Keogh, E.: iSAX 2.0: Indexing and Mining One Billion Time Series. In: ICDM 2010 (2010)
Laxman, S., Sastrya, P.S.: A Survey of temporal data mining. SADHANA, Academy Proceedings in Engineering Sciences 31(2), 173–198 (2006)
Mitsa, T.: Temporal Data Mining. CRC Press (2010) ISBN:9781420089769
Rodriguess, P.P., Pedroso, J.P.: Hierarchical Clustering of Time Series Data Streams. IEEE Transactions on Knowledge and Data Engineering 20(5) (2008)
Vitter, J.S.: Random sampling with a reservoir. ACM Trans. Math. Softw. 11, 37–57 (1985)
Yu, P.S., Wang, H., Han, J.: Mining Data Streams. In: Maimon, O., Rokach, L. (eds.) The Data Mining and Knowledge Discovery Handbook 2005. Springer (2005)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2013 Springer International Publishing Switzerland
About this paper
Cite this paper
Balzanella, A., Verde, R. (2013). Clustering and Change Detection in Multiple Streaming Time Series. In: Kołodziej, J., Di Martino, B., Talia, D., Xiong, K. (eds) Algorithms and Architectures for Parallel Processing. ICA3PP 2013. Lecture Notes in Computer Science, vol 8285. Springer, Cham. https://doi.org/10.1007/978-3-319-03859-9_1
Download citation
DOI: https://doi.org/10.1007/978-3-319-03859-9_1
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-03858-2
Online ISBN: 978-3-319-03859-9
eBook Packages: Computer ScienceComputer Science (R0)