Scalable Similarity Matching in Streaming Time Series

Marascu, Alice; Khan, Suleiman A.; Palpanas, Themis

doi:10.1007/978-3-642-30220-6_19

Alice Marascu²³,
Suleiman A. Khan²⁴ &
Themis Palpanas²³

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 7302))

Included in the following conference series:

Pacific-Asia Conference on Knowledge Discovery and Data Mining

2500 Accesses

Abstract

Nowadays online monitoring of data streams is essential in many real life applications, like sensor network monitoring, manufacturing process control, and video surveillance. One major problem in this area is the online identification of streaming sequences similar to a predefined set of pattern-sequences.

In this paper, we present a novel solution that extends the state of the art both in terms of effectiveness and efficiency. We propose the first online similarity matching algorithm based on Longest Common SubSequence that is specifically designed to operate in a streaming context, and that can effectively handle time scaling, as well as noisy data. In order to deal with high stream rates and multiple streams, we extend the algorithm to operate on multilevel approximations of the streaming data, therefore quickly pruning the search space. Finally, we incorporate in our approach error estimation mechanisms in order to reduce the number of false negatives.

We perform an extensive experimental evaluation using forty real datasets, diverse in nature and characteristics, and we also compare our approach to previous techniques. The experiments demonstrate the validity of our approach.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Similarity search for numerous patterns over multiple time series streams under dynamic time warping which supports data normalization

Article Open access 11 March 2016

A Framework for Similarity Search in Streaming Time Series based on Spark Streaming

Article 11 June 2022

MVS-match: An Efficient Subsequence Matching Approach Based on the Series Synopsis

References

Airoldi, E., Faloutsos, C.: Recovering latent time-series from their observed sums: network tomography with particle filters. In: KDD 2004 (2004)
Google Scholar
Borgne, Y.-A.L., Santini, S., Bontempi, G.: Adaptive model selection for time series prediction in wireless sensor networks. Signal Process. 87(12), 3010–3020 (2007)
Article MATH Google Scholar
Zhu, Y., Shasha, D.: Statstream: statistical monitoring of thousands of data streams in real time. In: VLDB 2002 (2002)
Google Scholar
Camerra, A., Palpanas, T., Shieh, J., Keogh, E.: iSAX 2.0: Indexing and Mining One Billion Time Series. In: ICDM 2010 (2010)
Google Scholar
Dallachiesa, M., Nushi, B., Mirylenka, K., Palpanas, T.: Similarity Matching for Uncertain Time Series: Analytical and Experimental Comparison. In: QUeST 2011 (2011)
Google Scholar
Wei, L., Keogh, E.J., Herle, H.V., Neto, A.M.: Atomic Wedgie: Efficient Query Filtering for Streaming Times Series. In: ICDM 2005, pp. 490–497 (2005)
Google Scholar
Capitani, P., Ciaccia, P.: Warping the time on data streams. Data and Knowledge Engineering (62), 438–458 (2007)
Google Scholar
Vlachos, M., Gunopulos, D., Kollios, G.: Discovering similar multidimensional trajectories. In: ICDE 2002, pp. 673–684 (2002)
Google Scholar
Sakurai, Y., Faloutsos, C., Yamamuro, M.: Stream Monitoring under the Time Warping Distance. In: ICDE 2007 (2007)
Google Scholar
Ratanamahatana, C.A., Keogh, E.: Everything you know about Dynamic Time Warping is Wrong. In: Third Workshop on Mining Temporal and Sequential Data 2004 (2004)
Google Scholar
Ding, H., Trajcevski, G., Scheuermann, P., Wang, X., Keogh, E.: Querying and Mining of Time Series Data: Experimental Comparison of Representations and Distance Measures. In: VLDB 2008 (2008)
Google Scholar
Salvador, S., Chan, P.: FastDTW: Toward Accurate Dynamic Time Warping in Linear Time and Space. Intelligent Data Analysis 11(5), 561–580 (2007)
Google Scholar
Sakoe, H., Chiba, S.: Dynamic programming algorithm optimization for spoken word recognition. ASSP (1978)
Google Scholar
Itakura, F.: Minimum Prediction Residual Principle Applied to Speech Recognition. ASSP 23, 52–72 (1975)
Google Scholar
Agrawal, R., Faloutsos, C., Swami, A.N.: Efficient Similarity Search in Sequence Databases. In: Lomet, D.B. (ed.) FODO 1993. LNCS, vol. 730, pp. 69–84. Springer, Heidelberg (1993)
Chapter Google Scholar
Chen, Y., Nascimento, M.A., Ooi, B.C., Tung, A.K.H.: SpADe: On Shape-based Pattern Detection in Streaming Time Series. In: ICDE 2007 (2007)
Google Scholar
Marascu, A., Masseglia, F.: Mining Sequential Patterns from Data Streams: a Centroid Approach. J. Intell. Inf. Syst. 27(3), 291–307 (2006)
Article Google Scholar
Harada, L.: Detection of complex temporal patterns over data streams. Information Systems 29(6), 439–459 (2004)
Article Google Scholar
Lian, X., Chen, L., Yu, J.X., Wang, G., Yu, G.: Similarity Match Over High Speed Time-Series Streams. In: ICDE 2007 (2007)
Google Scholar
Keogh, E.J., Chakrabarti, K., Pazzani, M.J., Mehrotra, S.: Dimensionality Reduction for Fast Similarity Search in Large Time Series Databases. Knowl. Inf. Syst. 3(3) (2001)
Google Scholar
Babcock, B., Datar, M., Motwani, R.: Sampling From a Moving Window Over Streaming Data. In: SODA 2002 (2002)
Google Scholar
Babcock, B., Datar, M., Motwani, R., O’Callaghan, L.: Maintaining Variance And k-medians Over Data Stream Windows. In: PODS, pp. 234–243 (2003)
Google Scholar
Ben-David, S., Gehrke, J., Kifer, D.: Identifying Distribution Change in Data Streams. In: VLDB, Toronto, ON, Canada (2004)
Google Scholar
Detailed list of datasets used, http://disi.unitn.eu/~themis/publications/pakdd12-ssm-appendix.pdf
UCR: Time Series Data Archive, http://www.cs.ucr.edu/~eamonn/time_series_data/

Download references

Author information

Authors and Affiliations

University of Trento, Italy
Alice Marascu & Themis Palpanas
Aalto University, Finland
Suleiman A. Khan

Authors

Alice Marascu
View author publications
You can also search for this author in PubMed Google Scholar
Suleiman A. Khan
View author publications
You can also search for this author in PubMed Google Scholar
Themis Palpanas
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Department of Computer Science and Engineering, Michigan State University, 428 S. Shaw Lane, 48824-1226, East Lansing, MI, USA
Pang-Ning Tan
School of Information Technologies, University of Sydney, 1 Cleveland St., 2006, Sydney, NSW, Australia
Sanjay Chawla
Faculty of Computing and Informatics, Jalan Multimedia, Multimedia University, 63100, Cyberjaya, Selangor, Malaysia
Chin Kuan Ho
Department of Computing and Information Systems, The University of Melbourne, 111 Barry Street, 3053, Melbourne, VIC, Australia
James Bailey

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Marascu, A., Khan, S.A., Palpanas, T. (2012). Scalable Similarity Matching in Streaming Time Series. In: Tan, PN., Chawla, S., Ho, C.K., Bailey, J. (eds) Advances in Knowledge Discovery and Data Mining. PAKDD 2012. Lecture Notes in Computer Science(), vol 7302. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-30220-6_19

Download citation

DOI: https://doi.org/10.1007/978-3-642-30220-6_19
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-30219-0
Online ISBN: 978-3-642-30220-6
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics