Skip to main content

Scalable Similarity Matching in Streaming Time Series

  • Conference paper
Advances in Knowledge Discovery and Data Mining (PAKDD 2012)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 7302))

Included in the following conference series:

  • 2500 Accesses

Abstract

Nowadays online monitoring of data streams is essential in many real life applications, like sensor network monitoring, manufacturing process control, and video surveillance. One major problem in this area is the online identification of streaming sequences similar to a predefined set of pattern-sequences.

In this paper, we present a novel solution that extends the state of the art both in terms of effectiveness and efficiency. We propose the first online similarity matching algorithm based on Longest Common SubSequence that is specifically designed to operate in a streaming context, and that can effectively handle time scaling, as well as noisy data. In order to deal with high stream rates and multiple streams, we extend the algorithm to operate on multilevel approximations of the streaming data, therefore quickly pruning the search space. Finally, we incorporate in our approach error estimation mechanisms in order to reduce the number of false negatives.

We perform an extensive experimental evaluation using forty real datasets, diverse in nature and characteristics, and we also compare our approach to previous techniques. The experiments demonstrate the validity of our approach.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Similar content being viewed by others

References

  1. Airoldi, E., Faloutsos, C.: Recovering latent time-series from their observed sums: network tomography with particle filters. In: KDD 2004 (2004)

    Google Scholar 

  2. Borgne, Y.-A.L., Santini, S., Bontempi, G.: Adaptive model selection for time series prediction in wireless sensor networks. Signal Process. 87(12), 3010–3020 (2007)

    Article  MATH  Google Scholar 

  3. Zhu, Y., Shasha, D.: Statstream: statistical monitoring of thousands of data streams in real time. In: VLDB 2002 (2002)

    Google Scholar 

  4. Camerra, A., Palpanas, T., Shieh, J., Keogh, E.: iSAX 2.0: Indexing and Mining One Billion Time Series. In: ICDM 2010 (2010)

    Google Scholar 

  5. Dallachiesa, M., Nushi, B., Mirylenka, K., Palpanas, T.: Similarity Matching for Uncertain Time Series: Analytical and Experimental Comparison. In: QUeST 2011 (2011)

    Google Scholar 

  6. Wei, L., Keogh, E.J., Herle, H.V., Neto, A.M.: Atomic Wedgie: Efficient Query Filtering for Streaming Times Series. In: ICDM 2005, pp. 490–497 (2005)

    Google Scholar 

  7. Capitani, P., Ciaccia, P.: Warping the time on data streams. Data and Knowledge Engineering (62), 438–458 (2007)

    Google Scholar 

  8. Vlachos, M., Gunopulos, D., Kollios, G.: Discovering similar multidimensional trajectories. In: ICDE 2002, pp. 673–684 (2002)

    Google Scholar 

  9. Sakurai, Y., Faloutsos, C., Yamamuro, M.: Stream Monitoring under the Time Warping Distance. In: ICDE 2007 (2007)

    Google Scholar 

  10. Ratanamahatana, C.A., Keogh, E.: Everything you know about Dynamic Time Warping is Wrong. In: Third Workshop on Mining Temporal and Sequential Data 2004 (2004)

    Google Scholar 

  11. Ding, H., Trajcevski, G., Scheuermann, P., Wang, X., Keogh, E.: Querying and Mining of Time Series Data: Experimental Comparison of Representations and Distance Measures. In: VLDB 2008 (2008)

    Google Scholar 

  12. Salvador, S., Chan, P.: FastDTW: Toward Accurate Dynamic Time Warping in Linear Time and Space. Intelligent Data Analysis 11(5), 561–580 (2007)

    Google Scholar 

  13. Sakoe, H., Chiba, S.: Dynamic programming algorithm optimization for spoken word recognition. ASSP (1978)

    Google Scholar 

  14. Itakura, F.: Minimum Prediction Residual Principle Applied to Speech Recognition. ASSP 23, 52–72 (1975)

    Google Scholar 

  15. Agrawal, R., Faloutsos, C., Swami, A.N.: Efficient Similarity Search in Sequence Databases. In: Lomet, D.B. (ed.) FODO 1993. LNCS, vol. 730, pp. 69–84. Springer, Heidelberg (1993)

    Chapter  Google Scholar 

  16. Chen, Y., Nascimento, M.A., Ooi, B.C., Tung, A.K.H.: SpADe: On Shape-based Pattern Detection in Streaming Time Series. In: ICDE 2007 (2007)

    Google Scholar 

  17. Marascu, A., Masseglia, F.: Mining Sequential Patterns from Data Streams: a Centroid Approach. J. Intell. Inf. Syst. 27(3), 291–307 (2006)

    Article  Google Scholar 

  18. Harada, L.: Detection of complex temporal patterns over data streams. Information Systems 29(6), 439–459 (2004)

    Article  Google Scholar 

  19. Lian, X., Chen, L., Yu, J.X., Wang, G., Yu, G.: Similarity Match Over High Speed Time-Series Streams. In: ICDE 2007 (2007)

    Google Scholar 

  20. Keogh, E.J., Chakrabarti, K., Pazzani, M.J., Mehrotra, S.: Dimensionality Reduction for Fast Similarity Search in Large Time Series Databases. Knowl. Inf. Syst. 3(3) (2001)

    Google Scholar 

  21. Babcock, B., Datar, M., Motwani, R.: Sampling From a Moving Window Over Streaming Data. In: SODA 2002 (2002)

    Google Scholar 

  22. Babcock, B., Datar, M., Motwani, R., O’Callaghan, L.: Maintaining Variance And k-medians Over Data Stream Windows. In: PODS, pp. 234–243 (2003)

    Google Scholar 

  23. Ben-David, S., Gehrke, J., Kifer, D.: Identifying Distribution Change in Data Streams. In: VLDB, Toronto, ON, Canada (2004)

    Google Scholar 

  24. Detailed list of datasets used, http://disi.unitn.eu/~themis/publications/pakdd12-ssm-appendix.pdf

  25. UCR: Time Series Data Archive, http://www.cs.ucr.edu/~eamonn/time_series_data/

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2012 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Marascu, A., Khan, S.A., Palpanas, T. (2012). Scalable Similarity Matching in Streaming Time Series. In: Tan, PN., Chawla, S., Ho, C.K., Bailey, J. (eds) Advances in Knowledge Discovery and Data Mining. PAKDD 2012. Lecture Notes in Computer Science(), vol 7302. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-30220-6_19

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-30220-6_19

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-30219-0

  • Online ISBN: 978-3-642-30220-6

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics