Skip to main content
Log in

Efficient query filtering for streaming time series with applications to semisupervised learning of time series classifiers

  • Regular Paper
  • Published:
Knowledge and Information Systems Aims and scope Submit manuscript

Abstract

In this paper, we define time series query filtering, the problem of monitoring the streaming time series for a set of predefined patterns. This problem is of great practical importance given the massive volume of streaming time series available through sensors, medical patient records, financial indices and space telemetry. Since the data may arrive at a high rate and the number of predefined patterns can be relatively large, it may be impossible for the comparison algorithm to keep up. We propose a novel technique that exploits the commonality among the predefined patterns to allow monitoring at higher bandwidths, while maintaining a guarantee of no false dismissals. Our approach is based on the widely used envelope-based lower-bounding technique. As we will demonstrate on extensive experiments in diverse domains, our approach achieves tremendous improvements in performance in the offline case, and significant improvements in the fastest possible arrival rate of the data stream that can be processed with guaranteed no false dismissals. As a further demonstration of the utility of our approach, we demonstrate that it can make semisupervised learning of time series classifiers tractable.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. Bartolini I, Ciaccia P, Patella M (2005) WARP: accurate retrieval of shapes using phase of fourier descriptors and time warping distance. IEEE Trans Pattern Anal Mach Intell 27(1):142–147

    Article  Google Scholar 

  2. Berndt D, Clifford J (1994) Using dynamic time warping to find patterns in time series. In: AAAI-94 workshop on knowledge discovery in databases, Seattle, Washington, July 31–August, 1994, pp 229–248

  3. Capitani P, Ciaccia P (2005) Efficiently and accurately comparing real-valued data streams. In: Proceedings of the 13th Italian symposium on advanced database systems (SEBD), Brixen-Bressanone, Italy, June 19–22, 2005, pp 161–168

  4. Carson MP, Fisher AJ, Scorza WE (2002) Atrial fibrillation in pregnancy associated with oral terbutaline. Obstet Gynecol 100(5):1096–1097

    Article  Google Scholar 

  5. Cole R, Gottlieb L, Lewenstein M (2004) Dictionary matching and indexing with errors and don't cares. In: Proceedings of the 36th annual ACM symposium on theory of computing, Chicago, IL, pp 91–100

  6. Diao Y, Altinel M, Franklin MJ et al (2003) Path sharing and predicate evaluation for high-performance XML filtering. ACM Trans Database Syst 28(4):467–516

    Article  Google Scholar 

  7. Gao L, Wang X (2002) Continually evaluating similarity-based pattern queries on a streaming time series. In: Proceedings of the 2002 ACM SIGMOD international conference on management of data, Madison, WI, pp 370–381

  8. Goldberger A, Amaral L, Glass L et al (2000) PhysioBank, PhysioToolkit, and PhysioNet: components of a new research resource for complex physiologic signals. Circulation 101(23):215–220

    Google Scholar 

  9. Joachims T (1998) Text categorization with support vector machines: learning with many relevant features. In: Proceedings of the 10th European conference on machine learning, Chemnitz, Germany, April 21–25, 1998, pp 137–142

  10. Karydis Y, Nanopoulos A, Papadopoulos AN et al (2005) Evaluation of similarity searching methods for music data in peer-to-peer networks. Int J Bus Intell Data Mining 1(2):210–228

    Article  Google Scholar 

  11. Keogh E (2005) Abvailable via http://www.cs.ucr.edu/wli/filtering/

  12. Keogh E, Chotirat AR (2005) Exact indexing of dynamic time warping. Knowledge Inf Syst 7(3):358–386

    Article  Google Scholar 

  13. Keogh E, Kasetty S (2002) On the need for time series data mining benchmarks: a survey and empirical demonstration. In: Proceedings of the 8th ACM SIGKDD international conference on knowledge discovery and data mining, Edmonton, Alberta, Canada, pp 102–111

  14. Keogh E, Lin J (2005) Clustering of time-series subsequences is meaningless: implications for previous and future research. Knowledge Inf Syst 8(2):154–177

    Article  Google Scholar 

  15. Keogh E, Palpanas T, Zordan VB et al (2004) Indexing large human-motion databases. In: Proceedings of the 30th international conference on very large data bases, Toronto, Canada, pp 780–791

  16. Keogh E, Lin J, Fu A et al (2006) Finding the most unusual time series subsequences: algorithms and applications. Knowledge Inf Syst, in press

  17. Kruskall JB, Liberman M (1983) The symmetric time warping algorithm: from continuous to discrete. Reprinted In: Sankoff D, Kruskal J (eds) (1999) Time warps, string edits, and macromolecules: the theory and practice of sequence comparison. CSLI, Stanford, pp 125–161

    Google Scholar 

  18. Kuo JC, Wen CH, Wu AY (2003) Implementation of a programmable 64 2048-point FFT/FFT processor for OFDM-based communication systems. In: Proceedings of the IEEE international symposium on circuits and systems, Bangkok, Thailand, pp 121–124

  19. Li Q, López I, Moon B (2004) Skyline index for time series data. IEEE Trans Knowledge Data Eng 16(6):669–684

    Article  Google Scholar 

  20. Moore A, Miller RH (2002) Automated identification of optically sensed aphid (Homoptera: Aphidae) wingbeat waveforms. Ann Entomol Soc Am 95(1):1–8

    Article  MathSciNet  Google Scholar 

  21. Oomomo S, Chen H, Furuse K et al (2005) Efficient search of similar time series under time warping with dimensionality reduction. In: Proceedings of the 16th Japanese national data engineering workshop (DEWS'05), Japan

  22. Pheromone Pest Management Moritor Technologies, Inc. (2001) Available via http://www.moritor.com/web/

  23. Rabiner L, Juang B (1993) Fundamentals of speech recognition. Prentice Hall, Englewood Cliffs, NJ

    Google Scholar 

  24. Rath TM, Manmatha R (2002) Lower-bounding of dynamic time warping distances for multivariate time series. Technical Report, MM-40, Center for Intelligent Information Retrieval, University of Massachusetts, Amherst

    Google Scholar 

  25. Sakoe H, Chiba S (1978) Dynamic programming algorithm optimization for spoken word recognition. IEEE Trans Acoust Speech Signal Proc 26:43–49

    Google Scholar 

  26. Sakurai Y, Yoshikawa M, Faloutsos C (2005) FTW: fast similarity search under the time warping distance. In: Proceedings of the symposium on principles of database systems (PODS), Maryland, pp 326–337

  27. Shou Y, Mamoulis Y, Cheung DW (2005) Fast and exact warping of time series using adaptive segmental approximations. Mach Learn 58(2/3):231–267

    Article  MATH  Google Scholar 

  28. US Dept of Agriculture (2005) Office of Budget and Management Website www.whitehouse.gov/omb/budget/fy2005/agriculture.html

  29. Vlachos M, Gunopulos D, Das G (2004) Rotation invariant distance measures for trajectories. In: Proceedings of the 10th international conference on knowledge discovery & data mining (SIGKDD), Seattle, WA, pp 707–712

  30. Vlachos M, Kollios G, Gunopulos D (2005) Elastic translation invariant matching of trajectories. Mach Learn J 58(2):301–334

    Article  MATH  Google Scholar 

  31. Zhu Y, Shasha D (2003) Warping indexes with envelope transforms for query by humming. In: Proceedings of the 2003 ACM SIGMOD international conference on management of data, San Diego, CA, pp 181–192

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Li Wei.

Additional information

Li Wei is a Ph.D. candidate in the Department of Computer Science & Engineering at the University of California, Riverside. She received her B.S. and M.S. degrees from Fudan University, China. Her research interests include data mining and information retrieval.

Eamonn Keogh is an Assistant Professor of computer science at the University of California, Riverside. His research interests include data mining, machine learning and information retrieval. Several of his papers have won best paper awards, including papers at SIGKDD and SIGMOD. Dr. Keogh is the recipient of a 5-year NSF Career Award for “Efficient Discovery of Previously Unknown Patterns and Relationships in Massive Time Series Databases”.

Helga Van Herle is an Assistant Clinical Professor of medicine at the Division of Cardiology of the Geffen School of Medicine at UCLA. She received her M.D. from UCLA in 1993; completed her residency in internal medicine at the New York Hospital (Cornell University; 1993–1996) and her cardiology fellowship at UCLA (1997–2001). Dr. Van Herle holds an M.Sc. in bioengineering from Columbia University (1987) and a B.Sc. in chemical engineering from UCLA (1985).

Agenor Mafra-Neto, Ph.D., is the CEO of ISCA Technologies, Inc., in California and the founder of ISCA Technologies, LTDA, in Brazil. His research interests include the analysis of insect behavior and communication systems, the manipulation of insect behavior, and the automation of pest monitoring and pest control. Dr. Mafra-Neto is currently coordinating the deployment of area-wide smart sensor and effector networks to micromanage agricultural and public health pests in the field in an automatic fashion.

Russell J. Abbott is a Professor of computer science at California State University, Los Angeles, and a member of the staff at the Aerospace Corporation, El Segundo, CA. His primary interests are in the field of complex systems. He is currently organizing a workshop to bring together people working in the fields of complex systems and systems engineering.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Wei, L., Keogh, E., Van Herle, H. et al. Efficient query filtering for streaming time series with applications to semisupervised learning of time series classifiers. Knowl Inf Syst 11, 313–344 (2007). https://doi.org/10.1007/s10115-006-0033-7

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10115-006-0033-7

Keywords

Navigation