Abstract
In this paper, we define time series query filtering, the problem of monitoring the streaming time series for a set of predefined patterns. This problem is of great practical importance given the massive volume of streaming time series available through sensors, medical patient records, financial indices and space telemetry. Since the data may arrive at a high rate and the number of predefined patterns can be relatively large, it may be impossible for the comparison algorithm to keep up. We propose a novel technique that exploits the commonality among the predefined patterns to allow monitoring at higher bandwidths, while maintaining a guarantee of no false dismissals. Our approach is based on the widely used envelope-based lower-bounding technique. As we will demonstrate on extensive experiments in diverse domains, our approach achieves tremendous improvements in performance in the offline case, and significant improvements in the fastest possible arrival rate of the data stream that can be processed with guaranteed no false dismissals. As a further demonstration of the utility of our approach, we demonstrate that it can make semisupervised learning of time series classifiers tractable.
Similar content being viewed by others
References
Bartolini I, Ciaccia P, Patella M (2005) WARP: accurate retrieval of shapes using phase of fourier descriptors and time warping distance. IEEE Trans Pattern Anal Mach Intell 27(1):142–147
Berndt D, Clifford J (1994) Using dynamic time warping to find patterns in time series. In: AAAI-94 workshop on knowledge discovery in databases, Seattle, Washington, July 31–August, 1994, pp 229–248
Capitani P, Ciaccia P (2005) Efficiently and accurately comparing real-valued data streams. In: Proceedings of the 13th Italian symposium on advanced database systems (SEBD), Brixen-Bressanone, Italy, June 19–22, 2005, pp 161–168
Carson MP, Fisher AJ, Scorza WE (2002) Atrial fibrillation in pregnancy associated with oral terbutaline. Obstet Gynecol 100(5):1096–1097
Cole R, Gottlieb L, Lewenstein M (2004) Dictionary matching and indexing with errors and don't cares. In: Proceedings of the 36th annual ACM symposium on theory of computing, Chicago, IL, pp 91–100
Diao Y, Altinel M, Franklin MJ et al (2003) Path sharing and predicate evaluation for high-performance XML filtering. ACM Trans Database Syst 28(4):467–516
Gao L, Wang X (2002) Continually evaluating similarity-based pattern queries on a streaming time series. In: Proceedings of the 2002 ACM SIGMOD international conference on management of data, Madison, WI, pp 370–381
Goldberger A, Amaral L, Glass L et al (2000) PhysioBank, PhysioToolkit, and PhysioNet: components of a new research resource for complex physiologic signals. Circulation 101(23):215–220
Joachims T (1998) Text categorization with support vector machines: learning with many relevant features. In: Proceedings of the 10th European conference on machine learning, Chemnitz, Germany, April 21–25, 1998, pp 137–142
Karydis Y, Nanopoulos A, Papadopoulos AN et al (2005) Evaluation of similarity searching methods for music data in peer-to-peer networks. Int J Bus Intell Data Mining 1(2):210–228
Keogh E (2005) Abvailable via http://www.cs.ucr.edu/wli/filtering/
Keogh E, Chotirat AR (2005) Exact indexing of dynamic time warping. Knowledge Inf Syst 7(3):358–386
Keogh E, Kasetty S (2002) On the need for time series data mining benchmarks: a survey and empirical demonstration. In: Proceedings of the 8th ACM SIGKDD international conference on knowledge discovery and data mining, Edmonton, Alberta, Canada, pp 102–111
Keogh E, Lin J (2005) Clustering of time-series subsequences is meaningless: implications for previous and future research. Knowledge Inf Syst 8(2):154–177
Keogh E, Palpanas T, Zordan VB et al (2004) Indexing large human-motion databases. In: Proceedings of the 30th international conference on very large data bases, Toronto, Canada, pp 780–791
Keogh E, Lin J, Fu A et al (2006) Finding the most unusual time series subsequences: algorithms and applications. Knowledge Inf Syst, in press
Kruskall JB, Liberman M (1983) The symmetric time warping algorithm: from continuous to discrete. Reprinted In: Sankoff D, Kruskal J (eds) (1999) Time warps, string edits, and macromolecules: the theory and practice of sequence comparison. CSLI, Stanford, pp 125–161
Kuo JC, Wen CH, Wu AY (2003) Implementation of a programmable 64 2048-point FFT/FFT processor for OFDM-based communication systems. In: Proceedings of the IEEE international symposium on circuits and systems, Bangkok, Thailand, pp 121–124
Li Q, López I, Moon B (2004) Skyline index for time series data. IEEE Trans Knowledge Data Eng 16(6):669–684
Moore A, Miller RH (2002) Automated identification of optically sensed aphid (Homoptera: Aphidae) wingbeat waveforms. Ann Entomol Soc Am 95(1):1–8
Oomomo S, Chen H, Furuse K et al (2005) Efficient search of similar time series under time warping with dimensionality reduction. In: Proceedings of the 16th Japanese national data engineering workshop (DEWS'05), Japan
Pheromone Pest Management Moritor Technologies, Inc. (2001) Available via http://www.moritor.com/web/
Rabiner L, Juang B (1993) Fundamentals of speech recognition. Prentice Hall, Englewood Cliffs, NJ
Rath TM, Manmatha R (2002) Lower-bounding of dynamic time warping distances for multivariate time series. Technical Report, MM-40, Center for Intelligent Information Retrieval, University of Massachusetts, Amherst
Sakoe H, Chiba S (1978) Dynamic programming algorithm optimization for spoken word recognition. IEEE Trans Acoust Speech Signal Proc 26:43–49
Sakurai Y, Yoshikawa M, Faloutsos C (2005) FTW: fast similarity search under the time warping distance. In: Proceedings of the symposium on principles of database systems (PODS), Maryland, pp 326–337
Shou Y, Mamoulis Y, Cheung DW (2005) Fast and exact warping of time series using adaptive segmental approximations. Mach Learn 58(2/3):231–267
US Dept of Agriculture (2005) Office of Budget and Management Website www.whitehouse.gov/omb/budget/fy2005/agriculture.html
Vlachos M, Gunopulos D, Das G (2004) Rotation invariant distance measures for trajectories. In: Proceedings of the 10th international conference on knowledge discovery & data mining (SIGKDD), Seattle, WA, pp 707–712
Vlachos M, Kollios G, Gunopulos D (2005) Elastic translation invariant matching of trajectories. Mach Learn J 58(2):301–334
Zhu Y, Shasha D (2003) Warping indexes with envelope transforms for query by humming. In: Proceedings of the 2003 ACM SIGMOD international conference on management of data, San Diego, CA, pp 181–192
Author information
Authors and Affiliations
Corresponding author
Additional information
Li Wei is a Ph.D. candidate in the Department of Computer Science & Engineering at the University of California, Riverside. She received her B.S. and M.S. degrees from Fudan University, China. Her research interests include data mining and information retrieval.
Eamonn Keogh is an Assistant Professor of computer science at the University of California, Riverside. His research interests include data mining, machine learning and information retrieval. Several of his papers have won best paper awards, including papers at SIGKDD and SIGMOD. Dr. Keogh is the recipient of a 5-year NSF Career Award for “Efficient Discovery of Previously Unknown Patterns and Relationships in Massive Time Series Databases”.
Helga Van Herle is an Assistant Clinical Professor of medicine at the Division of Cardiology of the Geffen School of Medicine at UCLA. She received her M.D. from UCLA in 1993; completed her residency in internal medicine at the New York Hospital (Cornell University; 1993–1996) and her cardiology fellowship at UCLA (1997–2001). Dr. Van Herle holds an M.Sc. in bioengineering from Columbia University (1987) and a B.Sc. in chemical engineering from UCLA (1985).
Agenor Mafra-Neto, Ph.D., is the CEO of ISCA Technologies, Inc., in California and the founder of ISCA Technologies, LTDA, in Brazil. His research interests include the analysis of insect behavior and communication systems, the manipulation of insect behavior, and the automation of pest monitoring and pest control. Dr. Mafra-Neto is currently coordinating the deployment of area-wide smart sensor and effector networks to micromanage agricultural and public health pests in the field in an automatic fashion.
Russell J. Abbott is a Professor of computer science at California State University, Los Angeles, and a member of the staff at the Aerospace Corporation, El Segundo, CA. His primary interests are in the field of complex systems. He is currently organizing a workshop to bring together people working in the fields of complex systems and systems engineering.
Rights and permissions
About this article
Cite this article
Wei, L., Keogh, E., Van Herle, H. et al. Efficient query filtering for streaming time series with applications to semisupervised learning of time series classifiers. Knowl Inf Syst 11, 313–344 (2007). https://doi.org/10.1007/s10115-006-0033-7
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10115-006-0033-7