skip to main content
10.1145/1376616.1376656acmconferencesArticle/Chapter ViewAbstractPublication PagesmodConference Proceedingsconference-collections
research-article

Approximate embedding-based subsequence matching of time series

Published: 09 June 2008 Publication History

Abstract

A method for approximate subsequence matching is introduced, that significantly improves the efficiency of subsequence matching in large time series data sets under the dynamic time warping (DTW) distance measure. Our method is called EBSM, shorthand for Embedding-Based Subsequence Matching. The key idea is to convert subsequence matching to vector matching using an embedding. This embedding maps each database time series into a sequence of vectors, so that every step of every time series in the database is mapped to a vector. The embedding is computed by applying full dynamic time warping between reference objects and each database time series. At runtime, given a query object, an embedding of that object is computed in the same manner, by running dynamic time warping between the reference objects and the query. Comparing the embedding of the query with the database vectors is used to efficiently identify relatively few areas of interest in the database sequences. Those areas of interest are then fully explored using the exact DTW-based subsequence matching algorithm. Experiments on a large, public time series data set produce speedups of over one order of magnitude compared to brute-force search, with very small losses (< 1%) in retrieval accuracy.

References

[1]
J. Alon, V. Athitsos, and S. Sclaroff. Accurate and efficient gesture spotting via pruning and subgesture reasoning. In IEEE Workshop on Human Computer Interaction, pages 189--198, 2005.]]
[2]
T. Argyros and C. Ermopoulos. Efficient subsequence matching in time series databases under time and amplitude transformations. In International Conference on Data Mining, pages 481--484, 2003.]]
[3]
V. Athitsos, M. Hadjieleftheriou, G. Kollios, and S. Sclaroff. Query-sensitive embeddings. In ACM International Conference on Management of Data (SIGMOD), pages 706--717, 2005.]]
[4]
C. Böhm, S. Berchtold, and D. A. Keim. Searching in high-dimensional spaces: Index structures for improving the performance of multimedia databases. ACM Computing Surveys, 33(3):322--373, 2001.]]
[5]
K. Chakrabarti and S. Mehrotra. Local dimensionality reduction: A new approach to indexing high dimensional spaces. In International Conference on Very Large Data Bases, pages 89--100, 2000.]]
[6]
Ö. Egecioglu and H. Ferhatosmanoglu. Dimensionality reduction and similarity distance computation by inner product approximations. In International Conference on Information and Knowledge Management, pages 219--226, 2000.]]
[7]
C. Faloutsos and K. I. Lin. FastMap: A fast algorithm for indexing, data-mining and visualization of traditional and multimedia datasets. In ACM International Conference on Management of Data (SIGMOD), pages 163--174, 1995.]]
[8]
C. Faloutsos, M. Ranganathan, and Y. Manolopoulos. Fast subsequence matching in time-series databases. In ACM International Conference on Management of Data (SIGMOD), pages 419--429, 1994.]]
[9]
A. Gionis, P. Indyk, and R. Motwani. Similarity search in high dimensions via hashing. In International Conference on Very Large Databases, pages 518--529, 1999.]]
[10]
D. Q. Goldin and P. C. Kanellakis. On similarity queries for time-series data: Constraint specification and implementation. In International Conference on Principles and Practice of Constraint Programming, pages 137--153, 1995.]]
[11]
W.-S. Han, J. Lee, Y.-S. Moon, and H. Jiang. Ranked subsequence matching in time-series databases. In International Conference on Very Large Data Bases (VLDB), pages 423--434, 2007.]]
[12]
G. Hjaltason and H. Samet. Properties of embedding methods for similarity searching in metric spaces. IEEE Transactions on Pattern Analysis and Machine Intelligence, 25(5):530--549, 2003.]]
[13]
G. R. Hjaltason and H. Samet. Index-driven similarity search in metric spaces. ACM Transactions on Database Systems, 28(4):517--580, 2003.]]
[14]
G. Hristescu and M. Farach-Colton. Cluster-preserving embedding of proteins. Technical Report 99-50, CS Department, Rutgers University, 1999.]]
[15]
K. V. R. Kanth, D. Agrawal, and A. Singh. Dimensionality reduction for similarity searching in dynamic databases. In ACM International Conference on Management of Data (SIGMOD), pages 166--176, 1998.]]
[16]
E. Keogh. Exact indexing of dynamic time warping. In International Conference on Very Large Data Bases, pages 406--417, 2002.]]
[17]
E. Keogh and M. Pazzani. Scaling up dynamic time warping for data mining applications. In Proc. of SIGKDD, 2000.]]
[18]
E. Keogh, X. Xi, L. Wei, and C. A. Ratanamahatana. The UCR time series classification/clustering homepage: www.cs.ucr.edu/~eamonn/time_series_data/, 2006.]]
[19]
N. Koudas, B. C. Ooi, H. T. Shen, and A. K. H. Tung. LDC: Enabling search by partial distance in a hyper-dimensional space. In IEEE International Conference on Data Engineearing, pages 6--17, 2004.]]
[20]
J. B. Kruskall and M. Liberman. The symmetric time warping algorithm: From continuous to discrete. In Time Warps. Addison-Wesley, 1983.]]
[21]
H. Lee and J. Kim. An HMM-based threshold model approach for gesture recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence, 21(10):961--973, October 1999.]]
[22]
C. Li, E. Chang, H. Garcia-Molina, and G. Wiederhold. Clustering for approximate similarity search in high-dimensional spaces. IEEE Transactions on Knowledge and Data Engineering, 14(4):792--808, 2002.]]
[23]
Y. Moon, K. Whang, and W. Han. General match: a subsequence matching method in time-series databases based on generalized windows. In ACM International Conference on Management of Data (SIGMOD), pages 382--393, 2002.]]
[24]
Y. Moon, K. Whang, and W. Loh. Duality-based subsequence matching in time-series databases. In IEEE International Conference on Data Engineering (ICDE), pages 263--272, 2001.]]
[25]
P. Morguet and M. Lang. Spotting dynamic hand gestures in video image sequences using hidden Markov models. In IEEE International Conference on Image Processing, pages 193--197, 1998.]]
[26]
R. Oka. Spotting method for classification of real world data. The Computer Journal, 41(8):559--565, July 1998.]]
[27]
S. Park, W. W. Chu, J. Yoon, and J. Won. Similarity search of time-warped subsequences via a suffix tree. Information Systems, 28(7), 2003.]]
[28]
S. Park, S. Kim, and W. W. Chu. Segment-based approach for subsequence searches in sequence databases. In Symposium on Applied Computing, pages 248--252, 2001.]]
[29]
D. Rafiei and A. O. Mendelzon. Similarity-based queries for time series data. In ACM International Conference on Management of Data (SIGMOD), pages 13--25, 1997.]]
[30]
T. M. Rath and R. Manmatha. Word image matching using dynamic time warping. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), volume 2, pages 521--527, 2003.]]
[31]
Y. Sakurai, C. Faloutsos, and M. Yamamuro. Stream monitoring under the time warping distance. In IEEE International Conference on Data Engineering (ICDE), 2007.]]
[32]
Y. Sakurai, M. Yoshikawa, and C. Faloutsos. FTW: fast similarity search under the time warping distance. In Principles of Database Systems (PODS), pages 326--337, 2005.]]
[33]
Y. Sakurai, M. Yoshikawa, S. Uemura, and H. Kojima. The A-tree: An index structure for high-dimensional spaces using relative approximation. In International Conference on Very Large Data Bases, pages 516--526, 2000.]]
[34]
Y. Shou, N. Mamoulis, and D. W. Cheung. Fast and exact warping of time series using adaptive segmental approximations. Machine Learning, 58(2-3):231--267, 2005.]]
[35]
E. Tuncel, H. Ferhatosmanoglu, and K. Rose. VQ-index: An index structure for similarity searching in multimedia databases. In Proc. of ACM Multimedia, pages 543--552, 2002.]]
[36]
J. Venkateswaran, D. Lachwani, T. Kahveci, and C. Jermaine. Reference-based indexing of sequence databases. In International Conference on Very Large Databases (VLDB), pages 906--917, 2006.]]
[37]
M. Vlachos, M. Hadjieleftheriou, D. Gunopulos, and E. Keogh. Indexing multi-dimensional time-series with support for multiple distance measures. In ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pages 216--225, 2003.]]
[38]
X. Wang, J. T. L. Wang, K. I. Lin, D. Shasha, B. A. Shapiro, and K. Zhang. An index structure for data mining and clustering. Knowledge and Information Systems, 2(2):161--184, 2000.]]
[39]
R. Weber and K. Böhm. Trading quality for time with nearest-neighbor search. In International Conference on Extending Database Technology: Advances in Database Technology, pages 21--35, 2000.]]
[40]
R. Weber, H.-J. Schek, and S. Blott. A quantitative analysis and performance study for similarity-search methods in high-dimensional spaces. In International Conference on Very Large Data Bases, pages 194--205, 1998.]]
[41]
D. A. White and R. Jain. Similarity indexing: Algorithms and performance. In Storage and Retrieval for Image and Video Databases (SPIE), pages 62--73, 1996.]]
[42]
H. Wu, B. Salzberg, G. C. Sharp, S. B. Jiang, H. Shirato, and D. R. Kaeli. Subsequence matching on structured time series data. In ACM International Conference on Management of Data (SIGMOD), pages 682--693, 2005.]]
[43]
B.-K. Yi, H. V. Jagadish, and C. Faloutsos. Efficient retrieval of similar time sequences under time warping. In IEEE International Conference on Data Engineering, pages 201--208, 1998.]]
[44]
Y. Zhu and D. Shasha. Warping indexes with envelope transforms for query by humming. In ACM International Conference on Management of Data (SIGMOD), pages 181--192, 2003.]]

Cited By

View all
  • (2024)Accelerating time series similarity search under Move-Split-Merge distance via dissimilarity space embeddingExpert Systems with Applications10.1016/j.eswa.2024.124889255(124889)Online publication date: Dec-2024
  • (2022)Boosted-SpringDTW for Comprehensive Feature Extraction of PPG SignalsIEEE Open Journal of Engineering in Medicine and Biology10.1109/OJEMB.2022.31748063(78-85)Online publication date: 2022
  • (2020)Efficient and effective similar subtrajectory search with deep reinforcement learningProceedings of the VLDB Endowment10.14778/3407790.340782713:12(2312-2325)Online publication date: 14-Sep-2020
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
SIGMOD '08: Proceedings of the 2008 ACM SIGMOD international conference on Management of data
June 2008
1396 pages
ISBN:9781605581026
DOI:10.1145/1376616
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 09 June 2008

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. dynamic time warping
  2. embeddings
  3. filter-and-refine retrieval
  4. similarity indexing
  5. subsequence matching
  6. time series

Qualifiers

  • Research-article

Conference

SIGMOD/PODS '08
Sponsor:

Acceptance Rates

Overall Acceptance Rate 785 of 4,003 submissions, 20%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)16
  • Downloads (Last 6 weeks)1
Reflects downloads up to 18 Feb 2025

Other Metrics

Citations

Cited By

View all
  • (2024)Accelerating time series similarity search under Move-Split-Merge distance via dissimilarity space embeddingExpert Systems with Applications10.1016/j.eswa.2024.124889255(124889)Online publication date: Dec-2024
  • (2022)Boosted-SpringDTW for Comprehensive Feature Extraction of PPG SignalsIEEE Open Journal of Engineering in Medicine and Biology10.1109/OJEMB.2022.31748063(78-85)Online publication date: 2022
  • (2020)Efficient and effective similar subtrajectory search with deep reinforcement learningProceedings of the VLDB Endowment10.14778/3407790.340782713:12(2312-2325)Online publication date: 14-Sep-2020
  • (2020)Machine Learning-Driven Event Characterization under Scarce Vehicular Sensing Data2020 IEEE 25th International Workshop on Computer Aided Modeling and Design of Communication Links and Networks (CAMAD)10.1109/CAMAD50429.2020.9209295(1-6)Online publication date: Sep-2020
  • (2018)Non-Overlapping Subsequence Matching of Stream SynopsesIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2017.272583330:1(101-114)Online publication date: 1-Jan-2018
  • (2018)Optimizing dynamic time warping's window width for time series data mining applicationsData Mining and Knowledge Discovery10.1007/s10618-018-0565-y32:4(1074-1120)Online publication date: 1-Jul-2018
  • (2017)Interactive Time Series Analytics Powered by ONEXProceedings of the 2017 ACM International Conference on Management of Data10.1145/3035918.3058729(1595-1598)Online publication date: 9-May-2017
  • (2017)EEG seizure classification based on exploiting phase space reconstruction and extreme learningCluster Computing10.1007/s10586-017-1409-zOnline publication date: 30-Nov-2017
  • (2017)RLCPattern Analysis & Applications10.1007/s10044-016-0577-420:2(601-611)Online publication date: 1-May-2017
  • (2017)Exploit Every Cycle: Vectorized Time Series Algorithms on Modern Commodity CPUsData Management on New Hardware10.1007/978-3-319-56111-0_2(18-39)Online publication date: 23-Mar-2017
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media