skip to main content
research-article

Embedding-based subsequence matching in time-series databases

Published: 26 August 2011 Publication History

Abstract

We propose an embedding-based framework for subsequence matching in time-series databases that improves the efficiency of processing subsequence matching queries under the Dynamic Time Warping (DTW) distance measure. This framework partially reduces subsequence matching to vector matching, using an embedding that maps each query sequence to a vector and each database time series into a sequence of vectors. The database embedding is computed offline, as a preprocessing step. At runtime, given a query object, an embedding of that object is computed online. Relatively few areas of interest are efficiently identified in the database sequences by comparing the embedding of the query with the database vectors. Those areas of interest are then fully explored using the exact DTW-based subsequence matching algorithm. We apply the proposed framework to define two specific methods. The first method focuses on time-series subsequence matching under unconstrained Dynamic Time Warping. The second method targets subsequence matching under constrained Dynamic Time Warping (cDTW), where warping paths are not allowed to stray too much off the diagonal. In our experiments, good trade-offs between retrieval accuracy and retrieval efficiency are obtained for both methods, and the results are competitive with respect to current state-of-the-art methods.

References

[1]
Argyros, T. and Ermopoulos, C. 2003. Efficient subsequence matching in time series databases under time and amplitude transformations. In Proceedings of the International Conference on Data Mining. 481--484.
[2]
Assent, I., Wichterich, M., Krieger, R., Kremer, H., and Seidl, T. 2009. Anticipatory dtw for efficient similarity search in time series databases. Proc. VLDB Endow. 2, 1, 826--837.
[3]
Athitsos, V., Alon, J., Sclaroff, S., and Kollios, G. 2004. BoostMap: A method for efficient approximate similarity rankings. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 268--275.
[4]
Athitsos, V., Hadjieleftheriou, M., Kollios, G., and Sclaroff, S. 2005. Query-Sensitive embeddings. In Proceedings of the ACM International Conference on Management of Data (SIGMOD). 706--717.
[5]
Athitsos, V., Papapetrou, P., Potamias, M., Kollios, G., and Gunopulos, D. 2008. Approximate embedding-based subsequence matching of time series. In Proceedings of the ACM International Conference on Management of Data (SIGMOD). 365--378.
[6]
Bingham, E., Gionis, A., Haiminen, N., Hiisilä, H., Mannila, H., and Terzi, E. 2006. Segmentation and dimensionality reduction. In Proceedings of the SIAM International Data Mining Conference (SDM).
[7]
Böhm, C., Berchtold, S., and Keim, D. A. 2001. Searching in high-dimensional spaces: Index structures for improving the performance of multimedia databases. ACM Comput. Surv. 33, 3, 322--373.
[8]
Burkhardt, S., Crauser, A., Ferragina, P., Lenhof, H.-P., Rivals, E., and Vingron, M. 1999. q-gram based database searching using a suffix array (quasar). In Proceedings of the International Conference on Computational Molecular Biology (RECOMB). 77--83.
[9]
Chakrabarti, K. and Mehrotra, S. 2000. Local dimensionality reduction: A new approach to indexing high dimensional spaces. In Proceedings of the International Conference on Very Large Data Bases (VLDB). 89--100.
[10]
Chan, K.-P. and Fu, A. W.-C. 1999. Efficient time series matching by wavelets. In Proceedings of the IEEE International Conference on Data Engineearing (ICDE). 126--133.
[11]
Chen, L. and Ng, R. T. 2004. On the marriage of lp-norms and edit distance. In Proceedings of the International Conference on Very Large Data Bases (VLDB). 792--803.
[12]
Chen, L., Özsu, M. T., and Oria, V. 2005. Robust and fast similarity search for moving object trajectories. In Proceedings of the ACM International Conference on Management of Data (SIGMOD). 491--502.
[13]
Chen, Y., Chen, G., Chen, K., and Ooi, B. C. 2009. Efficient processing of warping time series join of motion capture data. In Proceedings of the International Conference on Data Engineering (ICDE). 1048--1059.
[14]
Chen, Y., Nascimento, M. A., Ooi, B. C., and Tung, A. K. H. 2007. Spade: On shape-based pattern detection in streaming time series. In Proceedings of the International Conference on Data Engineering (ICDE). 786--795.
[15]
Egecioglu, Ö. and Ferhatosmanoglu, H. 2000. Dimensionality reduction and similarity distance computation by inner product approximations. In Proceedings of the International Conference on Information and Knowledge Management. 219--226.
[16]
Faloutsos, C. and Lin, K. I. 1995. FastMap: A fast algorithm for indexing, data-mining and visualization of traditional and multimedia datasets. In Proceedings of the ACM International Conference on Management of Data (SIGMOD). 163--174.
[17]
Faloutsos, C., Ranganathan, M., and Manolopoulos, Y. 1994. Fast subsequence matching in time-series databases. In Proceedings of the ACM International Conference on Management of Data (SIGMOD). 419--429.
[18]
Fu, A. W.-C., Keogh, E., Lau, L. Y. H., Ratanamahatana, C., and Wong, R. C.-W. 2008. Scaling and time warping in time series querying. Proceedings of the VLDB J. 17, 4, 899--921.
[19]
Gionis, A., Indyk, P., and Motwani, R. 1999. Similarity search in high dimensions via hashing. In Proceedings of the International Conference on Very Large Databases. 518--529.
[20]
Han, W.-S., Lee, J., Moon, Y.-S., and Jiang, H. 2007. Ranked subsequence matching in time-series databases. In Proceedings of the International Conference on Very Large Data Bases (VLDB). 423--434.
[21]
Hjaltason, G. and Samet, H. 2003a. Properties of embedding methods for similarity searching in metric spaces. IEEE Trans. Patt. Anal. Mach. Intell. 25, 5, 530--549.
[22]
Hjaltason, G. R. and Samet, H. 2003b. Index-driven similarity search in metric spaces. ACM Trans. Datab. Syst. 28, 4, 517--580.
[23]
Hristescu, G. and Farach-Colton, M. 1999. Cluster-Preserving embedding of proteins. Tech. rep. 99-50, CS Department, Rutgers University.
[24]
Kanth, K. V. R., Agrawal, D., and Singh, A. 1998. Dimensionality reduction for similarity searching in dynamic databases. In Proceedings of the ACM International Conference on Management of Data (SIGMOD). 166--176.
[25]
Keogh, E. 2002. Exact indexing of dynamic time warping. In Proceedings of the International Conference on Very Large Data Bases. 406--417.
[26]
Keogh, E. 2006. The UCR time series data mining archive. http://www.cs.ucr.edu/~eamonn/tsdma/index.html.
[27]
Keogh, E., Chu, S., Hart, D., and Pazzani, M. 1993. Segmenting time series: A survey and novel approach. In an Edited Volume, Data Mining in Time Series Databases. World Scientific Publishing Company, 1--22.
[28]
Keogh, E. and Lin, J. 2005. Hot sax: Efficiently finding the most unusual time series subsequence. In Proceedings of the IEEE International Conference on Data Mining (ICDM). 226--233.
[29]
Keogh, E. and Pazzani, M. 2000. Scaling up dynamic time warping for data mining applications. In Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining.
[30]
Koudas, N., Ooi, B. C., Shen, H. T., and Tung, A. K. H. 2004. LDC: Enabling search by partial distance in a hyper-dimensional space. In Proceedings of the IEEE International Conference on Data Engineearing. 6--17.
[31]
Kruskal, J. B. and Liberman, M. 1983. The symmetric time warping algorithm: From continuous to discrete. In Time Warps. Addison-Wesley.
[32]
Latecki, L., Megalooikonomou, V., Wang, Q., Lakämper, R., Ratanamahatana, C., and Keogh, E. 2005. Elastic partial matching of time series. In Proceedings of the European Conference on Principles of Data Mining and Knowledge Discovery (PKDD). 577--584.
[33]
Lee, H. and Kim, J. 1999. An HMM-based threshold model approach for gesture recognition. IEEE Trans. Patt. Anal. Mach. Intell. 21, 10, 961--973.
[34]
Levenshtein, V. I. 1966. Binary codes capable of correcting deletions, insertions, and reversals. Soviet Phys. 10, 8, 707--710.
[35]
Li, C., Chang, E., Garcia-Molina, H., and Wiederhold, G. 2002. Clustering for approximate similarity search in high-dimensional spaces. IEEE Trans. Knowl. Data Engin. 14, 4, 792--808.
[36]
Li, C., Wang, B., and Yang, X. 2007. Vgram: improving performance of approximate queries on string collections using variable-length grams. In Proceedings of the International Conference on Very Large Data Bases (VLDB). 303--314.
[37]
Lian, X. and Chen, L. 2008. Similarity search in arbitrary subspaces under lp-norm. In Proceedings of the IEEE International Conference on Data Engineearing (ICDE). 317--326.
[38]
Lin, J., Keogh, E., Wei, L., and Lonardi, S. 2007. Experiencing sax: a novel symbolic representation of time series. Data Mining Knowl. Discov. 15, 107--144.
[39]
Meek, C., Patel, J. M., and Kasetty, S. 2003. OASIS: An online and accurate technique for local-alignment searches on biological sequences. In Proceedings of the International Conference on Very Large Data Bases (VLDB). 910--921.
[40]
Moon, Y., Whang, K., and Han, W. 2002. General match: A subsequence matching method in time-series databases based on generalized windows. In Proceedings of the ACM International Conference on Management of Data (SIGMOD). 382--393.
[41]
Moon, Y., Whang, K., and Loh, W. 2001. Duality-based subsequence matching in time-series databases. In Proceedings of the IEEE International Conference on Data Engineering (ICDE). 263--272.
[42]
Morguet, P. and Lang, M. 1998. Spotting dynamic hand gestures in video image sequences using hidden Markov models. In Proceedings of the IEEE International Conference on Image Processing. 193--197.
[43]
Morse, M. and Patel, J. 2007. An efficient and accurate method for evaluating time series similarity. In Proceedings of the ACM International Conference on Management of Data (SIGMOD). 569--580.
[44]
Navarro, G. and Baeza-Yates, R. 1999. A new indexing method for approximate string matching. In Proceedings of the 10th Annual Symposium on Combinatorial Pattern Matching. 163--185.
[45]
Oka, R. 1998. Spotting method for classification of real world data. Comput. J. 41, 8, 559--565.
[46]
Papapetrou, P., Athitsos, V., Kollios, G., and Gunopulos, D. 2009. Reference-Based alignment in large sequence databases. Proc. VLDB Endow. 2, 205--216.
[47]
Park, S., Chu, W. W., Yoon, J., and Won, J. 2003. Similarity search of time-warped subsequences via a suffix tree. Inf. Syst. 28, 7.
[48]
Park, S., Kim, S., and Chu, W. W. 2001. Segment-Based approach for subsequence searches in sequence databases. In Proceedings of the Symposium on Applied Computing. 248--252.
[49]
Rafiei, D. and Mendelzon, A. O. 1997. Similarity-based queries for time series data. In Proceedings of the ACM International Conference on Management of Data (SIGMOD). 13--25.
[50]
Ratanamahatana, C. and Keogh, E. J. 2005. Three myths about dynamic time warping data mining. In Proceedings of the SIAM International Data Mining Conference (SDM).
[51]
Rath, T. M. and Manmatha, R. 2003. Word image matching using dynamic time warping. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Vol. 2. 521--527.
[52]
Sakurai, Y., Faloutsos, C., and Yamamuro, M. 2007. Stream monitoring under the time warping distance. In Proceedings of the IEEE International Conference on Data Engineering (ICDE).
[53]
Sakurai, Y., Yoshikawa, M., and Faloutsos, C. 2005. FTW: fast similarity search under the time warping distance. In Proceedings of the ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems (PODS). 326--337.
[54]
Sakurai, Y., Yoshikawa, M., Uemura, S., and Kojima, H. 2000. The A-tree: An index structure for high-dimensional spaces using relative approximation. In Proceedings of the International Conference on Very Large Data Bases. 516--526.
[55]
Shou, Y., Mamoulis, N., and Cheung, D. W. 2005. Fast and exact warping of time series using adaptive segmental approximations. Mach. Learn. 58, 2-3, 231--267.
[56]
Smith, T. F. and Waterman, M. S. 1981. Identification of common molecular subsequences. J. Molec. Biol. 147, 195--197.
[57]
Tao, Y., Yi, K., Sheng, C., and Kalnis, P. 2009. Quality and efficiency in high dimensional nearest neighbor search. In Proceedings of the SIGMOD Conference. 563--576.
[58]
Tuncel, E., Ferhatosmanoglu, H., and Rose, K. 2002. VQ-index: An index structure for similarity searching in multimedia databases. In Proceedings of the ACM Multimedia Conference. 543--552.
[59]
Venkateswaran, J., Lachwani, D., Kahveci, T., and Jermaine, C. 2006. Reference-Based indexing of sequence databases. In Proceedings of the International Conference on Very Large Databases (VLDB). 906--917.
[60]
Vlachos, M., Gunopulos, D., and Das, G. 2004. Rotation invariant distance measures for trajectories. In Proceedings of the SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD'04). 707--712.
[61]
Vlachos, M., Gunopulos, D., and Kollios, G. 2002. Discovering similar multidimensional trajectories. In Proceedings of the IEEE International Conference on Data Engineering (ICDE). 673--684.
[62]
Vlachos, M., Hadjieleftheriou, M., Gunopulos, D., and Keogh, E. 2003. Indexing multi-dimensional time-series with support for multiple distance measures. In Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 216--225.
[63]
Wang, X., Wang, J. T. L., Lin, K. I., Shasha, D., Shapiro, B. A., and Zhang, K. 2000. An index structure for data mining and clustering. Knowl. Inf. Syst. 2, 2, 161--184.
[64]
Weber, R. and Böhm, K. 2000. Trading quality for time with nearest-neighbor search. In Proceedings of the International Conference on Extending Database Technology: Advances in Database Technology. 21--35.
[65]
Weber, R., Schek, H.-J., and Blott, S. 1998. A quantitative analysis and performance study for similarity-search methods in high-dimensional spaces. In Proceedings of the International Conference on Very Large Data Bases. 194--205.
[66]
White, D. A. and Jain, R. 1996. Similarity indexing: Algorithms and performance. In Proceedings of the Storage and Retrieval for Image and Video Databases (SPIE). 62--73.
[67]
Wu, H., Salzberg, B., Sharp, G. C., Jiang, S. B., Shirato, H., and Kaeli, D. R. 2005. Subsequence matching on structured time series data. In Proceedings of the ACM International Conference on Management of Data (SIGMOD). 682--693.
[68]
Yi, B.-K., Jagadish, H. V., and Faloutsos, C. 1998. Efficient retrieval of similar time sequences under time warping. In Proceedings of the IEEE International Conference on Data Engineering. 201--208.
[69]
Zhou, M. and Wong, M. H. 2008. Efficient online subsequence searching in data streams under dynamic time warping distance. In Proceedings of the IEEE International Conference on Data Engineering (ICDE). 686--695.
[70]
Zhu, Y. and Shasha, D. 2003. Warping indexes with envelope transforms for query by humming. In Proceedings of the ACM International Conference on Management of Data (SIGMOD). 181--192.

Cited By

View all
  • (2024)A hydrologic similarity-based parameters dynamic matching framework: Application to enhance the real-time flood forecastingScience of The Total Environment10.1016/j.scitotenv.2023.167767907(167767)Online publication date: Jan-2024
  • (2023)Querying Similar Multi-Dimensional Time Series with a Spatial DatabaseISPRS International Journal of Geo-Information10.3390/ijgi1204017912:4(179)Online publication date: 21-Apr-2023
  • (2023)Multi-QueryingInformatica10.15388/23-INFOR51934:3(557-576)Online publication date: 28-Jun-2023
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Transactions on Database Systems
ACM Transactions on Database Systems  Volume 36, Issue 3
August 2011
207 pages
ISSN:0362-5915
EISSN:1557-4644
DOI:10.1145/2000824
Issue’s Table of Contents
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 26 August 2011
Accepted: 01 March 2011
Revised: 01 January 2011
Received: 01 September 2010
Published in TODS Volume 36, Issue 3

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. Embedding methods
  2. nearest neighbor retrieval
  3. non-Euclidean spaces
  4. nonmetric spaces
  5. similarity matching

Qualifiers

  • Research-article
  • Research
  • Refereed

Funding Sources

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)34
  • Downloads (Last 6 weeks)6
Reflects downloads up to 18 Feb 2025

Other Metrics

Citations

Cited By

View all
  • (2024)A hydrologic similarity-based parameters dynamic matching framework: Application to enhance the real-time flood forecastingScience of The Total Environment10.1016/j.scitotenv.2023.167767907(167767)Online publication date: Jan-2024
  • (2023)Querying Similar Multi-Dimensional Time Series with a Spatial DatabaseISPRS International Journal of Geo-Information10.3390/ijgi1204017912:4(179)Online publication date: 21-Apr-2023
  • (2023)Multi-QueryingInformatica10.15388/23-INFOR51934:3(557-576)Online publication date: 28-Jun-2023
  • (2023)Accelerating Similarity Search for Elastic Measures: A Study and New Generalization of Lower Bounding DistancesProceedings of the VLDB Endowment10.14778/3594512.359453016:8(2019-2032)Online publication date: 22-Jun-2023
  • (2023)Connectedness of International Stock Market at Major Public Events: Empirical Study via Dynamic Time Warping-Based NetworkComplexity10.1155/2023/31721812023(1-17)Online publication date: 7-Mar-2023
  • (2023)Efficient Similarity Searches for Multivariate Time Series: A Hash-Based ApproachInformation Integration and Web Intelligence10.1007/978-3-031-48316-5_43(478-490)Online publication date: 22-Nov-2023
  • (2022)Fast Adaptive Similarity Search through Variance-Aware Quantization2022 IEEE 38th International Conference on Data Engineering (ICDE)10.1109/ICDE53745.2022.00268(2969-2983)Online publication date: May-2022
  • (2021)Fault Recognition of Indicator Diagrams Based on the Dynamic Time Warping Distance of Differential CurvesMathematical Problems in Engineering10.1155/2021/66909302021(1-7)Online publication date: 9-Feb-2021
  • (2021)Generating Adversarial Samples on Multivariate Time Series using Variational AutoencodersIEEE/CAA Journal of Automatica Sinica10.1109/JAS.2021.10041088:9(1523-1538)Online publication date: Sep-2021
  • (2021)CyCo: A Temporal Cycle Consistency Based Labeling Method for Time Series Data2021 International Joint Conference on Neural Networks (IJCNN)10.1109/IJCNN52387.2021.9533633(1-8)Online publication date: 2021
  • Show More Cited By

View Options

Login options

Full Access

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media