Skip to main content
Log in

Exact indexing for massive time series databases under time warping distance

  • Published:
Data Mining and Knowledge Discovery Aims and scope Submit manuscript

Abstract

Among many existing distance measures for time series data, Dynamic Time Warping (DTW) distance has been recognized as one of the most accurate and suitable distance measures due to its flexibility in sequence alignment. However, DTW distance calculation is computationally intensive. Especially in very large time series databases, sequential scan through the entire database is definitely impractical, even with random access that exploits some index structures since high dimensionality of time series data incurs extremely high I/O cost. More specifically, a sequential structure consumes high CPU but low I/O costs, while an index structure requires low CPU but high I/O costs. In this work, we therefore propose a novel indexed sequential structure called TWIST (Time Warping in Indexed Sequential sTructure) which benefits from both sequential access and index structure. When a query sequence is issued, TWIST calculates lower bounding distances between a group of candidate sequences and the query sequence, and then identifies the data access order in advance, hence reducing a great number of both sequential and random accesses. Impressively, our indexed sequential structure achieves significant speedup in a querying process. In addition, our method shows superiority over existing rival methods in terms of query processing time, number of page accesses, and storage requirement with no false dismissal guaranteed.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  • Assent I, Krieger R, Afschari F, Seidl T (2008) The TS-tree: efficient time series search and retrieval. In: Proceedings of 11th international conference on extending database technology (EDBT 2008), Nantes, France, pp 252–263

  • Bagnall AJ, Ratanamahatana CA, Keogh EJ, Lonardi S, Janacek GJ (2006) A bit level representation for time series data mining with shape based similarity. Data Min Knowl Discov 13(1): 11–40

    Article  MathSciNet  Google Scholar 

  • Beckmann N, Kriegel HP, Schneider R, Seeger B (1990) The R*-tree: an efficient and robust access method for points and rectangles. In: Proceedings of the 1990 ACM SIGMOD international conference on management of data (SIGMOD 90), Atlantic City, pp 322–331

  • Berchtold S, Keim DA, Kriegel HP (1996) The X-tree : an index structure for high-dimensional data. In: Proceedings of 22nd international conference on very large data bases (VLDB 96), Mumbai (Bombay), India, pp 28–39

  • Berndt DJ, Clifford J (1994) Using dynamic time warping to find patterns in time series. In: The 1994 AAAI workshop on knowledge discovery in databases, Seattle, Washington, pp 359–370

  • Chu S, Keogh EJ, Hart D, Pazzani MJ (2002) Iterative deepening dynamic time warping for time series. In: Proceedings of the second SIAM international conference on data mining (SDM 2002), Arlington, VA, USA

  • Ciaccia P, Patella M, Zezula P (1997) M-tree: an efficient access method for similarity search in metric spaces. In: Proceedings of 23rd international conference on very large data bases (VLDB 97), Athens, Greece, pp 426–435

  • Ding H, Trajcevski G, Scheuermann P, Wang X, Keogh E (2008) Querying and mining of time series data: experimental comparison of representations and distance measures. In: Proceedings of 34th international conference on very large data bases (VLDB 2008), Auckland, New Zealand

  • Faloutsos C, Ranganathan M, Manolopoulos Y (1994) Fast subsequence matching in time-series databases. In: Proceedings of the 1994 ACM SIGMOD international conference on management of data (SIGMOD 94), Minneapolis, Minnesota, pp 419–429

  • Guttman A (1984) R-trees: A dynamic index structure for spatial searching. In: Yormark B (eds) Proceedings of Annual Meeting SIGMOD’84. ACM Press, Boston, pp 47–57

    Google Scholar 

  • Itakura F (1975) Minimum prediction residual principle applied to speech recognition. IEEE Trans Acoust Speech Signal Process 23(1): 67–72

    Article  Google Scholar 

  • Keogh E, Ratanamahatana CA (2005) Exact indexing of dynamic time warping. Knowl Inf Syst 7(3): 358–386

    Article  Google Scholar 

  • Keogh EJ, Kasetty S (2003) On the need for time series data mining benchmarks: a survey and empirical demonstration. Data Min Knowl Discov 7(4): 349–371

    Article  MathSciNet  Google Scholar 

  • Keogh EJ, Pazzani MJ (2000) Scaling up dynamic time warping for datamining applications. In: Proceedings of the sixth ACM SIGKDD international conference on knowledge discovery and data mining (KDD 2000), New York, NY, pp 285–289. doi:10.1145/347090.347153

  • Keogh EJ, Chakrabarti K, Pazzani MJ, Mehrotra S (2001) Dimensionality reduction for fast similarity search in large time series databases. Knowl Inf Syst 3(3): 263–286

    Article  MATH  Google Scholar 

  • Kim SW, Park S, Chu WW (2001) An index-based approach for similarity search supporting time warping in large sequence databases. In: Proceedings of the 17th international conference on data engineering (ICDE 2001), Heidelberg, Germany, pp 607–614

  • Lin J, Keogh EJ, Wei L, Lonardi S (2007) Experiencing SAX: a novel symbolic representation of time series. Data Min Knowl Discov 15(2): 107–144

    Article  MathSciNet  Google Scholar 

  • Loh WK, Kim SW, Whang KY (2004) A subsequence matching algorithm that supports normalization transform in time-series databases. Data Min Knowl Discov 9(1): 5–28

    Article  MathSciNet  Google Scholar 

  • MacQueen JB (1967) Some methods for classification and analysis of multivariate observations. In: Cam LML, Neyman J (eds) Proceedings of 5th Berkeley Symposium on Mathematical Statistics and Probability, vol 1. University of California Press, pp 281–297

  • Moody GB, Mark RG (1983) A new method for detecting atrial fibrillation using RR intervals. Comput Cardiol 10: 227–230

    Google Scholar 

  • Ratanamahatana CA, Keogh EJ (2004) Making time-series classification more accurate using learned constraints. In: Proceedings of 4th SIAM international conference on data mining (SDM 2004), Lake Buena Vista, Florida, USA, pp 11–22

  • Ratanamahatana CA, Keogh EJ (2005) Three myths about dynamic time warping data mining. In: Proceedings of 2005 SIAM international data mining conference (SDM 2005), Newport Beach, CL, USA, pp 506–510

  • Sakoe H, Chiba S (1978) Dynamic programming algorithm optimization for spoken word recognition. IEEE Trans Acoust Speech Signal Process 26(1): 43–49

    Article  MATH  Google Scholar 

  • Sakurai Y, Yoshikawa M, Faloutsos C (2005) FTW: fast similarity search under the time warping distance. In: Proceedings of 24th ACM SIGACT-SIGMOD-SIGART symposium on principles of database systems, Baltimore, ML, USA, pp 326–337

  • Sakurai Y, Faloutsos C, Yamamuro M (2007) Stream monitoring under the time warping distance. In: Proceedings of IEEE 23rd international conference on data engineering (ICDE 2007), Istanbul, Turkey, pp 1046–1055

  • Vlachos M, Yu PS, Castelli V, Meek C (2006) Structural periodic measures for time-series data. Data Min Knowl Discov 12(1): 1–28

    Article  MathSciNet  Google Scholar 

  • Wang X, Smith KA, Hyndman RJ (2006) Characteristic-based clustering for time series data. Data Min Knowl Discov 13(3): 335–364

    Article  MathSciNet  Google Scholar 

  • Weber R, Schek HJ, Blott S (1998) A quantitative analysis and performance study for similarity-search methods in high-dimensional spaces. In: Gupta A, Shmueli O, Widom J (eds) Proceedings of 24th international conference on very large data bases (VLDB 98). Morgan Kaufmann, New York City, NY, pp 194–205

    Google Scholar 

  • Yi BK, Jagadish HV, Faloutsos C (1998) Efficient retrieval of similar time sequences under time warping. In: Proceedings of 14th international conference on data engineering (ICDE 98), Orlando, FL, USA, pp 201–208

  • Yianilos PN (1993) Data structures and algorithms for nearest neighbor search in general metric spaces. In: Proceedings of 4th annual ACM-SIAM symposium on discrete algorithms (SODA 93), society for industrial and applied mathematics, Philadelphia, PA, USA, pp 311–321

  • Zhu Y, Shasha D (2003) Warping indexes with envelope transforms for query by humming. In: Proceedings of the 2003 ACM SIGMOD international conference on management of data (SIGMOD 2003), San Diego, CA, USA, pp 181–192

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Chotirat Ann Ratanamahatana.

Additional information

Responsible editor: Eamonn Keogh.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Niennattrakul, V., Ruengronghirunya, P. & Ratanamahatana, C.A. Exact indexing for massive time series databases under time warping distance. Data Min Knowl Disc 21, 509–541 (2010). https://doi.org/10.1007/s10618-010-0165-y

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10618-010-0165-y

Keywords

Navigation