Abstract
Among many existing distance measures for time series data, Dynamic Time Warping (DTW) distance has been recognized as one of the most accurate and suitable distance measures due to its flexibility in sequence alignment. However, DTW distance calculation is computationally intensive. Especially in very large time series databases, sequential scan through the entire database is definitely impractical, even with random access that exploits some index structures since high dimensionality of time series data incurs extremely high I/O cost. More specifically, a sequential structure consumes high CPU but low I/O costs, while an index structure requires low CPU but high I/O costs. In this work, we therefore propose a novel indexed sequential structure called TWIST (Time Warping in Indexed Sequential sTructure) which benefits from both sequential access and index structure. When a query sequence is issued, TWIST calculates lower bounding distances between a group of candidate sequences and the query sequence, and then identifies the data access order in advance, hence reducing a great number of both sequential and random accesses. Impressively, our indexed sequential structure achieves significant speedup in a querying process. In addition, our method shows superiority over existing rival methods in terms of query processing time, number of page accesses, and storage requirement with no false dismissal guaranteed.
Similar content being viewed by others
References
Assent I, Krieger R, Afschari F, Seidl T (2008) The TS-tree: efficient time series search and retrieval. In: Proceedings of 11th international conference on extending database technology (EDBT 2008), Nantes, France, pp 252–263
Bagnall AJ, Ratanamahatana CA, Keogh EJ, Lonardi S, Janacek GJ (2006) A bit level representation for time series data mining with shape based similarity. Data Min Knowl Discov 13(1): 11–40
Beckmann N, Kriegel HP, Schneider R, Seeger B (1990) The R*-tree: an efficient and robust access method for points and rectangles. In: Proceedings of the 1990 ACM SIGMOD international conference on management of data (SIGMOD 90), Atlantic City, pp 322–331
Berchtold S, Keim DA, Kriegel HP (1996) The X-tree : an index structure for high-dimensional data. In: Proceedings of 22nd international conference on very large data bases (VLDB 96), Mumbai (Bombay), India, pp 28–39
Berndt DJ, Clifford J (1994) Using dynamic time warping to find patterns in time series. In: The 1994 AAAI workshop on knowledge discovery in databases, Seattle, Washington, pp 359–370
Chu S, Keogh EJ, Hart D, Pazzani MJ (2002) Iterative deepening dynamic time warping for time series. In: Proceedings of the second SIAM international conference on data mining (SDM 2002), Arlington, VA, USA
Ciaccia P, Patella M, Zezula P (1997) M-tree: an efficient access method for similarity search in metric spaces. In: Proceedings of 23rd international conference on very large data bases (VLDB 97), Athens, Greece, pp 426–435
Ding H, Trajcevski G, Scheuermann P, Wang X, Keogh E (2008) Querying and mining of time series data: experimental comparison of representations and distance measures. In: Proceedings of 34th international conference on very large data bases (VLDB 2008), Auckland, New Zealand
Faloutsos C, Ranganathan M, Manolopoulos Y (1994) Fast subsequence matching in time-series databases. In: Proceedings of the 1994 ACM SIGMOD international conference on management of data (SIGMOD 94), Minneapolis, Minnesota, pp 419–429
Guttman A (1984) R-trees: A dynamic index structure for spatial searching. In: Yormark B (eds) Proceedings of Annual Meeting SIGMOD’84. ACM Press, Boston, pp 47–57
Itakura F (1975) Minimum prediction residual principle applied to speech recognition. IEEE Trans Acoust Speech Signal Process 23(1): 67–72
Keogh E, Ratanamahatana CA (2005) Exact indexing of dynamic time warping. Knowl Inf Syst 7(3): 358–386
Keogh EJ, Kasetty S (2003) On the need for time series data mining benchmarks: a survey and empirical demonstration. Data Min Knowl Discov 7(4): 349–371
Keogh EJ, Pazzani MJ (2000) Scaling up dynamic time warping for datamining applications. In: Proceedings of the sixth ACM SIGKDD international conference on knowledge discovery and data mining (KDD 2000), New York, NY, pp 285–289. doi:10.1145/347090.347153
Keogh EJ, Chakrabarti K, Pazzani MJ, Mehrotra S (2001) Dimensionality reduction for fast similarity search in large time series databases. Knowl Inf Syst 3(3): 263–286
Kim SW, Park S, Chu WW (2001) An index-based approach for similarity search supporting time warping in large sequence databases. In: Proceedings of the 17th international conference on data engineering (ICDE 2001), Heidelberg, Germany, pp 607–614
Lin J, Keogh EJ, Wei L, Lonardi S (2007) Experiencing SAX: a novel symbolic representation of time series. Data Min Knowl Discov 15(2): 107–144
Loh WK, Kim SW, Whang KY (2004) A subsequence matching algorithm that supports normalization transform in time-series databases. Data Min Knowl Discov 9(1): 5–28
MacQueen JB (1967) Some methods for classification and analysis of multivariate observations. In: Cam LML, Neyman J (eds) Proceedings of 5th Berkeley Symposium on Mathematical Statistics and Probability, vol 1. University of California Press, pp 281–297
Moody GB, Mark RG (1983) A new method for detecting atrial fibrillation using RR intervals. Comput Cardiol 10: 227–230
Ratanamahatana CA, Keogh EJ (2004) Making time-series classification more accurate using learned constraints. In: Proceedings of 4th SIAM international conference on data mining (SDM 2004), Lake Buena Vista, Florida, USA, pp 11–22
Ratanamahatana CA, Keogh EJ (2005) Three myths about dynamic time warping data mining. In: Proceedings of 2005 SIAM international data mining conference (SDM 2005), Newport Beach, CL, USA, pp 506–510
Sakoe H, Chiba S (1978) Dynamic programming algorithm optimization for spoken word recognition. IEEE Trans Acoust Speech Signal Process 26(1): 43–49
Sakurai Y, Yoshikawa M, Faloutsos C (2005) FTW: fast similarity search under the time warping distance. In: Proceedings of 24th ACM SIGACT-SIGMOD-SIGART symposium on principles of database systems, Baltimore, ML, USA, pp 326–337
Sakurai Y, Faloutsos C, Yamamuro M (2007) Stream monitoring under the time warping distance. In: Proceedings of IEEE 23rd international conference on data engineering (ICDE 2007), Istanbul, Turkey, pp 1046–1055
Vlachos M, Yu PS, Castelli V, Meek C (2006) Structural periodic measures for time-series data. Data Min Knowl Discov 12(1): 1–28
Wang X, Smith KA, Hyndman RJ (2006) Characteristic-based clustering for time series data. Data Min Knowl Discov 13(3): 335–364
Weber R, Schek HJ, Blott S (1998) A quantitative analysis and performance study for similarity-search methods in high-dimensional spaces. In: Gupta A, Shmueli O, Widom J (eds) Proceedings of 24th international conference on very large data bases (VLDB 98). Morgan Kaufmann, New York City, NY, pp 194–205
Yi BK, Jagadish HV, Faloutsos C (1998) Efficient retrieval of similar time sequences under time warping. In: Proceedings of 14th international conference on data engineering (ICDE 98), Orlando, FL, USA, pp 201–208
Yianilos PN (1993) Data structures and algorithms for nearest neighbor search in general metric spaces. In: Proceedings of 4th annual ACM-SIAM symposium on discrete algorithms (SODA 93), society for industrial and applied mathematics, Philadelphia, PA, USA, pp 311–321
Zhu Y, Shasha D (2003) Warping indexes with envelope transforms for query by humming. In: Proceedings of the 2003 ACM SIGMOD international conference on management of data (SIGMOD 2003), San Diego, CA, USA, pp 181–192
Author information
Authors and Affiliations
Corresponding author
Additional information
Responsible editor: Eamonn Keogh.
Rights and permissions
About this article
Cite this article
Niennattrakul, V., Ruengronghirunya, P. & Ratanamahatana, C.A. Exact indexing for massive time series databases under time warping distance. Data Min Knowl Disc 21, 509–541 (2010). https://doi.org/10.1007/s10618-010-0165-y
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10618-010-0165-y