Abstract
Dynamic time warping (DTW) distance has been effectively used in mining time series data in a multitude of domains. However, in its original formulation DTW is extremely inefficient in comparing long sparse time series, containing mostly zeros and some unevenly spaced nonzero observations. Original DTW distance does not take advantage of this sparsity, leading to redundant calculations and a prohibitively large computational cost for long time series. We derive a new time warping similarity measure (AWarp) for sparse time series that works on the run-length encoded representation of sparse time series. The complexity of AWarp is quadratic on the number of observations as opposed to the range of time of the time series. Therefore, AWarp can be several orders of magnitude faster than DTW on sparse time series. AWarp is exact for binary-valued time series and a close approximation of the original DTW distance for any-valued series. We discuss useful variants of AWarp: bounded (both upper and lower), constrained, and multidimensional. We show applications of AWarp to three data mining tasks including clustering, classification, and outlier detection, which are otherwise not feasible using classic DTW, while producing equivalent results. Potential areas of application include bot detection, human activity classification, search trend analysis, seismic analysis, and unusual review pattern mining.















Similar content being viewed by others
References
Mueen A, Keogh E (2010) Online discovery and maintenance of time series motifs. In: Proceedings of the 16th ACM SIGKDD international conference on knowledge discovery and data mining—KDD’10, number C in KDD’10. ACM Press, p 1089
Ye L, Keogh E (2009) Time series shapelets: a new primitive for data mining. In: Proceedings of the 15th ACM SIGKDD international conference on knowledge discovery and data mining, KDD, pp 947–956
Shokoohi-Yekta M, Chen Y, Campana B, Hu B, Zakaria J, Keogh E (2015) Discovery of meaningful rules in time series. In: Proceedings of the 21th ACM SIGKDD international conference on knowledge discovery and data mining—KDD’15. ACM Press, New York, pp 1085–1094
Hamooni H, Mueen A (2014) Dual-domain hierarchical classification of phonetic time series. In: ICDM 2014. ICDM
Keogh E (2002) Exact indexing of dynamic time warping. In: Proceedings of the 28th international conference on very large data bases, VLDB’02, pp 406–417
Faloutsos C, Ranganathan M, Manolopoulos Y (1994) Fast subsequence matching in time-series databases. ACM SIGMOD Rec 23(2):419–429
Wang X, Mueen A, Ding H, Trajcevski G, Scheuermann P, Keogh E (2013) Experimental comparison of representation methods and distance measures for time series data. Data Min Knowl Disc 26(2):275–309
Murray D, Stankovic L, Refit: electrical load measurements. http://www.refitsmarthomes.org/
Cook DJ, Crandall AS, Thomas BL, Krishnan NC (2013) CASAS: a smart home in a box. Computer 46(7):62–69
Run-Length Encoding. https://en.wikipedia.org/wiki/Run-length_encoding
Boulgouris N, Plataniotis K, Hatzinakos D (2004) Gait recognition using dynamic time warping. In: IEEE 6th workshop on multimedia signal processing. IEEE, pp 263–266
Sakoe H, Chiba S (1978) Dynamic programming algorithm optimization for spoken word recognition. IEEE Trans Acoust Speech Signal Process 26(1):43–49
Keogh EJ, Pazzani MJ (2000) Scaling up dynamic time warping for datamining applications. In: Proceedings of the 6th ACM SIGKDD international conference on knowledge discovery and data mining—KDD’00. ACM Press, New York, pp 285–289
Rath TM, Manmatha R (2003) Word image matching using dynamic time warping. In: 2003. Proceedings. 2003 IEEE computer society conference on computer vision and pattern recognition, vol 2. IEEE, p II—521
Berndt DJ, Clifford J (1994) Using dynamic time warping to find patterns in time series. In: KDD Workshop, pp 359–370
Al-Naymat G, Chawla S, Taheri J (2009) SparseDTW: a novel approach to speed up dynamic time warping. In: Proceedings of the Eighth Australasian data mining conference, vol 101. Australian computer society, Inc., Darlinghurst, Australia, pp 117–127
Tan LN, Alwan A, Kossan G, Cody ML, Taylor CE (2015) Dynamic time warping and sparse representation classification for birdsong phrase classification using limited training data. J Acoust Soc Am 137(3):1069–80
Chu S, Keogh E, Hart D, Pazzani M (2002) Iterative deepening dynamic time warping for time series, Chapter 12, pp 195–212
Salvador S, Chan P (2007) Toward accurate dynamic time warping in linear time and space. Intell Data Anal 11(5):561–580
Sart D, Mueen A, Najjar W, Niennattrakul V, Keogh E (2010) Accelerating dynamic time warping subsequnce search with GPUs and FPGAs. ICDM 2010. In: Proceedings—IEEE international conference on data mining, ICDM, pp 1001–1006
Rakthanmanon T, Campana B, Mueen A, Batista G, Westover B, Zhu Q, Zakaria J, Keogh E (2012) Searching and mining trillions of time series subsequences under dynamic time warping. In: Proceedings of the 18th ACM SIGKDD international conference on knowledge discovery and data mining—KDD ’12. ACM Press, New York, p 262
Begum N, Ulanova L, Wang J, Keogh E (2015) Accelerating dynamic time warping clustering with a novel admissible pruning strategy. In: Proceedings of the 21st ACM SIGKDD international conference on knowledge discovery and data mining- KDD’15. ACM Press, New York, pp 49–58
Assent I, Wichterich M, Krieger R, Kremer H, Seidl T (2009) Anticipatory DTW for efficient similarity search in time series databases. J Proc VLDB Endow 2(1):826–837
Candan KS, Rossini R, Sapino ML, Wang X (2012) sDTW: computing DTW distances using locally relevant constraints based on salient feature alignments. PVLDB 5(11):1519–1530
Shokoohi-Yekta M, Wang J, Keogh E, On the non-trivial generalization of dynamic time warping to the multi-dimensional case, Chapter 33, pp 289–297
Lines J, Davis L, Hills J, Bagnall A (2012) A shapelet transform for time series classification. In: Proceedings of the 18th ACM SIGKDD international conference on knowledge discovery and data mining, KDD, pp 289–297
Mueen A (2013) Enumeration of time series motifs of all lengths. In: Proceedings—IEEE international conference on data mining, ICDM. ICDM, pp 547–556
Zhu Y, Zimmerman Z, Senobari NS, Yeh CCM, Funning G, Mueen A, Brisk P, Keogh E (2016) Matrix profile II: exploiting a novel algorithm and GPUs to break the one hundred million Barrier for time series motifs and joins. In: 2016 IEEE 16th international conference on data mining (ICDM). IEEE, pp 739–748
Awarp: Warping Similarity for Sparse Time Series. http://www.cs.unm.edu/~mueen/Projects/AWarp/
Zhu Q, Batista G, Rakthanmanon T, Keogh E (2012) A novel approximation to dynamic time warping allows anytime clustering of massive time series datasets. In: Proceedings of the 2012 SIAM international conference on data mining, pp 999–1010
Yeh CCM, Zhu Y, Ulanova L, Begum N, Ding Y, Dau HA, Silva DF, Mueen A, Keogh E (2016) Matrix profile I: all pairs similarity joins for time series: a unifying view that includes motifs, discords and shapelets. In: 2016 IEEE 16th international conference on data mining (ICDM). IEEE, pp 1317–1322
Silva DF, Batista GEAPA (2016) Speeding up all-pairwise dynamic time warping matrix calculation. In: Proceedings of the 2016 SIAM international conference on data mining. Society for Industrial and Applied Mathematics, Philadelphia, pp 837–845
Shieh J, Keogh E (2009) ISAX: disk-aware mining and indexing of massive time series datasets. Data Min Knowl Disc 19(1):24–57
Chavoshi N, Hamooni H, Mueen A (2016) DeBot: Twitter Bot detection via warped correlation. In: 2016 IEEE 16th international conference on data mining (ICDM). IEEE, 12, pp 817–822
Mueen A, Keogh E, Zhu Q, Cash S, Westover B (2009) Exact discovery of time series motifs. In: Proceedings of the 2009 SIAM international conference on data mining, pp 473–484
Yankov D, Keogh E, Medina J, Chiu B, Zordan V (2007) Detecting time series motifs under uniform scaling. In: Proceedings of the 13th ACM SIGKDD international conference on knowledge discovery and data mining KDD 07, KDD’07, p 844
Anderson KR, Gaby JE (1983) Dynamic waveform matching. Inf Sci 31(3):221–242 12
Herrera RH, Fomel S, van der Baan M (2014) Automatic approaches for seismic to well tying. Interpretation 2(2):SD9–SD17
Google Trends. https://www.google.com/trends/
List of Most Downloaded Android Applications. https://en.wikipedia.org/wiki/List_of_most_downloaded_Android_applications
Yankov D, Keogh EJ, Rebbapragada U (2007) Disk aware discord discovery: finding unusual time series in terabyte sized datasets. In: ICDM, pp 381–390
Mueen A, Keogh E, Young N (2011) Logical-shapelets: an expressive primitive for time series classification. In: The 17th ACM SIGKDD international conference, pp 1154–1162
Acknowledgements
This work was supported by the NSF CCF Grant No. 1527127 and the NSF Graduate Research Fellowship under Grant No. DGE-0237002.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Mueen, A., Chavoshi, N., Abu-El-Rub, N. et al. Speeding up dynamic time warping distance for sparse time series data. Knowl Inf Syst 54, 237–263 (2018). https://doi.org/10.1007/s10115-017-1119-0
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10115-017-1119-0