Abstract
Two similarity measures are proposed that can successfully capture both the numerical and point distribution characteristics of time series. More specifically, a novel grid representation for time series is first presented, with which a time series is segmented and compiled into a matrix format. Based on the proposed grid representation, two matrix matching algorithms, matrix-based Euclidean distance (GMED) and matrix-based dynamic time warping (GMDTW), are adapted to measure the similarity of matrix-like time series. Last, to assess the effectiveness of the proposed similarity measures, 1NN classification and K-means experiments are conducted using 22 online datasets from the UCR time series datasets Web site. In general, the results indicate that GMDTW measure is apparently superior to most current measures in accuracy, while the GMED can achieve much higher efficiency than dynamic time warping algorithm with equivalent performance. Furthermore, effects of the parameters in the proposed measures are analyzed and a way to determine the values of the parameters has been given.







Similar content being viewed by others
References
Leary DEO (2016) Ethics for big data and analytics. IEEE Intell Syst 31(4):81–84
Aghabozorgi S, Shirkhorshidi AS, Wah TY (2015) Time-series clustering: a decade review. Inf Syst 53:16–38
Gandhi, A (2002) Content-based image retrieval: plant species identification. MS thesis, Oregon State University
Esling P, Agon C (2012) Time series data mining. ACM Comput Surv 45(1):7–7
Nielsen CB, Larsen PG, Fitzgerald J, Woodcock J, Peleska J (2015) Systems of systems engineering: basic concepts, model-based techniques, and research directions. ACM Comput Surv 48(2):1–41
Mori U, Mendiburu A, Lozano JA (2016) Similarity measure selection for clustering time series databases. IEEE Trans Knowl Data Eng 28(1):181–195
Serra J, Arcos JL (2014) An empirical evaluation of similarity measures for time series classification. Knowl Based Syst 67:305–314
Baydogan MG, Runger G (2016) Time series representation and similarity based on local autopatterns. Data Min Knowl Discov 30(2):476–509
Keogh E, Chakrabarti K, Mehrotra S, Pazzani M (2001) Locally adaptive dimensionality reduction for indexing large time series databases. In: Proceedings of the 2001 ACM SIGMOD international conference on management of data, pp 151–163
Keogh E (1997) Fast similarity search in the presence of longitudinal scaling in time series databases. In: Proceedings of the ninth IEEE international conference on tools with artificial intelligence, pp 578–584
Keogh E, Pazzani M (2000) A simple dimensionality reduction technique for fast similarity search in large time series databases. In: Proceedings of the 4th Pacific-Asia conference on knowledge discovery and data mining, pp 122–133
Azzouzi M, Nabney IT (1998) Analysing time series structure with hidden Markov models. In: Proceedings of the IEEE conference on neural networks and signal processing, pp 402–408
Serr J, Kantz H, Serra X, Andrzejak RG (2012) Predictability of music descriptor time series and its application to cover song detection. IEEE Trans Audio Speech Lang Process 20:514–525
Weng X, Shen J (2008) Classification of multivariate time series using two-dimensional singular value decomposition. Knowl Based Syst 21:535–539
Shieh J, Keogh E (2008) iSAX: indexing and mining terabyte sized time series. In: Proceedings of the 14th ACM SIGKDD international conference on knowledge discovery and data mining, pp 623–631
Zhang Z, Tang P, Duan R (2015) Dynamic time warping under pointwise shape context. Inf Sci 315:88–101
Chen Y, Keogh E, Hu B, Begum N, Bagnall A, Mueen A, Batista G (2015) The UCR time series classification archive. www.cs.ucr.edu/eamonn/time_series_data/
Aghabozorgi S, Shirkhorshidi AS, Wah TY (2015) Time series clustering: a decade review. Inf Syst 53:16–38
Lin J, Keogh E, Lonardi S, Chiu B (2003) A symbolic representation of time series, with implications for streaming algorithms. In: Proceedings of the 8th ACM SIGMOD workshop on research issues in data mining and knowledge discovery, pp 2–11
Shieh J, Keogh E (2008) iSAX: indexing and mining terabyte sized time series. In: Proceedings the 14th ACM SIGKDD international conference on knowledge discovery and data mining, pp 623–631
Agrawal R, Faloutsos C, Swami A, Lomet D (ed) (1993) Efficient similarity search in sequence databases, foundations of data organization and algorithms. Springer, Berlin, pp 69–84
Chen L, TamerOzsu M (2003) Similarity-based retrieval of time-series data using multi-scale histograms, computer sciences technical report. University of Waterloo, Waterloo, CS-2003-31
An J, Chen H, Furuse K, Ohbo N, Keogh E (2003) Grid-based indexing for large time series databases. In: Intelligent data engineering and automated learning (IDEAL), pp 614–621
Duan G, Suzuki Y, Kawagoe K (2006) Grid representation of efficient similarity search in time series databases. In: Proceedings of the 22nd international conference on data engineering workshops (ICDEW’06), pp 64–70
Reshef DN, Reshef YA, Finucane HK, Grossman SR, Mcvean G, Turnbaugh PJ, Lander ES, Mitzenmacher M, Sabeti PC (2011) Detecting novel associations in large data sets. Science 334(6062):1518–1524
Keogh E, Ratanamahatana CA (2005) Exact indexing of dynamic time warping. Knowl Inf Syst 7(3):358–386
Gorecki T (2014) Using derivatives in a longest common subsequence dissimilarity measure for time series classification. Pattern Recogn Lett 45:99–105
Jeong YS, Jayaraman R (2015) Support vector-based algorithms with weighted dynamic time warping kernel function for time series classification. Knowl Based Syst 75:184–191
Jeong YS, Jeong MK, Omitaomu OA (2011) Weighted dynamic time warping for time series classification. Pattern Recogn 44:2231–2240
Chen L, Ng R (2004) On the marriage of Lp-norms and edit distance. In: VLDB04: Proceedings of the 30th international conference on very large data bases, pp 792–803
Das G, Gunopulos D, Mannila H (1997) Finding similar time series. In: Komorowski J, Zytkow J (eds) Principles of data mining and knowledge discovery. Springer, Berlin, pp 88–100
Morse MD, Patel JM (2007) An efficient and accurate method for evaluating time series similarity. In: Proceedings of the 2007 ACM SIGMOD international conference on management of data, pp 569–580
Chen L, Zsu MT, Oria V (2005) Robust and fast similarity search for moving object trajectories. In: Proceedings of the 2005 ACM SIGMOD international conference on management of data, pp 491–502
Yueguo C, Nascimento MA, Beng CO, Tung AKH (2007) SpADe: on shape based pattern detection in streaming time series. In: Proceedings of the IEEE 23rd international conference on data engineering, pp 786–795
Keogh E, Ratanamahatana CA (2005) Exact indexing of dynamic time warping. Knowl Inf Syst 7:358–386
Ding H, Trajcevski G, Scheuermann P, Wang X, Keogh E (2008) Querying and mining of time series data: experimental comparison of representations and distance measures. Proc VLDB Endow 1:1542–1552
Wang X, Mueen A, Ding H, Trajcevski G, Scheuermann P, Keogh EJ (2013) Experimental comparison of representation methods and distance measures for time series data. Data Min Knowl Discov 26:275–309
Batista GEAPA, Wang X, Keogh EJ (2011) A complexity-invariant distance measure for time series. In: Proceedings of the 11th SIAM international conference on data mining. SIAM, pp 699–710
Javid MAJ, Blackwell T, Zimmer R, Alrifaie MM (2016) Analysis of information gain and Kolmogorov complexity for structural evaluation of cellular automata configurations. Connect Sci 28(2):1–16
Greckia T, Luczak M (2015) Multivariate time series classification with parametric derivative dynamic time warping. Expert Syst Appl 42:2305–2312
Kate RJ (2015) Using dynamic time warping distances as features for improved time series classification. Data Min Knowl Discov 30(2):283–312
Pietzsch T, Saalfeld S, Preibisch S, Tomancak P (2015) BigDataViewer: visualization and processing for large image data sets. Nat Methods 12(6):481–483
Acknowledgements
This work was supported in part by the National Natural Science Foundation of China under Grant Nos. 71571182, 71571185, and 71671186, and the Research Project of National University of Defense Technology. The authors would like to thank the UCR time series for providing online datasets and results of partial measures. Many thanks to the reviewers for proposing sound advices that are really helpful in improving our paper.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Ye, Y., Jiang, J., Ge, B. et al. Similarity measures for time series data classification using grid representation and matrix distance. Knowl Inf Syst 60, 1105–1134 (2019). https://doi.org/10.1007/s10115-018-1264-0
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10115-018-1264-0