Skip to main content
Log in

Similarity measures for time series data classification using grid representation and matrix distance

  • Regular Paper
  • Published:
Knowledge and Information Systems Aims and scope Submit manuscript

Abstract

Two similarity measures are proposed that can successfully capture both the numerical and point distribution characteristics of time series. More specifically, a novel grid representation for time series is first presented, with which a time series is segmented and compiled into a matrix format. Based on the proposed grid representation, two matrix matching algorithms, matrix-based Euclidean distance (GMED) and matrix-based dynamic time warping (GMDTW), are adapted to measure the similarity of matrix-like time series. Last, to assess the effectiveness of the proposed similarity measures, 1NN classification and K-means experiments are conducted using 22 online datasets from the UCR time series datasets Web site. In general, the results indicate that GMDTW measure is apparently superior to most current measures in accuracy, while the GMED can achieve much higher efficiency than dynamic time warping algorithm with equivalent performance. Furthermore, effects of the parameters in the proposed measures are analyzed and a way to determine the values of the parameters has been given.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7

Similar content being viewed by others

References

  1. Leary DEO (2016) Ethics for big data and analytics. IEEE Intell Syst 31(4):81–84

    Article  Google Scholar 

  2. Aghabozorgi S, Shirkhorshidi AS, Wah TY (2015) Time-series clustering: a decade review. Inf Syst 53:16–38

    Article  Google Scholar 

  3. Gandhi, A (2002) Content-based image retrieval: plant species identification. MS thesis, Oregon State University

  4. Esling P, Agon C (2012) Time series data mining. ACM Comput Surv 45(1):7–7

    Article  MATH  Google Scholar 

  5. Nielsen CB, Larsen PG, Fitzgerald J, Woodcock J, Peleska J (2015) Systems of systems engineering: basic concepts, model-based techniques, and research directions. ACM Comput Surv 48(2):1–41

    Article  Google Scholar 

  6. Mori U, Mendiburu A, Lozano JA (2016) Similarity measure selection for clustering time series databases. IEEE Trans Knowl Data Eng 28(1):181–195

    Article  Google Scholar 

  7. Serra J, Arcos JL (2014) An empirical evaluation of similarity measures for time series classification. Knowl Based Syst 67:305–314

    Article  Google Scholar 

  8. Baydogan MG, Runger G (2016) Time series representation and similarity based on local autopatterns. Data Min Knowl Discov 30(2):476–509

    Article  MathSciNet  MATH  Google Scholar 

  9. Keogh E, Chakrabarti K, Mehrotra S, Pazzani M (2001) Locally adaptive dimensionality reduction for indexing large time series databases. In: Proceedings of the 2001 ACM SIGMOD international conference on management of data, pp 151–163

  10. Keogh E (1997) Fast similarity search in the presence of longitudinal scaling in time series databases. In: Proceedings of the ninth IEEE international conference on tools with artificial intelligence, pp 578–584

  11. Keogh E, Pazzani M (2000) A simple dimensionality reduction technique for fast similarity search in large time series databases. In: Proceedings of the 4th Pacific-Asia conference on knowledge discovery and data mining, pp 122–133

  12. Azzouzi M, Nabney IT (1998) Analysing time series structure with hidden Markov models. In: Proceedings of the IEEE conference on neural networks and signal processing, pp 402–408

  13. Serr J, Kantz H, Serra X, Andrzejak RG (2012) Predictability of music descriptor time series and its application to cover song detection. IEEE Trans Audio Speech Lang Process 20:514–525

    Google Scholar 

  14. Weng X, Shen J (2008) Classification of multivariate time series using two-dimensional singular value decomposition. Knowl Based Syst 21:535–539

    Article  Google Scholar 

  15. Shieh J, Keogh E (2008) iSAX: indexing and mining terabyte sized time series. In: Proceedings of the 14th ACM SIGKDD international conference on knowledge discovery and data mining, pp 623–631

  16. Zhang Z, Tang P, Duan R (2015) Dynamic time warping under pointwise shape context. Inf Sci 315:88–101

    Article  MathSciNet  Google Scholar 

  17. Chen Y, Keogh E, Hu B, Begum N, Bagnall A, Mueen A, Batista G (2015) The UCR time series classification archive. www.cs.ucr.edu/eamonn/time_series_data/

  18. Aghabozorgi S, Shirkhorshidi AS, Wah TY (2015) Time series clustering: a decade review. Inf Syst 53:16–38

    Article  Google Scholar 

  19. Lin J, Keogh E, Lonardi S, Chiu B (2003) A symbolic representation of time series, with implications for streaming algorithms. In: Proceedings of the 8th ACM SIGMOD workshop on research issues in data mining and knowledge discovery, pp 2–11

  20. Shieh J, Keogh E (2008) iSAX: indexing and mining terabyte sized time series. In: Proceedings the 14th ACM SIGKDD international conference on knowledge discovery and data mining, pp 623–631

  21. Agrawal R, Faloutsos C, Swami A, Lomet D (ed) (1993) Efficient similarity search in sequence databases, foundations of data organization and algorithms. Springer, Berlin, pp 69–84

  22. Chen L, TamerOzsu M (2003) Similarity-based retrieval of time-series data using multi-scale histograms, computer sciences technical report. University of Waterloo, Waterloo, CS-2003-31

  23. An J, Chen H, Furuse K, Ohbo N, Keogh E (2003) Grid-based indexing for large time series databases. In: Intelligent data engineering and automated learning (IDEAL), pp 614–621

  24. Duan G, Suzuki Y, Kawagoe K (2006) Grid representation of efficient similarity search in time series databases. In: Proceedings of the 22nd international conference on data engineering workshops (ICDEW’06), pp 64–70

  25. Reshef DN, Reshef YA, Finucane HK, Grossman SR, Mcvean G, Turnbaugh PJ, Lander ES, Mitzenmacher M, Sabeti PC (2011) Detecting novel associations in large data sets. Science 334(6062):1518–1524

    Article  MATH  Google Scholar 

  26. Keogh E, Ratanamahatana CA (2005) Exact indexing of dynamic time warping. Knowl Inf Syst 7(3):358–386

    Article  Google Scholar 

  27. Gorecki T (2014) Using derivatives in a longest common subsequence dissimilarity measure for time series classification. Pattern Recogn Lett 45:99–105

    Article  Google Scholar 

  28. Jeong YS, Jayaraman R (2015) Support vector-based algorithms with weighted dynamic time warping kernel function for time series classification. Knowl Based Syst 75:184–191

    Article  Google Scholar 

  29. Jeong YS, Jeong MK, Omitaomu OA (2011) Weighted dynamic time warping for time series classification. Pattern Recogn 44:2231–2240

    Article  Google Scholar 

  30. Chen L, Ng R (2004) On the marriage of Lp-norms and edit distance. In: VLDB04: Proceedings of the 30th international conference on very large data bases, pp 792–803

  31. Das G, Gunopulos D, Mannila H (1997) Finding similar time series. In: Komorowski J, Zytkow J (eds) Principles of data mining and knowledge discovery. Springer, Berlin, pp 88–100

    Chapter  Google Scholar 

  32. Morse MD, Patel JM (2007) An efficient and accurate method for evaluating time series similarity. In: Proceedings of the 2007 ACM SIGMOD international conference on management of data, pp 569–580

  33. Chen L, Zsu MT, Oria V (2005) Robust and fast similarity search for moving object trajectories. In: Proceedings of the 2005 ACM SIGMOD international conference on management of data, pp 491–502

  34. Yueguo C, Nascimento MA, Beng CO, Tung AKH (2007) SpADe: on shape based pattern detection in streaming time series. In: Proceedings of the IEEE 23rd international conference on data engineering, pp 786–795

  35. Keogh E, Ratanamahatana CA (2005) Exact indexing of dynamic time warping. Knowl Inf Syst 7:358–386

    Article  Google Scholar 

  36. Ding H, Trajcevski G, Scheuermann P, Wang X, Keogh E (2008) Querying and mining of time series data: experimental comparison of representations and distance measures. Proc VLDB Endow 1:1542–1552

    Article  Google Scholar 

  37. Wang X, Mueen A, Ding H, Trajcevski G, Scheuermann P, Keogh EJ (2013) Experimental comparison of representation methods and distance measures for time series data. Data Min Knowl Discov 26:275–309

    Article  MathSciNet  Google Scholar 

  38. Batista GEAPA, Wang X, Keogh EJ (2011) A complexity-invariant distance measure for time series. In: Proceedings of the 11th SIAM international conference on data mining. SIAM, pp 699–710

  39. Javid MAJ, Blackwell T, Zimmer R, Alrifaie MM (2016) Analysis of information gain and Kolmogorov complexity for structural evaluation of cellular automata configurations. Connect Sci 28(2):1–16

    Google Scholar 

  40. Greckia T, Luczak M (2015) Multivariate time series classification with parametric derivative dynamic time warping. Expert Syst Appl 42:2305–2312

    Article  Google Scholar 

  41. Kate RJ (2015) Using dynamic time warping distances as features for improved time series classification. Data Min Knowl Discov 30(2):283–312

    Article  MathSciNet  MATH  Google Scholar 

  42. Pietzsch T, Saalfeld S, Preibisch S, Tomancak P (2015) BigDataViewer: visualization and processing for large image data sets. Nat Methods 12(6):481–483

    Article  Google Scholar 

Download references

Acknowledgements

This work was supported in part by the National Natural Science Foundation of China under Grant Nos. 71571182, 71571185, and 71671186, and the Research Project of National University of Defense Technology. The authors would like to thank the UCR time series for providing online datasets and results of partial measures. Many thanks to the reviewers for proposing sound advices that are really helpful in improving our paper.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Yanqing Ye.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Ye, Y., Jiang, J., Ge, B. et al. Similarity measures for time series data classification using grid representation and matrix distance. Knowl Inf Syst 60, 1105–1134 (2019). https://doi.org/10.1007/s10115-018-1264-0

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10115-018-1264-0

Keywords

Navigation