Elastic distances for time-series classification: Itakura versus Sakoe-Chiba constraints

Geler, Zoltan; Kurbalija, Vladimir; Ivanović, Mirjana; Radovanović, Miloš

doi:10.1007/s10115-022-01725-1

Elastic distances for time-series classification: Itakura versus Sakoe-Chiba constraints

Regular Paper
Published: 12 August 2022

Volume 64, pages 2797–2832, (2022)
Cite this article

Knowledge and Information Systems Aims and scope Submit manuscript

Zoltan Geler¹,
Vladimir Kurbalija²,
Mirjana Ivanović² &
…
Miloš Radovanović²

309 Accesses
1 Altmetric
Explore all metrics

Abstract

In the field of time series data mining, the accuracy of the simple, but very successful nearest neighbor (NN) classifier directly depends on the chosen similarity measure. To improve the efficiency of elastic measures introduced to overcome the shortcomings of Euclidean distance, the Sakoe-Chiba band is usually applied as a constraint. In this paper, we provide a detailed analysis of the influence of the alternative Itakura parallelogram constraint on the accuracy of the NN classifier in combination with four well-known elastic measures, compared to the Sakoe-Chiba constraint and the unconstrained variants of these measures. The findings suggest that, although the Sakoe-Chiba band generally produces better results, for certain types of datasets the Itakura parallelogram represents a better choice.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

The Performance of a Combined Distance Between Time Series

A Comprehensive Comparison of Distance Measures for Time Series Classification

Elastic similarity and distance measures for multivariate time series

Article Open access 14 February 2023

Data availability

The results presented in this paper are based on the UCR time Series Classification Archive (https://www.cs.ucr.edu/~eamonn/time_series_data_2018/).

References

Laxman S, Sastry PS (2006) A survey of temporal data mining. Sadhana 31:173–198. https://doi.org/10.1007/BF02719780
Article MathSciNet MATH Google Scholar
Mitsa T (2010) Temporal Data Mining. Taylor & Francis
Book Google Scholar
Esling P, Agon C (2012) Time-series data mining. ACM Comput Surv 45:11:1-12:34. https://doi.org/10.1145/2379776.2379788
Article MATH Google Scholar
Singh P, Borah B (2014) Forecasting stock index price based on M-factors fuzzy time series and particle swarm optimization. Int J Approx Reason 55:812–833. https://doi.org/10.1016/j.ijar.2013.09.014
Article MathSciNet MATH Google Scholar
Pecev P, Rackovic M (2017) LTR-MDTS structure - a structure for multiple dependent time series prediction. Comput Sci Inf Syst 14:467–490. https://doi.org/10.2298/CSIS150815004P
Article Google Scholar
Wang X, Mueen A, Ding H et al (2013) Experimental comparison of representation methods and distance measures for time series data. Data Min Knowl Discov 26:275–309. https://doi.org/10.1007/s10618-012-0250-5
Article MathSciNet Google Scholar
Gou J, Sun L, Du L et al (2022) A representation coefficient-based k-nearest centroid neighbor classifier. Expert Syst Appl 194:116529. https://doi.org/10.1016/j.eswa.2022.116529
Article Google Scholar
Gou J, Ma H, Ou W et al (2019) A generalized mean distance-based k-nearest neighbor classifier. Expert Syst Appl 115:356–372. https://doi.org/10.1016/j.eswa.2018.08.021
Article Google Scholar
Singh P, Borah B (2013) High-order fuzzy-neuro expert system for time series forecasting. Knowledge-Based Syst 46:12–21. https://doi.org/10.1016/j.knosys.2013.01.030
Article Google Scholar
Radovanović M, Nanopoulos A, Ivanović M (2010) Time-Series Classification in Many Intrinsic Dimensions. In: Proceedings of the 2010 SIAM International Conference on Data Mining. Society for Industrial and Applied Mathematics, Philadelphia, PA, pp 677–688
Ding H, Trajcevski G, Scheuermann P, et al (2008) Querying and mining of time series data: experimental comparison of representations and distance measures. In: Proceedings of the VLDB Endowment. VLDB Endowment, pp 1542–1552
Keogh E, Ratanamahatana CA (2005) Exact indexing of dynamic time warping. Knowl Inf Syst 7:358–386. https://doi.org/10.1007/s10115-004-0154-9
Article Google Scholar
Xi X, Keogh E, Shelton C, et al (2006) Fast time series classification using numerosity reduction. In: Proceedings of the 23^rd international conference on Machine learning - ICML ’06. ACM Press, New York, New York, USA, pp 1033–1040
Berndt D, Clifford J (1994) Using dynamic time warping to find patterns in time series. In: Usama M. Fayyad RU (ed) Knowledge Discovery in Databases: Papers from the 1994 AAAI Workshop. AAAI Press, Seattle, Washington, pp 359–370
Vlachos M, Kollios G, Gunopulos D (2002) Discovering similar multidimensional trajectories. In: Proceedings 18^th International Conference on Data Engineering. IEEE Comput. Soc, pp 673–684
Chen L, Ng R (2004) On The Marriage of Lp-norms and Edit Distance. In: Nascimento MA, Özsu MT, Kossmann D et al (eds) Proceedings 2004 VLDB Conference. Elsevier, pp 792–803
Chapter Google Scholar
Chen L, Özsu MT, Oria V (2005) Robust and fast similarity search for moving object trajectories. In: Proceedings of the 2005 ACM SIGMOD international conference on Management of data - SIGMOD ’05. ACM Press, New York, New York, USA, pp 491–502
Sakoe H, Chiba S (1978) Dynamic programming algorithm optimization for spoken word recognition. IEEE Trans Acoust 26:43–49. https://doi.org/10.1109/TASSP.1978.1163055
Article MATH Google Scholar
Geler Z (2015) Role of Similarity Measures in Time Series Analysis. Dissertation, University of Novi Sad, Serbia
Geler Z, Kurbalija V, Radovanović M, Ivanović M (2014) Impact of the Sakoe-Chiba band on the DTW time series distance measure for kNN classification. In: Buchmann R, Kifor CV, Yu J (eds) The 7^th International conference on knowledge science, engineering and management KSEM 2014. Springer International Publishing, Cham, pp 105–114
Google Scholar
Kurbalija V, Radovanović M, Geler Z, Ivanović M (2011) The influence of global constraints on DTW and LCS similarity measures for time-series databases. In: Dicheva D, Markov Z, Stefanova E (eds) Third international conference on software, services and semantic technologies S3T 2011 SE - 10. Springer, Berlin Heidelberg, pp 67–74
Google Scholar
Kurbalija V, Radovanović M, Geler Z, Ivanović M (2014) The influence of global constraints on similarity measures for time-series databases. Knowledge-Based Syst 56:49–67. https://doi.org/10.1016/j.knosys.2013.10.021
Article Google Scholar
Ratanamahatana CA, Keogh E (2005) Three Myths about Dynamic Time Warping Data Mining. In: Proceedings of the 2005 SIAM International Conference on Data Mining. Society for Industrial and Applied Mathematics, Philadelphia, PA, pp 506–510
Geler Z, Kurbalija V, Ivanovic M, et al (2019) Dynamic Time Warping: Itakura vs Sakoe-Chiba. In: 2019 IEEE International Symposium on INnovations in Intelligent SysTems and Applications (INISTA). IEEE, pp 1–6
Itakura F (1975) Minimum prediction residual principle applied to speech recognition. IEEE Trans Acoust 23:67–72. https://doi.org/10.1109/TASSP.1975.1162641
Article Google Scholar
Anh Dau H, Keogh E, Kamgar K, et al (2019) The UCR Time Series Classification Archive. https://www.cs.ucr.edu/~eamonn/time_series_data_2018/
Faloutsos C, Ranganathan M, Manolopoulos Y (1994) Fast subsequence matching in time-series databases. ACM SIGMOD Rec 23:419–429. https://doi.org/10.1145/191843.191925
Article Google Scholar
Agrawal R, Faloutsos C, Swami A (1993) Efficient similarity search in sequence databases. In: David B. Lomet (ed) Proceedings of the 4^th International Conference on Foundations of Data Organization and Algorithms (FODO ’93). Springer Berlin Heidelberg, pp 69–84
Rakthanmanon T, Campana B, Mueen A, et al (2012) Searching and Mining Trillions of Time Series Subsequences Under Dynamic Time Warping. In: Proceedings of the 18th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, New York, NY, USA, pp 262–270
Górecki T, Łuczak M (2019) The influence of the Sakoe-Chiba band size on time series classification. J Intell Fuzzy Syst 36:527–539. https://doi.org/10.3233/JIFS-18839
Article Google Scholar
Strle B, Možina M, Bratko I (2009) Qualitative approximation to Dynamic Time Warping similarity between time series data. In: Proceedings of the 23rd international workshop on qualitative reasoning. pp 104–110
Salvador S, Chan P (2007) Toward accurate dynamic time warping in linear time and space. Intell Data Anal 11:561–580
Article Google Scholar
Wu R, Keogh EJ (2020) FastDTW is approximate and Generally Slower than the Algorithm it Approximates. IEEE Trans Knowl Data Eng. https://doi.org/10.1109/TKDE.2020.3033752
Article Google Scholar
Bagnall A, Lines J, Bostrom A et al (2017) The great time series classification bake off: a review and experimental evaluation of recent algorithmic advances. Data Min Knowl Discov 31:606–660. https://doi.org/10.1007/s10618-016-0483-9
Article MathSciNet Google Scholar
Jiang W (2020) Time series classification: nearest neighbor versus deep learning models. SN Appl Sci 2:721. https://doi.org/10.1007/s42452-020-2506-9
Article Google Scholar
Witten IH, Frank E, Hall MA, Pal CJ (2017) Data mining: practical machine learning tools and techniques, 4th edn. Morgan Kaufmann
Google Scholar
García S, Fernández A, Luengo J, Herrera F (2010) Advanced nonparametric tests for multiple comparisons in the design of experiments in computational intelligence and data mining: Experimental analysis of power. Inf Sci (Ny) 180:2044–2064. https://doi.org/10.1016/j.ins.2009.12.010
Article Google Scholar
García S, Fernández A, Luengo J, Herrera F (2009) A study of statistical techniques and performance measures for genetics-based machine learning: accuracy and interpretability. Soft Comput 13:959–977. https://doi.org/10.1007/s00500-008-0392-y
Article Google Scholar
García S, Herrera F (2008) An extension on “statistical comparisons of classifiers over multiple data sets” for all pairwise comparisons. J Mach Learn Res 9:2677–2694
MATH Google Scholar
Bouckaert RR, Frank E (2004) Evaluating the replicability of significance tests for comparing learning algorithms. In: Dai H, Srikant R, Zhang C (eds) Advances in knowledge discovery and data mining. Springer, Berlin Heidelberg, pp 3–12
Chapter Google Scholar
Batista GEAPA, Wang X, Keogh EJ (2011) A Complexity-Invariant Distance Measure for Time Series. In: Proceedings of the 2011 SIAM international conference on data mining. society for industrial and applied mathematics, Philadelphia, PA, pp 699–710
Paparrizos J (2019) 2018 UCR Time-series archive: backward compatibility, missing values, and varying lengths. https://github.com/johnpaparrizos/UCRArchiveFixes
Geler Z, Kurbalija V, Radovanović M, Ivanović M (2016) Comparison of different weighting schemes for the kNN classifier on time-series data. Knowl Inf Syst 48:331–378. https://doi.org/10.1007/s10115-015-0881-0
Article Google Scholar
Geler Z, Kurbalija V, Ivanović M, Radovanović M (2020) Weighted kNN and constrained elastic distances for time-series classification. Expert Syst Appl 162:113829. https://doi.org/10.1016/j.eswa.2020.113829
Article Google Scholar
Kurbalija V, Radovanović M, Geler Z, Ivanović M (2010) A Framework for time-series analysis. In: Dicheva D, Dochev D (eds) Artificial intelligence: methodology, systems, and applications SE - 5. Springer, Berlin Heidelberg, pp 42–51
Chapter Google Scholar
Kurbalija V, Ivanović M, Geler Z, Radovanović M (2018) Two faces of the framework for analysis and prediction, part 1 - education. Inf Technol Control 47:249–261. https://doi.org/10.5755/j01.itc.47.2.18746
Article Google Scholar
Kurbalija V, Ivanović M, Geler Z, Radovanović M (2018) Two faces of the framework for analysis and prediction, part 2 - research. Inf Technol Control 47:489–502. https://doi.org/10.5755/j01.itc.47.3.18747
Article Google Scholar
Mitrović D, Geler Z, Ivanović M (2012) Distributed distance matrix generator based on agents. In: Proceedings of the 2nd international conference on web intelligence, mining and semantics - WIMS ’12. ACM Press, New York, New York, USA, pp 1–6
Mitrovic D, Ivanović M, Geler Z (2014) Agent-based distributed computing for dynamic networks. Inf Technol Control 43:88–97. https://doi.org/10.5755/j01.itc.43.1.4588
Article Google Scholar
Kurbalija V, Ivanović M, Radovanović M, et al (2015) Cultural Differences and Similarities in Emotion Recognition. In: Proceedings of the 7th Balkan conference on informatics conference - BCI ’15. ACM Press, New York, New York, USA, pp 1–6
Kurbalija V, Ivanović M, Radovanović M et al (2018) Emotion perception and recognition: an exploration of cultural differences and similarities. Cogn Syst Res 52:103–116. https://doi.org/10.1016/j.cogsys.2018.06.009
Article Google Scholar
Ratanamahatana CA, Keogh E (2004) Making Time-series Classification More Accurate Using Learned Constraints. In: Proceedings of the 2004 SIAM international conference on data mining. Society for industrial and applied mathematics, Philadelphia, PA, pp 11–22

Download references

Acknowledgements

V. Kurbalija, M. Ivanović, and M. Radovanović acknowledge financial support of the Ministry of Education, Science and Technological Development of the Republic of Serbia (Grant No. 451-03-68/2022-14/200125). The authors would like to express special thanks to Eamonn Keogh for his valuable comments about the paper, and for collecting and making available the UCR time-series datasets, and also to everyone who contributed data to the collection, without which the presented work would not have been possible.

Funding

Partial financial support was received from the Ministry of Education, Science and Technological Development of the Republic of Serbia (Grant No. 451-03-68/2022-14/200125).

Author information

Authors and Affiliations

Department of Media Studies, Faculty of Philosophy, University of Novi Sad, Dr Zorana Đinđića 2, 21000, Novi Sad, Serbia
Zoltan Geler
Department of Mathematics and Informatics, Faculty of Sciences, University of Novi Sad, Trg D. Obradovića 4, 21000, Novi Sad, Serbia
Vladimir Kurbalija, Mirjana Ivanović & Miloš Radovanović

Authors

Zoltan Geler
View author publications
You can also search for this author inPubMed Google Scholar
Vladimir Kurbalija
View author publications
You can also search for this author inPubMed Google Scholar
Mirjana Ivanović
View author publications
You can also search for this author inPubMed Google Scholar
Miloš Radovanović
View author publications
You can also search for this author inPubMed Google Scholar

Contributions

Z. Geler and V. Kurbalija designed and performed the experiments. V. Kurbalija and M. Radovanović provided the expertise for interpreting the results and designing the experiments. Z. Geler was involved in data acquisition and pre-processing. V. Kurbalija, M. Radovanović and M. Ivanović provided valuable feedback and interpretation of the experimental results. Z. Geler drafted the manuscript, while V. Kurbalija, M. Radovanović and M. Ivanović gave comments and made modifications to the manuscript. All authors approved the manuscript. M. Ivanović coordinated the joint research.

Corresponding author

Correspondence to Zoltan Geler.

Ethics declarations

Conflict of interest

The authors have no competing interests to declare that are relevant to the content of this article.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendix: The intuition behind the Sakoe-Chiba band and the Itakura parallelogram

The reader may wonder why both the Sakoe-Chiba band and Itakura parallelogram exist. It may be helpful to revisit DTW in the context of its invention as a speech processing tool. In addition, let us use text as an intuitive proxy for time series. For some speech processing problems, it is easy to precisely locate the beginning and end of an utterance (by the dramatic change in volume). Imagine we wish to compare the name of a Southern US state with a stuttered "plosive" consonant, Mississippppppi with a version that has significant sibilance Missssissssippi. The reader will appreciate that DTW is an ideal tool to recognize that these refer to the same word. Note that for such comparisons there is an interesting constraint. Both utterances start with the same letter “M”, and end with the same letter “i”. This means there is no warping at the beginning of the alignment, the warping can grow as we transverse the words, and as we move to the end of the word, the fact that we must end with the same sound again constrains the amount of warping that is possible. The reader will appreciate that this, in geometric terms, describes a parallelogram. The parallelogram shaped alignment is very common in natural and human processes where there is a mechanism to naturally align the beginning and end of a process, but the amount of lagging or leading is proportional to the distance to the nearer of, the beginning or the end. For example, amateur musicians typically start and end in time, but may differ in the middle of a performance.

In contrast, consider the problem of comparing braided and upbraid. While these two words have a lot of common structure, their differences are concentrated at both ends, exactly were the Itakura parallelogram would penalize them, by forcing them to an alignment that is almost Euclidean. Such comparisons need just as much flexibility at every point, which the reader will appreciate describes a band.

We believe that this consideration exactly explains when the Sakoe-Chiba band or Itakura parallelogram are most appropriate. For example, in the GunPoint dataset, the creators use a metronome that produced an audible bleep every five seconds; this provided a cue that aligned the ends of the signal snippet. This explains why Itakura parallelogram works well on this dataset. A good example is GunPointMaleVersusFemale dataset in combination with DTW measure. Here, the IT version is more than two times better than SC version (error rates are 0.0024 and 0.0058 respectively). The alignment on the beginning and on the end of the time series is clearly visible in Fig.

27 where several time series from this dataset are plotted.

Another example of good IT performance could be datasets where the images are converted to time series by measuring and recording the local angles of each pixel of an image contour. This idea is nicely shown in Fig.

28 on the FaceFour dataset. The same idea is applied in the Plane dataset where the contours of 7 different airplanes are used to create time-series dataset in the same manner. In both datasets, time series are aligned at the beginning and the end, while in the middle sector the variations are significant, as seen in Fig.

29. This fact makes them more appropriate for the IT constraint which is evident in the obtained classification error rates:

For the FaceFour dataset using the LCS measure: 0.0089 (IT), 0.0179 (SC)
For the Plane dataset applying the DTW measure: 0.0029 (IT), 0.0038 (SC)

In contrast to datasets suitable for the IT approach, in the datasets where SC performs better, time series do not necessarily need to be aligned at the beginning and at the end. The example datasets with this property are more frequent and can be easily found in the UCR repository. Two such datasets (only a few time series are plotted) are shown in Fig. 30.

In both datasets, SC produced notably lower error rates:

For the GesturePebbleZ2 dataset using DTW: 0.1384 (IT), 0.0671 (SC)
For the BeetleFly dataset applying DTW: 0.2225 (IT), 0.125 (SC)

Rights and permissions

Springer Nature or its licensor holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Geler, Z., Kurbalija, V., Ivanović, M. et al. Elastic distances for time-series classification: Itakura versus Sakoe-Chiba constraints. Knowl Inf Syst 64, 2797–2832 (2022). https://doi.org/10.1007/s10115-022-01725-1

Download citation

Received: 18 April 2022
Revised: 18 May 2022
Accepted: 16 July 2022
Published: 12 August 2022
Issue Date: October 2022
DOI: https://doi.org/10.1007/s10115-022-01725-1

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Elastic distances for time-series classification: Itakura versus Sakoe-Chiba constraints

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

The Performance of a Combined Distance Between Time Series

A Comprehensive Comparison of Distance Measures for Time Series Classification

Elastic similarity and distance measures for multivariate time series

Data availability

References

Acknowledgements

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Appendix: The intuition behind the Sakoe-Chiba band and the Itakura parallelogram

Appendix: The intuition behind the Sakoe-Chiba band and the Itakura parallelogram

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now