Improving time series similarity measures by integrating preprocessing steps

Höppner, Frank

doi:10.1007/s10618-016-0490-x

Improving time series similarity measures by integrating preprocessing steps

Published: 11 January 2017

Volume 31, pages 851–878, (2017)
Cite this article

Data Mining and Knowledge Discovery Aims and scope Submit manuscript

Frank Höppner ORCID: orcid.org/0000-0003-4170-5077¹

1168 Accesses
13 Citations
5 Altmetric
Explore all metrics

Abstract

While many application papers involving time series data report about the beneficial application of filters, filtering (and preprocessing in general) plays at best a minor role in the proposals of similarity measures for time series or the studies that compare them. We investigate the performance of basic Euclidean distance with an integrated preprocessing (filtering with automatically derived filters (supervised or unsupervised) and rescaling) and demonstrate that such measures can better respond to typical problems in time series similarity. By accounting for differences in both domains (time and value) we overcome some limitations of elastic measures that focus on time only. Using the proposed measure on real datasets we can achieve performance gains comparable to those of switching from a lock-step measure (Euclidean) to an elastic measure (DTW).

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Artificial Intelligence

Notes

Only the artificial data can be shared: http://public.ostfalia.de/~hoeppnef/tsfilter.html.
We found no reference for the type of the Car dataset, but assume it to be a shape dataset due to its similarity to other datasets of this type, e.g. FISH.
Fig. 7
Car (top) and FaceFour (bottom) dataset. For a given reference series (black) several other series (dashed gray) from the same class (left) and another class (right) are filtered and rescaled by LFD (red series) to obtain a higher correspondence with the reference (Color figure online)
Full size image

References

Agrawal R, Faloutsos C, Swami A (1993) Efficient similarity search in sequence databases. In: 4th international conference foundations of data organization and algorithms, Chicago, pp 69–84
Argyros T, Ermopoulos C (2003) Efficient subsequence matching in time series databases under time and amplitude transformations. In: IEEE data mining, IEEE computing in society, pp 481–484
Batista G, Wang X, Keogh E (2011) A complexity-invariant distance measure for time series. In: Proceedings of international conference on data mining, pp 699–710
Chapter Google Scholar
Berndt DJ, Clifford J (1996) Finding patterns in time series: a dynamic programming approach. In: Advances in knowledge discovery and data mining, MITP, pp 229–248
Casacuberta F, Vidal E, Rulot H (1987) On the metric properties of dynamic time warping. IEEE Trans Acoust Speech Signal Process 35(11):1631–1633
Article Google Scholar
Chen Y, Keogh E, Hu B, Begum N, Bagnall A, Mueen A, Batista G (2011) The UCR time series classification archive. www.cs.ucr.edu/~eamonn/time_series_data
Chen L, Ng R (2004) On the marriage of Lp-norms and edit distance. In: International conference on very large databases, pp 792–803
Chapter Google Scholar
Chen L, Özsu M, Oria V (2005) Robust and fast similarity search for moving object trajectories. In: Proceedings of ACM SIGMOD, pp 491–502
Cleveland WS (1979) Robust locally weighted regression and smoothing scatterplots. J Am Stat Assoc 74(368):829–836. doi:10.2307/2286407
Article MathSciNet MATH Google Scholar
Dallachiesa M, Nushi B (2012) Uncertain time-series similarity: return to the basics. Proc VLDB 5(11):1662–1673
Article Google Scholar
Das G, Gunopulos D, Mannila H (1997) Finding similar time series. In: Proceedings of the conference on principles of knowledge discovery and data mining
Elboher E, Werman M (2013) Asymmetric correlation: a noise robust similarity measure for template matching. IEEE Trans Image Process 22(8):3062–3073
Article MathSciNet Google Scholar
Fu TC (2011) A review on time series data mining. Eng Appl Artif Intell 24(1):164–181
Article Google Scholar
Górecki T, Łuczak M (2013) Using derivatives in time series classification. Data Min Knowl Discov 26(2):310–331
Article MathSciNet Google Scholar
Höppner F (2014) Less is more: similarity of time series under linear transformations. In: Proceedings of SIAM international conference data mining (SDM), pp 560–568
Chapter Google Scholar
Höppner F (2015) Optimal filtering for time series classification. In: Proceedings of 16th internatinal conference intelligent data engineering and automated learning, pp 26–35
Chapter Google Scholar
Höppner F, Klawonn F (2009) Compensation of translational displacement in time series clustering using cross correlation. In: Proceedings of 8th international symposium intelligent data analysis, no. 5772 in LNCS, pp 71–82
Chapter Google Scholar
Itakura F (1975) Minimum prediction residual principle to speech recognition applied. IEEE Trans Acoust Speech Signal Process 23(1):67–72
Article Google Scholar
Kay SM (1993) Fundamentals of statistical signal processing. Prentice-Hall, Englewood Cliffs
MATH Google Scholar
Keogh E (2003) Efficiently finding arbitrarily scaled patterns in massive time series databases. In: Knowledge discovery in databases (PKDD), pp 253–265
Chapter Google Scholar
Keogh E, Kasetty S (2003) On the need for time series data mining benchmarks: a survey and empirical demonstration. Data Min Knowl Discov 7(4):349–371. doi:10.1023/A:1024988512476
Article MathSciNet Google Scholar
Keogh E, Pazzani M (2001) Derivative dynamic time warping. In: SDM, pp 5–7
Kullback S, Leibler RA (1951) On information and sufficiency. Ann Math Stat 22(1):79–86
Article MathSciNet Google Scholar
Last M, Klein Y, Kandel A (2001) Knowledge discovery in time series databases. IEEE Trans Syst Man Cybern 31(01):160–169
Article Google Scholar
Marteau PF (2009) Time warp edit distance with stiffness adjustment for time series matching. IEEE Trans Pattern Anal Mach Intell 31(2):306–318
Article Google Scholar
Müller M (2007) Information retrieval for music and motion. Springer, New York
Book Google Scholar
Navarro G (2001) A guided tour to approximate string matching. ACM Comput Surv 33(1):31–88
Article Google Scholar
Ramsay JO, Silverman BW (1997) Functional data analysis. Springer series in statistics. Springer, New York
Book Google Scholar
Ratanamahatana C, Keogh E (2004) Everything you know about dynamic time warping is wrong. In: Third workshop on mining temporal and sequential data
Sakoe H, Chiba S (1978) Dynamic programming algorithm optimization for spoken word recognition. IEEE Trans Acoust Speech Signal Process 26(1):43–49
Article Google Scholar
Schäfer P (2014) The BOSS is concerned with time series classification in the presence of noise. Data Min Knowl Discov 29(6):1505–1530
Article MathSciNet Google Scholar
Serrà J, Arcos JL (2014) An empirical evaluation of similarity measures for time series classification. Knowl Based Syst 67:305–314
Article Google Scholar
Taylor JR (1997) An introduction to error analysis. University Science Books, New York
Google Scholar
Tversky A (1977) Features of similarity. Psychol Rev 84(4):327–352
Article Google Scholar
Wang X, Mueen A, Ding H, Trajcevski G, Scheuermann P, Keogh E (2012) Experimental comparison of representation methods and distance measures for time series data. Data Min Knowl Discov 26(2):275–309
Article MathSciNet Google Scholar
Ye L, Keogh E (2010) Time series shapelets: a novel technique that allows accurate, interpretable and fast classification. Data Min Knowl Discov 22(1–2):149–182
MathSciNet MATH Google Scholar

Download references

Acknowledgements

I would like to thank the reviewers for their useful comments.

Author information

Authors and Affiliations

Department of Computer Science, Ostfalia University of Applied Sciences, Wolfenbüttel, Germany
Frank Höppner

Authors

Frank Höppner
View author publications
You can also search for this author inPubMed Google Scholar

Corresponding author

Correspondence to Frank Höppner.

Additional information

Responsible editor: Eamonn Keogh.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Höppner, F. Improving time series similarity measures by integrating preprocessing steps. Data Min Knowl Disc 31, 851–878 (2017). https://doi.org/10.1007/s10618-016-0490-x

Download citation

Received: 04 March 2016
Accepted: 23 December 2016
Published: 11 January 2017
Issue Date: May 2017
DOI: https://doi.org/10.1007/s10618-016-0490-x

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Improving time series similarity measures by integrating preprocessing steps

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Elastic similarity and distance measures for multivariate time series

A review and evaluation of elastic distance functions for time series clustering

A Multiscale Bezier-Representation for Time Series that Supports Elastic Matching

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Keywords

Subscribe and save

Buy Now

Improving time series similarity measures by integrating preprocessing steps

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Elastic similarity and distance measures for multivariate time series

A review and evaluation of elastic distance functions for time series clustering

A Multiscale Bezier-Representation for Time Series that Supports Elastic Matching

Explore related subjects

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now