Window Size Selection in Unsupervised Time Series Analytics: A Review and Benchmark

Ermshaus, Arik; Schäfer, Patrick; Leser, Ulf

doi:10.1007/978-3-031-24378-3_6

Arik Ermshaus¹³,
Patrick Schäfer¹³ &
Ulf Leser¹³

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 13812))

Included in the following conference series:

International Workshop on Advanced Analytics and Learning on Temporal Data

1280 Accesses
1 Altmetric

Abstract

Time series (TS) are sequences of values ordered in time. Such TS have in common, that important insights from the data can be drawn by inspecting local substructures, and not the recordings as a whole. ECG recordings, for instance, are characterized by normal or anomalous heartbeats that repeat themselves often within a longer TS. As such, many state-of-the-art time series data mining (TSDM) methods characterize TS by inspecting local substructures. The window size for extracting such subsequences is a crucial hyper-parameter, and setting an inappropriate value results in poor TSDM results. Finding the optimal window size has remained to be one of the most challenging tasks in TSDM domains, where no domain-agnostic method is known for learning the window size. We provide, for the first time, a systematic survey and experimental study of 6 TS window size selection (WSS) algorithms on three diverse TSDM tasks, namely anomaly detection, segmentation and motif discovery, using state-of-the art TSDM algorithms and benchmarks. We found that WSS methods are competitive with or even surpass human annotations, if an interesting or anomalous pattern can be attributed to (changes in) the period. That is because current WSS methods aim at finding the period length of data sets. This assumption is mostly true for segmentation or anomaly detection, by definition. In the case of motif discovery, however, the results were mixed. Motifs can be independent of a period, but repeat themselves unusually often. In this domain, WSS fails and more research is needed.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 44.99; Price excludes VAT (USA)

Softcover Book: USD 59.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Accelerating the discovery of unsupervised-shapelets

Article 07 May 2015

$$MC^2$$ : An Integrated Toolbox for Change, Causality and Motif Discovery

Introducing the contrast profile: a novel time series primitive that allows real world classification

Article 17 March 2022

References

Blázquez-García, A., Conde, A., Mori, U., Lozano, J.A.: A review on outlier/anomaly detection in time series data. ACM Comput. Surv. (CSUR) 54, 1–33 (2021)
Article Google Scholar
Dau, H.A., et al.: The UCR time series archive. IEEE/CAA J. Automatica Sinica 6, 1293–1305 (2019)
Article Google Scholar
Demšar, J.: Statistical comparisons of classifiers over multiple data sets. J. Mach. Learn. Res. 7, 1–30 (2006)
MathSciNet MATH Google Scholar
Elfeky, M.G., Aref, W.G., Elmagarmid, A.K.: Periodicity detection in time series databases. IEEE Trans. Knowl. Data Eng. 17, 875–887 (2005)
Article Google Scholar
Ermshaus, A., Schäfer, P., Leser, U.: ClaSP - Parameter-free Time Series Segmentation. arXiv (2022)
Google Scholar
Gharghabi, S., et al.: Domain agnostic online semantic segmentation for multi-dimensional time series. DMKD 33, 96–130 (2018)
MathSciNet MATH Google Scholar
Grabocka, J., Schilling, N., Schmidt-Thieme, L.: Latent time-series motifs. TKDD 11(1), 1–20 (2016)
Article Google Scholar
Imani, S., Keogh, E.: Multi-window-finder: domain agnostic window size for time series data. MileTS (2021)
Google Scholar
Keogh, E., Dutta Roy, T., Naik, U., Agrawal, A: Multi-dataset time-series anomaly detection competition (2021). https://compete.hexagon-ml.com/practice/competition/39/
Lin, J., Keogh, E., Wei, L., Lonardi, S.: Experiencing sax: a novel symbolic representation of time series. DMKD 15(2), 107–144 (2007)
MathSciNet Google Scholar
Lonardi, J., Patel, P.: Finding motifs in time series. In: Workshop on Temporal Data Mining (2002)
Google Scholar
Mörchen, F., Ultsch, A.: Efficient mining of understandable patterns from multivariate interval time series. DMKD 15(2), 181–215 (2007)
MathSciNet Google Scholar
Petrutiu, S., Sahakian, A.V., Swiryn, S.: Abrupt changes in fibrillatory wave characteristics at the termination of paroxysmal atrial fibrillation in humans. Europace 9(7), 466–470 (2007)
Article Google Scholar
Schäfer, P., Ermshaus, A., Leser, U.: ClaSP - Time series segmentation. In: CIKM (2021)
Google Scholar
Supporting Material (2022). https://github.com/ermshaua/window-size-selection
Time Series Segmentation Benchmark (2021). https://github.com/ermshaua/time-series-segmentation-benchmark
Truong, C., Oudre, L., Vayatis, N.: Selective review of offline change point detection methods. Sig. Proces. 167, 107299 (2019)
Article MATH Google Scholar
Vlachos, M., Yu, P.S., Castelli, V.: On periodicity detection and structural periodic similarity. In: SDM (2005)
Google Scholar
Wen, Q., He, K., Sun, L., Zhang, Y., Ke, M., min Xu, H.: Robustperiod: robust time-frequency mining for multiple periodicity detection. In: SIGMOD/PODS (2021)
Google Scholar
Yeh, C.C.M., et al.: Matrix profile i: all pairs similarity joins for time series: a unifying view that includes motifs, discords and shapelets. In: ICDM (2016)
Google Scholar

Download references

Author information

Authors and Affiliations

Humboldt-Universität zu Berlin, Berlin, Germany
Arik Ermshaus, Patrick Schäfer & Ulf Leser

Authors

Arik Ermshaus
View author publications
You can also search for this author in PubMed Google Scholar
Patrick Schäfer
View author publications
You can also search for this author in PubMed Google Scholar
Ulf Leser
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Arik Ermshaus .

Editor information

Editors and Affiliations

Inria Grenoble - Rhône-Alpes Research Centre, Villeurbanne, France
Thomas Guyet
University College Dublin, Dublin, Ireland
Georgiana Ifrim
University of Rennes, Rennes, France
Simon Malinowski
University of East Anglia, Norwich, UK
Anthony Bagnall
University of Rennes, Rennes, France
Patrick Shafer
Orange Labs, Lannion, France
Vincent Lemaire

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Ermshaus, A., Schäfer, P., Leser, U. (2023). Window Size Selection in Unsupervised Time Series Analytics: A Review and Benchmark. In: Guyet, T., Ifrim, G., Malinowski, S., Bagnall, A., Shafer, P., Lemaire, V. (eds) Advanced Analytics and Learning on Temporal Data. AALTD 2022. Lecture Notes in Computer Science(), vol 13812. Springer, Cham. https://doi.org/10.1007/978-3-031-24378-3_6

Download citation

DOI: https://doi.org/10.1007/978-3-031-24378-3_6
Published: 04 February 2023
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-24377-6
Online ISBN: 978-3-031-24378-3
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Societies and partnerships

the ECML PKDD community (opens in a new tab)

Window Size Selection in Unsupervised Time Series Analytics: A Review and Benchmark