Introducing time series snippets: a new primitive for summarizing long time series

Imani, Shima; Madrid, Frank; Ding, Wei; Crouter, Scott E.; Keogh, Eamonn

doi:10.1007/s10618-020-00702-y

Introducing time series snippets: a new primitive for summarizing long time series

Published: 02 July 2020

Volume 34, pages 1713–1743, (2020)
Cite this article

Data Mining and Knowledge Discovery Aims and scope Submit manuscript

Shima Imani ORCID: orcid.org/0000-0001-9616-3178¹,
Frank Madrid¹,
Wei Ding²,
Scott E. Crouter³ &
…
Eamonn Keogh¹

1447 Accesses
7 Citations
Explore all metrics

Abstract

The first question a data analyst asks when confronting a new dataset is often, “Show me some representative/typical data.” Answering this question is simple in many domains, with random samples or aggregate statistics of some kind. Surprisingly, it is difficult for large time series datasets. The major difficulty is not time or space complexity, but defining what it means to be representative data for this data type. In this work, we show that the obvious candidate definitions: motifs, shapelets, cluster centers, random samples etc., are all poor choices. We introduce time series snippets, a novel representation of typical time series subsequences. Informally, time series snippets can be seen as the answer to the following question. If a user, which could be a human or a higher-level algorithm, only has resources (including human time) to inspect k subsequences of a long time series, which k subsequences should be chosen? Beyond their utility for visualizing and summarizing massive time series collections, we show that time series snippets have utility for high-level comparison of large time series collections.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 2

Fig. 5

Fig. 6

Fig. 7

Fig. 8

Fig. 10

Fig. 12

Fig. 18

Fig. 19

Fig. 20

Fig. 24

Fig. 25

Fig. 26

Fig. 29

Introducing the contrast profile: a novel time series primitive that allows real world classification

Article 17 March 2022

Introducing time series chains: a new primitive for time series data mining

Article 02 June 2018

Exploring variable-length time series motifs in one hundred million length scale

Article 10 May 2018

Notes

In a sense, midnight is not arbitrary, as it marks the midpoint between sunset and sunrise. However, due to time zones and daylight-savings time, it rarely coincides with 12 midnight on the clock. Midnight is really an arbitrary cultural artifact.

References

Abdoli A, Murillo AC, Yeh C-CM, Gerry AC, Keogh EJ (2018) Time series classification to improve poultry welfare. In: 2018 17th IEEE international conference on machine learning and applications (ICMLA). IEEE, pp 635–642
Alaee S, Abdoli A, Shelton C, Murillo AC, Gerry AC, Keogh E (2020) Features or shape? Tackling the false dichotomy of time series classification∗. In: Proceedings of the 2020 SIAM international conference on data mining. Society for Industrial and Applied Mathematics, pp 442–450
Alvarez-Estevez D, Moret-Bonillo V (2015) Computer-assisted diagnosis of the sleep apnea-hypopnea syndrome: a review. Sleep Disorders
Batista GEAPA, Keogh EJ, Tataw OM, De Souza VMA (2014) CID: an efficient complexity-invariant distance for time series. Data Min Knowl Discov 28(3):634–669
Article MathSciNet Google Scholar
Drews FA (2008) Patient monitors in critical care: Lessons for improvement. In: Advances in patient safety: new directions and alternative approaches (vol 3: performance and tools). Agency for Healthcare Research and Quality (US)
Elhamifar E, Sapiro G, Vidal R (2012) See all by looking at a few: sparse modeling for finding representative objects. In: 2012 IEEE conference on computer vision and pattern recognition. IEEE, pp 1600–1607
Forde-Johnston C (2014) Intentional rounding: a review of the literature. Nurs Stand 28(32):37–42
Article Google Scholar
Gharghabi S, Imani S, Bagnall A, Darvishzadeh A, Keogh E (2018) Matrix profile XII: MPdist: a novel time series distance measure to allow data mining in more challenging scenarios. In: 2018 IEEE international conference on data mining (ICDM). IEEE, pp 965–970
Gharghabi S, Yeh C-CM, Ding Y, Ding W, Hibbing P, LaMunion S, Kaplan A, Crouter SE, Keogh E (2019) Domain agnostic online semantic segmentation for multi-dimensional time series. Data Min Knowl Discov 33(1):96–130
Article MathSciNet Google Scholar
Heldt T, Oefinger MB, Hoshiyama M, Mark RG (2003) Circulatory response to passive and active changes in posture. In: Computers in cardiology, 2003. IEEE, pp 263–266
Hendryx EP, Rivière BM, Sorensen DC, Rusin CG (2018) Finding representative electrocardiogram beat morphologies with CUR. J Biomed Inform 77:97–110
Article Google Scholar
Imani S (2020) Supporting website for this paper. https://sites.google.com/site/snippetfinderinfo/
Imani S, Keogh E (2019) Matrix profile XIX: time series semantic motifs: a new primitive for finding higher-level structure in time series. In: 2019 IEEE international conference on data mining (ICDM). IEEE, pp 329–338
Imani S, Keogh E (2020) Natura: towards conversational analytics for comparing and contrasting time series. In: Companion proceedings of the web conference 2020, pp 46–47
Imani S, Madrid F, Ding W, Crouter S, Keogh E (2018) Matrix profile XIII: time series snippets: a new primitive for time series data mining. In: 2018 IEEE international conference on big knowledge (ICBK). IEEE, pp 382–389
Imani S, Alaee S, Keogh E (2019) Putting the human in the time series analytics loop. In: Companion proceedings of the 2019 World Wide Web conference, pp 635–644
Indyk P, Koudas N, Muthukrishnan S (2000) Identifying representative trends in massive time series data sets using sketches. In: VLDB, pp 363–372
Keogh E, Lin J (2005) Clustering of time-series subsequences is meaningless: implications for previous and future research. Knowl Inf Syst 8(2):154–177
Article Google Scholar
Khuller S, Moss A, Naor JS (1999) The budgeted maximum coverage problem. Inf Proces Lett 70(1):39–45
Article MathSciNet Google Scholar
Kolhoff P, Preuß J, Loviscach J (2008) Content-based icons for music files. Comput Graph 32(5):550–560
Article Google Scholar
Langohr L, Toivonen H (2012) Finding representative nodes in probabilistic graphs. In: Bisociative knowledge discovery. Springer, Berlin, pp 218–229
Lin JF-S, Karg M, Kulić D (2016) Movement primitive segmentation for human motion modeling: a framework for analysis. IEEE Trans Hum Mach Syst 46(3):325–339
Article Google Scholar
Linnarsson D, Sundberg CJ, Tedner B, Haruna Y, Karemaker JM, Antonutto G, Di Prampero PE (1996) Blood pressure and heart rate responses to sudden changes of gravity during exercise. Am J Physiol Heart Circ Physiol 270(6):H2132–H2142
Article Google Scholar
Lu L, Zhang H-J (2003) Automated extraction of music snippets. In: Proceedings of the eleventh ACM international conference on multimedia, pp 140–147
Pan F, Wang W, Tung AKH, Yang J (2005) Finding representative set from massive data. In: Fifth IEEE international conference on data mining (ICDM’05). IEEE, p 8
Papadimitriou S, Yu P (2006) Optimal multi-scale patterns in time series streams. In: Proceedings of the 2006 ACM SIGMOD international conference on management of data, pp 647–658
Reiss A, Stricker D (2012) Introducing a new benchmarked dataset for activity monitoring. In: 2012 16th international symposium on wearable computers. IEEE, pp 108–109
Rhodes JD, Cole WJ, Upshaw CR, Edgar TF, Webber ME (2014) Clustering analysis of residential electricity demand profiles. Appl Energy 135:461–471
Article Google Scholar
Rosa KD, Shah R, Lin B (2011) Anatole Gershman, and Robert Frederking. Topical clustering of tweets. In: Proceedings of the ACM SIGIR: SWSM 63
Salmenkivi M (2006) Finding representative sets of dialect words for geographical regions. In: LREC, pp 1980–1985
Samaniego NC, Morris F, Brady WJ (2003) Electrocardiographic artefact mimicking arrhythmic change on the ECG. Emerg Med J 20(4):356–357
Article Google Scholar
Schneider TD (2002) Consensus sequence zen. Appl Bioinform 1(3):111
Google Scholar
Wang X-J, Xu Z, Zhang L, Liu C, Rui Y (2012) Towards indexing representative images on the web. In: Proceedings of the 20th ACM international conference on multimedia, pp 1229–1238
Yeh C-CM, Zhu Y, Ulanova L, Begum N, Ding Y, Dau HA, Silva DF, Mueen A, Keogh E (2016) Matrix profile I: all pairs similarity joins for time series: a unifying view that includes motifs, discords and shapelets. In: 2016 IEEE 16th international conference on data mining (ICDM). IEEE, pp 1317–1322
Yu J, Reiter E, Hunter J, Mellish C (2007) Choosing the content of textual summaries of large time-series data sets. Nat Lang Eng 13(1):25–49
Article Google Scholar
Zhu Y, Zimmerman Z, Senobari NS, Yeh C-CM, Funning G, Mueen A, Brisk P, Keogh E (2016) Matrix profile II: exploiting a novel algorithm and gpus to break the one hundred million barrier for time series motifs and joins. In: 2016 IEEE 16th international conference on data mining (ICDM). IEEE, pp 739–748

Download references

Acknowledgements

We gratefully acknowledge funding from NIH R01HD083431 and from NSF 1544969 and 1510741. Dr. Keogh would also like to acknowledge funding from NetAPP.

Author information

Authors and Affiliations

University of California, Riverside, Riverside, USA
Shima Imani, Frank Madrid & Eamonn Keogh
Department of Computer Science, University of Massachusetts Boston, Boston, USA
Wei Ding
Department of Kinesiology, Recreation, and Sport Studies, The University of Tennessee Knoxville, Knoxville, USA
Scott E. Crouter

Authors

Shima Imani
View author publications
You can also search for this author in PubMed Google Scholar
Frank Madrid
View author publications
You can also search for this author in PubMed Google Scholar
Wei Ding
View author publications
You can also search for this author in PubMed Google Scholar
Scott E. Crouter
View author publications
You can also search for this author in PubMed Google Scholar
Eamonn Keogh
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Shima Imani.

Additional information

Responsible editor: Panagiotis Papapetrou.

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Imani, S., Madrid, F., Ding, W. et al. Introducing time series snippets: a new primitive for summarizing long time series. Data Min Knowl Disc 34, 1713–1743 (2020). https://doi.org/10.1007/s10618-020-00702-y

Download citation

Received: 03 June 2019
Accepted: 20 June 2020
Published: 02 July 2020
Issue Date: November 2020
DOI: https://doi.org/10.1007/s10618-020-00702-y

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Introducing time series snippets: a new primitive for summarizing long time series

Abstract

Access this article

Similar content being viewed by others

Introducing the contrast profile: a novel time series primitive that allows real world classification

Introducing time series chains: a new primitive for time series data mining

Exploring variable-length time series motifs in one hundred million length scale

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Introducing time series snippets: a new primitive for summarizing long time series

Abstract

Access this article

Similar content being viewed by others

Introducing the contrast profile: a novel time series primitive that allows real world classification

Introducing time series chains: a new primitive for time series data mining

Exploring variable-length time series motifs in one hundred million length scale

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation