Skip to main content

Advertisement

Log in

Introducing time series snippets: a new primitive for summarizing long time series

  • Published:
Data Mining and Knowledge Discovery Aims and scope Submit manuscript

Abstract

The first question a data analyst asks when confronting a new dataset is often, “Show me some representative/typical data.” Answering this question is simple in many domains, with random samples or aggregate statistics of some kind. Surprisingly, it is difficult for large time series datasets. The major difficulty is not time or space complexity, but defining what it means to be representative data for this data type. In this work, we show that the obvious candidate definitions: motifs, shapelets, cluster centers, random samples etc., are all poor choices. We introduce time series snippets, a novel representation of typical time series subsequences. Informally, time series snippets can be seen as the answer to the following question. If a user, which could be a human or a higher-level algorithm, only has resources (including human time) to inspect k subsequences of a long time series, which k subsequences should be chosen? Beyond their utility for visualizing and summarizing massive time series collections, we show that time series snippets have utility for high-level comparison of large time series collections.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15
Fig. 16
Fig. 17
Fig. 18
Fig. 19
Fig. 20
Fig. 21
Fig. 22
Fig. 23
Fig. 24
Fig. 25
Fig. 26
Fig. 27
Fig. 28
Fig. 29
Fig. 30
Fig. 31

Similar content being viewed by others

Notes

  1. In a sense, midnight is not arbitrary, as it marks the midpoint between sunset and sunrise. However, due to time zones and daylight-savings time, it rarely coincides with 12 midnight on the clock. Midnight is really an arbitrary cultural artifact.

References

  • Abdoli A, Murillo AC, Yeh C-CM, Gerry AC, Keogh EJ (2018) Time series classification to improve poultry welfare. In: 2018 17th IEEE international conference on machine learning and applications (ICMLA). IEEE, pp 635–642

  • Alaee S, Abdoli A, Shelton C, Murillo AC, Gerry AC, Keogh E (2020) Features or shape? Tackling the false dichotomy of time series classification∗. In: Proceedings of the 2020 SIAM international conference on data mining. Society for Industrial and Applied Mathematics, pp 442–450

  • Alvarez-Estevez D, Moret-Bonillo V (2015) Computer-assisted diagnosis of the sleep apnea-hypopnea syndrome: a review. Sleep Disorders

  • Batista GEAPA, Keogh EJ, Tataw OM, De Souza VMA (2014) CID: an efficient complexity-invariant distance for time series. Data Min Knowl Discov 28(3):634–669

    Article  MathSciNet  Google Scholar 

  • Drews FA (2008) Patient monitors in critical care: Lessons for improvement. In: Advances in patient safety: new directions and alternative approaches (vol 3: performance and tools). Agency for Healthcare Research and Quality (US)

  • Elhamifar E, Sapiro G, Vidal R (2012) See all by looking at a few: sparse modeling for finding representative objects. In: 2012 IEEE conference on computer vision and pattern recognition. IEEE, pp 1600–1607

  • Forde-Johnston C (2014) Intentional rounding: a review of the literature. Nurs Stand 28(32):37–42

    Article  Google Scholar 

  • Gharghabi S, Imani S, Bagnall A, Darvishzadeh A, Keogh E (2018) Matrix profile XII: MPdist: a novel time series distance measure to allow data mining in more challenging scenarios. In: 2018 IEEE international conference on data mining (ICDM). IEEE, pp 965–970

  • Gharghabi S, Yeh C-CM, Ding Y, Ding W, Hibbing P, LaMunion S, Kaplan A, Crouter SE, Keogh E (2019) Domain agnostic online semantic segmentation for multi-dimensional time series. Data Min Knowl Discov 33(1):96–130

    Article  MathSciNet  Google Scholar 

  • Heldt T, Oefinger MB, Hoshiyama M, Mark RG (2003) Circulatory response to passive and active changes in posture. In: Computers in cardiology, 2003. IEEE, pp 263–266

  • Hendryx EP, Rivière BM, Sorensen DC, Rusin CG (2018) Finding representative electrocardiogram beat morphologies with CUR. J Biomed Inform 77:97–110

    Article  Google Scholar 

  • Imani S (2020) Supporting website for this paper. https://sites.google.com/site/snippetfinderinfo/

  • Imani S, Keogh E (2019) Matrix profile XIX: time series semantic motifs: a new primitive for finding higher-level structure in time series. In: 2019 IEEE international conference on data mining (ICDM). IEEE, pp 329–338

  • Imani S, Keogh E (2020) Natura: towards conversational analytics for comparing and contrasting time series. In: Companion proceedings of the web conference 2020, pp 46–47

  • Imani S, Madrid F, Ding W, Crouter S, Keogh E (2018) Matrix profile XIII: time series snippets: a new primitive for time series data mining. In: 2018 IEEE international conference on big knowledge (ICBK). IEEE, pp 382–389

  • Imani S, Alaee S, Keogh E (2019) Putting the human in the time series analytics loop. In: Companion proceedings of the 2019 World Wide Web conference, pp 635–644

  • Indyk P, Koudas N, Muthukrishnan S (2000) Identifying representative trends in massive time series data sets using sketches. In: VLDB, pp 363–372

  • Keogh E, Lin J (2005) Clustering of time-series subsequences is meaningless: implications for previous and future research. Knowl Inf Syst 8(2):154–177

    Article  Google Scholar 

  • Khuller S, Moss A, Naor JS (1999) The budgeted maximum coverage problem. Inf Proces Lett 70(1):39–45

    Article  MathSciNet  Google Scholar 

  • Kolhoff P, Preuß J, Loviscach J (2008) Content-based icons for music files. Comput Graph 32(5):550–560

    Article  Google Scholar 

  • Langohr L, Toivonen H (2012) Finding representative nodes in probabilistic graphs. In: Bisociative knowledge discovery. Springer, Berlin, pp 218–229

  • Lin JF-S, Karg M, Kulić D (2016) Movement primitive segmentation for human motion modeling: a framework for analysis. IEEE Trans Hum Mach Syst 46(3):325–339

    Article  Google Scholar 

  • Linnarsson D, Sundberg CJ, Tedner B, Haruna Y, Karemaker JM, Antonutto G, Di Prampero PE (1996) Blood pressure and heart rate responses to sudden changes of gravity during exercise. Am J Physiol Heart Circ Physiol 270(6):H2132–H2142

    Article  Google Scholar 

  • Lu L, Zhang H-J (2003) Automated extraction of music snippets. In: Proceedings of the eleventh ACM international conference on multimedia, pp 140–147

  • Pan F, Wang W, Tung AKH, Yang J (2005) Finding representative set from massive data. In: Fifth IEEE international conference on data mining (ICDM’05). IEEE, p 8

  • Papadimitriou S, Yu P (2006) Optimal multi-scale patterns in time series streams. In: Proceedings of the 2006 ACM SIGMOD international conference on management of data, pp 647–658

  • Reiss A, Stricker D (2012) Introducing a new benchmarked dataset for activity monitoring. In: 2012 16th international symposium on wearable computers. IEEE, pp 108–109

  • Rhodes JD, Cole WJ, Upshaw CR, Edgar TF, Webber ME (2014) Clustering analysis of residential electricity demand profiles. Appl Energy 135:461–471

    Article  Google Scholar 

  • Rosa KD, Shah R, Lin B (2011) Anatole Gershman, and Robert Frederking. Topical clustering of tweets. In: Proceedings of the ACM SIGIR: SWSM 63

  • Salmenkivi M (2006) Finding representative sets of dialect words for geographical regions. In: LREC, pp 1980–1985

  • Samaniego NC, Morris F, Brady WJ (2003) Electrocardiographic artefact mimicking arrhythmic change on the ECG. Emerg Med J 20(4):356–357

    Article  Google Scholar 

  • Schneider TD (2002) Consensus sequence zen. Appl Bioinform 1(3):111

    Google Scholar 

  • Wang X-J, Xu Z, Zhang L, Liu C, Rui Y (2012) Towards indexing representative images on the web. In: Proceedings of the 20th ACM international conference on multimedia, pp 1229–1238

  • Yeh C-CM, Zhu Y, Ulanova L, Begum N, Ding Y, Dau HA, Silva DF, Mueen A, Keogh E (2016) Matrix profile I: all pairs similarity joins for time series: a unifying view that includes motifs, discords and shapelets. In: 2016 IEEE 16th international conference on data mining (ICDM). IEEE, pp 1317–1322

  • Yu J, Reiter E, Hunter J, Mellish C (2007) Choosing the content of textual summaries of large time-series data sets. Nat Lang Eng 13(1):25–49

    Article  Google Scholar 

  • Zhu Y, Zimmerman Z, Senobari NS, Yeh C-CM, Funning G, Mueen A, Brisk P, Keogh E (2016) Matrix profile II: exploiting a novel algorithm and gpus to break the one hundred million barrier for time series motifs and joins. In: 2016 IEEE 16th international conference on data mining (ICDM). IEEE, pp 739–748

Download references

Acknowledgements

We gratefully acknowledge funding from NIH R01HD083431 and from NSF 1544969 and 1510741. Dr. Keogh would also like to acknowledge funding from NetAPP.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Shima Imani.

Additional information

Responsible editor: Panagiotis Papapetrou.

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Imani, S., Madrid, F., Ding, W. et al. Introducing time series snippets: a new primitive for summarizing long time series. Data Min Knowl Disc 34, 1713–1743 (2020). https://doi.org/10.1007/s10618-020-00702-y

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10618-020-00702-y

Keywords

Navigation