Skip to main content
Log in

Closed motifs for streaming time series classification

  • Regular Paper
  • Published:
Knowledge and Information Systems Aims and scope Submit manuscript

Abstract

A streaming time series is a continuous and unbounded group of chronological observations that are found in many scientific and business applications. Motifs that are frequent subsequences are highly representative for the time series and play an important role in time series mining. Discovering motifs in time series has received much attention during recent years, and several algorithms have been proposed to solve this problem. However, these algorithms can only find motifs with a predefined length, which greatly affects their performance and practicality. Recent algorithms can discover motifs with different lengths, but require multiple scanning of the time series and are thus not applicable to streaming time series. In addition, it is difficult to determine the optimal length of interesting motifs; a suboptimal choice results in missing the key motifs or having too many redundant motifs. To overcome this challenge, we introduce the notion of a \(closed\) motif; a motif is \(closed\) if there is no motif with a longer length having the same number of occurrences. We propose a novel algorithm \(closedMotif\) to discover closed motifs in a single scan for streaming time series. We also use the nearest neighbor classifier with the most distinctive closed motifs to validate their potential in time series classification. Extensive experiments show that our approach can efficiently discover motifs with different lengths. In addition, our closed-motif-based classifier is shown to be more accurate than \(Logical\text{- }Shapelet\), a state-of-the-art time series classifier. Finally, we demonstrate the scalability of \(closedMotif\) on several large datasets in diverse domains like video surveillance, sensor networks, and biometrics.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14

Similar content being viewed by others

References

  1. Agrawal R, Srikant R (1994) Fast algorithms for mining association rules in large databases. In: Proceedings of the 20th international conference on very large data bases. Morgan Kaufmann Publishers Inc. 672836:487–499

  2. Androulakis IP (2005) New approaches for representing, analyzing and visualizing complex kinetic mechanisms. In: Proceedings of the 15th European symposium on computer aided process engineering

  3. Buza K, Nanopoulos A, Schmidt-Thieme L (2011) Insight: efficient and effective instance selection for time-series classification. In: Proceedings of the 15th Pacific-Asia conference on advances in knowledge discovery and data mining—volume Part II. Springer 2022863:149–160

  4. Camerra A, Palpanas T, Shieh J, Keogh E (2010) isax 2.0: indexing and mining one billion time series. In: IEEE 10th international conference on data mining (ICDM), IEEE, pp 58–67

  5. Castro N, Azevedo P (2010) Multiresolution motif discovery in time series. In: Proceedings of the SIAM international conference on data mining, pp 665–676

  6. Celly B, Zordan V (2004) Animated people textures. In: Proceedings of the 17th international conference on computer animation and social agents

  7. Chakrabarti K, Keogh E, Mehrotra S, Pazzani M (2002) Locally adaptive dimensionality reduction for indexing large time series databases. ACM Trans Database Syst (TODS) 27(2):188–228

    Article  Google Scholar 

  8. Chan K, Fu A (1999) Efficient time series matching by wavelets. In: The 15th international conference on data engineering, IEEE, pp 126–133

  9. Chen J-S, Moon Y-S, Yeung H-W (2005) Palmprint authentication using time series. In: Proceedings of the 5th international conference on audio- and video-based biometric person authentication. Springer 2134905:376–385

  10. Chiu B, Keogh E, Lonardi S (2003) Probabilistic discovery of time series motifs. In: The ninth ACM SIGKDD international conference on Knowledge discovery and data mining, ACM, pp 493–498

  11. Das G, Lin K, Mannila H, Renganathan G, Smyth P (1998) Rule discovery from time series. Knowl Discov Data Min, pp 16–22

  12. Ding H, Trajcevski G, Scheuermann P, Wang X, Keogh E (2008) Querying and mining of time series data: experimental comparison of representations and distance measures. Proc VLDB Endow 1(2):1542–1552

    Article  Google Scholar 

  13. Eirinaki M, Vazirgiannis M (2003) Web mining for web personalization. ACM Trans Internet Technol 3(1):1–27

    Article  Google Scholar 

  14. Faloutsos C, Ranganathan M, Manolopoulos Y (1994) Fast subsequence matching in time-series databases. In: ACM SIGMOD international conference on management of data, vol. 23, ACM

  15. Ferreira PG, Azevedo PJ, Silva CG, Brito RMM (2006) Mining approximate motifs in time series. In: Proceedings of the 9th international conference on discovery science. Springer 2089941:89–101

  16. Gao L, Wang XS (2002) Continually evaluating similarity-based pattern queries on a streaming time series. In: Proceedings of the 2002 ACM SIGMOD international conference on management of data. ACM 564734:370–381

  17. Goldberger AL, Amaral LA, Glass L, Hausdorff JM, Ivanov PC, Mark RG, Mietus JE., Moody GB, Peng CK, Stanley HE (2000) Physiobank, physiotoolkit, and physionet: components of a new research resource for complex physiologic signals. Circulation 101:215–220

    Google Scholar 

  18. Guttman A (1984) R-trees: a dynamic index structure for spatial searching. In: Proceedings of the 1984 ACM SIGMOD international conference on management of data, pp 47–57

  19. Hoppner F (2001) Discovery of temporal patterns—learning rules about the qualitative behavior of time series. In: De Raedt L, Siebes A (eds) Principles and practice of knowledge discovery in databases, vol. 2168 of lecture notes in computer science, Springer, Berlin, pp 192–203

  20. Idé T (2006) Why does subsequence time-series clustering produce sine waves? 10th European conference on principle and practice of knowledge discovery in databases, pp 211–222

  21. Keogh E (2002) Exact indexing of dynamic time warping. In: Proceedings of the 28th international conference on very large data bases. VLDB Endowment 1287405:406–417

  22. Keogh E, Kasetty S (2003) On the need for time series data mining benchmarks: a survey and empirical demonstration. Data Min Knowl Discov 7(4):349–371

    Article  MathSciNet  Google Scholar 

  23. Keogh E, Lin J (2005) Clustering of time-series subsequences is meaningless: implications for previous and future research. Knowl Inf Syst 8(2):154–177

    Article  Google Scholar 

  24. Keogh E, Lin J, Fu A (2005) Hot sax: efficiently finding the most unusual time series subsequence. In: Data mining, fifth IEEE international conference on, p 8

  25. Keogh E, Pazzani M (1998) An enhanced representation of time series which allows fast and accurate classification, clustering and relevance, feedback, pp 239–241

  26. Keogh E, Zhu Q, Hu B, Hao Y, Xi X, Wei L, Ratanamahatana CA (2011) The ucr time series classification/clustering homepage: www.cs.ucr.edu/eamonn/time_series_data

  27. Lam H, Pham N, Calders T (2011) Online discovery of top-k similar motifs in time series data. In: Proceedings of the eleventh SIAM international conference

  28. Lee AJT, Wu H-W, Lee T-Y, Liu Y-H, Chen K-T (2009) Mining closed patterns in multi-sequence time-series databases. Data Knowl Eng 68(10):1071–1090

    Article  Google Scholar 

  29. Li Y, Lin J (2010) Approximate variable-length time series motif discovery using grammar inference. In: Proceedings of the tenth international workshop on multimedia data mining, ACM, p 10

  30. Lin J, Keogh E, Wei L, Lonardi S (2007) Experiencing sax: a novel symbolic representation of time series. Data Min Knowl Discov 15(2):107–144

    Article  MathSciNet  Google Scholar 

  31. Lin J, Keogh E, Lonardi S, Lankford J, Nystrom D (2004) Visually mining and monitoring massive time series. In: Proceedings of the tenth ACM SIGKDD international conference on knowledge discovery and data mining, ACM, pp 460–469

  32. Lin J, Keogh E, Lonardi S, Patel P (2002) Finding motifs in time series. In: The 2nd workshop on temporal data mining, pp 53–68

  33. Linoff G, Berry M (2011) Data mining techniques: for marketing, sales, and customer relationship management. Wiley Computer Publishing

  34. Liu C, Yan X, Yu H, Han J, Yu P (2005) Mining behavior graphs for backtrace of noncrashing bugs. In: SIAM international conference on data mining, pp 286–297

  35. McGovern A, Kruger A, Rosendahl D, Droegemeier K (2006) Open problem: dynamic relational models for improved hazardous weather prediction. In: Proceedings of ICML workshop on open problems in statistical relational learning

  36. Minnen D, Starner T, Essa M, Isbell C (2006) Discovering characteristic actions from on-body sensor data. In: Wearable computers, 2006 10th IEEE international symposium on, pp 11–18

  37. Mörchen F (2006) Time series knowledge mining. PhD thesis

  38. Mueen A, Keogh E (2010) Online discovery and maintenance of time series motifs. In: Proceedings of the 16th ACM SIGKDD international conference on Knowledge discovery and data mining, ACM, pp 1089–1098

  39. Mueen A, Keogh E, Young N (2011) Logical-shapelets: an expressive primitive for time series classification. In: Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining. ACM 2020587:1154–1162

  40. Mueen A, Keogh E, Zhu Q, Cash S, Westover B (2009) Exact discovery of time series motifs. In: Proceedings of SIAM international conference on data mining, pp 473–484

  41. Nevill-Manning CG, Witten IH (1997) Identifying hierarchical structure in sequences: a linear-time algorithm. J Artif Int Res 7(1):67–82

    MATH  Google Scholar 

  42. Nunthanid P, Niennattrakul V, Ratanamahatana C (2011) Discovery of variable length time series motif. In: ‘ECTI-CON’, IEEE, pp 472–475

  43. Ohsaki M, Sato Y, Yokoi H, Yamaguchi T (2003) A rule discovery support system for sequential medical data, in the case study of a chronic hepatitis dataset. In: Proceedings of the ECML/PKDD-2003 discovery challenge workshop, pp 154–165

  44. Patel P, Keogh E, Lin J, Lonardi S (2002) Mining motifs in massive time series databases. In: Data mining, 2002. ICDM 2003. Proceedings 2002 IEEE international conference on, IEEE, pp 370–377

  45. Pei J, Han J, Mao R (2000) Closet: An efficient algorithm for mining frequent closed itemsets. In: Workshop on research issues in data mining and knowledge discovery, DMKD, pp 21–30

  46. Ratanamahatana C, Keogh E (2005) Three myths about dynamic time warping data mining. In: The SIAM international data mining conference, pp 506–510

  47. Rombo S, Terracina G (2004) ‘Discovering representative models in large time series databases. Flexible Query Answering Systems, pp 84–97

  48. Sakoe H, Chiba S (1978) Dynamic programming algorithm optimization for spoken word recognition. IEEE Trans Acoust Speech Signal Process 26(1):43–49

    Article  MATH  Google Scholar 

  49. Shieh J, Keogh E (2008) isax: indexing and mining terabyte sized time series. In: Proceedings of the 14th ACM SIGKDD international conference on knowledge discovery and data mining, ACM, pp 623–631

  50. Tang H, Liao S (2008) Discovering original motifs with different lengths from time series. Knowl Based Syst 21(7):666–671

    Article  Google Scholar 

  51. Uehara K, Tanaka Y, Makio K (2004) Motif discovery algorithm from motion data. In: Annual conference on JSAI, vol 18, pp 3D3-01

  52. Vlachos M (2005) A practical time-series tutorial with matlab. In: European conference on machine learning

  53. Wu H-W, Lee AJT (2010) Mining closed flexible patterns in time-series databases. Expert Syst Appl 37(3):2098–2107

    Article  MathSciNet  Google Scholar 

  54. Xi X, Keogh E, Shelton C, Wei L, Ratanamahatana CA (2006) Fast time series classification using numerosity reduction

  55. Ye L, Keogh E (2009) Time series shapelets: a new primitive for data mining, ACM, pp 947–956

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Hai-Long Nguyen.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Nguyen, HL., Ng, WK. & Woon, YK. Closed motifs for streaming time series classification. Knowl Inf Syst 41, 101–125 (2014). https://doi.org/10.1007/s10115-013-0662-6

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10115-013-0662-6

Keywords

Navigation