Abstract
A streaming time series is a continuous and unbounded group of chronological observations that are found in many scientific and business applications. Motifs that are frequent subsequences are highly representative for the time series and play an important role in time series mining. Discovering motifs in time series has received much attention during recent years, and several algorithms have been proposed to solve this problem. However, these algorithms can only find motifs with a predefined length, which greatly affects their performance and practicality. Recent algorithms can discover motifs with different lengths, but require multiple scanning of the time series and are thus not applicable to streaming time series. In addition, it is difficult to determine the optimal length of interesting motifs; a suboptimal choice results in missing the key motifs or having too many redundant motifs. To overcome this challenge, we introduce the notion of a \(closed\) motif; a motif is \(closed\) if there is no motif with a longer length having the same number of occurrences. We propose a novel algorithm \(closedMotif\) to discover closed motifs in a single scan for streaming time series. We also use the nearest neighbor classifier with the most distinctive closed motifs to validate their potential in time series classification. Extensive experiments show that our approach can efficiently discover motifs with different lengths. In addition, our closed-motif-based classifier is shown to be more accurate than \(Logical\text{- }Shapelet\), a state-of-the-art time series classifier. Finally, we demonstrate the scalability of \(closedMotif\) on several large datasets in diverse domains like video surveillance, sensor networks, and biometrics.
Similar content being viewed by others
References
Agrawal R, Srikant R (1994) Fast algorithms for mining association rules in large databases. In: Proceedings of the 20th international conference on very large data bases. Morgan Kaufmann Publishers Inc. 672836:487–499
Androulakis IP (2005) New approaches for representing, analyzing and visualizing complex kinetic mechanisms. In: Proceedings of the 15th European symposium on computer aided process engineering
Buza K, Nanopoulos A, Schmidt-Thieme L (2011) Insight: efficient and effective instance selection for time-series classification. In: Proceedings of the 15th Pacific-Asia conference on advances in knowledge discovery and data mining—volume Part II. Springer 2022863:149–160
Camerra A, Palpanas T, Shieh J, Keogh E (2010) isax 2.0: indexing and mining one billion time series. In: IEEE 10th international conference on data mining (ICDM), IEEE, pp 58–67
Castro N, Azevedo P (2010) Multiresolution motif discovery in time series. In: Proceedings of the SIAM international conference on data mining, pp 665–676
Celly B, Zordan V (2004) Animated people textures. In: Proceedings of the 17th international conference on computer animation and social agents
Chakrabarti K, Keogh E, Mehrotra S, Pazzani M (2002) Locally adaptive dimensionality reduction for indexing large time series databases. ACM Trans Database Syst (TODS) 27(2):188–228
Chan K, Fu A (1999) Efficient time series matching by wavelets. In: The 15th international conference on data engineering, IEEE, pp 126–133
Chen J-S, Moon Y-S, Yeung H-W (2005) Palmprint authentication using time series. In: Proceedings of the 5th international conference on audio- and video-based biometric person authentication. Springer 2134905:376–385
Chiu B, Keogh E, Lonardi S (2003) Probabilistic discovery of time series motifs. In: The ninth ACM SIGKDD international conference on Knowledge discovery and data mining, ACM, pp 493–498
Das G, Lin K, Mannila H, Renganathan G, Smyth P (1998) Rule discovery from time series. Knowl Discov Data Min, pp 16–22
Ding H, Trajcevski G, Scheuermann P, Wang X, Keogh E (2008) Querying and mining of time series data: experimental comparison of representations and distance measures. Proc VLDB Endow 1(2):1542–1552
Eirinaki M, Vazirgiannis M (2003) Web mining for web personalization. ACM Trans Internet Technol 3(1):1–27
Faloutsos C, Ranganathan M, Manolopoulos Y (1994) Fast subsequence matching in time-series databases. In: ACM SIGMOD international conference on management of data, vol. 23, ACM
Ferreira PG, Azevedo PJ, Silva CG, Brito RMM (2006) Mining approximate motifs in time series. In: Proceedings of the 9th international conference on discovery science. Springer 2089941:89–101
Gao L, Wang XS (2002) Continually evaluating similarity-based pattern queries on a streaming time series. In: Proceedings of the 2002 ACM SIGMOD international conference on management of data. ACM 564734:370–381
Goldberger AL, Amaral LA, Glass L, Hausdorff JM, Ivanov PC, Mark RG, Mietus JE., Moody GB, Peng CK, Stanley HE (2000) Physiobank, physiotoolkit, and physionet: components of a new research resource for complex physiologic signals. Circulation 101:215–220
Guttman A (1984) R-trees: a dynamic index structure for spatial searching. In: Proceedings of the 1984 ACM SIGMOD international conference on management of data, pp 47–57
Hoppner F (2001) Discovery of temporal patterns—learning rules about the qualitative behavior of time series. In: De Raedt L, Siebes A (eds) Principles and practice of knowledge discovery in databases, vol. 2168 of lecture notes in computer science, Springer, Berlin, pp 192–203
Idé T (2006) Why does subsequence time-series clustering produce sine waves? 10th European conference on principle and practice of knowledge discovery in databases, pp 211–222
Keogh E (2002) Exact indexing of dynamic time warping. In: Proceedings of the 28th international conference on very large data bases. VLDB Endowment 1287405:406–417
Keogh E, Kasetty S (2003) On the need for time series data mining benchmarks: a survey and empirical demonstration. Data Min Knowl Discov 7(4):349–371
Keogh E, Lin J (2005) Clustering of time-series subsequences is meaningless: implications for previous and future research. Knowl Inf Syst 8(2):154–177
Keogh E, Lin J, Fu A (2005) Hot sax: efficiently finding the most unusual time series subsequence. In: Data mining, fifth IEEE international conference on, p 8
Keogh E, Pazzani M (1998) An enhanced representation of time series which allows fast and accurate classification, clustering and relevance, feedback, pp 239–241
Keogh E, Zhu Q, Hu B, Hao Y, Xi X, Wei L, Ratanamahatana CA (2011) The ucr time series classification/clustering homepage: www.cs.ucr.edu/eamonn/time_series_data
Lam H, Pham N, Calders T (2011) Online discovery of top-k similar motifs in time series data. In: Proceedings of the eleventh SIAM international conference
Lee AJT, Wu H-W, Lee T-Y, Liu Y-H, Chen K-T (2009) Mining closed patterns in multi-sequence time-series databases. Data Knowl Eng 68(10):1071–1090
Li Y, Lin J (2010) Approximate variable-length time series motif discovery using grammar inference. In: Proceedings of the tenth international workshop on multimedia data mining, ACM, p 10
Lin J, Keogh E, Wei L, Lonardi S (2007) Experiencing sax: a novel symbolic representation of time series. Data Min Knowl Discov 15(2):107–144
Lin J, Keogh E, Lonardi S, Lankford J, Nystrom D (2004) Visually mining and monitoring massive time series. In: Proceedings of the tenth ACM SIGKDD international conference on knowledge discovery and data mining, ACM, pp 460–469
Lin J, Keogh E, Lonardi S, Patel P (2002) Finding motifs in time series. In: The 2nd workshop on temporal data mining, pp 53–68
Linoff G, Berry M (2011) Data mining techniques: for marketing, sales, and customer relationship management. Wiley Computer Publishing
Liu C, Yan X, Yu H, Han J, Yu P (2005) Mining behavior graphs for backtrace of noncrashing bugs. In: SIAM international conference on data mining, pp 286–297
McGovern A, Kruger A, Rosendahl D, Droegemeier K (2006) Open problem: dynamic relational models for improved hazardous weather prediction. In: Proceedings of ICML workshop on open problems in statistical relational learning
Minnen D, Starner T, Essa M, Isbell C (2006) Discovering characteristic actions from on-body sensor data. In: Wearable computers, 2006 10th IEEE international symposium on, pp 11–18
Mörchen F (2006) Time series knowledge mining. PhD thesis
Mueen A, Keogh E (2010) Online discovery and maintenance of time series motifs. In: Proceedings of the 16th ACM SIGKDD international conference on Knowledge discovery and data mining, ACM, pp 1089–1098
Mueen A, Keogh E, Young N (2011) Logical-shapelets: an expressive primitive for time series classification. In: Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining. ACM 2020587:1154–1162
Mueen A, Keogh E, Zhu Q, Cash S, Westover B (2009) Exact discovery of time series motifs. In: Proceedings of SIAM international conference on data mining, pp 473–484
Nevill-Manning CG, Witten IH (1997) Identifying hierarchical structure in sequences: a linear-time algorithm. J Artif Int Res 7(1):67–82
Nunthanid P, Niennattrakul V, Ratanamahatana C (2011) Discovery of variable length time series motif. In: ‘ECTI-CON’, IEEE, pp 472–475
Ohsaki M, Sato Y, Yokoi H, Yamaguchi T (2003) A rule discovery support system for sequential medical data, in the case study of a chronic hepatitis dataset. In: Proceedings of the ECML/PKDD-2003 discovery challenge workshop, pp 154–165
Patel P, Keogh E, Lin J, Lonardi S (2002) Mining motifs in massive time series databases. In: Data mining, 2002. ICDM 2003. Proceedings 2002 IEEE international conference on, IEEE, pp 370–377
Pei J, Han J, Mao R (2000) Closet: An efficient algorithm for mining frequent closed itemsets. In: Workshop on research issues in data mining and knowledge discovery, DMKD, pp 21–30
Ratanamahatana C, Keogh E (2005) Three myths about dynamic time warping data mining. In: The SIAM international data mining conference, pp 506–510
Rombo S, Terracina G (2004) ‘Discovering representative models in large time series databases. Flexible Query Answering Systems, pp 84–97
Sakoe H, Chiba S (1978) Dynamic programming algorithm optimization for spoken word recognition. IEEE Trans Acoust Speech Signal Process 26(1):43–49
Shieh J, Keogh E (2008) isax: indexing and mining terabyte sized time series. In: Proceedings of the 14th ACM SIGKDD international conference on knowledge discovery and data mining, ACM, pp 623–631
Tang H, Liao S (2008) Discovering original motifs with different lengths from time series. Knowl Based Syst 21(7):666–671
Uehara K, Tanaka Y, Makio K (2004) Motif discovery algorithm from motion data. In: Annual conference on JSAI, vol 18, pp 3D3-01
Vlachos M (2005) A practical time-series tutorial with matlab. In: European conference on machine learning
Wu H-W, Lee AJT (2010) Mining closed flexible patterns in time-series databases. Expert Syst Appl 37(3):2098–2107
Xi X, Keogh E, Shelton C, Wei L, Ratanamahatana CA (2006) Fast time series classification using numerosity reduction
Ye L, Keogh E (2009) Time series shapelets: a new primitive for data mining, ACM, pp 947–956
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Nguyen, HL., Ng, WK. & Woon, YK. Closed motifs for streaming time series classification. Knowl Inf Syst 41, 101–125 (2014). https://doi.org/10.1007/s10115-013-0662-6
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10115-013-0662-6