Abstract
Time series motifs are repeated patterns in long and noisy time series. Motifs are typically used to understand the dynamics of the source because repeated patterns with high similarity evidentially rule out the presence of noise. Recently, time series motifs have also been used for clustering, summarization, rule discovery and compression as features. For all such purposes, many high-quality motifs of various lengths are desirable and thus originate the problem of enumerating motifs for a wide range of lengths. Existing algorithms find motifs for a given length. A trivial way to enumerate motifs is to run one of the algorithms for the whole range of lengths. However, such parameter sweep is computationally infeasible for large real datasets. In this paper, we describe an exact algorithm, called \({\textit{MOEN}}\), to enumerate motifs. The algorithm is an order of magnitude faster than the naive algorithm. The algorithm frees us from re-discovering the same motif at different lengths and tuning multiple data-dependent parameters. The speedup comes from using a novel bound on the similarity function across lengths and the algorithm uses only linear space unlike other motif discovery algorithms. We also describe an approximate extension of MOEN algorithm that is faster and suitable for larger datasets. We describe five case studies in entomology, sensor fusion, power consumption monitoring and activity recognition where \({\textit{MOEN}}\) enumerates several high-quality motifs.


















Similar content being viewed by others
References
Supporting webpage. http://www.cs.unm.edu/~mueen/Projects/MOEN.html
topsider.com marathon match. http://community.topcoder.com/tc?module=MatchDetails&rd=15618
Webpage to download the benchmark dataset for context recognition. http://cis.legacy.ics.tkk.fi/jhimberg/contextdata/scenarios.html
Brown AEX, Yemini EI, Grundy LJ, Jucikas T, Schafer WR (2013) A dictionary of behavioral motifs reveals clusters of genes affecting caenorhabditis elegans locomotion. Proc Natl Acad Sci 110(2):791–796
Cassisi C, Aliotta M, Cannata A, Montalto P, Patanè D, Pulvirenti A, Spampinato L (2013) Motif discovery on seismic amplitude time series: the case study of Mt. Etna 2011 eruptive activity. Pure Appl Geophys 170(4):529–545
Castro N, Azevedo P (2010) Multiresolution motif discovery in time series. In: Proceedings of the 2010 SIAM international conference on data mining, pp 665–676. http://epubs.siam.org/doi/abs/10.1137/1.9781611972801.73
Goldberger AL, Amaral LAN, Glass L, Hausdorff JM, Ivanov PC, Mark RG, Mietus JE, Moody GB, Peng C-K, Stanley HE (2000) Physiobank, physiotoolkit, and physionet: components of a new research resource for complex physiologic signals. Circulation 101(23):e215–e220
Keogh E, Kasetty S (2003) On the need for time series data mining benchmarks: a survey and empirical demonstration. Data Min Knowl Discov 7(4):349–371
Lam HT, Pham ND, Calders T (2011) Online discovery of top-k similar motifs in time series data. In: SIAM Conference on Data Mining, SDM ’11
Oates T, Lin J, Li Y (2012) Visualizing variable-length time series motifs. In: Proceedings of the 2012 SIAM international conference on data mining, pp 895–906. http://epubs.siam.org/doi/abs/10.1137/1.9781611972825.77
Lin J, Keogh E, Lonardi S, Patel P (2002) Finding motifs in time series. In: Proceedings of 2nd workshop on temporal data mining at KDD, pp 53–68
Makonin S, Popowich F, Bartram L, Gill B, Bajic IV (2013) AMPds: a public dataset for load disaggregation and eco-feedback research. In: Electrical power and energy conference (EPEC), 2013 IEEE, pp 1–6
Mäntyjärvi J, Himberg J, Kangas P, Tuomela U, Huuskonen P (2004) Sensor signal data set for exploring context recognition of mobile devices. In: Workshop “Benchmarks and a database for context recognition” in conjunction with the 2nd international conference on pervasive computing (PERVASIVE 2004)
Mueen A (2013) Enumeration of time series motifs of all lengths. ICDM
Mueen A, Keogh E (2010) Online discovery and maintenance of time series motifs. In: Proceedings of the 16th ACM SIGKDD international conference on knowledge discovery and data mining, KDD ’10. ACM, Washington, DC, pp 1089–1098. ISBN: 978-1-4503-0055-1
Mueen A, Keogh E (2011) Logical-shapelets: an expressive primitive for time series classification. In: Proceedings of the 17th ACM SIGKDD international conference on knowledge discovery and data mining, KDD ’11. ACM, New York, pp 1154–1162. ISBN: 978-1-4503-0813-7
Mueen A, Zhu Q, Cash S, Keogh E, Westover B (2009) Exact discovery of time series motifs. In: Proceedings of the 2009 SIAM International Conference on Data Mining, pp 473–484. http://epubs.siam.org/doi/abs/10.1137/1.9781611972795.41
Mueen A, Nath S, Liu J (2010) Fast approximate correlation for massive time-series data. In: SIGMOD conference, pp. 171–182
Narang A, Bhattcherjee S (2011) Real-time approximate range motif discovery & data redundancy removal algorithm. In: Proceedings of the 14th international conference on extending database technology, EDBT/ICDT ’11. ACM, New York, pp 485–496. ISBN: 978-1-4503-0528-0
Nunthanid P, Niennattrakul V, Ratanamahatana C (2011) Discovery of variable length time series motif. In: 2011 8th international conference on electrical engineering/electronics, computer, telecommunications and information technology (ECTI-CON), pp 472–475
Patel P, Keogh E, Lin J, Lonardi S (2002) Mining motifs in massive time series databases. In: Proceedings of the IEEE international conference on data mining, ICDM
Pohl H, Hadjakos A (2010) Dance pattern recognition using dynamic time warping. In: SMC 2010 proceedings, pp 183–190
Rakthanmanon T, Campana B, Mueen A, Batista G, Westover B, Zhu Q, Zakaria J, Keogh E (2012) Searching and mining trillions of time series subsequences under dynamic time warping. In: Proceedings of the 18th ACM SIGKDD international conference on knowledge discovery and data mining, KDD ’12. ACM, New York, pp 262–270. ISBN: 978-1-4503-1462-6
Rakthanmanon T, Keogh EJ, Lonardi S, Evans S (2011) Time series epenthesis: clustering time series streams requires ignoring some data. In: Proceedings of the 2011 IEEE 11th international conference on data mining, ICDM ’11, pp 547–556
Sakurai Y, Papadimitriou S, Faloutsos C (2005) Braid: stream mining through group lag correlations. In: SIGMOD conference, pp 599–610
Tanaka Y, Iwamoto K, Uehara K (2005) Discovery of time-series motif from multi-dimensional data based on MDL principle. Mach Learn 58:269–300
Tang H, Liao SS (2008) Discovering original motifs with different lengths from time series. Knowl Based Syst 21:666–671
Yankov D, Keogh E, Medina J, Chiu B, Zordan V (2007) Detecting time series motifs under uniform scaling. In: Proceedings of the 13th ACM SIGKDD international conference on knowledge discovery and data mining, KDD ’07, pp 844–853
Ye L, Keogh E (2009) Time series shapelets: a new primitive for data mining. In: Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining, KDD, pp 947–956
Yingchareonthawornchai S, Sivaraks H, Rakthanmanon T, Ratanamahatana C (2013) Efficient proper length time series motif discovery. In: 2013 IEEE 13th international conference on data mining (ICDM), pp 1265–1270
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Mueen, A., Chavoshi, N. Enumeration of time series motifs of all lengths. Knowl Inf Syst 45, 105–132 (2015). https://doi.org/10.1007/s10115-014-0793-4
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10115-014-0793-4