Abstract
Detecting repeated variable-length patterns, also called variable-length motifs, has received a great amount of attention in recent years. The state-of-the-art algorithm utilizes a fixed-length motif discovery algorithm as a subroutine to enumerate variable-length motifs. As a result, it may take hours or days to execute when the enumeration range is large. In this work, we introduce an approximate algorithm called hierarchical-based motif enumeration (HIME) to detect variable-length motifs with a large enumeration range in million-scale time series. We show in the experiments that the scalability of the proposed algorithm is significantly better than that of the state-of-the-art algorithm. Moreover, the motif length range detected by HIME is considerably larger than previous sequence matching-based approximate variable-length motif discovery approach. We demonstrate that HIME can efficiently detect meaningful variable-length motifs in long, real-world time series.




















Similar content being viewed by others
References
Begum N, Keogh E (2014) Rare time series motif discovery from unbounded streams. Proc VLDB Endow 8(2):149–160
Buza K, Schmidt-Thieme L (2010) Motif-based classification of time series with Bayesian networks and SVMS. In: Fink A, Lausen B, Seidel W, Ultsch A (eds) Advances in data analysis, data handling and business intelligence. Springer, Berlin, pp 105–114
Castro N, Azevedo PJ (2010) Multiresolution motif discovery in time series. In: Proceedings of the 2010 SIAM international conference on data mining. SIAM, pp 665–676
Chiu B, Keogh E, Lonardi S (2003) Probabilistic discovery of time series motifs. In: Proceedings of the ninth ACM SIGKDD international conference on knowledge discovery and data mining. ACM, pp 493–498
Gao Y, Li Q, Li X, Lin J, Rangwala H (2017) Trajviz: A tool for visualizing patterns and anomalies in trajectory. In: Joint European Conference on Machine Learning and Knowledge Discovery in Databases, pp 428–431. Springer
Gao Y, Lin J (2017) Efficient discovery of time series motifs with large length range in million scale time series. In: Data Mining (ICDM), 2017 IEEE International Conference on, pp 1213–1222. IEEE
Gao Y, Lin J (2018) Exploring variable-length time series motifs in one hundred million length scale. Data Min Knowl Discov 32(5):1200–1228
Gao Y, Lin J, Rangwala H (2016) Iterative grammar-based framework for discovering variable-length time series motifs. In: 2016 15th IEEE international conference on machine learning and applications (ICMLA). IEEE, pp 7–12
Goldberger AL, Amaral LA, Glass L, Hausdorff JM, Ivanov PC, Mark RG, Mietus JE, Moody GB, Peng C-K, Stanley HE (2000) Physiobank, physiotoolkit, and physionet components of a new research resource for complex physiologic signals. Circulation 101(23):e215–e220
Jessica Lin SL, Keogh E, Patel P (2002) Finding motifs in time series. In: Proceedings of the 2nd workshop on temporal data mining, pp 53–68
Keogh E, Lin J, Fu A (2005) Hot sax: efficiently finding the most unusual time series subsequence. In: 2005 IEEE 5th international conference on data mining (ICDM), p 8
Li Y, Lin J, Oates T (2012) Visualizing variable-length time series motifs. In: Proceedings of the 2012 SIAM international conference on data mining. SIAM, pp 895–906
Li Y, Yiu ML, Gong Z, et al (2015) Quick-motif: an efficient and scalable framework for exact motif discovery. In: 2015 IEEE 31st international conference on data engineering (ICDE). IEEE, pp 579–590
Lin J, Keogh E, Lonardi S, Lankford JP, Nystrom DM (2004) Visually mining and monitoring massive time series. In: Proceedings of the tenth ACM SIGKDD international conference on knowledge discovery and data mining. ACM, pp 460–469
Lin J, Keogh E, Wei L, Lonardi S (2007) Experiencing sax: a novel symbolic representation of time series. Data Min Knowl Discov 15(2):107–144
Linardi M, Zhu Y, Palpanas T, Keogh E (2018) Matrix profile x: Valmod-scalable discovery of variable-length motifs in data series. In: Proceedings of the 2018 international conference on management of data. ACM, pp 1053–1066
Lines J, Davis LM, Hills J, Bagnall A (2012) A shapelet transform for time series classification. In: Proceedings of the 18th ACM SIGKDD international conference on knowledge discovery and data mining. ACM, pp 289–297
Liu B, Li J, Chen C, Tan W, Chen Q, Zhou M (2015) Efficient motif discovery for large-scale time series in healthcare. IEEE Trans Ind Inf 11(3):583–590
Maaten Lvd, Hinton G (2008) Visualizing data using t-SNE. J Mach Learn Res 9(Nov):2579–2605
Meng J, Yuan J, Hans M, Wu Y (2008) Mining motifs from human motion. In: Proceedings of EUROGRAPHICS, vol 8
Minnen D, Starner T, Essa I, Isbell C (2006) Discovering characteristic actions from on-body sensor data. In: 2006 10th IEEE international symposium on wearable computers. IEEE, pp 11–18
Mohammad Y, Nishida T (2014) Exact discovery of length-range motifs. In: Intelligent information and database systems, pp 23–32. Springer, Berlin
Mohammad Y, Nishida T (2014) Scale invariant multi-length motif discovery. In: Modern advances in applied intelligence. Springer, Berlin, pp 417–426
Mueen A (2013) Enumeration of time series motifs of all lengths. In: 2013 IEEE 13th international conference on data mining (ICDM). IEEE, pp 547–556
Mueen A, Keogh E (2010) Online discovery and maintenance of time series motifs. In: Proceedings of the 16th ACM SIGKDD international conference on knowledge discovery and data mining. ACM, pp 1089–1098
Mueen A, Keogh E, Young N (2011) Logical-shapelets: an expressive primitive for time series classification. In: Proceedings of the 17th ACM SIGKDD international conference on knowledge discovery and data mining. ACM, pp 1154–1162
Mueen A, Keogh EJ, Zhu Q, Cash S, Westover MB (2009) Exact discovery of time series motifs. In: Proceedings of the 2009 SIAM international conference on data mining. SIAM, pp 473–484
Mueen A, Zhu Y, Yeh M, Kamgar K, Viswanathan K, Gupta C, Keogh E (2015) The fastest similarity search algorithm for time series subsequences under Euclidean distance. http://www.cs.unm.edu/~mueen/FastestSimilaritySearch.html
Murray D, Liao J, Stankovic L, Stankovic V, Hauxwell-Baldwin R, Wilson C, Coleman M, Kane T, Firth S (2015) A data management platform for personalised real-time energy feedback. In: Proceedings of the 8th international conference on energy efficiency in domestic appliances and lighting, pp 1–15
Nevill-Manning CG, Witten IH (1997) Identifying hierarchical structure in sequences: a linear-time algorithm. J Artif Intell Res (JAIR) 7:67–82
Nunthanid P, Niennattrakul V, Ratanamahatana CA (2011) Discovery of variable length time series motif. In: 2011 8th international conference on electrical engineering/electronics, computer, telecommunications and information technology (ECTI-CON). IEEE, pp 472–475
Rakthanmanon T, Campana B, Mueen A, Batista G, Westover B, Zhu Q, Zakaria J, Keogh E (2012) Searching and mining trillions of time series subsequences under dynamic time warping. In: Proceedings of the 18th ACM SIGKDD international conference on knowledge discovery and data mining. ACM, pp 262–270
Senin P, Lin J, Wang X, Oates T, Gandhi S, Boedihardjo AP, Chen C, Frankenstein S, Lerner M (2014) Grammarviz 2.0: a tool for grammar-based pattern discovery in time series. In: Machine learning and knowledge discovery in databases. Springer, Berlin, pp 468–472
Shokoohi-Yekta M, Chen Y, Campana B, Hu B, Zakaria J, Keogh E (2015) Discovery of meaningful rules in time series. In: Proceedings of the 21th ACM SIGKDD international conference on knowledge discovery and data mining. ACM, pp 1085–1094
Skaletsky H, Kuroda-Kawaguchi T, Minx PJ, Cordum HS, Hillier L, Brown LG, Repping S, Pyntikova T, Ali J, Bieri T et al (2003) The male-specific region of the human y chromosome is a mosaic of discrete sequence classes. Nature 423(6942):825–837
Tang H, Liao SS (2008) Discovering original motifs with different lengths from time series. Knowl Based Syst 21(7):666–671
Wang X, Lin J, Senin P, Oates T, Gandhi S, Boedihardjo AP, Chen C, Frankenstein S (2016) Rpm: representative pattern mining for efficient time series classification. In: 19th international conference on extending database technology (EDBT), pp 185–196
Bob P, Willem-Pier V, Sander P, Jonathon J (2005) Xeno-Canto. www.xeno-canto.org. Accessed 30 May 2005
Ye L, Keogh E (2009) Time series shapelets: a new primitive for data mining. In: Proceedings of the 15th ACM SIGKDD international conference on knowledge discovery and data mining. ACM, pp 947–956
Yeh C-CM, Zhu Y, Ulanova L, Begum N, Ding Y, Dau HA, Silva DF, Mueen A, Keogh E (2016) Matrix profile i: all pairs similarity joins for time series: a unifying view that includes motifs, discords and shapelets. In: 2016 IEEE 16th international conference on data mining (ICDM), pp 1317–1322
Zhang X, Zhao L, Boedihardjo AP, Lu C-T, Ramakrishnan N (2017) Spatiotemporal event forecasting from incomplete hyper-local price data. In: Proceedings of the 2017 ACM on conference on information and knowledge management. ACM, pp 507–516
Zhu Y, Schall-Zimmerman Z, Senobari NS, Yeh C-CM, Funning G, Mueen A, Brisk P, Keogh EJ (2016) Matrix profile II: exploiting a novel algorithm and gpus to break the one hundred million barrier for time series motifs and joins. In: 2016 IEEE 16th international conference on data mining (ICDM), pp 739–748
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Gao, Y., Lin, J. HIME: discovering variable-length motifs in large-scale time series. Knowl Inf Syst 61, 513–542 (2019). https://doi.org/10.1007/s10115-018-1279-6
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10115-018-1279-6