skip to main content
10.1145/3183713.3183744acmconferencesArticle/Chapter ViewAbstractPublication PagesmodConference Proceedingsconference-collections
research-article

Matrix Profile X: VALMOD - Scalable Discovery of Variable-Length Motifs in Data Series

Published: 27 May 2018 Publication History

Abstract

In the last fifteen years, data series motif discovery has emerged as one of the most useful primitives for data series mining, with applications to many domains, including robotics, entomology, seismology, medicine, and climatology. Nevertheless, the state-of-the-art motif discovery tools still require the user to provide the motif length. Yet, in at least some cases, the choice of motif length is critical and unforgiving. Unfortunately, the obvious brute-force solution, which tests all lengths within a given range, is computationally untenable. In this work, we introduce VALMOD, an exact and scalable motif discovery algorithm that efficiently finds all motifs in a given range of lengths. We evaluate our approach with five diverse real datasets, and demonstrate that it is up to 20 times faster than the state-of-the-art. Our results also show that removing the unrealistic assumption that the user knows the correct length, can often produce more intuitive and actionable results, which could have been missed otherwise.

References

[1]
Rakesh Agrawal, Christos Faloutsos, and Arun N. Swami. 1993. Efficient Similarity Search In Sequence Databases. In FODO '93 . 69--84.
[2]
Alessandro Camerra, Themis Palpanas, Jin Shieh, and Eamonn Keogh. 2010. iSAX 2.0: Indexing and mining one billion time series. In ICDM .
[3]
Alessandro Camerra, Jin Shieh, Themis Palpanas, Thanawin Rakthanmanon, and Eamonn J. Keogh. 2014. Beyond one billion time series: indexing and mining very large time series collections with iSAX2+. KAIS 39, 1 (2014), 123--151.
[4]
Edwin Cartlidge. Oct. 3, 2016. Seven-year legal saga ends as Italian official is cleared of manslaughter in earthquake trial. Science (Oct. 3, 2016).
[5]
Bill Yuan Chiu, Eamonn J. Keogh, and Stefano Lonardi. 2003. Probabilistic discovery of time series motifs. In SIGKDD 2003 . 493--498.
[6]
Michele Dallachiesa, Themis Palpanas, and Ihab F. Ilyas. 2014. Top-k Nearest Neighbor Search In Uncertain Data Series. PVLDB 8, 1 (2014), 13--24.
[7]
MG Terzano et al. Sleep Med 2001 Nov, 2(6):537--553. Atlas, rules, and recording techniques for the scoring of cyclic alternating pattern (CAP) in human sleep. (Sleep Med 2001 Nov, 2(6):537--553).
[8]
Yifeng Gao, Jessica Lin, and Huzefa Rangwala. 2016. Iterative Grammar-Based Framework for Discovering Variable-Length Time Series Motifs. In ICMLA 2016 . 7--12.
[9]
C. Gisler, A. Ridi, D. Zufferey, O. A. Khaled, and J. Hennebert. 2013. Appliance consumption signature database and recognition test protocols. In 2013 WoSSPA) . 336--341.
[10]
Glass L Hausdorff JM Ivanov PCh Mark RG Mietus JE Moody GB Peng C-K Stan- ley HE. Goldberger AL, Amaral LAN. 2000 June 13. PhysioBank, PhysioToolkit, and PhysioNet: Components of a New Research Resource for Complex Physio- logic Signals. (2000 June 13). http://circ.ahajournals.org/cgi/content/full/101/23/ e215
[11]
Josif Grabocka, Nicolas Schilling, and Lars Schmidt-Thieme. 2016. Latent Time- Series Motifs. TKDD 11, 1 (2016), 6:1--6:20.
[12]
Picard RW. Healey JA. June 2016. Detecting stress during real-world driving tasks using physiological sensors. IEEE Transactions in Intelligent Transportation Systems 6(2):156--166 (June 2016).
[13]
H. V. Jagadish, Alberto O. Mendelzon, and Tova Milo. 1995. Similarity-Based Queries. In ACM SIGACT-SIGMOD-SIGART Symposium .
[14]
Søren Kejser Jensen, Torben Bach Pedersen, and Christian Thomsen. 2017. Time Series Management Systems: A Survey. IEEE Trans. Knowl. Data Eng. 29, 11 (2017), 2581--2600.
[15]
Shrikant Kashyap and Panagiotis Karras. 2011. Scalable kNN search on vertically stored time series. In KDD .
[16]
Eamonn J. Keogh. 2011. Machine Learning in Time Series Databases (tutorial).
[17]
Eamonn J. Keogh, Kaushik Chakrabarti, Sharad Mehrotra, and Michael J. Pazzani. Locally Adaptive Dimensionality Reduction for Indexing Large Time Series Databases. In ACM SIGMOD 2001 .
[18]
Haridimos Kondylakis, Niv Dayan, Kostas Zoumpatianos, and Themis Palpanas. 2018. Coconut: A Scalable Bottom-Up Approach for Building Data Series Indexes. In PVLDB .
[19]
Yuhong Li, Leong Hou U, Man Lung Yiu, and Zhiguo Gong. 2015. Quick-motif: An efficient and scalable framework for exact motif discovery. (2015), 579--590 pages.
[20]
M. Lichman. 2013. UCI Machine Learning Repository. (2013). http://archive.ics. uci.edu/ml
[21]
Michele Linardi. 2017. VALMOD support web page. (2017). http://www.mi. parisdescartes.fr/~mlinardi/VALMOD.html
[22]
Michele Linardi and Themis Palpanas. 2018. ULISSE: ULtra compact Index for Variable-Length Similarity SEarch in Data Series. In ICDE .
[23]
Wei Luo, Marcus Gallagher, and Janet Wiles. 2013. Parameter-Free Search of Time-Series Discord. Journal of Computer Science and Technology (2013).
[24]
A. Marzal and E. Vidal. 1993. Computation of Normalized Edit Distance and Applications. IEEE Trans. Pattern Anal. Mach. Intell. 15, 9 (Sept. 1993). {25} David Minnen, Charles Lee Isbell Jr., Irfan A. Essa, and Thad Starner. Discovering Multivariate Motifs using Subsequence Density Estimation and Greedy Mixture Learning. In AAAI Conference on Artificial Intelligence, 2007 .
[25]
Katsiaryna Mirylenka, Vassilis Christophides, Themis Palpanas, Ioannis Pe- fkianakis, and Martin May. Characterizing Home Device Usage From Wireless Traffic Time Series. In EDBT, 2016 .
[26]
Y. Mohammad and T. Nishida. 2012. Unsupervised discovery of basic human ac- tions from activity recording datasets. In 2012 IEEE/SICE International Symposium on System Integration (SII) .
[27]
Yasser F. O. Mohammad and Toyoaki Nishida. Exact Discovery of Length-Range Motifs. In Intelligent Information and Database Systems - 6th Asian Conference, ACIIDS 2014 .
[28]
Abdullah Mueen. Enumeration of Time Series Motifs of All Lengths. In ICDM, 2013 .
[29]
Abdullah Mueen, Hossein Hamooni, and Trilce Estrada. 2014. Time Series Join on Subsequence Correlation. In 2014 IEEE International Conference on Data Mining, ICDM 2014, Shenzhen, China, December 14--17, 2014 . 450--459.
[30]
Abdullah Mueen, Eamonn J. Keogh, Qiang Zhu, Sydney Cash, and M. Brandon Westover. Exact Discovery of Time Series Motifs. In SDM 2009 .
[31]
Moss C. B. Neupane, D. and A. H. 2016. van Bruggen. 2016. Estimating citrus production loss due to citrus huanglongbing in Florida. Annual Meeting, Southern Agricultural Economics Association, San Antonio, TX. (2016).
[32]
Michael. Noskov. 2015. Director, Data Science at Aspen Technology. Personal communication. (2015).
[33]
Themis Palpanas. Big Sequence Management: A glimpse of the Past, the Present, and the Future. In SOFSEM 2016 .
[34]
Themis Palpanas. 2015. Data Series Management: The Road to Big Sequence Analytics. SIGMOD Record 44, 2 (2015), 47--52.
[35]
Themis Palpanas. 2017. The Parallel and Distributed Future of Data Series Mining. In High Performance Computing &Simulation (HPCS) .
[36]
Spiros Papadimitriou and Philip S. Yu. Optimal multi-scale patterns in time series streams. In ACM SIGMOD 2006 .
[37]
Davood Rafiei and Alberto Mendelzon. 1998. Efficient Retrieval of Similar Time Sequences Using DFT. In ICDE .
[38]
Usman Raza, Alessandro Camerra, Amy L. Murphy, Themis Palpanas, and Gian Pietro Picco. 2015. Practical Data Prediction for Real-World Wireless Sensor Networks. IEEE Trans. Knowl. Data Eng. (2015).
[39]
D. Roverso. 2000. Multivariate Temporal Classification by Windowed Wavelet Decomposition and Recurrent Networks. In ANS International Topical Meeting on Nuclear Plant Instrumentation, Control and Human-Machine Interface .
[40]
W.H.Baumgartner G.Ponti C.R.Shrader P. Lubinski H.A.Krimm F. Mattana J. Tueller S. Soldi, V. Beckmann. 2014. Long-term variability of AGN at hard X-rays. Astronomy &Astrophysics (2014).
[41]
Suchi Saria, Andrew Duchi, and Daphne Koller. Discovering Deformable Motifs in Continuous Time Series Data. In IJCAI 2011 .
[42]
Jin Shieh and Eamonn Keogh. 2008. iSAX: Indexing and Mining Terabyte Sized Time Series. In SIGKDD . 623--631.
[43]
Saurabh Sinha. 2002. Discriminative motifs. In Proceedings of the Sixth Annual International Conference on Computational Biology, RECOMB 2002 . 291--298.
[44]
Zeeshan Syed, Collin M. Stultz, Manolis Kellis, Piotr Indyk, and John V. Guttag. 2010. Motif discovery in physiological datasets: A methodology for inferring predictive elements. TKDD 4, 1 (2010), 2:1--2:23.
[45]
J. Wang, A. Balasubramanian, L. Mojica de la Vega, J. Green, A. Samal, and B. Prabhakaran. Word recognition from continuous articulatory movement time- series data using symbolic representations. In Workshop on Speech and Language Processing for Assistive Technologies. (SLPAT) (2013).
[46]
Yang Wang, Peng Wang, Jian Pei, Wei Wang, and Sheng Huang. 2013. A Data- adaptive and Dynamic Segmentation Index for Whole Matching on Time Series. PVLDB (2013).
[47]
CW Whitney, DJ Gottlieb, S Redline, RG Norman, RR Dodge, E Shahar, S Surovec, and FJ Nieto. 1998. Reliability of scoring respiratory disturbance indices and sleep staging. Sleep (November 1998).
[48]
Denis S. Willett, Justin George, Nora S. Willett, Lukasz L. Stelinski, and Stephen L. Lapointe. 2016. Machine Learning for Characterization of Insect Vector Feeding. PLOS Computational Biology (2016).
[49]
Djamel-Edine Yagoubi, Reza Akbarinia, Florent Masseglia, and Themis Palpanas. 2017. DPiSAX: Massively Distributed Partitioned iSAX.
[50]
Dragomir Yankov, Eamonn J. Keogh, Jose Medina, Bill Yuan-chi Chiu, and Victor B. Zordan. Detecting time series motifs under uniform scaling. In ACM.
[51]
Lexiang Ye and Eamonn J. Keogh. 2009. Time series shapelets: a new primitive for data mining. In KDD .
[52]
Chin-Chia Michael Yeh, Yan Zhu, Liudmila Ulanova, Nurjahan Begum, Yifei Ding, Hoang Anh Dau, Diego Furtado Silva, Abdullah Mueen, and Eamonn J. Keogh. Matrix Profile I: All Pairs Similarity Joins for Time Series: A Unifying View That Includes Motifs, Discords and Shapelets. In IEEE, ICDM 2016 .
[53]
Sorrachai Yingchareonthawornchai, Haemwaan Sivaraks, Thanawin Rakthan- manon, and Chotirat Ann Ratanamahatana. 2013. Efficient Proper Length Time Series Motif Discovery. In 2013 IEEE ICDM . 1265--1270.
[54]
Yan Zhu, Abdullah Mueen, and Eamonn Keogh. 2018. Admissible Time Series Motif Discovery with Missing Data. (2018). arXiv:1802.05472
[55]
Yan Zhu, Zachary Zimmerman, Nader Shakibay Senobari, Chin-Chia Michael Yeh, Gareth Funning, Abdullah Mueen, Philip Brisk, and Eamonn J. Keogh. Matrix Profile II: Exploiting a Novel Algorithm and GPUs to Break the One Hundred Mil- lion Barrier for Time Series Motifs and Joins. In IEEE 16th International Conference on Data Mining, ICDM 2016, December 12--15, 2016, Barcelona, Spain .
[56]
Kostas Zoumpatianos, Stratos Idreos, and Themis Palpanas. 2016. ADS: the adaptive data series index. VLDB J. 25, 6 (2016), 843--866.
[57]
Kostas Zoumpatianos, Yin Lou, Themis Palpanas, and Johannes Gehrke. 2015. Query Workloads for Data Series Indexes. In ACM SIGKDD, 2015 . 1603--1612.
[58]
Kostas Zoumpatianos and Themis Palpanas. 2018. Data Series Management: Fulfilling the Need for Big Sequence Analytics. In ICDE .

Cited By

View all
  • (2025)Feedback-Driven Pattern Matching in Time Series DataIEEE Access10.1109/ACCESS.2024.352033713(1764-1777)Online publication date: 2025
  • (2025)Machining Cycle Detection Based Expert System for Improving Energy Efficiency in ManufacturingSustainable Manufacturing as a Driver for Growth10.1007/978-3-031-77429-4_73(659-667)Online publication date: 7-Jan-2025
  • (2024)Persistence-Based Motif Discovery in Time SeriesIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2024.341730336:11(6814-6827)Online publication date: Nov-2024
  • Show More Cited By

Index Terms

  1. Matrix Profile X: VALMOD - Scalable Discovery of Variable-Length Motifs in Data Series

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    SIGMOD '18: Proceedings of the 2018 International Conference on Management of Data
    May 2018
    1874 pages
    ISBN:9781450347037
    DOI:10.1145/3183713
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 27 May 2018

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. data mining
    2. data series
    3. motif discovery
    4. time series
    5. variable length

    Qualifiers

    • Research-article

    Conference

    SIGMOD/PODS '18
    Sponsor:

    Acceptance Rates

    SIGMOD '18 Paper Acceptance Rate 90 of 461 submissions, 20%;
    Overall Acceptance Rate 785 of 4,003 submissions, 20%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)52
    • Downloads (Last 6 weeks)8
    Reflects downloads up to 01 Mar 2025

    Other Metrics

    Citations

    Cited By

    View all
    • (2025)Feedback-Driven Pattern Matching in Time Series DataIEEE Access10.1109/ACCESS.2024.352033713(1764-1777)Online publication date: 2025
    • (2025)Machining Cycle Detection Based Expert System for Improving Energy Efficiency in ManufacturingSustainable Manufacturing as a Driver for Growth10.1007/978-3-031-77429-4_73(659-667)Online publication date: 7-Jan-2025
    • (2024)Persistence-Based Motif Discovery in Time SeriesIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2024.341730336:11(6814-6827)Online publication date: Nov-2024
    • (2024)Scalable Order-Preserving Pattern Mining2024 IEEE International Conference on Data Mining (ICDM)10.1109/ICDM59182.2024.00028(211-220)Online publication date: 9-Dec-2024
    • (2024)Linear-trend normalization for multivariate subsequence similarity search2024 IEEE 40th International Conference on Data Engineering Workshops (ICDEW)10.1109/ICDEW61823.2024.00028(167-175)Online publication date: 13-May-2024
    • (2024)LoCoMotif: discovering time-warped motifs in time seriesData Mining and Knowledge Discovery10.1007/s10618-024-01032-z38:4(2276-2305)Online publication date: 1-Jul-2024
    • (2024)SOCXAI: Leveraging CNN and SHAP Analysis for Battery SOC Estimation and Anomaly DetectionComputational Science – ICCS 202410.1007/978-3-031-63783-4_14(177-191)Online publication date: 29-Jun-2024
    • (2024)Convolutional Sparse Coding for Time Series Via a ℓ0$$ {\mathrm{\ell}}_0 $$ Penalty: An Efficient Algorithm With Statistical GuaranteesStatistical Analysis and Data Mining: The ASA Data Science Journal10.1002/sam.7000017:6Online publication date: 16-Dec-2024
    • (2023)Time Series Data Mining for Sport Data: a ReviewInternational Journal of Computer Science in Sport10.2478/ijcss-2022-000821:2(17-31)Online publication date: 17-Jan-2023
    • (2023)A Symbolic Representation of Two-Dimensional Time Series for Arbitrary Length DTW Motif2023 IEEE International Conference on Data Mining (ICDM)10.1109/ICDM58522.2023.00125(1067-1072)Online publication date: 1-Dec-2023
    • Show More Cited By

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Figures

    Tables

    Media

    Share

    Share

    Share this Publication link

    Share on social media