Abstract
Many time series data mining algorithms work by reasoning about the relationships the conserved shapes of subsequences. To facilitate this, the Matrix Profile is a data structure that annotates a time series by recording each subsequence’s Euclidean distance to its nearest neighbor. In recent years, the community has shown that using the Matrix Profile it is possible to discover many useful properties of a time series, including repeated behaviors (motifs), anomalies, evolving patterns, regimes, etc. However, the Matrix Profile is limited to representing the relationship between the subsequence’s shapes. It is understood that, for some domains, useful information is conserved not in the subsequence’s shapes, but in the subsequence’s features. In recent years, a new set of features for time series called catch22 has revolutionized feature-based mining of time series. Combining these two ideas seems to offer many possibilities for novel data mining applications; however, there are two difficulties in attempting this. A direct application of the Matrix Profile with the catch22 features would be prohibitively slow. Less obviously, as we will demonstrate, in almost all domains, using all twenty-two of the catch22 features produces poor results, and we must somehow select the subset appropriate for the domain. In this work, we introduce novel algorithms to solve both problems and demonstrate that, for most domains, the proposed C22MP is a state-of-the-art anomaly detector.























Similar content being viewed by others
Data availability
C22MP (2022) Supporting webpage: sites.google.com/view/c22mp/home.
Notes
The two most cited datasets for evaluating TSAD algorithms are tiny: NY-Taxi (length 10,320) and Yahoo! Webscope (mean length 1415) [49].
This contrived example is not as implausible as it may seem. Suppose we are monitoring the accelerometer time series from a smartphone in a user’s pocket. If the user takes a call, and then returns the phone to her pocket upside down, the Y-axis time series will flip upside down, but will not be flipped backwards.
In blog forums, private conversations, openreview.net etc.
References
Agrahari R et al (2022) Assessing feature representations for instance-based cross-domain anomaly detection in cloud services univariate time series data. IoT 3(1):123–144
Alzantot M, Chakraborty S, Srivastava M (2017) Sensegen: a deep learning architecture for synthetic sensor data generation. In: 2017 IEEE international conference on pervasive computing and communications workshops (PerCom Workshops), pp 188–193. IEEE
Aminifar F et al (2022) A review of power system protection and asset management with machine learning techniques. Energy Syst 13(4):855–892
Audibert J, Marti S, Guyard F, Zuluaga MA (2021) From univariate to multivariate time series anomaly detection with non-local information. In: International workshop on advanced analytics and learning on temporal data, pp 186–194. Springer, Cham
Audibert J, Michiardi P, Guyard F, Marti S, Zuluaga MA (2022) Do deep neural networks contribute to multivariate time series anomaly detection? arXiv preprint https://arxiv.org/abs/2204.01637
Audibert J, Michiardi P, Guyard F, Marti S, Zuluaga MA (2020) USAD: unsupervised anomaly detection on multivariate time series. In: Proceedings of the 26th ACM SIGKDD international conference on knowledge discovery & data mining, pp 3395–3404
Boniol P, Linardi M, Roncallo F, Palpanas T, Meftah M, Remy E (2021) Unsupervised and scalable subsequence anomaly detection in large data series. VLDB J 30(6):909–931
Brophy E, Wang Z, She Q, Ward T (2021) Generative adversarial networks in time series: A survey and taxonomy. arXiv preprint https://arxiv.org/abs/2107.11098
C22MP (2022) Supporting webpage: sites.google.com/view/c22mp/home
Dau HA et al (2019) The UCR time series archive. IEEE/CAA J Automatica Sinica 6(6):1293–1305
Fährmann D, Damer N, Kirchbuchner F, Kuijper A (2022) Lightweight long short-term memory variational auto-encoder for multivariate time series anomaly detection in industrial control systems. Sensors 22(8):2886
Fengming Z, Shufang L, Zhimin G, Bo W, Shiming T, Mingming P (2017) Anomaly detection in smart grid based on encoder-decoder framework with recurrent neural network. J China Univ Posts Telecommun 24(6):67–73
Geiger A, Liu D, Alnegheimish S, Cuesta-Infante A, Veeramachaneni K (2020) Tadgan: time series anomaly detection using generative adversarial networks. In: 2020 IEEE international conference on big data (Big Data), pp 33–43. IEEE
Goh J, Adepu S, Junejo KN, Mathur A (2016) A dataset to support research in the design of secure water treatment systems. In: International conference on critical information infrastructures security, pp 88–99. Springer
Haq IU, Lee BS (2023) TransNAS-TSAD: harnessing transformers for multi-objective neural architecture search in time series anomaly detection. arXiv preprint https://arxiv.org/abs/2311.18061
Huet A, Navarro JM, Rossi D (2022) Local evaluation of time series anomaly detection algorithms. In: Proceedings of the 28th ACM SIGKDD, pp 635–645
Hundman K, Constantinou V, Laporte C, Colwell I, Soderstrom T (2018) Detecting spacecraft anomalies using lstms and nonparametric dynamic thresholding. In: Proceedings of 24th ACM SIGKDD, pp 387–395
Idé T (2006) Why does subsequence time-series clustering produce sine waves? In: Knowledge discovery in databases: PKDD 2006: 10th European conference on principles and practice of knowledge discovery in databases Berlin, Germany, Proceedings, vol 10, pp 211–222. Springer, Berlin
Jackson TD et al (2021) The motion of trees in the wind: a data synthesis. Biogeosciences 18(13):4059–4072
Keogh E, Lin J (2005) Clustering of time-series subsequences is meaningless: implications for previous and future research. Knowl Inf Syst 8:154–177
Kravchik M, Shabtai A (2021) Efficient cyber attack detection in industrial control systems using lightweight neural networks and PCA. IEEE Trans Depend Secure Comput 19(4):2179–2197
Lai KH, Zha D, Xu J, Zhao Y, Wang G, Hu X (2021) Revisiting time series outlier detection: Definitions and benchmarks. In: 35th Conference on NeurIPS datasets and benchmarks track
Li D, Chen D, Jin B, Shi L, Goh J, Ng SK (2019) MAD-GAN: multivariate anomaly detection for time series data with generative adversarial networks. In: Artificial neural networks and machine learning—ICANN 2019: text and time series: 28th international conference on artificial neural networks, Munich, Germany, Proceedings, part IV, pp 703–716. Springer, Cham
Liu HY, Gao ZZ, Wang ZH, Deng YH (2022) Time series classification with shapelet and canonical features. Appl Sci 12(17):8685
Loh WY (2011) Classification and regression trees. Wiley Interdiscip Rev Data Min Knowl Discov 1(1):14–23
Lu Y, Wu R, Mueen A, Zuluaga MA, Keogh E (2022) Matrix profile XXIV: scaling time series anomaly detection to trillions of datapoints and ultra-fast arriving data streams. In: Proceedings of the 28th ACM SIGKDD conference on knowledge discovery and data mining, pp 1173–1182
Lubba CH, Sethi SS, Knaute P, Schultz SR, Fulcher BD, Jones NS (2019) catch22: CAnonical time-series CHaracteristics. Data Min Knowl Disc 33(6):1821–1852
Lauer J, Zhou M, Ye S, Menegas W, Nath T, Rahman MM, Di Santo V, Soberanes D, Feng G, Murthy VN, Lauder G (2021) Multi-animal pose estimation and tracking with DeepLabCut. BioRxiv
MacQueen J (1967) Classification and analysis of multivariate observations. In: 5th Berkeley symposium on mathematics and statistics and probability, pp 281–297. University of California, Los Angeles
Marimon X, Traserra S, Jiménez M, Ospina A, Benítez R (2022) Detection of abnormal cardiac response patterns in cardiac tissue using deep learning. Mathematics 10(15):2786
Munir M, Siddiqui SA, Dengel A, Ahmed S (2018) DeepAnT: a deep learning approach for unsupervised anomaly detection in time series. IEEE Access 19(7):1991–2005
Nakamura T, Imamura M, Mercer R, Keogh E (2020) Merlin: parameter-free discovery of arbitrary length anomalies in massive time series archives. In: 2020 IEEE ICDM, pp 1190–1195
Park D, Hoshi Y, Kemp CC (2018) A multimodal anomaly detector for robot-assisted feeding using an LSTM-based variational autoencoder. IEEE Robot Autom Lett 3(3):1544–1551
Ren H, Xu B, Wang Y, Yi C, Huang C, Kou X, Xing T, Yang M, Tong J, Zhang Q (2019) Time-series anomaly detection service at microsoft. In: Proceedings of the 25th ACM SIGKDD international conference on knowledge discovery & data mining, pp 3009–3017
Rewicki F, Denzler J, Niebling J (2022) Is it worth it? An experimental comparison of six deep-and classical machine learning methods for unsupervised anomaly detection in time series. arXiv preprint https://arxiv.org/abs/2212.11080
Saarela M, Jauhiainen S (2021) Comparison of feature importance measures as explanations for classification models. SN Appl Sci 3:1–2
Thompson DW (1917) On growth and form. Cambridge University Press
Tuli S, Casale G, Jennings NR. Tranad: deep transformer networks for anomaly detection in multivariate time series data. arXiv preprint https://arxiv.org/abs/2201.07284
Turowski M et al. (2022) Modeling and generating synthetic anomalies for energy and power time series. In: Proceedings of the 13th ACM e-Energy, pp 471–484
Wang R, Liu C, Mou X, Guo X, Gao K, Liu P, Wo T, Liu X (2022) Deep contrastive one-class time series anomaly detection. arXiv preprint https://arxiv.org/abs/2207.01472
Wen Q, Sun L, Yang F, Song X, Gao J, Wang X, Xu H (2020) Time series data augmentation for deep learning: a survey. arXiv preprint https://arxiv.org/abs/2002.12478
Wu R, Keogh E (2021) Current time series anomaly detection benchmarks are flawed and are creating the illusion of progress. IEEE TKDE
Yairi T, Kato Y, Hori K (2001) Fault detection by mining association rules from house-keeping data. In: Proceedings of the 6th international symposium on artificial intelligence, robotics and automation in space, vol 18, p 21. Citeseer
Yankov D, Keogh E, Rebbapragada U (2008) Disk aware discord discovery: finding unusual time series in terabyte sized datasets. Knowl Inf Syst 17:241–262
Yoon J, Jarrett D, Van der Schaar M (2019) Time-series generative adversarial networks. In: Advances in neural information processing systems, vol 32
Zhang C, Kuppannagari SR, Kannan R, Prasanna VK (2018) Generative adversarial network for synthetic time series data generation in smart grids. In: 2018 IEEE international conference on communications, control, and computing technologies for smart grids (SmartGridComm), pp 1–6. IEEE
Zhu Y, Yeh CC, Zimmerman Z, Kamgar K, Keogh E (2018) Matrix profile XI: SCRIMP++: time series motif discovery at interactive speeds. In: 2018 IEEE ICDM, pp 837–846
Acknowledgements
We thank all the creators of the data sets used in this work and the original authors of catch22, who were very helpful with their time [33].
Funding
Funding was provided by gifts from Google, Mitsubishi and by NSF Award 2103976.
Author information
Authors and Affiliations
Contributions
ST involved in algorithm design, writing, and implementation. YL involved in design of algorithm comparison measure. RW involved in optimization of code. TVAS involved in design of normalization algorithm. HDC involved in design of biological (mouse) experiments. RM involved in design of DNA experiments. EK involved in writing and editing.
Corresponding author
Ethics declarations
Competing interests
The authors declare that they have no competing interests.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Tafazoli, S., Lu, Y., Wu, R. et al. C22MP: the marriage of catch22 and the matrix profile creates a fast, efficient and interpretable anomaly detector. Knowl Inf Syst 66, 4789–4823 (2024). https://doi.org/10.1007/s10115-024-02107-5
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10115-024-02107-5