Abstract
Variable-length anomalous subsequence detection in time series has many important applications in the real world, yet the methods presented in existing studies are computationally expensive, as the detection techniques are mostly brute-force approaches. In this work, we formalize the detection problem into a subsequence segmentation problem (SSP) optimization task, in which the time series is segmented by a set of cutting points into subsequences with minimized total distances to the representative motif. The anomalous subsequences can then be accurately located by reducing the dissimilarity among all subsequences, and this technique, when compared to existing techniques, can reduce the number of comparisons required for search. We further introduce a new clustering-based and swarm intelligence-based evolutionary algorithm (CBSI) in this work to solve the highly complex SSP efficiently. The proposed method balances the scopes of exploration and exploitation under a local-global search strategy. The CBSI clusters the solutions in the search space into groups, allowing frequent information sharing among solutions in the same cluster for their exploitation within their own search spaces. Furthermore, the best local solutions are promoted by the global-search strategy to explore the remaining search regions. Through a comparison with existing state-of-the-art techniques in solving both synthetic and real-world problems, we show that any optimization methods under our proposed SSP bring significant computational savings and comparable searching accuracy compared to existing techniques for the detection task. Our proposed CBSI also has the highest searching capability compared to existing and related optimization methods. The experimental results also highlight the scalability of our study to longer time series, larger anomaly sizes and wider search ranges.
Similar content being viewed by others
Data Availibility Statement
All datasets used in this paper are available online. In specific, all synthesized datasets in Section 4.2 are collected from UCR time series classification dataset, see Reference [4] for details. The ECG time series data in Section 4.2.1 is collected from the MIT-BIT Arrhythmia database, see Reference 11 for details. The traffic volume dataset is collected from the Freeway Bureau of Taiwan, see Reference 6 for details.
References
Abdollahzadeh B, Gharehchopogh FS, Mirjalili S (2021) African vultures optimization algorithm: a new nature-inspired metaheuristic algorithm for global optimization problems. Comput Ind Eng 158:107408
Ahmed M, Mahmood AN, Islam MR (2016) A survey of anomaly detection techniques in financial domain. Future Gener Comput Syst 55:278–288. https://doi.org/10.1016/j.future.2015.01.001
Crawford B, Soto R, Astorga G et al (2017) Putting continuous metaheuristics to work in binary search spaces. Complexity 2017:1–19. https://doi.org/10.1155/2017/8404231
Dau HA, Bagnall A, Kamgar K et al (2018) The ucr time series classification archive. https://www.cs.ucr.edu/~eamonn/time_series_data_2018/
Febrero M, Galeano P, González-Manteiga W (2007) Outlier detection in functional data by depth measures, with application to identify abnormal nox levels. Envirometrics 19(4):331–345. https://doi.org/10.1002/env.878
Freeway Bureau (2022) The ministry of transportation and communications of Taiwan. https://tisvcloud.freeway.gov.tw
Gálvez J, Cuevas E, Becerra H et al (2020) A hybrid optimization approach based on clustering and chaotic sequences. Int J Mach Learn Cybernet 11:359–401
Gupta A, Datta S, Das S (2018) Fast automatic estimation of the number of clusters from the minimum inter-center distance for k-means clustering. Pattern Recognit Lett 116:72–79. https://doi.org/10.1016/j.patrec.2018.09.003
Heidari AA, Mirjalili S, Faris H et al (2019) Harris hawks optimization: algorithm and applications. Futur Gener Comput Syst 97:849–872
Hu M, Feng X, Ji Z et al (2019) A novel computational approach for discord search with local recurrence rates in multivariate time series. Inf Sci 477:220–233. https://doi.org/10.1016/j.ins.2018.10.047
Keogh E, Lin J, Fu A (2005) Hot sax: efficiently finding the most unusual time series subsequence. In: The fifth IEEE international conference on data mining. IEEE Computer Society, 1106352, pp 226–233. https://doi.org/10.1109/icdm.2005.79. http://www.cs.cuhk.hk/~adafu/Pub/icdm05time.pdf
Levine J, Ducatelle F (2004) Ant colony optimization and local search for bin packing and cutting stock problems. J Oper Res Soc 55:705–716
Li S, Chen H, Wang M et al (2020) Slime mould algorithm: a new method for stochastic optimization. Futur Gener Comput Syst 111:300–323
Li Y, Lin J, Oates T (2012) Visualizing variable-length time series motifs. In: SDM. Society for industrial and applied mathematics, pp 895–906. https://doi.org/10.1137/1.9781611972825.77
Lin J, Keogh E, Lonardi S, et al (2003) A symbolic representation of time series, with implications for streaming algorithms. In: Zaki MJ, Aggarwal CC (eds) Proceedings of the 8th ACM SIGMOD workshop on Research issues in data mining and knowledge discovery - DMKD ’03. Association for Computing Machinery, New York, NY, USA, DMKD ’03, pp 2–11. https://doi.org/10.1145/882082.882086
Linardi M, Zhu Y, Palpanas T et al (2020) Matrix profile goes mad: variable-length motif and discord discovery in data series. Data Min Knowl Disc 34:1022–1071. https://doi.org/10.1007/s10618-020-00685-w. arXiv:2008.13447
Lu Q, Wang Z, Chen M (2008) An ant colony optimization algorithm for the one-dimensional cutting stock problem with multiple stock lengths. 2008 Fourth Int Conf Nat Comput 7:475–479. https://doi.org/10.1109/icnc.2008.208
Luo W, Gallagher M (2011) Faster and parameter-free discord search in quasi-periodic time series. In: Huang JZ, Cao L, Srivastava J (eds) Advances in Knowledge Discovery and Data Mining, Lecture Notes in Computer Science, vol 6635. Springer, Verlag Berlin Heidelberg, pp 135–148. https://doi.org/10.1007/978-3-642-20847-8_12
Luo W, Gallagher M, Wiles J (2013) Parameter-free search of time-series discord. J Comput Sci Technol 28(2):300–310. https://doi.org/10.1007/s11390-013-1330-8
Matsumoto K, Umetani S, Nagamochi H (2011) On the one-dimensional stock cutting problem in the paper tube industry. J Sched 14:281–290. https://doi.org/10.1007/s10951-010-0164-2
Nguyen TPQ, Phuc PNK, Yang CL et al (2023) Time-series anomaly detection using dynamic programming based longest common subsequence on sensor data. Expert Syst Appl 213:118902
Paparrizos J, Gravano L (2016) k-shape: efficient and accurate clustering of time series. SIGMOD Rec 45:69–76. https://doi.org/10.1145/2723372.2737793
Phoa FKH (2017) A swarm intelligence based (sib) method for optimization in designs of experiments. Nat Comput 16(4):597–605. https://doi.org/10.1007/s11047-016-9555-4
Phoa FKH, Chen RB, Wang W et al (2016) Optimizing two-level supersaturated designs using swarm intelligence techniques. Technometrics 58:43–49
Rahmani A, Afra S, Zarour O et al (2014) Graph-based approach for outlier detection in sequential data and its application on stock market and weather data. Knowl-Based Syst 61:89–97. https://doi.org/10.1016/j.knosys.2014.02.008
Rohlfshagen P, Bullinaria JA (2007) A genetic algorithm with exon shuffling crossover for hard bin packing problems. In: Lipson H (ed) GECCO ’07. ACM Press, pp 1365–1371. https://doi.org/10.1145/1276958.1277213. http://www.cs.bham.ac.uk/~jxb/PUBS/BPP.pdf
Sanchez IAL, Mora-Vargas J, Santos CA et al (2018) Solving binary cutting stock with matheuristics using particle swarm optimization and simulated annealing. Soft Comput 22:6111–6119. https://doi.org/10.1007/s00500-017-2666-8
Santhosh KK, Dogra DP, Roy PP et al (2021) Vehicular trajectory classification and traffic anomaly detection in videos using a hybrid cnn-vae architecture. IEEE Transactions on Intelligent Transportation Systems pp 1–12. https://doi.org/10.1109/tits.2021.3108504
Senin P, Lin J, Wang X et al (2014) Grammarviz 2.0: a tool for grammar-based pattern discovery in time series. In: Calders T, Esposito F, Hüllermeier E et al (eds) ECML/PKDD, vol 8726. Springer Berlin Heidelberg, pp 468–472. https://doi.org/10.1007/978-3-662-44845-8_37
Senin P, Lin J, Wang X et al (2018) Grammarviz 3.0: interactive discovery of variable-length time series patterns. ACM Trans Knowl Disc Data (TKDD) 12:1–28. https://doi.org/10.1145/3051126
Wang J, Ma Y, Zhang L et al (2018) Deep learning for smart manufacturing: methods and applications. J Manuf Syst 48:144–156. https://doi.org/10.1016/j.jmsy.2018.01.003
Yang CL, Sutrisno H (2020) A clustering-based symbiotic organisms search algorithm for high-dimensional optimization problems. Appl Soft Comput 97:106722. https://doi.org/10.1016/j.asoc.2020.106722
Yang CL, Darwin F, Sutrisno H (2019) Local recurrence rates with automatic time windows for discord search in multivariate time series. Procedia Manuf 39:1783–1792. https://doi.org/10.1016/j.promfg.2020.01.261
Yeh CCM, Zhu Y, Ulanova L et al (2016) Matrix profile i: all pairs similarity joins for time series: a unifying view that includes motifs, discords and shapelets. 2016 IEEE 16th International Conference on Data Mining (ICDM) pp 1317–1322. https://doi.org/10.1109/icdm.2016.0179
Zhang L, Gao Y, Lin J (2020) Semantic discord: Finding unusual local patterns for time series. In: Demeniconi C, Chawla NV (eds) Proceedings of the 2020 SIAM International Conference on Data Mining, SIAM. Society for Industrial and Applied Mathematics, pp 136–144. https://doi.org/10.1137/1.9781611976236.16
Zhang Y, Chen Y, Wang J et al (2021) Unsupervised deep anomaly detection for multi-sensor time-series signals. IEEE Trans Knowl Data Eng abs/2107.12626:1–1. https://doi.org/10.1109/TKDE.2021.3102110. arXiv:2107.12626
Acknowledgements
This project is partly supported by Academia Sinica Grant Nos. AS-TP-109-M07 and AS-IA-112-M03, and the National Science Council (Taiwan) Grant Nos. 107-2118-M-001-011-MY3 and 111-2118-M-001-007-MY2.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Sutrisno, H., Phoa, F.K.H. Anomalous variable-length subsequence detection in time series: mathematical formulation and a novel evolutionary algorithm based on clustering and swarm intelligence. Appl Intell 53, 29585–29603 (2023). https://doi.org/10.1007/s10489-023-05066-6
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10489-023-05066-6