Abstract
Anomaly detection algorithms that operate without human intervention are needed when dealing with large time series data coming from poorly understood processes. At the same time, common techniques expect the user to provide precise information about the data generating process or to manually tune various parameters. We present SIM-AD: a semi-supervised approach to detecting anomalies in univariate time series data that operates without any user-defined parameters. The approach involves converting time series using our proposed Sojourn Time Representation and then applying modal clustering-based anomaly detection on the converted data. We evaluate SIM-AD on three publicly available time series datasets from different domains and compare its accuracy to the PAV and RRA anomaly detection algorithms. We conclude that SIM-AD outperforms the evaluated approaches with respect to accuracy on trendless time series data.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
ECG Dataset. http://www.cs.ucr.edu/~eamonn/discords/mitdbx_mitdbx_108.txt. (2nd column)
Grammarviz 3.0. https://grammarviz2.github.io/grammarviz2_site/
LIT101 Dataset. https://itrust.sutd.edu.sg/itrust-labs_datasets/
Power Dataset. http://www.cs.ucr.edu/~eamonn/discords/power_data.txt
Aggarwal, C.C.: Outlier analysis. Data Mining, pp. 237–263. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-14142-8_8
Ahmad, S., Lavin, A., Purdy, S., Agha, Z.: Unsupervised real-time anomaly detection for streaming data. Neurocomputing 262, 134–147 (2017)
Bettaiah, V., Ranganath, H.S.: An analysis of time series representation methods: data mining applications perspective. In: Proceedings of the 2014 ACM Southeast Regional Conference, pp. 16:1–16:6 (2014)
Botev, Z., Grotowski, J., Kroese, D.: Kernel density estimation via diffusion. Ann. Stat. 38(5), 2916–2957 (2010)
Breiman, L., Meisel, W., Purcell, E.: Variable kernel estimates of multivariate densities. Technometrics 19(2), 135–144 (1977)
Breunig, M.M., Kriegel, H.P., Ng, R.T., Sander, J.: LOF: identifying density-based local outliers. In: Proceedings of the International Conference on Management of Data, vol. 29, pp. 93–104. ACM (2000)
Campello, R.J., Moulavi, D., Zimek, A., Sander, J.: Hierarchical density estimates for data clustering, visualization, and outlier detection. ACM Trans. Knowl. Discov. Data (TKDD) 10(1), 5:1–5:51 (2015)
Chandola, V.: Anomaly detection for symbolic sequences and time series data. Ph.D. thesis, University of Minnesota (2009)
Chandola, V., Banerjee, A., Kumar, V.: Anomaly detection: a survey. ACM Comput. Surv. (CSUR) 41(3), 15 (2009)
Chen, X., Zhan, Y.: Multi-scale anomaly detection algorithm based on infrequent pattern of time series. J. Comput. Appl. Math. 214(1), 227–237 (2008)
Chow, C.: Parzen-window network intrusion detectors. In: Proceedings of the 16th International Conference on Pattern Recognition (ICPR), pp. 385–388 (2002)
Clifton, D., Bannister, P., Tarassenko, L.: A framework for novelty detection in jet engine vibration data. Key Eng. Mater. 347, 305–310 (2007)
Gao, Y., Lin, J.: HIME: discovering variable-length motifs in large-scale time series. Knowl. Inf. Syst. 61(1), 513–542 (2019)
Goh, J., Adepu, S., Junejo, K.N., Mathur, A.: A dataset to support research in the design of secure water treatment systems. In: Havarneanu, G., Setola, R., Nassopoulos, H., Wolthusen, S. (eds.) CRITIS 2016. LNCS, vol. 10242, pp. 88–99. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-71368-7_8
Goh, J., Adepu, S., Tan, M., Lee, Z.S.: Anomaly detection in cyber physical systems using recurrent neural networks. In: 18th IEEE International Symposium on High Assurance Systems Engineering (HASE), pp. 140–145 (2017)
Gupta, M., Gao, J., Aggarwal, C.C., Han, J.: Outlier detection for temporal data: a survey. IEEE Trans. Knowl. Data Eng. 26(9), 2250–2267 (2014)
Han, J., Pei, J., Kamber, M.: Data Mining: Concepts and Techniques. Elsevier, Amsterdam (2011)
He, Z., Xu, X., Deng, S.: Discovering cluster-based local outliers. Pattern Recognit. Lett. 24(9–10), 1641–1650 (2003)
Hundman, K., Constantinou, V., Laporte, C., Colwell, I., Soderstrom, T.: Detecting spacecraft anomalies using LSTMs and nonparametric dynamic thresholding. In: Proceedings of the 24th ACM International Conference on Knowledge Discovery & Data Mining (KDD), pp. 387–395. ACM (2018)
Hutter, F., Kotthoff, L., Vanschoren, J.: Automated Machine Learning. Springer, Heidelberg (2019). https://doi.org/10.1007/978-3-030-05318-5
Keogh, E., Chu, S., Hart, D., Pazzani, M.: Segmenting time series: a survey and novel approach. In: Data Mining in Time Series Databases, pp. 1–21 (2004)
Keogh, E., Lin, J., Fu, A.: HOT SAX: efficiently finding the most unusual time series subsequence. In: Fifth IEEE International Conference on Data Mining (ICDM), pp. 226–233 (2005)
Kravchik, M., Shabtai, A.: Detecting cyber attacks in industrial control systems using convolutional neural networks. In: Proceedings of the 2018 Workshop on Cyber-Physical Systems Security and Privacy, pp. 72–83. ACM (2018)
Kriegel, H.P., Kroger, P., Sander, J., Zimek, A.: Density-based clustering. Wiley Interdiscip. Rev. Data Min. Knowl. Discov. 1(3), 231–240 (2011)
Lin, J., Keogh, E., Wei, L., Lonardi, S.: Experiencing SAX: a novel symbolic representation of time series. Data Min. Knowl. Discov. 15(2), 107–144 (2007)
Menardi, G.: A review on modal clustering. Int. Stat. Rev. 84(3), 413–433 (2016)
Patel, P., Keogh, E.J., Lin, J., Lonardi, S.: Mining motifs in massive time series databases. In: Proceedings of the 2002 IEEE International Conference on Data Mining (ICDM), pp. 370–377 (2002)
Pimentel, M.A., Clifton, D.A., Clifton, L., Tarassenko, L.: A review of novelty detection. Signal Process. 99, 215–249 (2014)
Ramaswamy, S., Rastogi, R., Shim, K.: Efficient algorithms for mining outliers from large data sets. In: Proceedings of the International Conference on Management of Data, vol. 29, pp. 427–438. ACM (2000)
Ratanamahatana, C., Keogh, E., Bagnall, A.J., Lonardi, S.: A novel bit level time series representation with implication of similarity search and clustering. In: Ho, T.B., Cheung, D., Liu, H. (eds.) PAKDD 2005. LNCS (LNAI), vol. 3518, pp. 771–777. Springer, Heidelberg (2005). https://doi.org/10.1007/11430919_90
Salvador, S., Chan, P.: Learning states and rules for detecting anomalies in time series. Appl. Intell. 23(3), 241–255 (2005)
Senin, P., et al.: Time series anomaly discovery with grammar-based compression. In: Proceedings of the 18th International Conference on Extending Database Technology (EDBT), pp. 481–492 (2015)
Senin, P., et al.: GrammarViz 3.0: interactive discovery of variable-length time series patterns. ACM Trans. Knowl. Discov. Data (TKDD) 12(1), 10 (2018)
Silverman, B.W.: Density Estimation for Statistics and Data Analysis. Chapman & Hall, London (1986)
Singh, A.: Anomaly detection for temporal data using long short-term memory (LSTM). Master’s thesis, KTH Information and Communication Technology, Sweden (2017)
Torkamani, S., Lohweg, V.: Survey on time series motif discovery. Wiley Interdiscip. Rev. Data Min. Knowl. Discov. 7(2), e1199 (2017)
Wang, X., Lin, J., Patel, N., Braun, M.: Exact variable-length anomaly detection algorithm for univariate and multivariate time series. Data Min. Knowl. Discov. 32(6), 1806–1844 (2018)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2021 Springer Nature Switzerland AG
About this paper
Cite this paper
Iegorov, O., Fischmeister, S. (2021). Parameterless Semi-supervised Anomaly Detection in Univariate Time Series. In: Hutter, F., Kersting, K., Lijffijt, J., Valera, I. (eds) Machine Learning and Knowledge Discovery in Databases. ECML PKDD 2020. Lecture Notes in Computer Science(), vol 12457. Springer, Cham. https://doi.org/10.1007/978-3-030-67658-2_37
Download citation
DOI: https://doi.org/10.1007/978-3-030-67658-2_37
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-67657-5
Online ISBN: 978-3-030-67658-2
eBook Packages: Computer ScienceComputer Science (R0)