Skip to main content

Parameterless Semi-supervised Anomaly Detection in Univariate Time Series

  • Conference paper
  • First Online:
Machine Learning and Knowledge Discovery in Databases (ECML PKDD 2020)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 12457))

Abstract

Anomaly detection algorithms that operate without human intervention are needed when dealing with large time series data coming from poorly understood processes. At the same time, common techniques expect the user to provide precise information about the data generating process or to manually tune various parameters. We present SIM-AD: a semi-supervised approach to detecting anomalies in univariate time series data that operates without any user-defined parameters. The approach involves converting time series using our proposed Sojourn Time Representation and then applying modal clustering-based anomaly detection on the converted data. We evaluate SIM-AD on three publicly available time series datasets from different domains and compare its accuracy to the PAV and RRA anomaly detection algorithms. We conclude that SIM-AD outperforms the evaluated approaches with respect to accuracy on trendless time series data.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    https://www.theregister.co.uk/2016/03/24/water_utility_hacked.

  2. 2.

    https://discourse.numenta.org/t/3141.

References

  1. ECG Dataset. http://www.cs.ucr.edu/~eamonn/discords/mitdbx_mitdbx_108.txt. (2nd column)

  2. Grammarviz 3.0. https://grammarviz2.github.io/grammarviz2_site/

  3. LIT101 Dataset. https://itrust.sutd.edu.sg/itrust-labs_datasets/

  4. Power Dataset. http://www.cs.ucr.edu/~eamonn/discords/power_data.txt

  5. Aggarwal, C.C.: Outlier analysis. Data Mining, pp. 237–263. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-14142-8_8

    Chapter  Google Scholar 

  6. Ahmad, S., Lavin, A., Purdy, S., Agha, Z.: Unsupervised real-time anomaly detection for streaming data. Neurocomputing 262, 134–147 (2017)

    Article  Google Scholar 

  7. Bettaiah, V., Ranganath, H.S.: An analysis of time series representation methods: data mining applications perspective. In: Proceedings of the 2014 ACM Southeast Regional Conference, pp. 16:1–16:6 (2014)

    Google Scholar 

  8. Botev, Z., Grotowski, J., Kroese, D.: Kernel density estimation via diffusion. Ann. Stat. 38(5), 2916–2957 (2010)

    Article  MathSciNet  Google Scholar 

  9. Breiman, L., Meisel, W., Purcell, E.: Variable kernel estimates of multivariate densities. Technometrics 19(2), 135–144 (1977)

    Article  Google Scholar 

  10. Breunig, M.M., Kriegel, H.P., Ng, R.T., Sander, J.: LOF: identifying density-based local outliers. In: Proceedings of the International Conference on Management of Data, vol. 29, pp. 93–104. ACM (2000)

    Google Scholar 

  11. Campello, R.J., Moulavi, D., Zimek, A., Sander, J.: Hierarchical density estimates for data clustering, visualization, and outlier detection. ACM Trans. Knowl. Discov. Data (TKDD) 10(1), 5:1–5:51 (2015)

    Google Scholar 

  12. Chandola, V.: Anomaly detection for symbolic sequences and time series data. Ph.D. thesis, University of Minnesota (2009)

    Google Scholar 

  13. Chandola, V., Banerjee, A., Kumar, V.: Anomaly detection: a survey. ACM Comput. Surv. (CSUR) 41(3), 15 (2009)

    Article  Google Scholar 

  14. Chen, X., Zhan, Y.: Multi-scale anomaly detection algorithm based on infrequent pattern of time series. J. Comput. Appl. Math. 214(1), 227–237 (2008)

    Article  MathSciNet  Google Scholar 

  15. Chow, C.: Parzen-window network intrusion detectors. In: Proceedings of the 16th International Conference on Pattern Recognition (ICPR), pp. 385–388 (2002)

    Google Scholar 

  16. Clifton, D., Bannister, P., Tarassenko, L.: A framework for novelty detection in jet engine vibration data. Key Eng. Mater. 347, 305–310 (2007)

    Article  Google Scholar 

  17. Gao, Y., Lin, J.: HIME: discovering variable-length motifs in large-scale time series. Knowl. Inf. Syst. 61(1), 513–542 (2019)

    Article  Google Scholar 

  18. Goh, J., Adepu, S., Junejo, K.N., Mathur, A.: A dataset to support research in the design of secure water treatment systems. In: Havarneanu, G., Setola, R., Nassopoulos, H., Wolthusen, S. (eds.) CRITIS 2016. LNCS, vol. 10242, pp. 88–99. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-71368-7_8

    Chapter  Google Scholar 

  19. Goh, J., Adepu, S., Tan, M., Lee, Z.S.: Anomaly detection in cyber physical systems using recurrent neural networks. In: 18th IEEE International Symposium on High Assurance Systems Engineering (HASE), pp. 140–145 (2017)

    Google Scholar 

  20. Gupta, M., Gao, J., Aggarwal, C.C., Han, J.: Outlier detection for temporal data: a survey. IEEE Trans. Knowl. Data Eng. 26(9), 2250–2267 (2014)

    Article  Google Scholar 

  21. Han, J., Pei, J., Kamber, M.: Data Mining: Concepts and Techniques. Elsevier, Amsterdam (2011)

    MATH  Google Scholar 

  22. He, Z., Xu, X., Deng, S.: Discovering cluster-based local outliers. Pattern Recognit. Lett. 24(9–10), 1641–1650 (2003)

    Article  Google Scholar 

  23. Hundman, K., Constantinou, V., Laporte, C., Colwell, I., Soderstrom, T.: Detecting spacecraft anomalies using LSTMs and nonparametric dynamic thresholding. In: Proceedings of the 24th ACM International Conference on Knowledge Discovery & Data Mining (KDD), pp. 387–395. ACM (2018)

    Google Scholar 

  24. Hutter, F., Kotthoff, L., Vanschoren, J.: Automated Machine Learning. Springer, Heidelberg (2019). https://doi.org/10.1007/978-3-030-05318-5

    Book  Google Scholar 

  25. Keogh, E., Chu, S., Hart, D., Pazzani, M.: Segmenting time series: a survey and novel approach. In: Data Mining in Time Series Databases, pp. 1–21 (2004)

    Google Scholar 

  26. Keogh, E., Lin, J., Fu, A.: HOT SAX: efficiently finding the most unusual time series subsequence. In: Fifth IEEE International Conference on Data Mining (ICDM), pp. 226–233 (2005)

    Google Scholar 

  27. Kravchik, M., Shabtai, A.: Detecting cyber attacks in industrial control systems using convolutional neural networks. In: Proceedings of the 2018 Workshop on Cyber-Physical Systems Security and Privacy, pp. 72–83. ACM (2018)

    Google Scholar 

  28. Kriegel, H.P., Kroger, P., Sander, J., Zimek, A.: Density-based clustering. Wiley Interdiscip. Rev. Data Min. Knowl. Discov. 1(3), 231–240 (2011)

    Article  Google Scholar 

  29. Lin, J., Keogh, E., Wei, L., Lonardi, S.: Experiencing SAX: a novel symbolic representation of time series. Data Min. Knowl. Discov. 15(2), 107–144 (2007)

    Article  MathSciNet  Google Scholar 

  30. Menardi, G.: A review on modal clustering. Int. Stat. Rev. 84(3), 413–433 (2016)

    Article  MathSciNet  Google Scholar 

  31. Patel, P., Keogh, E.J., Lin, J., Lonardi, S.: Mining motifs in massive time series databases. In: Proceedings of the 2002 IEEE International Conference on Data Mining (ICDM), pp. 370–377 (2002)

    Google Scholar 

  32. Pimentel, M.A., Clifton, D.A., Clifton, L., Tarassenko, L.: A review of novelty detection. Signal Process. 99, 215–249 (2014)

    Article  Google Scholar 

  33. Ramaswamy, S., Rastogi, R., Shim, K.: Efficient algorithms for mining outliers from large data sets. In: Proceedings of the International Conference on Management of Data, vol. 29, pp. 427–438. ACM (2000)

    Google Scholar 

  34. Ratanamahatana, C., Keogh, E., Bagnall, A.J., Lonardi, S.: A novel bit level time series representation with implication of similarity search and clustering. In: Ho, T.B., Cheung, D., Liu, H. (eds.) PAKDD 2005. LNCS (LNAI), vol. 3518, pp. 771–777. Springer, Heidelberg (2005). https://doi.org/10.1007/11430919_90

    Chapter  Google Scholar 

  35. Salvador, S., Chan, P.: Learning states and rules for detecting anomalies in time series. Appl. Intell. 23(3), 241–255 (2005)

    Article  Google Scholar 

  36. Senin, P., et al.: Time series anomaly discovery with grammar-based compression. In: Proceedings of the 18th International Conference on Extending Database Technology (EDBT), pp. 481–492 (2015)

    Google Scholar 

  37. Senin, P., et al.: GrammarViz 3.0: interactive discovery of variable-length time series patterns. ACM Trans. Knowl. Discov. Data (TKDD) 12(1), 10 (2018)

    Google Scholar 

  38. Silverman, B.W.: Density Estimation for Statistics and Data Analysis. Chapman & Hall, London (1986)

    Book  Google Scholar 

  39. Singh, A.: Anomaly detection for temporal data using long short-term memory (LSTM). Master’s thesis, KTH Information and Communication Technology, Sweden (2017)

    Google Scholar 

  40. Torkamani, S., Lohweg, V.: Survey on time series motif discovery. Wiley Interdiscip. Rev. Data Min. Knowl. Discov. 7(2), e1199 (2017)

    Article  Google Scholar 

  41. Wang, X., Lin, J., Patel, N., Braun, M.: Exact variable-length anomaly detection algorithm for univariate and multivariate time series. Data Min. Knowl. Discov. 32(6), 1806–1844 (2018)

    Article  MathSciNet  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Oleg Iegorov .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2021 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Iegorov, O., Fischmeister, S. (2021). Parameterless Semi-supervised Anomaly Detection in Univariate Time Series. In: Hutter, F., Kersting, K., Lijffijt, J., Valera, I. (eds) Machine Learning and Knowledge Discovery in Databases. ECML PKDD 2020. Lecture Notes in Computer Science(), vol 12457. Springer, Cham. https://doi.org/10.1007/978-3-030-67658-2_37

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-67658-2_37

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-67657-5

  • Online ISBN: 978-3-030-67658-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics