Skip to main content
Log in

Online sequential extreme studentized deviate tests for anomaly detection in streaming data with varying patterns

  • Published:
Cluster Computing Aims and scope Submit manuscript

Abstract

In the new era of big data, numerous information and technology systems can store huge amounts of streaming data in real time, for example, in server-access logs on web application servers. The importance of anomaly detection in voluminous quantities of streaming data from such systems is rapidly increasing. One of the biggest challenges in the detection task is to carry out real-time contextual anomaly detection in streaming data with varying patterns that are visually detectable but unsuitable for a parametric model. Most anomaly detection algorithms have weaknesses in dealing with streaming time-series data containing such patterns. In this paper, we propose a novel method for online contextual anomaly detection in streaming time-series data using generalized extreme studentized deviates (GESD) tests. The GESD test is relatively accurate and efficient because it performs statistical hypothesis testing but it is unable to handle streaming time-series data. Thus, focusing on streaming time-series data, we propose an online version of the test capable of detecting outliers under varying patterns. We perform extensive experiments with simulated data, syntactic data, and real online traffic data from Yahoo Webscope, showing a clear advantage of the proposed method, particularly for analyzing streaming data with varying patterns.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5

Similar content being viewed by others

References

  1. Adibi, M.A., Shahrabi, J.: Online anomaly detection based on support vector clustering. Int. J. Comput. Intell. Syst. 8(4), 735746 (2015)

    Article  Google Scholar 

  2. Adikaram, K.K.L.B., Hussein, M.A., Effenberger, M., Becker, T.: Data transformation technique to improve the outlier detection power of Grubbs’ test for data expected to follow linear relation. J. Appl. Math. 2015, 9 (2015). https://doi.org/10.1155/2015/708948

    Article  MATH  Google Scholar 

  3. Bartos, M.D., Mullapudi, A., Troutman, S.C.: rrcf: Implementation of the robust random cut forest algorithm for anomaly detection on streams. J. Open Source Softw. 4(35), 1336 (2019)

    Article  Google Scholar 

  4. Chandola, V., Banerjee, A., Kumar, V.: Anomaly detection: a survey. ACM Comput. Surv. 41(3), 15:1–15:58 (2009)

    Article  Google Scholar 

  5. Chen, C., Liu, L.M.: Joint estimation of model parameters and outlier effects in time series. J. Am. Stat. Assoc. 88(421), 284–297 (1993)

    MATH  Google Scholar 

  6. Chen, T., Liu, X., Xia, B., Wang, W., Lai, Y.: Unsupervised anomaly detection of industrial robots using sliding-window convolutional variational autoencoder. IEEE Access 8, 47072–47081 (2020)

    Article  Google Scholar 

  7. Choi, Y., Lim, H., Choi, H., Kim, I.J.: Gan-based anomaly detection and localization of multivariate time series data for power plant. In: 2020 IEEE International Conference on Big Data and Smart Computing (BigComp), pp. 71–74. IEEE (2020)

  8. Fan, C., Xiao, F., Zhao, Y., Wang, J.: Analytical investigation of autoencoder-based methods for unsupervised anomaly detection in building energy data. Appl. Energy 211, 1123–1135 (2018)

    Article  Google Scholar 

  9. Grubbs, F.E.: Procedures for detecting outlying observations in samples. Technometrics 11(1), 121 (1969)

    Article  Google Scholar 

  10. https://github.com/linkedin/luminol

  11. Kingma, D.P., Welling, M.: Auto-encoding variational bayes (2013)

  12. Kloft, M., Laskov, P.: Online anomaly detection under adversarial impact. In: AISTATS (2010)

  13. Li, L., Yan, J., Wang, H., Jin, Y.: Anomaly detection of time series with smoothness-inducing sequential variational auto-encoder. IEEE Trans. Neural Netw. Learning Syst. 99, 1–15 (2020)

    Google Scholar 

  14. Nikolay Laptev Saeed Amizadeh, Y.B.: S5 - a labeled anomaly detection dataset, version 1.0 (16m) (2015). https://webscope.sandbox.yahoo.com/catalog.php?datatype=s

  15. Niu, Z., Yu, K., Wu, X.: LSTM-based vae-gan for time-series anomaly detection. Sensors 20(13), 3738 (2020)

    Article  Google Scholar 

  16. Ozkan, H., Ozkan, F., Kozat, S.S.: Online anomaly detection under Markov statistics with controllable type-I error. IEEE Trans. Signal Process. 64(6), 1435–1445 (2015)

    Article  MathSciNet  Google Scholar 

  17. Ren, H., Xu, B., Wang, Y., Yi, C., Huang, C., Kou, X., Xing, T., Yang, M., Tong, J., Zhang, Q.: Time-series anomaly detection service at Microsoft. Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining (2019)

  18. Rosner, B.: Percentage points for a generalized ESD many-outlier procedure. Technometrics 25(2), 165172 (1983)

    Article  Google Scholar 

  19. Su, W., Liu, F., Zhao, J., He, M., Chen, H.: An online detection method for outliers of dynamic unstable measurement data. Clust. Comput. 22(4), 7831–7839 (2019)

    Article  Google Scholar 

  20. Sun, R., Zhang, S., Yin, C., Wang, J., Min, S.: Strategies for data stream mining method applied in anomaly detection. Clust. Comput. 22(2), 399–408 (2019)

    Article  Google Scholar 

  21. Vallis, O., Hochenbaum, J., Kejariwal, A.: A novel technique for long-term anomaly detection in the cloud. In: Proceedings of the 6th USENIX Conference on Hot Topics in Cloud Computing, pp. 15–15. USENIX Association (2014)

  22. Wei, L., Kumar, N., Lolla, V., Keogh, E.J., Lonardi, S., Ratanamahatana, C.: Assumption-free anomaly detection in time series. In: Proceedings of the 17th International Conference on Scientific and Statistical Database Management, SSDBM’ 2005, pp. 237–240. Lawrence Berkeley Laboratory, Berkeley, USA (2005)

  23. Wyszecki, G., Stiles, W.: Color science: Concepts and methods, quantitative data and formulae. In: Gunther Wyszecki, W.S. (ed.) Color Science: Concepts and Methods, Quantitative Data and Formulae, 2nd edn. Wiley, Hoboken (2000)

    Google Scholar 

  24. Xu, L., Zhang, P., Xu, J., Wu, S., Han, G., Xu, D.: Conflict analysis of multi-source SST distribution. In: Zhang, W., Chen, Z., Douglas, C.C., Tong, W. (eds.) High Performance Computing and Applications, pp. 479–484. Springer, Berlin (2010)

    Chapter  Google Scholar 

  25. Xu, H., Chen, W., Zhao, N., Li, Z., Bu, J., Li, Z., Liu, Y., Zhao, Y., Pei, D., Feng, Y., et al.: Unsupervised anomaly detection via variational auto-encoder for seasonal kpis in web applications. In: Proceedings of the 2018 World Wide Web Conference on World Wide Web, pp. 187–196. International World Wide Web Conferences Steering Committee (2018)

  26. Yin, C., Zhang, S., Yin, Z., Wang, J.: Anomaly detection model based on data stream clustering. Clust. Comput. 49, 1–10 (2019)

    Google Scholar 

Download references

Acknowledgements

This research was supported by the Ministry of Education of the Republic of Korea and the National Research Foundation of Korea (Grant No. NRF-2020R1F1A1076278).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Kichun Lee.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Ryu, M., Lee, G. & Lee, K. Online sequential extreme studentized deviate tests for anomaly detection in streaming data with varying patterns. Cluster Comput 24, 1975–1987 (2021). https://doi.org/10.1007/s10586-021-03236-0

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10586-021-03236-0

Keywords

Navigation