Abstract
In the new era of big data, numerous information and technology systems can store huge amounts of streaming data in real time, for example, in server-access logs on web application servers. The importance of anomaly detection in voluminous quantities of streaming data from such systems is rapidly increasing. One of the biggest challenges in the detection task is to carry out real-time contextual anomaly detection in streaming data with varying patterns that are visually detectable but unsuitable for a parametric model. Most anomaly detection algorithms have weaknesses in dealing with streaming time-series data containing such patterns. In this paper, we propose a novel method for online contextual anomaly detection in streaming time-series data using generalized extreme studentized deviates (GESD) tests. The GESD test is relatively accurate and efficient because it performs statistical hypothesis testing but it is unable to handle streaming time-series data. Thus, focusing on streaming time-series data, we propose an online version of the test capable of detecting outliers under varying patterns. We perform extensive experiments with simulated data, syntactic data, and real online traffic data from Yahoo Webscope, showing a clear advantage of the proposed method, particularly for analyzing streaming data with varying patterns.
Similar content being viewed by others
References
Adibi, M.A., Shahrabi, J.: Online anomaly detection based on support vector clustering. Int. J. Comput. Intell. Syst. 8(4), 735746 (2015)
Adikaram, K.K.L.B., Hussein, M.A., Effenberger, M., Becker, T.: Data transformation technique to improve the outlier detection power of Grubbs’ test for data expected to follow linear relation. J. Appl. Math. 2015, 9 (2015). https://doi.org/10.1155/2015/708948
Bartos, M.D., Mullapudi, A., Troutman, S.C.: rrcf: Implementation of the robust random cut forest algorithm for anomaly detection on streams. J. Open Source Softw. 4(35), 1336 (2019)
Chandola, V., Banerjee, A., Kumar, V.: Anomaly detection: a survey. ACM Comput. Surv. 41(3), 15:1–15:58 (2009)
Chen, C., Liu, L.M.: Joint estimation of model parameters and outlier effects in time series. J. Am. Stat. Assoc. 88(421), 284–297 (1993)
Chen, T., Liu, X., Xia, B., Wang, W., Lai, Y.: Unsupervised anomaly detection of industrial robots using sliding-window convolutional variational autoencoder. IEEE Access 8, 47072–47081 (2020)
Choi, Y., Lim, H., Choi, H., Kim, I.J.: Gan-based anomaly detection and localization of multivariate time series data for power plant. In: 2020 IEEE International Conference on Big Data and Smart Computing (BigComp), pp. 71–74. IEEE (2020)
Fan, C., Xiao, F., Zhao, Y., Wang, J.: Analytical investigation of autoencoder-based methods for unsupervised anomaly detection in building energy data. Appl. Energy 211, 1123–1135 (2018)
Grubbs, F.E.: Procedures for detecting outlying observations in samples. Technometrics 11(1), 121 (1969)
Kingma, D.P., Welling, M.: Auto-encoding variational bayes (2013)
Kloft, M., Laskov, P.: Online anomaly detection under adversarial impact. In: AISTATS (2010)
Li, L., Yan, J., Wang, H., Jin, Y.: Anomaly detection of time series with smoothness-inducing sequential variational auto-encoder. IEEE Trans. Neural Netw. Learning Syst. 99, 1–15 (2020)
Nikolay Laptev Saeed Amizadeh, Y.B.: S5 - a labeled anomaly detection dataset, version 1.0 (16m) (2015). https://webscope.sandbox.yahoo.com/catalog.php?datatype=s
Niu, Z., Yu, K., Wu, X.: LSTM-based vae-gan for time-series anomaly detection. Sensors 20(13), 3738 (2020)
Ozkan, H., Ozkan, F., Kozat, S.S.: Online anomaly detection under Markov statistics with controllable type-I error. IEEE Trans. Signal Process. 64(6), 1435–1445 (2015)
Ren, H., Xu, B., Wang, Y., Yi, C., Huang, C., Kou, X., Xing, T., Yang, M., Tong, J., Zhang, Q.: Time-series anomaly detection service at Microsoft. Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining (2019)
Rosner, B.: Percentage points for a generalized ESD many-outlier procedure. Technometrics 25(2), 165172 (1983)
Su, W., Liu, F., Zhao, J., He, M., Chen, H.: An online detection method for outliers of dynamic unstable measurement data. Clust. Comput. 22(4), 7831–7839 (2019)
Sun, R., Zhang, S., Yin, C., Wang, J., Min, S.: Strategies for data stream mining method applied in anomaly detection. Clust. Comput. 22(2), 399–408 (2019)
Vallis, O., Hochenbaum, J., Kejariwal, A.: A novel technique for long-term anomaly detection in the cloud. In: Proceedings of the 6th USENIX Conference on Hot Topics in Cloud Computing, pp. 15–15. USENIX Association (2014)
Wei, L., Kumar, N., Lolla, V., Keogh, E.J., Lonardi, S., Ratanamahatana, C.: Assumption-free anomaly detection in time series. In: Proceedings of the 17th International Conference on Scientific and Statistical Database Management, SSDBM’ 2005, pp. 237–240. Lawrence Berkeley Laboratory, Berkeley, USA (2005)
Wyszecki, G., Stiles, W.: Color science: Concepts and methods, quantitative data and formulae. In: Gunther Wyszecki, W.S. (ed.) Color Science: Concepts and Methods, Quantitative Data and Formulae, 2nd edn. Wiley, Hoboken (2000)
Xu, L., Zhang, P., Xu, J., Wu, S., Han, G., Xu, D.: Conflict analysis of multi-source SST distribution. In: Zhang, W., Chen, Z., Douglas, C.C., Tong, W. (eds.) High Performance Computing and Applications, pp. 479–484. Springer, Berlin (2010)
Xu, H., Chen, W., Zhao, N., Li, Z., Bu, J., Li, Z., Liu, Y., Zhao, Y., Pei, D., Feng, Y., et al.: Unsupervised anomaly detection via variational auto-encoder for seasonal kpis in web applications. In: Proceedings of the 2018 World Wide Web Conference on World Wide Web, pp. 187–196. International World Wide Web Conferences Steering Committee (2018)
Yin, C., Zhang, S., Yin, Z., Wang, J.: Anomaly detection model based on data stream clustering. Clust. Comput. 49, 1–10 (2019)
Acknowledgements
This research was supported by the Ministry of Education of the Republic of Korea and the National Research Foundation of Korea (Grant No. NRF-2020R1F1A1076278).
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Ryu, M., Lee, G. & Lee, K. Online sequential extreme studentized deviate tests for anomaly detection in streaming data with varying patterns. Cluster Comput 24, 1975–1987 (2021). https://doi.org/10.1007/s10586-021-03236-0
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10586-021-03236-0