Abstract
As network scale expands and concurrent requests grow, unexpected network anomalies are more frequent, leading to service interruptions and degraded user experience. Real-time, accurate troubleshooting is critical for ensuring satisfactory service. Existing troubleshooting solutions adopt ensemble anomaly detection (EAD) to detect anomalies due to its robustness. However, the fixed base classifier parameters in EAD set by expert experience may reduce the efficiency of anomaly detection when faced with different data distributions. Furthermore, the binary results fed to the secondary classifier in EAD cause information loss, leading to compromised accuracy and inaccurate root cause localization. Besides, key performance indicators (KPIs) are crucial for measuring the system performance, but relying on multiple redundant KPIs to identify the root causes of anomalies is time-consuming and error-prone.
To address the above issues, we propose a fully automatic troubleshooting system, ATS. A new EAD method is introduced to detect anomalies, then a module is designed to trigger the root cause localization. Specifically, the EAD method updates the parameters of base classifiers to dynamically adapt to different KPI data distributions. The ensemble of soft labels generated by base classifiers is subsequently fed into the secondary classifier to achieve information-lossless anomaly detection. Then, a heuristic module is proposed to select the most appropriate KPI data based on the metric i.e., bilayer relative difference to trigger the efficient root cause localization. Extensive experiments demonstrate that ATS is more than twice as fast as most state-of-the-art solutions while with higher troubleshooting accuracy.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Aburomman, A.A., Reaz, M.B.I.: A survey of intrusion detection systems based on ensemble and hybrid classifiers. Comput. Secur. 65, 135–152 (2017)
Ahmed, F., Erman, J., et al.: Detecting and localizing end-to-end performance degradation for cellular data services based on TCP loss ratio and round trip time. IEEE/ACM Trans. Netw. 25(6), 3709–3722 (2017)
Amazon: Amazon found every 100ms of latency cost them 1% in sales. http://blog.gigaspaces.com/amazon-found-every-100ms-of-latency-costthem-1-in-sales/ (Aug 2008)
Araya, D.B., Grolinger, K., ElYamany, H.F., Capretz, M.A., Bitsuamlak, G.: An ensemble learning framework for anomaly detection in building energy consumption. Energy Build. 144, 191–206 (2017)
Chaovalitwongse, W.A., et al.: On the time series k-nearest neighbor classification of abnormal brain activity. T-SMCA 37(6), 1005–1016 (2007)
Chen, Z., et al.: Combining MIC feature selection and feature-based MSPCA for network traffic anomaly detection. In: 2016 Third International Conference on Digital Information Processing, Data Mining, and Wireless Communications (DIPDMWC), pp. 176–181. IEEE (2016)
Folino, G., Sabatino, P.: Ensemble based collaborative and distributed intrusion detection systems: a survey. J. Netw. Comput. Appl. 66, 1–16 (2016)
Goldstein, M., Dengel, A.: Histogram-based outlier score (HBOS): a fast unsupervised anomaly detection algorithm. KI-2012: Poster and Demo Track. vol. 9 (2012)
Golovin, D., Solnik, B., et al.: Google vizier: a service for black-box optimization. In: Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 1487–1495 (2017)
Google. http://glinden.blogspot.com/2006/11/marissa-mayer-at-web-20.html (2006)
Groth, D., Hartmann, S., Klie, S., Selbig, J.: Principal components analysis. In: Computational Toxicology, pp. 527–547 (2013)
He, Z., Xu, X., Deng, S.: Discovering cluster-based local outliers. Pattern Recogn. Lett. 24(9–10), 1641–1650 (2003)
Jabbar, M.A., Aluvalu, R., Reddy, S.S.S.: Cluster based ensemble classification for intrusion detection system. In: Proceedings of the 9th International Conference on Machine Learning and Computing (ICMLC), pp. 253–257 (2017)
Jin, W., Tung, A.K.H., Han, J., Wang, W.: Ranking outliers using symmetric neighborhood relationship. In: Ng, W.-K., Kitsuregawa, M., Li, J., Chang, K. (eds.) PAKDD 2006. LNCS (LNAI), vol. 3918, pp. 577–593. Springer, Heidelberg (2006). https://doi.org/10.1007/11731139_68
Jing, P., Han, Y., Sun, J., Lin, T., Hu, Y.: AutoRoot: a novel fault localization schema of multi-dimensional root causes. In: 2021 IEEE Wireless Communications and Networking Conference (WCNC), pp. 1–7. IEEE (2021)
Klinker, F.: Exponential moving average versus moving exponential average. Math. Semesterberichte 58(1), 97–107 (2011)
Laptev, N., Amizadeh, S., Flint, I.: Generic and scalable framework for automated time-series anomaly detection. In: Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (SIGKDD), pp. 1939–1947 (2015)
Li, Z., Luo, C., et al.: Generic and robust localization of multi-dimensional root causes. In: 2019 IEEE 30th International Symposium on Software Reliability Engineering (ISSRE), pp. 47–57. IEEE (2019)
Li, Z., Zhao, Y., et al.: COPOD: copula-based outlier detection. In: 2020 IEEE International Conference on Data Mining (ICDM), pp. 1118–1123. IEEE (2020)
Liu, D., Zhao, Y., et al.: Opprentice: towards practical and automatic anomaly detection through machine learning. In: Proceedings of the 2015 Internet Measurement Conference (IMC), pp. 211–224 (2015)
Liu, F.T., Ting, K.M., Zhou, Z.H.: Isolation forest. In: 2008 Eighth IEEE International Conference on Data Mining, pp. 413–422. IEEE (2008)
Luglio, M., Romano, S.P., Roseti, C., Zampognaro, F.: Service delivery models for converged satellite-terrestrial 5G network deployment: a satellite-assisted CDN use-case. IEEE Netw. 33(1), 142–150 (2019)
Ma, M., et al.: Diagnosing root causes of intermittent slow queries in cloud databases. Proc. VLDB Endowment 13(8), 1176–1189 (2020)
McLeod, A.I., Li, W.K.: Diagnostic checking arma time series models using squared-residual autocorrelations. J. Time Ser. Anal. 4(4), 269–273 (1983)
Meng, Y., Zhang, S., et al.: Localizing failure root causes in a microservice through causality inference. In: 2020 IEEE/ACM 28th International Symposium on Quality of Service (IWQoS), pp. 1–10. IEEE (2020)
Mirza, A.H.: Computer network intrusion detection using various classifiers and ensemble learning. In: 2018 26th Signal Processing and Communications Applications Conference (SIU), pp. 1–4. IEEE (2018)
Netflix. https://github.com/netflix/surus (2019)
Pang, G., Shen, C., Cao, L., Hengel, A.V.D.: Deep learning for anomaly detection: a review. ACM Comput. Surv. (CSUR) 54(2), 1–38 (2021)
Persson, M., Rudenius, L.: Anomaly detection and fault localization an automated process for advertising systems. Master’s thesis (2018)
Pham, N.T., Foo, E., et al.: Improving performance of intrusion detection system using ensemble methods and feature selection. In: The Australasian Computer Science Week Multiconference (ACSW), pp. 1–6 (2018)
Rahman, M.A., Shoaib, S., et al.: A bayesian optimization framework for the prediction of diabetes mellitus. In: 2019 5th International Conference on Advances in Electrical Engineering (ICAEE), pp. 357–362. IEEE (2019)
Rajagopal, S., Kundapur, P.P., Hareesha, K.S.: A stacking ensemble for network intrusion detection using heterogeneous datasets. Secur. Commun. Netw. 2020, 1–9 (2020)
Su, Y., Zhao, Y., et al.: Robust anomaly detection for multivariate time series through stochastic recurrent neural network. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, pp. 2828–2837 (2019)
Sun, S., Jin, F., et al.: A new hybrid optimization ensemble learning approach for carbon price forecasting. Appl. Math. Model. 97, 182–205 (2021)
Sun, Y., Zhao, Y., et al.: HotSpot: Anomaly localization for additive KPIs with multi-dimensional attributes. IEEE Access 6, 10909–10923 (2018)
Tencent. https://github.com/tencent/metis (2019)
Vanerio, J., Casas, P.: Ensemble-learning approaches for network security and anomaly detection. In: Big-DAMA@SIGCOMM, pp. 1–6 (2017)
Wang, Z., Fu, Y., Song, C., Zeng, P., Qiao, L.: Power system anomaly detection based on OCSVM optimized by improved particle swarm optimization. IEEE Access 7, 181580–181588 (2019)
Zhao, Y., Nasrullah, Z., Li, Z.: PyOD: a python toolbox for scalable outlier detection. J. Mach. Learn. Res. 20(96), 1–7 (2019). http://jmlr.org/papers/v20/19-011.html
Zhong, Y., Chen, W., et al.: HELAD: a novel network anomaly detection model based on heterogeneous ensemble learning. Comput. Netw. 169, 107049 (2020)
Acknowledgements
This work is supported by the National Key Research and Development Program of China (No. 2021YFB2910108).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Yuan, L. et al. (2023). ATS: A Fully Automatic Troubleshooting System with Efficient Anomaly Detection and Localization. In: Mikyška, J., de Mulatier, C., Paszynski, M., Krzhizhanovskaya, V.V., Dongarra, J.J., Sloot, P.M. (eds) Computational Science – ICCS 2023. ICCS 2023. Lecture Notes in Computer Science, vol 10477. Springer, Cham. https://doi.org/10.1007/978-3-031-36030-5_38
Download citation
DOI: https://doi.org/10.1007/978-3-031-36030-5_38
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-36029-9
Online ISBN: 978-3-031-36030-5
eBook Packages: Computer ScienceComputer Science (R0)