Skip to main content

ATS: A Fully Automatic Troubleshooting System with Efficient Anomaly Detection and Localization

  • Conference paper
  • First Online:
Computational Science – ICCS 2023 (ICCS 2023)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 10477))

Included in the following conference series:

  • 674 Accesses

Abstract

As network scale expands and concurrent requests grow, unexpected network anomalies are more frequent, leading to service interruptions and degraded user experience. Real-time, accurate troubleshooting is critical for ensuring satisfactory service. Existing troubleshooting solutions adopt ensemble anomaly detection (EAD) to detect anomalies due to its robustness. However, the fixed base classifier parameters in EAD set by expert experience may reduce the efficiency of anomaly detection when faced with different data distributions. Furthermore, the binary results fed to the secondary classifier in EAD cause information loss, leading to compromised accuracy and inaccurate root cause localization. Besides, key performance indicators (KPIs) are crucial for measuring the system performance, but relying on multiple redundant KPIs to identify the root causes of anomalies is time-consuming and error-prone.

To address the above issues, we propose a fully automatic troubleshooting system, ATS. A new EAD method is introduced to detect anomalies, then a module is designed to trigger the root cause localization. Specifically, the EAD method updates the parameters of base classifiers to dynamically adapt to different KPI data distributions. The ensemble of soft labels generated by base classifiers is subsequently fed into the secondary classifier to achieve information-lossless anomaly detection. Then, a heuristic module is proposed to select the most appropriate KPI data based on the metric i.e., bilayer relative difference to trigger the efficient root cause localization. Extensive experiments demonstrate that ATS is more than twice as fast as most state-of-the-art solutions while with higher troubleshooting accuracy.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 89.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 119.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Aburomman, A.A., Reaz, M.B.I.: A survey of intrusion detection systems based on ensemble and hybrid classifiers. Comput. Secur. 65, 135–152 (2017)

    Article  Google Scholar 

  2. Ahmed, F., Erman, J., et al.: Detecting and localizing end-to-end performance degradation for cellular data services based on TCP loss ratio and round trip time. IEEE/ACM Trans. Netw. 25(6), 3709–3722 (2017)

    Article  Google Scholar 

  3. Amazon: Amazon found every 100ms of latency cost them 1% in sales. http://blog.gigaspaces.com/amazon-found-every-100ms-of-latency-costthem-1-in-sales/ (Aug 2008)

  4. Araya, D.B., Grolinger, K., ElYamany, H.F., Capretz, M.A., Bitsuamlak, G.: An ensemble learning framework for anomaly detection in building energy consumption. Energy Build. 144, 191–206 (2017)

    Article  Google Scholar 

  5. Chaovalitwongse, W.A., et al.: On the time series k-nearest neighbor classification of abnormal brain activity. T-SMCA 37(6), 1005–1016 (2007)

    Google Scholar 

  6. Chen, Z., et al.: Combining MIC feature selection and feature-based MSPCA for network traffic anomaly detection. In: 2016 Third International Conference on Digital Information Processing, Data Mining, and Wireless Communications (DIPDMWC), pp. 176–181. IEEE (2016)

    Google Scholar 

  7. Folino, G., Sabatino, P.: Ensemble based collaborative and distributed intrusion detection systems: a survey. J. Netw. Comput. Appl. 66, 1–16 (2016)

    Article  Google Scholar 

  8. Goldstein, M., Dengel, A.: Histogram-based outlier score (HBOS): a fast unsupervised anomaly detection algorithm. KI-2012: Poster and Demo Track. vol. 9 (2012)

    Google Scholar 

  9. Golovin, D., Solnik, B., et al.: Google vizier: a service for black-box optimization. In: Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 1487–1495 (2017)

    Google Scholar 

  10. Google. http://glinden.blogspot.com/2006/11/marissa-mayer-at-web-20.html (2006)

  11. Groth, D., Hartmann, S., Klie, S., Selbig, J.: Principal components analysis. In: Computational Toxicology, pp. 527–547 (2013)

    Google Scholar 

  12. He, Z., Xu, X., Deng, S.: Discovering cluster-based local outliers. Pattern Recogn. Lett. 24(9–10), 1641–1650 (2003)

    Article  MATH  Google Scholar 

  13. Jabbar, M.A., Aluvalu, R., Reddy, S.S.S.: Cluster based ensemble classification for intrusion detection system. In: Proceedings of the 9th International Conference on Machine Learning and Computing (ICMLC), pp. 253–257 (2017)

    Google Scholar 

  14. Jin, W., Tung, A.K.H., Han, J., Wang, W.: Ranking outliers using symmetric neighborhood relationship. In: Ng, W.-K., Kitsuregawa, M., Li, J., Chang, K. (eds.) PAKDD 2006. LNCS (LNAI), vol. 3918, pp. 577–593. Springer, Heidelberg (2006). https://doi.org/10.1007/11731139_68

    Chapter  Google Scholar 

  15. Jing, P., Han, Y., Sun, J., Lin, T., Hu, Y.: AutoRoot: a novel fault localization schema of multi-dimensional root causes. In: 2021 IEEE Wireless Communications and Networking Conference (WCNC), pp. 1–7. IEEE (2021)

    Google Scholar 

  16. Klinker, F.: Exponential moving average versus moving exponential average. Math. Semesterberichte 58(1), 97–107 (2011)

    Article  MathSciNet  MATH  Google Scholar 

  17. Laptev, N., Amizadeh, S., Flint, I.: Generic and scalable framework for automated time-series anomaly detection. In: Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (SIGKDD), pp. 1939–1947 (2015)

    Google Scholar 

  18. Li, Z., Luo, C., et al.: Generic and robust localization of multi-dimensional root causes. In: 2019 IEEE 30th International Symposium on Software Reliability Engineering (ISSRE), pp. 47–57. IEEE (2019)

    Google Scholar 

  19. Li, Z., Zhao, Y., et al.: COPOD: copula-based outlier detection. In: 2020 IEEE International Conference on Data Mining (ICDM), pp. 1118–1123. IEEE (2020)

    Google Scholar 

  20. Liu, D., Zhao, Y., et al.: Opprentice: towards practical and automatic anomaly detection through machine learning. In: Proceedings of the 2015 Internet Measurement Conference (IMC), pp. 211–224 (2015)

    Google Scholar 

  21. Liu, F.T., Ting, K.M., Zhou, Z.H.: Isolation forest. In: 2008 Eighth IEEE International Conference on Data Mining, pp. 413–422. IEEE (2008)

    Google Scholar 

  22. Luglio, M., Romano, S.P., Roseti, C., Zampognaro, F.: Service delivery models for converged satellite-terrestrial 5G network deployment: a satellite-assisted CDN use-case. IEEE Netw. 33(1), 142–150 (2019)

    Article  Google Scholar 

  23. Ma, M., et al.: Diagnosing root causes of intermittent slow queries in cloud databases. Proc. VLDB Endowment 13(8), 1176–1189 (2020)

    Article  Google Scholar 

  24. McLeod, A.I., Li, W.K.: Diagnostic checking arma time series models using squared-residual autocorrelations. J. Time Ser. Anal. 4(4), 269–273 (1983)

    Article  MathSciNet  MATH  Google Scholar 

  25. Meng, Y., Zhang, S., et al.: Localizing failure root causes in a microservice through causality inference. In: 2020 IEEE/ACM 28th International Symposium on Quality of Service (IWQoS), pp. 1–10. IEEE (2020)

    Google Scholar 

  26. Mirza, A.H.: Computer network intrusion detection using various classifiers and ensemble learning. In: 2018 26th Signal Processing and Communications Applications Conference (SIU), pp. 1–4. IEEE (2018)

    Google Scholar 

  27. Netflix. https://github.com/netflix/surus (2019)

  28. Pang, G., Shen, C., Cao, L., Hengel, A.V.D.: Deep learning for anomaly detection: a review. ACM Comput. Surv. (CSUR) 54(2), 1–38 (2021)

    Article  Google Scholar 

  29. Persson, M., Rudenius, L.: Anomaly detection and fault localization an automated process for advertising systems. Master’s thesis (2018)

    Google Scholar 

  30. Pham, N.T., Foo, E., et al.: Improving performance of intrusion detection system using ensemble methods and feature selection. In: The Australasian Computer Science Week Multiconference (ACSW), pp. 1–6 (2018)

    Google Scholar 

  31. Rahman, M.A., Shoaib, S., et al.: A bayesian optimization framework for the prediction of diabetes mellitus. In: 2019 5th International Conference on Advances in Electrical Engineering (ICAEE), pp. 357–362. IEEE (2019)

    Google Scholar 

  32. Rajagopal, S., Kundapur, P.P., Hareesha, K.S.: A stacking ensemble for network intrusion detection using heterogeneous datasets. Secur. Commun. Netw. 2020, 1–9 (2020)

    Article  Google Scholar 

  33. Su, Y., Zhao, Y., et al.: Robust anomaly detection for multivariate time series through stochastic recurrent neural network. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, pp. 2828–2837 (2019)

    Google Scholar 

  34. Sun, S., Jin, F., et al.: A new hybrid optimization ensemble learning approach for carbon price forecasting. Appl. Math. Model. 97, 182–205 (2021)

    Article  MathSciNet  MATH  Google Scholar 

  35. Sun, Y., Zhao, Y., et al.: HotSpot: Anomaly localization for additive KPIs with multi-dimensional attributes. IEEE Access 6, 10909–10923 (2018)

    Article  Google Scholar 

  36. Tencent. https://github.com/tencent/metis (2019)

  37. Vanerio, J., Casas, P.: Ensemble-learning approaches for network security and anomaly detection. In: Big-DAMA@SIGCOMM, pp. 1–6 (2017)

    Google Scholar 

  38. Wang, Z., Fu, Y., Song, C., Zeng, P., Qiao, L.: Power system anomaly detection based on OCSVM optimized by improved particle swarm optimization. IEEE Access 7, 181580–181588 (2019)

    Article  Google Scholar 

  39. Zhao, Y., Nasrullah, Z., Li, Z.: PyOD: a python toolbox for scalable outlier detection. J. Mach. Learn. Res. 20(96), 1–7 (2019). http://jmlr.org/papers/v20/19-011.html

  40. Zhong, Y., Chen, W., et al.: HELAD: a novel network anomaly detection model based on heterogeneous ensemble learning. Comput. Netw. 169, 107049 (2020)

    Article  Google Scholar 

Download references

Acknowledgements

This work is supported by the National Key Research and Development Program of China (No. 2021YFB2910108).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Jiyan Sun .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Yuan, L. et al. (2023). ATS: A Fully Automatic Troubleshooting System with Efficient Anomaly Detection and Localization. In: Mikyška, J., de Mulatier, C., Paszynski, M., Krzhizhanovskaya, V.V., Dongarra, J.J., Sloot, P.M. (eds) Computational Science – ICCS 2023. ICCS 2023. Lecture Notes in Computer Science, vol 10477. Springer, Cham. https://doi.org/10.1007/978-3-031-36030-5_38

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-36030-5_38

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-36029-9

  • Online ISBN: 978-3-031-36030-5

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics