Skip to main content

A Confidence-Guided Anomaly Detection Approach Jointly Using Multiple Machine Learning Algorithms

  • Conference paper
  • First Online:
Cyberspace Safety and Security (CSS 2019)

Part of the book series: Lecture Notes in Computer Science ((LNSC,volume 11983))

Included in the following conference series:

Abstract

Log data contains very rich and valuable information that records system states and behavior, which can be used to diagnose system failures. Anomaly detection from large-scale log data plays a key role in building secure and trustworthy systems. Anomaly detection model based on machine learning has achieved good results in practical applications. However, logs generated by modern large-scale distributed systems are more complex than ever before in terms of data size and variety. Therefore, the traditional single-machine learning anomaly detection model faces the model aging problem. We design an anomaly detection model that combines multiple machine learning algorithms. By using a conformal prediction, we can calculate the confidence of each algorithm for each log to be detected and use statistical analysis to tag them with a trusted label. The approach was tested on the public HDFS_100k log dataset, and the results show that our model is more accurate.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Bodik, P., Goldszmidt, M., Fox, A., Woodard, D.B., Andersen, H.: Fingerprinting the datacenter: automated classification of performance crises. In: Proceedings of the 5th European Conference on Computer Systems, pp. 111–124. ACM (2010)

    Google Scholar 

  2. Borghesi, A., Bartolini, A., Lombardi, M., Milano, M., Benini, L.: Anomaly detection using autoencoders in high performance computing systems (2018)

    Google Scholar 

  3. Breunig, M.M., Kriegel, H.P., Ng, R.T., Sander, J.: LOF: identifying density-based local outliers. In: ACM Sigmod Record, vol. 29, pp. 93–104. ACM (2000)

    Google Scholar 

  4. Chandola, V., Banerjee, A., Kumar, V.: Anomaly detection:a survey. ACM Comput. Surv. 41(3), 1–58 (2009)

    Article  Google Scholar 

  5. Chen, M., Zheng, A.X., Lloyd, J., Jordan, M.I., Brewer, E.: Failure diagnosis using decision trees. In: 2004 Proceedings of the International Conference on Autonomic Computing, pp. 36–43. IEEE (2004)

    Google Scholar 

  6. Gama, J., Žliobaitė, I., Bifet, A., Pechenizkiy, M., Bouchachia, A.: A survey on concept drift adaptation. ACM Comput. Surv. (CSUR) 46(4), 44 (2014)

    Article  Google Scholar 

  7. He, P., Zhu, J., He, S.: Loglizer (2016). https://github.com/logpai/loglizer

  8. He, S., Zhu, J., He, P., Lyu, M.R.: Experience report: system log analysis for anomaly detection. In: 2016 IEEE 27th International Symposium on Software Reliability Engineering (ISSRE), pp. 207–218. IEEE (2016)

    Google Scholar 

  9. Jordaney, R., et al.: Transcend: detecting concept drift in malware classification models. In: Proceedings of the 26TH USENIX Security Symposium (USENIX Security 2017), pp. 625–642. USENIX Association (2017)

    Google Scholar 

  10. Li, S.Z., Jain, A. (eds.): Concept Drift, p. 190. Springer, Boston (2009). https://doi.org/10.1007/978-0-387-73003-5

    Book  Google Scholar 

  11. Liang, Y., Zhang, Y., Xiong, H., Sahoo, R.: Failure prediction in IBM BlueGene/L event logs. In: Seventh IEEE International Conference on Data Mining (ICDM 2007), pp. 583–588. IEEE (2007)

    Google Scholar 

  12. Lin, Q., Zhang, H., Lou, J.G., Yu, Z., Chen, X.: Log clustering based problem identification for online service systems. In: IEEE/ACM International Conference on Software Engineering Companion (2016)

    Google Scholar 

  13. Liu, F.T., Ting, K.M., Zhou, Z.H.: Isolation forest. In: 2008 Eighth IEEE International Conference on Data Mining, pp. 413–422. IEEE (2008)

    Google Scholar 

  14. Lou, J.G., Fu, Q., Yang, S., Xu, Y., Li, J.: Mining invariants from console logs for system problem detection. In: Proceedings of USENIX ATC, pp. 231–244 (2010)

    Google Scholar 

  15. Makanju, A., Zincir-Heywood, A.N., Milios, E.E.: Fast entropy based alert detection in super computer logs. In: 2010 International Conference on Dependable Systems and Networks Workshops (DSN-W), pp. 52–58. IEEE (2010)

    Google Scholar 

  16. Min, D., Li, F., Zheng, G., Srikumar, V.: Deeplog: anomaly detection and diagnosis from system logs through deep learning. In: ACM SIGSAC Conference on Computer & Communications Security (2017)

    Google Scholar 

  17. Oprea, A., Li, Z., Yen, T.F., Chin, S.H., Alrwais, S.: Detection of early-stage enterprise infection by mining large-scale log data. In: 2015 45th Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN), pp. 45–56. IEEE (2015)

    Google Scholar 

  18. Schölkopf, B., Platt, J.C., Shawe-Taylor, J., Smola, A.J., Williamson, R.C.: Estimating the support of a high-dimensional distribution. Neural Comput. 13(7), 1443–1471 (2001)

    Article  Google Scholar 

  19. Shafer, G., Vovk, V.: A tutorial on conformal prediction. J. Mach. Learn. Res. 9(Mar), 371–421 (2008)

    Google Scholar 

  20. Tsymbal, A.: The problem of concept drift: definitions and related work. Comput. Sci. Dept. Trinity College Dublin 106(2), 58 (2004)

    Google Scholar 

  21. Xu, W., Huang, L., Fox, A., Patterson, D., Jordan, M.: Largescale system problem detection by mining console logs. In: Proceedings of SOSP 2009 (2009)

    Google Scholar 

  22. Xu, W., Huang, L., Fox, A., Patterson, D., Jordan, M.I.: Detecting large-scale system problems by mining console logs. In: Proceedings of the ACM SIGOPS 22nd Symposium on Operating Systems Principles, pp. 117–132. ACM (2009)

    Google Scholar 

Download references

Acknowledgment

This work is partially supported by the National Key Research and Development Program of China (No. 2018YFB2100300, 2016YFC0400709), the National Natural Science Foundation (No. 61872200), the Natural Science Foundation of Tianjin (18YFYZCG00060) and Nankai University (91922299).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Tao Li .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2019 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Xie, X., Jin, Z., Han, Q., Huang, S., Li, T. (2019). A Confidence-Guided Anomaly Detection Approach Jointly Using Multiple Machine Learning Algorithms. In: Vaidya, J., Zhang, X., Li, J. (eds) Cyberspace Safety and Security. CSS 2019. Lecture Notes in Computer Science(), vol 11983. Springer, Cham. https://doi.org/10.1007/978-3-030-37352-8_8

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-37352-8_8

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-37351-1

  • Online ISBN: 978-3-030-37352-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics