A Confidence-Guided Anomaly Detection Approach Jointly Using Multiple Machine Learning Algorithms

Xie, Xueshuo; Jin, Zongming; Han, Qingqi; Huang, Shenwei; Li, Tao

doi:10.1007/978-3-030-37352-8_8

Xueshuo Xie¹¹,
Zongming Jin¹¹,
Qingqi Han¹¹,
Shenwei Huang¹¹ &
…
Tao Li¹¹

Part of the book series: Lecture Notes in Computer Science ((LNSC,volume 11983))

Included in the following conference series:

International Symposium on Cyberspace Safety and Security

1127 Accesses
4 Citations

Abstract

Log data contains very rich and valuable information that records system states and behavior, which can be used to diagnose system failures. Anomaly detection from large-scale log data plays a key role in building secure and trustworthy systems. Anomaly detection model based on machine learning has achieved good results in practical applications. However, logs generated by modern large-scale distributed systems are more complex than ever before in terms of data size and variety. Therefore, the traditional single-machine learning anomaly detection model faces the model aging problem. We design an anomaly detection model that combines multiple machine learning algorithms. By using a conformal prediction, we can calculate the confidence of each algorithm for each log to be detected and use statistical analysis to tag them with a trusted label. The approach was tested on the public HDFS_100k log dataset, and the results show that our model is more accurate.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Bodik, P., Goldszmidt, M., Fox, A., Woodard, D.B., Andersen, H.: Fingerprinting the datacenter: automated classification of performance crises. In: Proceedings of the 5th European Conference on Computer Systems, pp. 111–124. ACM (2010)
Google Scholar
Borghesi, A., Bartolini, A., Lombardi, M., Milano, M., Benini, L.: Anomaly detection using autoencoders in high performance computing systems (2018)
Google Scholar
Breunig, M.M., Kriegel, H.P., Ng, R.T., Sander, J.: LOF: identifying density-based local outliers. In: ACM Sigmod Record, vol. 29, pp. 93–104. ACM (2000)
Google Scholar
Chandola, V., Banerjee, A., Kumar, V.: Anomaly detection:a survey. ACM Comput. Surv. 41(3), 1–58 (2009)
Article Google Scholar
Chen, M., Zheng, A.X., Lloyd, J., Jordan, M.I., Brewer, E.: Failure diagnosis using decision trees. In: 2004 Proceedings of the International Conference on Autonomic Computing, pp. 36–43. IEEE (2004)
Google Scholar
Gama, J., Žliobaitė, I., Bifet, A., Pechenizkiy, M., Bouchachia, A.: A survey on concept drift adaptation. ACM Comput. Surv. (CSUR) 46(4), 44 (2014)
Article Google Scholar
He, P., Zhu, J., He, S.: Loglizer (2016). https://github.com/logpai/loglizer
He, S., Zhu, J., He, P., Lyu, M.R.: Experience report: system log analysis for anomaly detection. In: 2016 IEEE 27th International Symposium on Software Reliability Engineering (ISSRE), pp. 207–218. IEEE (2016)
Google Scholar
Jordaney, R., et al.: Transcend: detecting concept drift in malware classification models. In: Proceedings of the 26TH USENIX Security Symposium (USENIX Security 2017), pp. 625–642. USENIX Association (2017)
Google Scholar
Li, S.Z., Jain, A. (eds.): Concept Drift, p. 190. Springer, Boston (2009). https://doi.org/10.1007/978-0-387-73003-5
Book Google Scholar
Liang, Y., Zhang, Y., Xiong, H., Sahoo, R.: Failure prediction in IBM BlueGene/L event logs. In: Seventh IEEE International Conference on Data Mining (ICDM 2007), pp. 583–588. IEEE (2007)
Google Scholar
Lin, Q., Zhang, H., Lou, J.G., Yu, Z., Chen, X.: Log clustering based problem identification for online service systems. In: IEEE/ACM International Conference on Software Engineering Companion (2016)
Google Scholar
Liu, F.T., Ting, K.M., Zhou, Z.H.: Isolation forest. In: 2008 Eighth IEEE International Conference on Data Mining, pp. 413–422. IEEE (2008)
Google Scholar
Lou, J.G., Fu, Q., Yang, S., Xu, Y., Li, J.: Mining invariants from console logs for system problem detection. In: Proceedings of USENIX ATC, pp. 231–244 (2010)
Google Scholar
Makanju, A., Zincir-Heywood, A.N., Milios, E.E.: Fast entropy based alert detection in super computer logs. In: 2010 International Conference on Dependable Systems and Networks Workshops (DSN-W), pp. 52–58. IEEE (2010)
Google Scholar
Min, D., Li, F., Zheng, G., Srikumar, V.: Deeplog: anomaly detection and diagnosis from system logs through deep learning. In: ACM SIGSAC Conference on Computer & Communications Security (2017)
Google Scholar
Oprea, A., Li, Z., Yen, T.F., Chin, S.H., Alrwais, S.: Detection of early-stage enterprise infection by mining large-scale log data. In: 2015 45th Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN), pp. 45–56. IEEE (2015)
Google Scholar
Schölkopf, B., Platt, J.C., Shawe-Taylor, J., Smola, A.J., Williamson, R.C.: Estimating the support of a high-dimensional distribution. Neural Comput. 13(7), 1443–1471 (2001)
Article Google Scholar
Shafer, G., Vovk, V.: A tutorial on conformal prediction. J. Mach. Learn. Res. 9(Mar), 371–421 (2008)
Google Scholar
Tsymbal, A.: The problem of concept drift: definitions and related work. Comput. Sci. Dept. Trinity College Dublin 106(2), 58 (2004)
Google Scholar
Xu, W., Huang, L., Fox, A., Patterson, D., Jordan, M.: Largescale system problem detection by mining console logs. In: Proceedings of SOSP 2009 (2009)
Google Scholar
Xu, W., Huang, L., Fox, A., Patterson, D., Jordan, M.I.: Detecting large-scale system problems by mining console logs. In: Proceedings of the ACM SIGOPS 22nd Symposium on Operating Systems Principles, pp. 117–132. ACM (2009)
Google Scholar

Download references

Acknowledgment

This work is partially supported by the National Key Research and Development Program of China (No. 2018YFB2100300, 2016YFC0400709), the National Natural Science Foundation (No. 61872200), the Natural Science Foundation of Tianjin (18YFYZCG00060) and Nankai University (91922299).

Author information

Authors and Affiliations

College of Computer Science, Nankai University, Tianjin, China
Xueshuo Xie, Zongming Jin, Qingqi Han, Shenwei Huang & Tao Li

Authors

Xueshuo Xie
View author publications
You can also search for this author in PubMed Google Scholar
Zongming Jin
View author publications
You can also search for this author in PubMed Google Scholar
Qingqi Han
View author publications
You can also search for this author in PubMed Google Scholar
Shenwei Huang
View author publications
You can also search for this author in PubMed Google Scholar
Tao Li
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Tao Li .

Editor information

Editors and Affiliations

Rutgers University, Newark, NJ, USA
Jaideep Vaidya
Beihang University, Beijing, China
Xiao Zhang
Guangzhou University, Guangzhou, China
Jin Li

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Xie, X., Jin, Z., Han, Q., Huang, S., Li, T. (2019). A Confidence-Guided Anomaly Detection Approach Jointly Using Multiple Machine Learning Algorithms. In: Vaidya, J., Zhang, X., Li, J. (eds) Cyberspace Safety and Security. CSS 2019. Lecture Notes in Computer Science(), vol 11983. Springer, Cham. https://doi.org/10.1007/978-3-030-37352-8_8

Download citation

DOI: https://doi.org/10.1007/978-3-030-37352-8_8
Published: 03 January 2020
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-37351-1
Online ISBN: 978-3-030-37352-8
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics