Skip to main content
Log in

Robust log anomaly detection based on contrastive learning and multi-scale MASS

  • Published:
The Journal of Supercomputing Aims and scope Submit manuscript

Abstract

System logs are an important data source for performance monitoring and anomaly detection. Analyzing logs for anomaly detection can improve service quality. At present, although machine learning algorithms for anomaly detection can achieve high accuracy, they lack robustness. The detection model cannot dynamically adapt to changes of logs when system logs contain noises owing to the casualness of the operators or log templates update. In face of this challenge, the paper proposes a robust log anomaly detection method based on contrastive learning and multi-scale Masked Sequence to Sequence (MASS). First, a log feature extraction model integrating the BERT model with contrastive learning is proposed. It can extract effective features by pulling two related normal logs together and pushing apart normal and abnormal logs to ensure that the semantic similarity between normal and abnormal log templates is lower than that between normal log templates, effectively remove abnormal log templates and distinguish log categories to which normal log templates and normal noise log templates belong rather than rudely treating the noise log templates as anomalies, which enhances the robustness of anomaly detection. Then, a multi-scale Masked Sequence to Sequence (MSMASS) model is proposed, the Attention mechanism of the MASS model is replaced with multi-scale Attention to fully learn the context information of different scales of the log sequence, which improves the accuracy of anomaly detection. Contrast experiments are conducted with four baseline methods on common datasets, and the results show that the method proposed in this paper is superior to most existing log-based anomaly detection methods in terms of accuracy and robustness.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10

Similar content being viewed by others

Notes

  1. https://github.com/logpai/loghub.

  2. https://keras.io. 2015.

References

  1. Zhu J, He S, Liu J, et al (2019) Tools and Benchmarks for Automated Log Parsing. 2019 IEEE/ACM 41st International Conference on Software Engineering: Software Engineering in Practice (ICSE-SEIP), Montreal, QC, Canada

  2. Li T, Ma JF, Sun C (2018) Dlog: diagnosing router events with syslogs for anomaly detection. J Supercomput 74(2):845–867

    Article  Google Scholar 

  3. Yen TF, Oprea A, Onarlioglu K, et al (2013) Beehive: Large-Scale Log Analysis for Detecting Suspicious Activity in Enterprise Networks. Proceedings of the 29th Annual Computer Security Applications Conference. New Orleans, LA, USA

  4. Bao L, Li Q, Lu PY et al (2018) Execution anomaly detection in large-scale systems through console log analysis. J Syst Softw 143:172–186

    Article  Google Scholar 

  5. Duan X, Ying S, Yuan W et al (2021) A generative adversarial networks for log anomaly detection. Comput Syst Sci Eng 37(1):135–148

    Article  Google Scholar 

  6. Lin Q, Zhang H, Lou JG et al (2016) Log Clustering Based Problem Identification for Online Service Systems. ACM 38th International Conference on Software Engineering Companion (ICSE-C), Austin, TX, USA

  7. Lou J G, Fu Q, Yang S et al (2010) Mining Invariants from Console Logs for System Problem Detection. USENIX Annual Technical Conference, Boston, MA, USA

  8. Breier J, Branišová J (2015). Anomaly detection from log files using data mining techniques. Information Science and Applications. Springer, Berlin, Heidelberg, 449–457

  9. Aksu D, Aydin MA (2018) Detecting Port Scan Attempts with Comparative Analysis of Deep Learning and Support Vector Machine Algorithms. 2018 International Congress on Big Data, Deep Learning and Fighting Cyber Terrorism, San Francisco, CA, USA

  10. Wu H, Prasad S (2017) Semi-supervised deep learning using pseudo labels for hyperspectral image classification. IEEE Trans Image Process 27(3):1259–1270

    Article  MathSciNet  MATH  Google Scholar 

  11. Landauer M, Wurzenberger M, Skopik F et al (2018) Dynamic log file analysis: an unsupervised cluster evolution approach for anomaly detection. Comput Secur, 79: 94–116

  12. Meng W, Liu Y, Zhu Y et al (2019) LogAnomaly: Unsupervised Detection of Sequential and Quantitative Anomalies in Unstructured Logs. IJCAI, Macao, SAR, China

  13. Du M, Li F, Zheng G et al (2017) Deeplog: Anomaly Detection and Diagnosis from System Logs Through Deep Learning. Proceedings of the 2017 ACM SIGSAC Conference on Computer and Communications Security, Dallas, TX, USA

  14. Zhang X, Xu Y, Lin Q et al (2019) Robust Log-based Anomaly Detection on Unstable Log Data. Proceedings of the 2019 27th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering, Tallinn, Estonia

  15. Li X, Chen P, Jing L, et al (2020) SwissLog: Robust and Unified Deep Learning Based Log Anomaly Detection for Diverse Faults. 2020 IEEE 31st ISSRE, Coimbra, Portugal

  16. Studiawan H, Sohel F, Payne C (2020) Anomaly detection in operating system logs with deep learning-based sentiment analysis. IEEE Trans Dependable Secur Comput 18(5):2136–2148

    Article  Google Scholar 

  17. Duan X, Ying S, Yuan W et al (2021) QLLog: a log anomaly detection method based on Q-learning algorithm. Inf Process Manage 58(3):102540

    Article  Google Scholar 

  18. Guo S, Jin Z, Chen Q et al (2021) Interpretable anomaly detection in event sequences via sequence matching and visual comparison. IEEE Trans Visual Comput Graphics. https://doi.org/10.1109/TVCG.2021.3093585

    Article  Google Scholar 

  19. Guo H, Yuan S, Wu X (2021) LogBERT: Log Anomaly Detection via BERT. arXiv preprint arXiv:2103.04475

  20. Luo Z, Hou T, Nguyen TT et al (2020) Log Analytics in HPC: A Data-driven Reinforcement Learning Framework. IEEE INFOCOM 2020-IEEE Conference on Computer Communications Workshops, Toronto, ON, Canada

  21. Devlin J, Chang MW, Lee K et al (2019) Bert: Pre-training of Deep Bidirectional Transformers for Language Understanding. NAACL-HLT, Minneapolis, MN, USA

  22. Nils Reimers, Iryna Gurevych (2019) Sentence-BERT: Sentence Embeddings Using Siamese BERT-Networks. EMNLP/IJCNLP, Hong Kong, China

  23. Han S, Wu Q, Zhang H et al (2021) Log-based anomaly detection with robust feature extraction and online learning. IEEE Trans Inf Forensics Secur 16:2300–2311

    Article  Google Scholar 

  24. Vaarandi R, Pihelgas M (2015) Logcluster-a Data Clustering and Pattern Mining Algorithm for Event Logs. 11th International Conference on Network and Service Management (CNSM), Barcelona, Spain

  25. Makanju A, Zincir-Heywood AN, Milios EE (2011) A lightweight algorithm for message type extraction in system application logs. IEEE Trans Knowl Data Eng 24(11):1921–1936

    Article  Google Scholar 

  26. Fu Q, Lou JG, Wang Y et al (2009) Execution Anomaly Detection in Distributed Systems Through Unstructured Log Analysis. 2009 ninth IEEE International Conference on Data Mining, Miami, Florida, USA

  27. Du M, Li F (2016) Spell: Streaming Parsing of System Event Logs. 2016 IEEE 16th International Conference on Data Mining (ICDM), Barcelona, Spain

  28. He P, Zhu J, Zheng Z et al (2017) Drain: An Online Log Parsing Approach with Fixed Depth Tree. 2017 IEEE International Conference on Web Services (ICWS), Honolulu, HI, USA

  29. Messaoudi S, Panichella A, Bianculli D et al (2018) A Search-Based Approach for Accurate Identification of Log Message Formats. Proceedings of the 26th Conference on Program Comprehension, Gothenburg, Sweden

  30. Gao T, Yao X, Chen D (2021) SimCSE: Simple Contrastive Learning of Sentence Embeddings. arXiv preprint arXiv:2104.08821

  31. Song K, Tan X, Qin T et al (2019) MASS: Masked Sequence to Sequence Pre-training for Language Generation. 36th International Conference on Machine Learning (ICML), Long Beach, California, USA

  32. Guo Q, Qiu X, Liu P et al (2020) Multi-scale Self-attention for Text Classification. Proceedings of the AAAI Conference on Artificial Intelligence, New York City, NY, USA

  33. Vaswani A, Shazeer N, Parmar N et al (2017) Attention is All You Need. 31st Annual Conference on Neural Information Processing Systems (NIPS), Long Beach, CA, USA

  34. Oliner A, Stearley J (2007) What Supercomputers Say: A Study of Five System Logs. 37th Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN'07), Edinburgh, UK

Download references

Funding

This work is supported by the National Key R&D Program of China (Grant No. 2018YFB1601502) and the LiaoNing Revitalization Talents Program (Grant No. XLYC1902071).

Author information

Authors and Affiliations

Authors

Corresponding authors

Correspondence to Qilei Cao or Zhiying Cao.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Wang, X., Cao, Q., Wang, Q. et al. Robust log anomaly detection based on contrastive learning and multi-scale MASS. J Supercomput 78, 17491–17512 (2022). https://doi.org/10.1007/s11227-022-04508-1

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11227-022-04508-1

Keywords

Navigation