Robust log anomaly detection based on contrastive learning and multi-scale MASS

Wang, Xuejie; Cao, Qilei; Wang, Qiaozheng; Cao, Zhiying; Zhang, Xiuguo; Wang, Peipeng

doi:10.1007/s11227-022-04508-1

Robust log anomaly detection based on contrastive learning and multi-scale MASS

Published: 20 May 2022

Volume 78, pages 17491–17512, (2022)
Cite this article

The Journal of Supercomputing Aims and scope Submit manuscript

Xuejie Wang¹,
Qilei Cao²,
Qiaozheng Wang¹,
Zhiying Cao ORCID: orcid.org/0000-0001-7738-985X¹,
Xiuguo Zhang¹ &
…
Peipeng Wang¹

593 Accesses
6 Citations
Explore all metrics

Abstract

System logs are an important data source for performance monitoring and anomaly detection. Analyzing logs for anomaly detection can improve service quality. At present, although machine learning algorithms for anomaly detection can achieve high accuracy, they lack robustness. The detection model cannot dynamically adapt to changes of logs when system logs contain noises owing to the casualness of the operators or log templates update. In face of this challenge, the paper proposes a robust log anomaly detection method based on contrastive learning and multi-scale Masked Sequence to Sequence (MASS). First, a log feature extraction model integrating the BERT model with contrastive learning is proposed. It can extract effective features by pulling two related normal logs together and pushing apart normal and abnormal logs to ensure that the semantic similarity between normal and abnormal log templates is lower than that between normal log templates, effectively remove abnormal log templates and distinguish log categories to which normal log templates and normal noise log templates belong rather than rudely treating the noise log templates as anomalies, which enhances the robustness of anomaly detection. Then, a multi-scale Masked Sequence to Sequence (MSMASS) model is proposed, the Attention mechanism of the MASS model is replaced with multi-scale Attention to fully learn the context information of different scales of the log sequence, which improves the accuracy of anomaly detection. Contrast experiments are conducted with four baseline methods on common datasets, and the results show that the method proposed in this paper is superior to most existing log-based anomaly detection methods in terms of accuracy and robustness.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 4

LogDP: Combining Dependency and Proximity for Log-Based Anomaly Detection

An Anomaly Detection Algorithm for Logs Based on Self-attention Mechanism and BiGRU Model

TeleDAL: a regression-based template-less unsupervised method for finding anomalies in log sequences

Article Open access 15 May 2023

Notes

https://github.com/logpai/loghub.
https://keras.io. 2015.

References

Zhu J, He S, Liu J, et al (2019) Tools and Benchmarks for Automated Log Parsing. 2019 IEEE/ACM 41st International Conference on Software Engineering: Software Engineering in Practice (ICSE-SEIP), Montreal, QC, Canada
Li T, Ma JF, Sun C (2018) Dlog: diagnosing router events with syslogs for anomaly detection. J Supercomput 74(2):845–867
Article Google Scholar
Yen TF, Oprea A, Onarlioglu K, et al (2013) Beehive: Large-Scale Log Analysis for Detecting Suspicious Activity in Enterprise Networks. Proceedings of the 29th Annual Computer Security Applications Conference. New Orleans, LA, USA
Bao L, Li Q, Lu PY et al (2018) Execution anomaly detection in large-scale systems through console log analysis. J Syst Softw 143:172–186
Article Google Scholar
Duan X, Ying S, Yuan W et al (2021) A generative adversarial networks for log anomaly detection. Comput Syst Sci Eng 37(1):135–148
Article Google Scholar
Lin Q, Zhang H, Lou JG et al (2016) Log Clustering Based Problem Identification for Online Service Systems. ACM 38th International Conference on Software Engineering Companion (ICSE-C), Austin, TX, USA
Lou J G, Fu Q, Yang S et al (2010) Mining Invariants from Console Logs for System Problem Detection. USENIX Annual Technical Conference, Boston, MA, USA
Breier J, Branišová J (2015). Anomaly detection from log files using data mining techniques. Information Science and Applications. Springer, Berlin, Heidelberg, 449–457
Aksu D, Aydin MA (2018) Detecting Port Scan Attempts with Comparative Analysis of Deep Learning and Support Vector Machine Algorithms. 2018 International Congress on Big Data, Deep Learning and Fighting Cyber Terrorism, San Francisco, CA, USA
Wu H, Prasad S (2017) Semi-supervised deep learning using pseudo labels for hyperspectral image classification. IEEE Trans Image Process 27(3):1259–1270
Article MathSciNet MATH Google Scholar
Landauer M, Wurzenberger M, Skopik F et al (2018) Dynamic log file analysis: an unsupervised cluster evolution approach for anomaly detection. Comput Secur, 79: 94–116
Meng W, Liu Y, Zhu Y et al (2019) LogAnomaly: Unsupervised Detection of Sequential and Quantitative Anomalies in Unstructured Logs. IJCAI, Macao, SAR, China
Du M, Li F, Zheng G et al (2017) Deeplog: Anomaly Detection and Diagnosis from System Logs Through Deep Learning. Proceedings of the 2017 ACM SIGSAC Conference on Computer and Communications Security, Dallas, TX, USA
Zhang X, Xu Y, Lin Q et al (2019) Robust Log-based Anomaly Detection on Unstable Log Data. Proceedings of the 2019 27th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering, Tallinn, Estonia
Li X, Chen P, Jing L, et al (2020) SwissLog: Robust and Unified Deep Learning Based Log Anomaly Detection for Diverse Faults. 2020 IEEE 31st ISSRE, Coimbra, Portugal
Studiawan H, Sohel F, Payne C (2020) Anomaly detection in operating system logs with deep learning-based sentiment analysis. IEEE Trans Dependable Secur Comput 18(5):2136–2148
Article Google Scholar
Duan X, Ying S, Yuan W et al (2021) QLLog: a log anomaly detection method based on Q-learning algorithm. Inf Process Manage 58(3):102540
Article Google Scholar
Guo S, Jin Z, Chen Q et al (2021) Interpretable anomaly detection in event sequences via sequence matching and visual comparison. IEEE Trans Visual Comput Graphics. https://doi.org/10.1109/TVCG.2021.3093585
Article Google Scholar
Guo H, Yuan S, Wu X (2021) LogBERT: Log Anomaly Detection via BERT. arXiv preprint arXiv:2103.04475
Luo Z, Hou T, Nguyen TT et al (2020) Log Analytics in HPC: A Data-driven Reinforcement Learning Framework. IEEE INFOCOM 2020-IEEE Conference on Computer Communications Workshops, Toronto, ON, Canada
Devlin J, Chang MW, Lee K et al (2019) Bert: Pre-training of Deep Bidirectional Transformers for Language Understanding. NAACL-HLT, Minneapolis, MN, USA
Nils Reimers, Iryna Gurevych (2019) Sentence-BERT: Sentence Embeddings Using Siamese BERT-Networks. EMNLP/IJCNLP, Hong Kong, China
Han S, Wu Q, Zhang H et al (2021) Log-based anomaly detection with robust feature extraction and online learning. IEEE Trans Inf Forensics Secur 16:2300–2311
Article Google Scholar
Vaarandi R, Pihelgas M (2015) Logcluster-a Data Clustering and Pattern Mining Algorithm for Event Logs. 11th International Conference on Network and Service Management (CNSM), Barcelona, Spain
Makanju A, Zincir-Heywood AN, Milios EE (2011) A lightweight algorithm for message type extraction in system application logs. IEEE Trans Knowl Data Eng 24(11):1921–1936
Article Google Scholar
Fu Q, Lou JG, Wang Y et al (2009) Execution Anomaly Detection in Distributed Systems Through Unstructured Log Analysis. 2009 ninth IEEE International Conference on Data Mining, Miami, Florida, USA
Du M, Li F (2016) Spell: Streaming Parsing of System Event Logs. 2016 IEEE 16th International Conference on Data Mining (ICDM), Barcelona, Spain
He P, Zhu J, Zheng Z et al (2017) Drain: An Online Log Parsing Approach with Fixed Depth Tree. 2017 IEEE International Conference on Web Services (ICWS), Honolulu, HI, USA
Messaoudi S, Panichella A, Bianculli D et al (2018) A Search-Based Approach for Accurate Identification of Log Message Formats. Proceedings of the 26th Conference on Program Comprehension, Gothenburg, Sweden
Gao T, Yao X, Chen D (2021) SimCSE: Simple Contrastive Learning of Sentence Embeddings. arXiv preprint arXiv:2104.08821
Song K, Tan X, Qin T et al (2019) MASS: Masked Sequence to Sequence Pre-training for Language Generation. 36th International Conference on Machine Learning (ICML), Long Beach, California, USA
Guo Q, Qiu X, Liu P et al (2020) Multi-scale Self-attention for Text Classification. Proceedings of the AAAI Conference on Artificial Intelligence, New York City, NY, USA
Vaswani A, Shazeer N, Parmar N et al (2017) Attention is All You Need. 31st Annual Conference on Neural Information Processing Systems (NIPS), Long Beach, CA, USA
Oliner A, Stearley J (2007) What Supercomputers Say: A Study of Five System Logs. 37th Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN'07), Edinburgh, UK

Download references

Funding

This work is supported by the National Key R&D Program of China (Grant No. 2018YFB1601502) and the LiaoNing Revitalization Talents Program (Grant No. XLYC1902071).

Author information

Authors and Affiliations

School of Information Science and Technology, Dalian Maritime University, Dalian, 116026, China
Xuejie Wang, Qiaozheng Wang, Zhiying Cao, Xiuguo Zhang & Peipeng Wang
School of Computer Science and Technology, Shandong Technology and Business University, Yantai, 264003, China
Qilei Cao

Authors

Xuejie Wang
View author publications
You can also search for this author in PubMed Google Scholar
Qilei Cao
View author publications
You can also search for this author in PubMed Google Scholar
Qiaozheng Wang
View author publications
You can also search for this author in PubMed Google Scholar
Zhiying Cao
View author publications
You can also search for this author in PubMed Google Scholar
Xiuguo Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Peipeng Wang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding authors

Correspondence to Qilei Cao or Zhiying Cao.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Wang, X., Cao, Q., Wang, Q. et al. Robust log anomaly detection based on contrastive learning and multi-scale MASS. J Supercomput 78, 17491–17512 (2022). https://doi.org/10.1007/s11227-022-04508-1

Download citation

Accepted: 06 April 2022
Published: 20 May 2022
Issue Date: November 2022
DOI: https://doi.org/10.1007/s11227-022-04508-1

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Robust log anomaly detection based on contrastive learning and multi-scale MASS

Abstract

Access this article

Similar content being viewed by others

LogDP: Combining Dependency and Proximity for Log-Based Anomaly Detection

An Anomaly Detection Algorithm for Logs Based on Self-attention Mechanism and BiGRU Model

TeleDAL: a regression-based template-less unsupervised method for finding anomalies in log sequences

Notes

References

Funding

Author information

Authors and Affiliations

Corresponding authors

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Robust log anomaly detection based on contrastive learning and multi-scale MASS

Abstract

Access this article

Similar content being viewed by others

LogDP: Combining Dependency and Proximity for Log-Based Anomaly Detection

An Anomaly Detection Algorithm for Logs Based on Self-attention Mechanism and BiGRU Model

TeleDAL: a regression-based template-less unsupervised method for finding anomalies in log sequences

Notes

References

Funding

Author information

Authors and Affiliations

Corresponding authors

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation