Abstract
System logs are an important data source for performance monitoring and anomaly detection. Analyzing logs for anomaly detection can improve service quality. At present, although machine learning algorithms for anomaly detection can achieve high accuracy, they lack robustness. The detection model cannot dynamically adapt to changes of logs when system logs contain noises owing to the casualness of the operators or log templates update. In face of this challenge, the paper proposes a robust log anomaly detection method based on contrastive learning and multi-scale Masked Sequence to Sequence (MASS). First, a log feature extraction model integrating the BERT model with contrastive learning is proposed. It can extract effective features by pulling two related normal logs together and pushing apart normal and abnormal logs to ensure that the semantic similarity between normal and abnormal log templates is lower than that between normal log templates, effectively remove abnormal log templates and distinguish log categories to which normal log templates and normal noise log templates belong rather than rudely treating the noise log templates as anomalies, which enhances the robustness of anomaly detection. Then, a multi-scale Masked Sequence to Sequence (MSMASS) model is proposed, the Attention mechanism of the MASS model is replaced with multi-scale Attention to fully learn the context information of different scales of the log sequence, which improves the accuracy of anomaly detection. Contrast experiments are conducted with four baseline methods on common datasets, and the results show that the method proposed in this paper is superior to most existing log-based anomaly detection methods in terms of accuracy and robustness.
Similar content being viewed by others
Notes
References
Zhu J, He S, Liu J, et al (2019) Tools and Benchmarks for Automated Log Parsing. 2019 IEEE/ACM 41st International Conference on Software Engineering: Software Engineering in Practice (ICSE-SEIP), Montreal, QC, Canada
Li T, Ma JF, Sun C (2018) Dlog: diagnosing router events with syslogs for anomaly detection. J Supercomput 74(2):845–867
Yen TF, Oprea A, Onarlioglu K, et al (2013) Beehive: Large-Scale Log Analysis for Detecting Suspicious Activity in Enterprise Networks. Proceedings of the 29th Annual Computer Security Applications Conference. New Orleans, LA, USA
Bao L, Li Q, Lu PY et al (2018) Execution anomaly detection in large-scale systems through console log analysis. J Syst Softw 143:172–186
Duan X, Ying S, Yuan W et al (2021) A generative adversarial networks for log anomaly detection. Comput Syst Sci Eng 37(1):135–148
Lin Q, Zhang H, Lou JG et al (2016) Log Clustering Based Problem Identification for Online Service Systems. ACM 38th International Conference on Software Engineering Companion (ICSE-C), Austin, TX, USA
Lou J G, Fu Q, Yang S et al (2010) Mining Invariants from Console Logs for System Problem Detection. USENIX Annual Technical Conference, Boston, MA, USA
Breier J, Branišová J (2015). Anomaly detection from log files using data mining techniques. Information Science and Applications. Springer, Berlin, Heidelberg, 449–457
Aksu D, Aydin MA (2018) Detecting Port Scan Attempts with Comparative Analysis of Deep Learning and Support Vector Machine Algorithms. 2018 International Congress on Big Data, Deep Learning and Fighting Cyber Terrorism, San Francisco, CA, USA
Wu H, Prasad S (2017) Semi-supervised deep learning using pseudo labels for hyperspectral image classification. IEEE Trans Image Process 27(3):1259–1270
Landauer M, Wurzenberger M, Skopik F et al (2018) Dynamic log file analysis: an unsupervised cluster evolution approach for anomaly detection. Comput Secur, 79: 94–116
Meng W, Liu Y, Zhu Y et al (2019) LogAnomaly: Unsupervised Detection of Sequential and Quantitative Anomalies in Unstructured Logs. IJCAI, Macao, SAR, China
Du M, Li F, Zheng G et al (2017) Deeplog: Anomaly Detection and Diagnosis from System Logs Through Deep Learning. Proceedings of the 2017 ACM SIGSAC Conference on Computer and Communications Security, Dallas, TX, USA
Zhang X, Xu Y, Lin Q et al (2019) Robust Log-based Anomaly Detection on Unstable Log Data. Proceedings of the 2019 27th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering, Tallinn, Estonia
Li X, Chen P, Jing L, et al (2020) SwissLog: Robust and Unified Deep Learning Based Log Anomaly Detection for Diverse Faults. 2020 IEEE 31st ISSRE, Coimbra, Portugal
Studiawan H, Sohel F, Payne C (2020) Anomaly detection in operating system logs with deep learning-based sentiment analysis. IEEE Trans Dependable Secur Comput 18(5):2136–2148
Duan X, Ying S, Yuan W et al (2021) QLLog: a log anomaly detection method based on Q-learning algorithm. Inf Process Manage 58(3):102540
Guo S, Jin Z, Chen Q et al (2021) Interpretable anomaly detection in event sequences via sequence matching and visual comparison. IEEE Trans Visual Comput Graphics. https://doi.org/10.1109/TVCG.2021.3093585
Guo H, Yuan S, Wu X (2021) LogBERT: Log Anomaly Detection via BERT. arXiv preprint arXiv:2103.04475
Luo Z, Hou T, Nguyen TT et al (2020) Log Analytics in HPC: A Data-driven Reinforcement Learning Framework. IEEE INFOCOM 2020-IEEE Conference on Computer Communications Workshops, Toronto, ON, Canada
Devlin J, Chang MW, Lee K et al (2019) Bert: Pre-training of Deep Bidirectional Transformers for Language Understanding. NAACL-HLT, Minneapolis, MN, USA
Nils Reimers, Iryna Gurevych (2019) Sentence-BERT: Sentence Embeddings Using Siamese BERT-Networks. EMNLP/IJCNLP, Hong Kong, China
Han S, Wu Q, Zhang H et al (2021) Log-based anomaly detection with robust feature extraction and online learning. IEEE Trans Inf Forensics Secur 16:2300–2311
Vaarandi R, Pihelgas M (2015) Logcluster-a Data Clustering and Pattern Mining Algorithm for Event Logs. 11th International Conference on Network and Service Management (CNSM), Barcelona, Spain
Makanju A, Zincir-Heywood AN, Milios EE (2011) A lightweight algorithm for message type extraction in system application logs. IEEE Trans Knowl Data Eng 24(11):1921–1936
Fu Q, Lou JG, Wang Y et al (2009) Execution Anomaly Detection in Distributed Systems Through Unstructured Log Analysis. 2009 ninth IEEE International Conference on Data Mining, Miami, Florida, USA
Du M, Li F (2016) Spell: Streaming Parsing of System Event Logs. 2016 IEEE 16th International Conference on Data Mining (ICDM), Barcelona, Spain
He P, Zhu J, Zheng Z et al (2017) Drain: An Online Log Parsing Approach with Fixed Depth Tree. 2017 IEEE International Conference on Web Services (ICWS), Honolulu, HI, USA
Messaoudi S, Panichella A, Bianculli D et al (2018) A Search-Based Approach for Accurate Identification of Log Message Formats. Proceedings of the 26th Conference on Program Comprehension, Gothenburg, Sweden
Gao T, Yao X, Chen D (2021) SimCSE: Simple Contrastive Learning of Sentence Embeddings. arXiv preprint arXiv:2104.08821
Song K, Tan X, Qin T et al (2019) MASS: Masked Sequence to Sequence Pre-training for Language Generation. 36th International Conference on Machine Learning (ICML), Long Beach, California, USA
Guo Q, Qiu X, Liu P et al (2020) Multi-scale Self-attention for Text Classification. Proceedings of the AAAI Conference on Artificial Intelligence, New York City, NY, USA
Vaswani A, Shazeer N, Parmar N et al (2017) Attention is All You Need. 31st Annual Conference on Neural Information Processing Systems (NIPS), Long Beach, CA, USA
Oliner A, Stearley J (2007) What Supercomputers Say: A Study of Five System Logs. 37th Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN'07), Edinburgh, UK
Funding
This work is supported by the National Key R&D Program of China (Grant No. 2018YFB1601502) and the LiaoNing Revitalization Talents Program (Grant No. XLYC1902071).
Author information
Authors and Affiliations
Corresponding authors
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Wang, X., Cao, Q., Wang, Q. et al. Robust log anomaly detection based on contrastive learning and multi-scale MASS. J Supercomput 78, 17491–17512 (2022). https://doi.org/10.1007/s11227-022-04508-1
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11227-022-04508-1