Log anomaly detection based on BERT

Tang, Pan; Guan, Yepeng

doi:10.1007/s11760-024-03327-6

Log anomaly detection based on BERT

Original Paper
Published: 13 June 2024

Volume 18, pages 6431–6441, (2024)
Cite this article

Signal, Image and Video Processing Aims and scope Submit manuscript

Pan Tang¹ &
Yepeng Guan^1,2,3

426 Accesses
Explore all metrics

Abstract

With the increasing complexity of computing clusters and large-scale network systems, anomaly detection based on logs has gained significant attention to identify system issues caused by machine failures or malicious attacks. To capture contextual information and local features in log sequences effectively, BERT (Bidirectional Encoder Representation from Transformers) with separated score attention and dual-branch (SD-BERT), a log anomaly detection method derived from BERT encoder blocks is introduced. SD-BERT employs normal log sequences as the training data and is trained by predicting masked log keys. In SD-BERT, taking into account the characteristics of log anomaly detection tasks, we redesign the scoring mechanism and propose the separated score attention (SSA). This helps enhance the model's attention towards different tokens and positions in a sequence. Since log sequence anomalies are related to partial segments in the sequence, a dual-branch module is designed with an SSA branch and a convolutional branch. The SSA branch is capable of capturing the global context related to the abnormal position, while the convolutional branch helps capture local abnormal details. This dual-branch design enables the model to have a more comprehensive understanding and detection of anomalous behavior in log sequences. A series of comparative experiments are conducted on HDFS, BGL, and Thunderbird datasets. The experimental results demonstrate that SD-BERT exhibits comparable or superior performance in contrast to the compared models, confirming the superiority of SD-BERT in log anomaly detection.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

LogAttn: Unsupervised Log Anomaly Detection with an AutoEncoder Based Attention Mechanism

An Anomaly Detection Algorithm for Logs Based on Self-attention Mechanism and BiGRU Model

Log-Based Anomaly Detection with Multi-Head Scaled Dot-Product Attention Mechanism

Data availability

The HDFS, BGL and Thunderbird datasets used in this paper are publicly available. The datasets can be acquired from the following links. HDFS: https://github.com/logpai/loghub/tree/master/HDFS, BGL: https://github.com/logpai/loghub/tree/master/BGL, ThunderBird: https://github.com/logpai/loghub/tree/master/Thunderbird.

References

Xie, Y., Yang, K.: Domain adaptive log anomaly prediction for hadoop system. IEEE Internet Things J. 9(20), 20778–20787 (2022)
Article Google Scholar
Xu, W., Huang, L., Fox, A., et al.: Detecting large-scale system problems by mining console logs. In: Proceedings of the ACM SIGOPS 22nd Symposium on Operating Systems Principles, 117–132 2009
Oliner, A., Stearley, J.: What supercomputers say: a study of five system logs. In: Proceedings of the 37th annual IEEE/IFIP International Conference on Dependable Systems and Networks, 575–584 2007
Zhu, J., He, S., He, P., et al.: Loghub: a large collection of system log datasets for ai-driven log analytics. In: Proceedings of the 34th International Symposium on Software Reliability Engineering, 355–366 2023
Landauer, M., Onder, S., Skopik, F., et al.: Deep learning for anomaly detection in log data: a survey. Mach. Learn. Appl. 12, 1–21 (2023)
Google Scholar
Egersdoerfer, C., Zhang, D., Dai, D.: ClusterLog: clustering Logs for effective log-based anomaly detection. In: Proceedings of IEEE/ACM 12th Workshop on Fault Tolerance for HPC at eXtreme Scale (FTXS), 1–10 2022
Qin, T., Gao, Y., Wei, L., et al.: Potential threats mining methods based on correlation analysis of multi-type logs. IET Netw 7(5), 299–305 (2018)
Article Google Scholar
Lu, S., Wei, X., Li, Y., et al.: Detecting anomaly in big data system logs using convolutional neural network. In: Proceedings of the 16th International Conference on Pervasive Intelligence and Computing, 151–158 2018
Brown, A., Tuor, A., Hutchinson, B., et al.: Recurrent neural network attention mechanisms for interpretable system log anomaly detection. In: Proceedings of the first workshop on machine learning for computing systems, 1–8 2018
Vaswani, A., Shazeer, N., Parmar, N., et al.: Attention is all you need. Adv. Neural. Inf. Process. Syst. 30, 30–45 (2017)
Google Scholar
Devlin, J., Chang, M.-W., Lee, K., et al.: Bert: pre-training of deep bidirectional transformers for language understanding. In: Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics, 4171–4186 2019
Cinque, M., Cotroneo, D., Pecchia, A.: Event logs for the analysis of software failures: a rule-based approach. IEEE Trans. Software Eng. 39(6), 806–821 (2012)
Article Google Scholar
Yen, T.-F., Oprea, A., Onarlioglu, K., et al.: Beehive: large-scale log analysis for detecting suspicious activity in enterprise networks. In: Proceedings of the 29th annual Computer Security Applications Conference, 199–208 2013
Bodik, P., Goldszmidt, M., Fox, A., et al. Fingerprinting the datacenter: automated classification of performance crises. In: Proceedings of the 5th European Conference on Computer Systems, 111–124 2010
Malek, Z.S., Trivedi, B., Shah, A.: User behavior-based intrusion detection using statistical techniques. In: Proceedings of Advanced Informatics for Computing Research: Second International Conference, 480–489 2019
Chen, M., Zheng, A.X., Lloyd, J., et al. Failure diagnosis using decision trees. In: Proceedings of the International Conference on Autonomic Computing, 36–43 2004
Pasha, D., Shah, A.H., Zadeh, E.H., et al.: Anomaly detection and root cause analysis on log data. In: Proceedings of International Conference on Innovative Techniques and Applications of Artificial Intelligence, 333–339 2022
Lin, Q., Zhang, H., Lou, J.-G., et al.: Log clustering based problem identification for online service systems. In: Proceedings of the 38th International Conference on Software Engineering Companion, 102–111 2016
Cheng, H., Xu, D., Yuan, S.: Explainable sequential anomaly detection via prototypes. In: Proceedings of International Joint Conference on Neural Networks, 1–8 2023
Siwach, M., Mann, S.: Anomaly detection for weblog data analysis using weighted PCA technique. J. Inf. Optim. Sci. 43(1), 131–141 (2022)
Google Scholar
Sinha, R., Sur, R., Sharma, R., et al.: Anomaly detection using system logs: a deep learning approach. Int. J. Inf. Secur. Priv. 16(1), 1–15 (2022)
Article Google Scholar
Wang, Z., Tian, J., Fang, H., et al.: LightLog: a lightweight temporal convolutional network for log anomaly detection on the edge. Comput. Netw. 203, 108616 (2022)
Article Google Scholar
Zhang, L., Li, W., Zhang, Z., et al.: LogAttn: ansupervised log anomaly detection with an AutoEncoder based attention mechanism. In: Proceedings of International Conference on Knowledge Science, Engineering and Management, 222–235 2021
Du, M., Li, F., Zheng, G., et al.: Deeplog: anomaly detection and diagnosis from system logs through deep learning. In: Proceedings of the 2017 ACM SIGSAC Conference on Computer and Communications Security, 1285–1298 2017
Zhang, X., Xu, Y., Lin, Q., et al.: Robust log-based anomaly detection on unstable log data. In: Proceedings of the 27th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering, 807–817 2019
Meng, W., Liu, Y., Zhu, Y., et al.: Loganomaly: unsupervised detection of sequential and quantitative anomalies in unstructured logs. In: Proceedings of International Joint Conference on Artificial Intelligence, 4739–4745 2019
Hu, C., Sun, X., Dai, H., et al.: Research on log anomaly detection based on sentence-BERT. Electronics 12(17), 3580–3596 (2023)
Article Google Scholar
Syngal, S., Verma, S., Karthik, K., et al.: Server-Language processing: a semi-supervised approach to server failure detection. In: Proceedings of the 2nd International Conference on Computing, Networks and Internet of Things, 1–7 2021
Li, X., Chen, P., Jing, L., et al.: SwissLog: robust anomaly detection and localization for interleaved unstructured logs. IEEE Trans. Dependable Secure Comput. 20(4), 2762–2780 (2022)
Article Google Scholar
Dong, S., Wang, L., Zeng, L., et al.: Fracture identification in reservoirs using well log data by window sliding recurrent neural network. Geoenergy Sci. Eng. 230, 1–13 (2023)
Article Google Scholar
Guo, H., Yuan, S., Wu, X.: Logbert: log anomaly detection via bert. In: Proceedings of International Joint Conference on Neural Networks, 1–8 2021
Zhang, S., Liu, Y., Zhang, X., et al.: Cat: beyond efficient transformer for content-aware anomaly detection in event sequences. In: Proceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, 4541–4550 2022
Lee, Y., Kim, J., Kang, P.: Lanobert: system log anomaly detection based on bert masked language model. Appl. Soft Comput. 146, 1–14 (2023)
Article Google Scholar
Huang, S., Liu, Y., Fung, C., et al.: Improving log-based anomaly detection by pre-training hierarchical transformers. IEEE Trans. Comput. 72(9), 2656–2667 (2023)
Article Google Scholar
Yu, S., He, P., Chen, N., et al.: Brain: log parsing with bidirectional parallel tree. IEEE Trans. Serv. Comput. 16(5), 3224–3237 (2023)
Article Google Scholar
He, P., Zhu, J., Zheng, Z., et al.: Drain: an online log parsing approach with fixed depth tree. In: Proceedings of IEEE International Conference on Web Services, 33–40 2017
Du, M., Li, F.: Spell: streaming parsing of system event logs. In: Proceedings of the 16th International Conference on Data Mining, 859–864 2016
Sedki, I., Hamou-Lhadj, A., Ait-Mohamed, O., et al.: An effective approach for parsing large log files. In: Proceedings of IEEE International Conference on Software Maintenance and Evolution, 1–12 2022

Download references

Funding

This work is supported in part by National Key R&D program of China (Grant No. 2020YFC1523004).

Author information

Authors and Affiliations

School of Communication and Information Engineering, Shanghai University, Shanghai, 200444, China
Pan Tang & Yepeng Guan
Key Laboratory of Advanced Display and System Application, Ministry of Education, Shanghai, 200072, China
Yepeng Guan
Key Laboratory of Silicate Cultural Relics Conservation (Shanghai University), Ministry of Education, Shanghai, 200444, China
Yepeng Guan

Authors

Pan Tang
View author publications
You can also search for this author inPubMed Google Scholar
Yepeng Guan
View author publications
You can also search for this author inPubMed Google Scholar

Contributions

P.T. presented the innovation of paper, designed and carried out the experiments, analyzed the result of the experiments. Y.G. contributed to the modification of the manuscript. All authors reviewed the manuscript.

Corresponding author

Correspondence to Yepeng Guan.

Ethics declarations

Conflict of interest

The authors declare that no conflict of interest.

Ethical approval

Not applicable.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Tang, P., Guan, Y. Log anomaly detection based on BERT. SIViP 18, 6431–6441 (2024). https://doi.org/10.1007/s11760-024-03327-6

Download citation

Received: 03 January 2024
Revised: 24 March 2024
Accepted: 27 May 2024
Published: 13 June 2024
Issue Date: September 2024
DOI: https://doi.org/10.1007/s11760-024-03327-6

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Log anomaly detection based on BERT

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

LogAttn: Unsupervised Log Anomaly Detection with an AutoEncoder Based Attention Mechanism

An Anomaly Detection Algorithm for Logs Based on Self-attention Mechanism and BiGRU Model

Log-Based Anomaly Detection with Multi-Head Scaled Dot-Product Attention Mechanism

Data availability

References

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Conflict of interest

Ethical approval

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now