Skip to main content
Log in

Log anomaly detection based on BERT

  • Original Paper
  • Published:
Signal, Image and Video Processing Aims and scope Submit manuscript

Abstract

With the increasing complexity of computing clusters and large-scale network systems, anomaly detection based on logs has gained significant attention to identify system issues caused by machine failures or malicious attacks. To capture contextual information and local features in log sequences effectively, BERT (Bidirectional Encoder Representation from Transformers) with separated score attention and dual-branch (SD-BERT), a log anomaly detection method derived from BERT encoder blocks is introduced. SD-BERT employs normal log sequences as the training data and is trained by predicting masked log keys. In SD-BERT, taking into account the characteristics of log anomaly detection tasks, we redesign the scoring mechanism and propose the separated score attention (SSA). This helps enhance the model's attention towards different tokens and positions in a sequence. Since log sequence anomalies are related to partial segments in the sequence, a dual-branch module is designed with an SSA branch and a convolutional branch. The SSA branch is capable of capturing the global context related to the abnormal position, while the convolutional branch helps capture local abnormal details. This dual-branch design enables the model to have a more comprehensive understanding and detection of anomalous behavior in log sequences. A series of comparative experiments are conducted on HDFS, BGL, and Thunderbird datasets. The experimental results demonstrate that SD-BERT exhibits comparable or superior performance in contrast to the compared models, confirming the superiority of SD-BERT in log anomaly detection.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6

Similar content being viewed by others

Data availability

The HDFS, BGL and Thunderbird datasets used in this paper are publicly available. The datasets can be acquired from the following links. HDFS: https://github.com/logpai/loghub/tree/master/HDFS, BGL: https://github.com/logpai/loghub/tree/master/BGL, ThunderBird: https://github.com/logpai/loghub/tree/master/Thunderbird.

References

  1. Xie, Y., Yang, K.: Domain adaptive log anomaly prediction for hadoop system. IEEE Internet Things J. 9(20), 20778–20787 (2022)

    Article  Google Scholar 

  2. Xu, W., Huang, L., Fox, A., et al.: Detecting large-scale system problems by mining console logs. In: Proceedings of the ACM SIGOPS 22nd Symposium on Operating Systems Principles, 117–132 2009

  3. Oliner, A., Stearley, J.: What supercomputers say: a study of five system logs. In: Proceedings of the 37th annual IEEE/IFIP International Conference on Dependable Systems and Networks, 575–584 2007

  4. Zhu, J., He, S., He, P., et al.: Loghub: a large collection of system log datasets for ai-driven log analytics. In: Proceedings of the 34th International Symposium on Software Reliability Engineering, 355–366 2023

  5. Landauer, M., Onder, S., Skopik, F., et al.: Deep learning for anomaly detection in log data: a survey. Mach. Learn. Appl. 12, 1–21 (2023)

    Google Scholar 

  6. Egersdoerfer, C., Zhang, D., Dai, D.: ClusterLog: clustering Logs for effective log-based anomaly detection. In: Proceedings of IEEE/ACM 12th Workshop on Fault Tolerance for HPC at eXtreme Scale (FTXS), 1–10 2022

  7. Qin, T., Gao, Y., Wei, L., et al.: Potential threats mining methods based on correlation analysis of multi-type logs. IET Netw 7(5), 299–305 (2018)

    Article  Google Scholar 

  8. Lu, S., Wei, X., Li, Y., et al.: Detecting anomaly in big data system logs using convolutional neural network. In: Proceedings of the 16th International Conference on Pervasive Intelligence and Computing, 151–158 2018

  9. Brown, A., Tuor, A., Hutchinson, B., et al.: Recurrent neural network attention mechanisms for interpretable system log anomaly detection. In: Proceedings of the first workshop on machine learning for computing systems, 1–8 2018

  10. Vaswani, A., Shazeer, N., Parmar, N., et al.: Attention is all you need. Adv. Neural. Inf. Process. Syst. 30, 30–45 (2017)

    Google Scholar 

  11. Devlin, J., Chang, M.-W., Lee, K., et al.: Bert: pre-training of deep bidirectional transformers for language understanding. In: Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics, 4171–4186 2019

  12. Cinque, M., Cotroneo, D., Pecchia, A.: Event logs for the analysis of software failures: a rule-based approach. IEEE Trans. Software Eng. 39(6), 806–821 (2012)

    Article  Google Scholar 

  13. Yen, T.-F., Oprea, A., Onarlioglu, K., et al.: Beehive: large-scale log analysis for detecting suspicious activity in enterprise networks. In: Proceedings of the 29th annual Computer Security Applications Conference, 199–208 2013

  14. Bodik, P., Goldszmidt, M., Fox, A., et al. Fingerprinting the datacenter: automated classification of performance crises. In: Proceedings of the 5th European Conference on Computer Systems, 111–124 2010

  15. Malek, Z.S., Trivedi, B., Shah, A.: User behavior-based intrusion detection using statistical techniques. In: Proceedings of Advanced Informatics for Computing Research: Second International Conference, 480–489 2019

  16. Chen, M., Zheng, A.X., Lloyd, J., et al. Failure diagnosis using decision trees. In: Proceedings of the International Conference on Autonomic Computing, 36–43 2004

  17. Pasha, D., Shah, A.H., Zadeh, E.H., et al.: Anomaly detection and root cause analysis on log data. In: Proceedings of International Conference on Innovative Techniques and Applications of Artificial Intelligence, 333–339 2022

  18. Lin, Q., Zhang, H., Lou, J.-G., et al.: Log clustering based problem identification for online service systems. In: Proceedings of the 38th International Conference on Software Engineering Companion, 102–111 2016

  19. Cheng, H., Xu, D., Yuan, S.: Explainable sequential anomaly detection via prototypes. In: Proceedings of International Joint Conference on Neural Networks, 1–8 2023

  20. Siwach, M., Mann, S.: Anomaly detection for weblog data analysis using weighted PCA technique. J. Inf. Optim. Sci. 43(1), 131–141 (2022)

    Google Scholar 

  21. Sinha, R., Sur, R., Sharma, R., et al.: Anomaly detection using system logs: a deep learning approach. Int. J. Inf. Secur. Priv. 16(1), 1–15 (2022)

    Article  Google Scholar 

  22. Wang, Z., Tian, J., Fang, H., et al.: LightLog: a lightweight temporal convolutional network for log anomaly detection on the edge. Comput. Netw. 203, 108616 (2022)

    Article  Google Scholar 

  23. Zhang, L., Li, W., Zhang, Z., et al.: LogAttn: ansupervised log anomaly detection with an AutoEncoder based attention mechanism. In: Proceedings of International Conference on Knowledge Science, Engineering and Management, 222–235 2021

  24. Du, M., Li, F., Zheng, G., et al.: Deeplog: anomaly detection and diagnosis from system logs through deep learning. In: Proceedings of the 2017 ACM SIGSAC Conference on Computer and Communications Security, 1285–1298 2017

  25. Zhang, X., Xu, Y., Lin, Q., et al.: Robust log-based anomaly detection on unstable log data. In: Proceedings of the 27th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering, 807–817 2019

  26. Meng, W., Liu, Y., Zhu, Y., et al.: Loganomaly: unsupervised detection of sequential and quantitative anomalies in unstructured logs. In: Proceedings of International Joint Conference on Artificial Intelligence, 4739–4745 2019

  27. Hu, C., Sun, X., Dai, H., et al.: Research on log anomaly detection based on sentence-BERT. Electronics 12(17), 3580–3596 (2023)

    Article  Google Scholar 

  28. Syngal, S., Verma, S., Karthik, K., et al.: Server-Language processing: a semi-supervised approach to server failure detection. In: Proceedings of the 2nd International Conference on Computing, Networks and Internet of Things, 1–7 2021

  29. Li, X., Chen, P., Jing, L., et al.: SwissLog: robust anomaly detection and localization for interleaved unstructured logs. IEEE Trans. Dependable Secure Comput. 20(4), 2762–2780 (2022)

    Article  Google Scholar 

  30. Dong, S., Wang, L., Zeng, L., et al.: Fracture identification in reservoirs using well log data by window sliding recurrent neural network. Geoenergy Sci. Eng. 230, 1–13 (2023)

    Article  Google Scholar 

  31. Guo, H., Yuan, S., Wu, X.: Logbert: log anomaly detection via bert. In: Proceedings of International Joint Conference on Neural Networks, 1–8 2021

  32. Zhang, S., Liu, Y., Zhang, X., et al.: Cat: beyond efficient transformer for content-aware anomaly detection in event sequences. In: Proceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, 4541–4550 2022

  33. Lee, Y., Kim, J., Kang, P.: Lanobert: system log anomaly detection based on bert masked language model. Appl. Soft Comput. 146, 1–14 (2023)

    Article  Google Scholar 

  34. Huang, S., Liu, Y., Fung, C., et al.: Improving log-based anomaly detection by pre-training hierarchical transformers. IEEE Trans. Comput. 72(9), 2656–2667 (2023)

    Article  Google Scholar 

  35. Yu, S., He, P., Chen, N., et al.: Brain: log parsing with bidirectional parallel tree. IEEE Trans. Serv. Comput. 16(5), 3224–3237 (2023)

    Article  Google Scholar 

  36. He, P., Zhu, J., Zheng, Z., et al.: Drain: an online log parsing approach with fixed depth tree. In: Proceedings of IEEE International Conference on Web Services, 33–40 2017

  37. Du, M., Li, F.: Spell: streaming parsing of system event logs. In: Proceedings of the 16th International Conference on Data Mining, 859–864 2016

  38. Sedki, I., Hamou-Lhadj, A., Ait-Mohamed, O., et al.: An effective approach for parsing large log files. In: Proceedings of IEEE International Conference on Software Maintenance and Evolution, 1–12 2022

Download references

Funding

This work is supported in part by National Key R&D program of China (Grant No. 2020YFC1523004).

Author information

Authors and Affiliations

Authors

Contributions

P.T. presented the innovation of paper, designed and carried out the experiments, analyzed the result of the experiments. Y.G. contributed to the modification of the manuscript. All authors reviewed the manuscript.

Corresponding author

Correspondence to Yepeng Guan.

Ethics declarations

Conflict of interest

The authors declare that no conflict of interest.

Ethical approval

Not applicable.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Tang, P., Guan, Y. Log anomaly detection based on BERT. SIViP 18, 6431–6441 (2024). https://doi.org/10.1007/s11760-024-03327-6

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11760-024-03327-6

Keywords