Abstract
With the increasing complexity of computing clusters and large-scale network systems, anomaly detection based on logs has gained significant attention to identify system issues caused by machine failures or malicious attacks. To capture contextual information and local features in log sequences effectively, BERT (Bidirectional Encoder Representation from Transformers) with separated score attention and dual-branch (SD-BERT), a log anomaly detection method derived from BERT encoder blocks is introduced. SD-BERT employs normal log sequences as the training data and is trained by predicting masked log keys. In SD-BERT, taking into account the characteristics of log anomaly detection tasks, we redesign the scoring mechanism and propose the separated score attention (SSA). This helps enhance the model's attention towards different tokens and positions in a sequence. Since log sequence anomalies are related to partial segments in the sequence, a dual-branch module is designed with an SSA branch and a convolutional branch. The SSA branch is capable of capturing the global context related to the abnormal position, while the convolutional branch helps capture local abnormal details. This dual-branch design enables the model to have a more comprehensive understanding and detection of anomalous behavior in log sequences. A series of comparative experiments are conducted on HDFS, BGL, and Thunderbird datasets. The experimental results demonstrate that SD-BERT exhibits comparable or superior performance in contrast to the compared models, confirming the superiority of SD-BERT in log anomaly detection.






Similar content being viewed by others
Data availability
The HDFS, BGL and Thunderbird datasets used in this paper are publicly available. The datasets can be acquired from the following links. HDFS: https://github.com/logpai/loghub/tree/master/HDFS, BGL: https://github.com/logpai/loghub/tree/master/BGL, ThunderBird: https://github.com/logpai/loghub/tree/master/Thunderbird.
References
Xie, Y., Yang, K.: Domain adaptive log anomaly prediction for hadoop system. IEEE Internet Things J. 9(20), 20778–20787 (2022)
Xu, W., Huang, L., Fox, A., et al.: Detecting large-scale system problems by mining console logs. In: Proceedings of the ACM SIGOPS 22nd Symposium on Operating Systems Principles, 117–132 2009
Oliner, A., Stearley, J.: What supercomputers say: a study of five system logs. In: Proceedings of the 37th annual IEEE/IFIP International Conference on Dependable Systems and Networks, 575–584 2007
Zhu, J., He, S., He, P., et al.: Loghub: a large collection of system log datasets for ai-driven log analytics. In: Proceedings of the 34th International Symposium on Software Reliability Engineering, 355–366 2023
Landauer, M., Onder, S., Skopik, F., et al.: Deep learning for anomaly detection in log data: a survey. Mach. Learn. Appl. 12, 1–21 (2023)
Egersdoerfer, C., Zhang, D., Dai, D.: ClusterLog: clustering Logs for effective log-based anomaly detection. In: Proceedings of IEEE/ACM 12th Workshop on Fault Tolerance for HPC at eXtreme Scale (FTXS), 1–10 2022
Qin, T., Gao, Y., Wei, L., et al.: Potential threats mining methods based on correlation analysis of multi-type logs. IET Netw 7(5), 299–305 (2018)
Lu, S., Wei, X., Li, Y., et al.: Detecting anomaly in big data system logs using convolutional neural network. In: Proceedings of the 16th International Conference on Pervasive Intelligence and Computing, 151–158 2018
Brown, A., Tuor, A., Hutchinson, B., et al.: Recurrent neural network attention mechanisms for interpretable system log anomaly detection. In: Proceedings of the first workshop on machine learning for computing systems, 1–8 2018
Vaswani, A., Shazeer, N., Parmar, N., et al.: Attention is all you need. Adv. Neural. Inf. Process. Syst. 30, 30–45 (2017)
Devlin, J., Chang, M.-W., Lee, K., et al.: Bert: pre-training of deep bidirectional transformers for language understanding. In: Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics, 4171–4186 2019
Cinque, M., Cotroneo, D., Pecchia, A.: Event logs for the analysis of software failures: a rule-based approach. IEEE Trans. Software Eng. 39(6), 806–821 (2012)
Yen, T.-F., Oprea, A., Onarlioglu, K., et al.: Beehive: large-scale log analysis for detecting suspicious activity in enterprise networks. In: Proceedings of the 29th annual Computer Security Applications Conference, 199–208 2013
Bodik, P., Goldszmidt, M., Fox, A., et al. Fingerprinting the datacenter: automated classification of performance crises. In: Proceedings of the 5th European Conference on Computer Systems, 111–124 2010
Malek, Z.S., Trivedi, B., Shah, A.: User behavior-based intrusion detection using statistical techniques. In: Proceedings of Advanced Informatics for Computing Research: Second International Conference, 480–489 2019
Chen, M., Zheng, A.X., Lloyd, J., et al. Failure diagnosis using decision trees. In: Proceedings of the International Conference on Autonomic Computing, 36–43 2004
Pasha, D., Shah, A.H., Zadeh, E.H., et al.: Anomaly detection and root cause analysis on log data. In: Proceedings of International Conference on Innovative Techniques and Applications of Artificial Intelligence, 333–339 2022
Lin, Q., Zhang, H., Lou, J.-G., et al.: Log clustering based problem identification for online service systems. In: Proceedings of the 38th International Conference on Software Engineering Companion, 102–111 2016
Cheng, H., Xu, D., Yuan, S.: Explainable sequential anomaly detection via prototypes. In: Proceedings of International Joint Conference on Neural Networks, 1–8 2023
Siwach, M., Mann, S.: Anomaly detection for weblog data analysis using weighted PCA technique. J. Inf. Optim. Sci. 43(1), 131–141 (2022)
Sinha, R., Sur, R., Sharma, R., et al.: Anomaly detection using system logs: a deep learning approach. Int. J. Inf. Secur. Priv. 16(1), 1–15 (2022)
Wang, Z., Tian, J., Fang, H., et al.: LightLog: a lightweight temporal convolutional network for log anomaly detection on the edge. Comput. Netw. 203, 108616 (2022)
Zhang, L., Li, W., Zhang, Z., et al.: LogAttn: ansupervised log anomaly detection with an AutoEncoder based attention mechanism. In: Proceedings of International Conference on Knowledge Science, Engineering and Management, 222–235 2021
Du, M., Li, F., Zheng, G., et al.: Deeplog: anomaly detection and diagnosis from system logs through deep learning. In: Proceedings of the 2017 ACM SIGSAC Conference on Computer and Communications Security, 1285–1298 2017
Zhang, X., Xu, Y., Lin, Q., et al.: Robust log-based anomaly detection on unstable log data. In: Proceedings of the 27th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering, 807–817 2019
Meng, W., Liu, Y., Zhu, Y., et al.: Loganomaly: unsupervised detection of sequential and quantitative anomalies in unstructured logs. In: Proceedings of International Joint Conference on Artificial Intelligence, 4739–4745 2019
Hu, C., Sun, X., Dai, H., et al.: Research on log anomaly detection based on sentence-BERT. Electronics 12(17), 3580–3596 (2023)
Syngal, S., Verma, S., Karthik, K., et al.: Server-Language processing: a semi-supervised approach to server failure detection. In: Proceedings of the 2nd International Conference on Computing, Networks and Internet of Things, 1–7 2021
Li, X., Chen, P., Jing, L., et al.: SwissLog: robust anomaly detection and localization for interleaved unstructured logs. IEEE Trans. Dependable Secure Comput. 20(4), 2762–2780 (2022)
Dong, S., Wang, L., Zeng, L., et al.: Fracture identification in reservoirs using well log data by window sliding recurrent neural network. Geoenergy Sci. Eng. 230, 1–13 (2023)
Guo, H., Yuan, S., Wu, X.: Logbert: log anomaly detection via bert. In: Proceedings of International Joint Conference on Neural Networks, 1–8 2021
Zhang, S., Liu, Y., Zhang, X., et al.: Cat: beyond efficient transformer for content-aware anomaly detection in event sequences. In: Proceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, 4541–4550 2022
Lee, Y., Kim, J., Kang, P.: Lanobert: system log anomaly detection based on bert masked language model. Appl. Soft Comput. 146, 1–14 (2023)
Huang, S., Liu, Y., Fung, C., et al.: Improving log-based anomaly detection by pre-training hierarchical transformers. IEEE Trans. Comput. 72(9), 2656–2667 (2023)
Yu, S., He, P., Chen, N., et al.: Brain: log parsing with bidirectional parallel tree. IEEE Trans. Serv. Comput. 16(5), 3224–3237 (2023)
He, P., Zhu, J., Zheng, Z., et al.: Drain: an online log parsing approach with fixed depth tree. In: Proceedings of IEEE International Conference on Web Services, 33–40 2017
Du, M., Li, F.: Spell: streaming parsing of system event logs. In: Proceedings of the 16th International Conference on Data Mining, 859–864 2016
Sedki, I., Hamou-Lhadj, A., Ait-Mohamed, O., et al.: An effective approach for parsing large log files. In: Proceedings of IEEE International Conference on Software Maintenance and Evolution, 1–12 2022
Funding
This work is supported in part by National Key R&D program of China (Grant No. 2020YFC1523004).
Author information
Authors and Affiliations
Contributions
P.T. presented the innovation of paper, designed and carried out the experiments, analyzed the result of the experiments. Y.G. contributed to the modification of the manuscript. All authors reviewed the manuscript.
Corresponding author
Ethics declarations
Conflict of interest
The authors declare that no conflict of interest.
Ethical approval
Not applicable.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Tang, P., Guan, Y. Log anomaly detection based on BERT. SIViP 18, 6431–6441 (2024). https://doi.org/10.1007/s11760-024-03327-6
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11760-024-03327-6