Detecting log anomaly using subword attention encoder and probabilistic feature selection

Hariharan, M.; Mishra, Abhinesh; Ravi, Sriram; Sharma, Ankita; Tanwar, Anshul; Sundaresan, Krishna; Ganesan, Prasanna; Karthik, R.

doi:10.1007/s10489-023-04674-6

Detecting log anomaly using subword attention encoder and probabilistic feature selection

Published: 26 June 2023

Volume 53, pages 22297–22312, (2023)
Cite this article

Applied Intelligence Aims and scope Submit manuscript

M. Hariharan ORCID: orcid.org/0000-0002-9382-568X¹,
Abhinesh Mishra¹,
Sriram Ravi¹,
Ankita Sharma¹,
Anshul Tanwar¹,
Krishna Sundaresan¹,
Prasanna Ganesan¹ &
…
R. Karthik²

186 Accesses
Explore all metrics

Abstract

Log anomaly is a manifestation of a software system error or security threat. Detecting such unusual behaviours across logs in real-time is the driving force behind large-scale autonomous monitoring technology that can rapidly alert zero-day attacks. Increasingly, AI methods are being used to process voluminous log datasets and reveal patterns of correlated anomaly. In this paper, we propose an enhanced approach to learning semantic-aware embeddings for logs called the Subword Encoder Neural network (SEN). Solving upon a key limitation of previous semantic log parsing works, the proposed work introduces the concept of learning word vectors from subword-level granularity using an attention encoder strategy. The learnt embeddings reflect the contextual/lexical relationships at the word level. As a result, the learnt word representations precisely capture new log messages previously not seen by the model. Furthermore, we develop a novel feature distillation algorithm termed Naive Bayes Feature Selector (NBFS) to extract useful log events. This probabilistic technique examines the occurrence pattern of events to only select the salient ones that can aid anomaly detection. To our best knowledge, this is the first attempt to associate affinity to log events based on the target task. Since the predictions can be traced to the log messages, the AI is inherently explainable too. The model outperforms state-of-the-art methods by a fair margin. It achieves a 0.99 detection F1-score on the benchmarked BGL, HDFS and OpenStack log datasets.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Sprelog: Log-Based Anomaly Detection with Self-matching Networks and Pre-trained Models

Log-Based Anomaly Detection with Multi-Head Scaled Dot-Product Attention Mechanism

LogAttn: Unsupervised Log Anomaly Detection with an AutoEncoder Based Attention Mechanism

Data availability

The syslog data that support the findings of this study are available in the LogHub public repository with the identifier(s). https://doi.org/10.48550/arXiv.2008.06448

References

Google Cloud Fixes Outage That Hit Home Depot, Snap, Spotify. https://www.bloomberg.com/news/articles/2021-11-16/home-depot-amazon-web-services-websites-reportedly-see-outages (Accessed 28 June 2022)
Amazon Web Services’ third outage in a month exposes a weak point in the Internet’s backbone. https://www.washingtonpost.com/business/2021/12/22/amazon-web-services-experiences-another-big-outage/ (Accessed 28 June 2022)
Lin Q, Zhang H, Lou JG, Zhang Y, Chen X (2016) Log clustering based problem identification for online service systems. In: Proceedings of the 38th International Conference on Software Engineering Companion, pp 102–111
Google Scholar
Zhou P, Wang Y, Li Z, Wang X, Tyson G, Xie G (2020) Logsayer: Log pattern-driven cloud component anomaly diagnosis with machine learning. In: 2020 IEEE/ACM 28th International Symposium on Quality of Service (IWQoS). IEEE, pp 1–10
Google Scholar
Yin K et al (2020) Improving Log-Based Anomaly Detection with Component-Aware Analysis. IEEE Int Conf Softw Maint Evol (ICSME) 2020:667–671. https://doi.org/10.1109/ICSME46990.2020.00069
Article Google Scholar
Lu S, Wei X, Li Y, Wang L (2018) Detecting anomaly in big data system logs using convolutional neural network. In: 2018 IEEE 16th Intl Conf on Dependable, Autonomic and Secure Computing, 16th Intl Conf on Pervasive Intelligence and Computing, 4th Intl Conf on Big Data Intelligence and Computing and Cyber Science and Technology Congress (DASC/PiCom/DataCom/CyberSciTech). IEEE, pp 151–158
Google Scholar
Meng W, Liu Y, Huang Y, Zhang S, Zaiter F, Chen B, Pei D (2020) A semantic-aware representation framework for online log analysis. In: In 2020 29th International Conference on Computer Communications and Networks (ICCCN). IEEE, pp 1–7
Google Scholar
Du M, Li F, Zheng G, Srikumar V (2017) Deeplog: Anomaly detection and diagnosis from system logs through deep learning. In: Proceedings of the 2017 ACM SIGSAC conference on computer and communications security, pp 1285–1298
Chapter Google Scholar
Meng W, Liu Y, Zhu Y, Zhang S, Pei D, Liu Y et al (2019) LogAnomaly: Unsupervised detection of sequential and quantitative anomalies in unstructured logs. IJCAI 19(7):4739–4745
Google Scholar
Chen Y, Luktarhan N, Lv D (2022) LogLS: Research on System Log Anomaly Detection Method Based on Dual LSTM. In Symmetry. MDPI AG 14(3):454. https://doi.org/10.3390/sym14030454
Article Google Scholar
Lv D, Luktarhan N, Chen Y (2021) ConAnomaly: Content-Based Anomaly Detection for System Logs. In Sensors. MDPI AG 21(18):6125. https://doi.org/10.3390/s21186125
Article Google Scholar
Yang R, Qu D, Gao Y, Qian Y, Tang Y (2019) nLSALog: An Anomaly Detection Framework for Log Sequence in Security Management. In IEEE Access. Ins Electr Electron Eng (IEEE) 7:181152–181164. https://doi.org/10.1109/access.2019.2953981
Article Google Scholar
Li X, Chen P, Jing L, He Z, Yu G (2020) Swisslog: Robust and unified deep learning based log anomaly detection for diverse faults. In: 2020 IEEE 31st International Symposium on Software Reliability Engineering (ISSRE). IEEE, pp 92–103
Chapter Google Scholar
Li X, Chen P, Jing L, He Z, Yu G (2022) SwissLog: Robust anomaly detection and localization for interleaved unstructured logs. IEEE Transactions on Dependable and Secure Computing
Lee Y, Kim J, Kang P (2021) LAnoBERT: System log anomaly detection based on BERT masked language model. arXiv preprint arXiv:2111.09564
Wang Q, Zhang X, Wang X, Cao Z (2021) Log Sequence Anomaly Detection Method Based on Contrastive Adversarial Training and Dual Feature Extraction. In Entropy. MDPI AG 24(1):69. https://doi.org/10.3390/e24010069
Article Google Scholar
Guo H, Yuan S, Wu X (2021) LogBERT: Log Anomaly Detection via BERT. Int Joint Conf Neural Net (IJCNN) 2021:1–8. https://doi.org/10.1109/IJCNN52387.2021.9534113
Article Google Scholar
Hashemi S, Mäntylä M (2021) OneLog: Towards end-to-end training in software log anomaly detection. arXiv preprint arXiv:2104.07324
Niwa T, Kasuya Y, Kitahara T (2017) Anomaly detection for openstack services with process-related topological analysis. In: 2017 13th International Conference on Network and Service Management (CNSM). IEEE, pp 1–5
Google Scholar
Zeufack V, Kim D, Seo D, Lee A (2021) An unsupervised anomaly detection framework for detecting anomalies in real time through network system’s log files analysis. In High-Confidence Computing. Elsevier BV 1(2):100030. https://doi.org/10.1016/j.hcc.2021.100030
Article Google Scholar
Chakraborty B, Divakaran DM, Nevat I, Peters GW, Gurusamy M (2021) Cost-Aware Feature Selection for IoT Device Classification. In IEEE Internet of Things Journal. Inst Electr Electron Eng (IEEE) 8(14):11052–11064. https://doi.org/10.1109/jiot.2021.3051480
Article Google Scholar
Bommert A, Sun X, Bischl B, Rahnenführer J, Lang M (2020) Benchmark for filter methods for feature selection in high-dimensional classification data. In Computational Statistics & Data Analysis. Elsevier BV 143:106839. https://doi.org/10.1016/j.csda.2019.106839
Article MATH Google Scholar
Iqbal M, Abid MM, Khalid MN, Manzoor A (2020) Review of feature selection methods for text classification. In International Journal of Advanced Computer Research (Vo 10, Issue 49, pp 138–152). Association of Computer, Communication and Education for National Triumph Social and Welfare Society (ACCENTS). https://doi.org/10.19101/ijacr.2020.1048037
Liu Y, Ju S, Wang J, Su C (2020) A New Feature Selection Method for Text Classification Based on Independent Feature Space Search. In Mathematical Problems in Engineering. Hindawi Limited 2020:1–14. https://doi.org/10.1155/2020/6076272
Article Google Scholar
Thabtah F, Kamalov F, Hammoud S, Shahamiri SR (2020) Least Loss: A simplified filter method for feature selection. In Information Sciences. Elsevier BV 534:1–15. https://doi.org/10.1016/j.ins.2020.05.017
Article MATH Google Scholar
Gumilar A, Prasetiyowati SS, Sibaroni Y (2022) Performance analysis of hybrid machine learning methods on imbalanced data (rainfall classification). Jurnal RESTI (Rekayasa Sistem dan Teknologi Informasi) 6(3):481–490
Article Google Scholar
Wang Z, Lin Z (2019) Optimal Feature Selection for Learning-Based Algorithms for Sentiment Classification. In Cognitive Computation (Vol 12, Issue 1, pp 238–248). Springer Science and Business Media LLC. https://doi.org/10.1007/s12559-019-09669-5
Vangara RVB, Thirupathur K, Vangara SP (2020) Opinion Mining Classification using Naive Bayes Algorithm. In International Journal of Innovative Technology and Exploring Engineering (Vol 9, Issue 5, pp 495–498). Blue Eyes Intelligence Engineering and Sciences Engineering and Sciences Publication - BEIESP. https://doi.org/10.35940/ijitee.e2402.039520
ThakkarA, Lohiya R (2020) Attack classification using feature selection techniques: a comparative study. In Journal of Ambient Intelligence and Humanized Computing (Vol 12, Issue 1, pp 1249–1266). Springer Science and Business Media LLC. https://doi.org/10.1007/s12652-020-02167-9
Ismail Z, Jantan A, Yusoff Mohd N, Kiru MU (2020) The effects of feature selection on the classification of encrypted botnet. In Journal of Computer Virology and Hacking Techniques (Vol 17, Issue 1, pp 61–74). Springer Science and Business Media LLC. https://doi.org/10.1007/s11416-020-00367-7
Bird JJ, Ekárt A, Buckingham CD, Faria DR (2019) High resolution sentiment analysis by ensemble classification. In: Intelligent Computing: Proceedings of the 2019 Computing Conference, vol 1. Springer International Publishing, pp 593–606
Chapter Google Scholar
Schroff F, Kalenichenko D, Philbin J (2015) FaceNet: A unified embedding for face recognition and clustering. IEEE Conf Comput Vision Pattern Recog (CVPR) 2015:815–823. https://doi.org/10.1109/CVPR.2015.7298682
Article Google Scholar
Wang Z, Tian J, Fang H, Chen L, Qin J (2022) LightLog: A lightweight temporal convolutional network for log anomaly detection on the edge. In Computer Networks (Vol 203, p 108616). Elsevier BV. https://doi.org/10.1016/j.comnet.2021.108616
Farzad A, Gulliver TA (2020) Unsupervised log message anomaly detection. In ICT Express (Vol 6, Issue 3, pp 229–237). Elsevier BV. https://doi.org/10.1016/j.icte.2020.06.003
Oliner A, Stearley J (2007) What supercomputers say: A study of five system logs. In: 37th annual IEEE/IFIP international conference on dependable systems and networks (DSN'07). IEEE, pp 575–584
Chapter Google Scholar
Xu W, Huang L, Fox A, Patterson D, Jordan MI (2009) Detecting large-scale system problems by mining console logs. In: Proceedings of the ACM SIGOPS 22nd symposium on Operating systems principles, pp 117–132
Chapter Google Scholar

Download references

Author information

Authors and Affiliations

Cisco Systems India Pvt Ltd, Bengaluru, India
M. Hariharan, Abhinesh Mishra, Sriram Ravi, Ankita Sharma, Anshul Tanwar, Krishna Sundaresan & Prasanna Ganesan
Center for Cyber Physical Systems, Vellore Institute of Technology, Chennai, India
R. Karthik

Authors

M. Hariharan
View author publications
You can also search for this author in PubMed Google Scholar
Abhinesh Mishra
View author publications
You can also search for this author in PubMed Google Scholar
Sriram Ravi
View author publications
You can also search for this author in PubMed Google Scholar
Ankita Sharma
View author publications
You can also search for this author in PubMed Google Scholar
Anshul Tanwar
View author publications
You can also search for this author in PubMed Google Scholar
Krishna Sundaresan
View author publications
You can also search for this author in PubMed Google Scholar
Prasanna Ganesan
View author publications
You can also search for this author in PubMed Google Scholar
R. Karthik
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to M. Hariharan.

Ethics declarations

Competing interests

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

The authors declare the following financial interests/personal relationships which may be considered as potential competing interests.

Additional information

Publisher's note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Hariharan, M., Mishra, A., Ravi, S. et al. Detecting log anomaly using subword attention encoder and probabilistic feature selection. Appl Intell 53, 22297–22312 (2023). https://doi.org/10.1007/s10489-023-04674-6

Download citation

Accepted: 26 April 2023
Published: 26 June 2023
Issue Date: October 2023
DOI: https://doi.org/10.1007/s10489-023-04674-6

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Detecting log anomaly using subword attention encoder and probabilistic feature selection

Abstract

Access this article

Similar content being viewed by others

Sprelog: Log-Based Anomaly Detection with Self-matching Networks and Pre-trained Models

Log-Based Anomaly Detection with Multi-Head Scaled Dot-Product Attention Mechanism

LogAttn: Unsupervised Log Anomaly Detection with an AutoEncoder Based Attention Mechanism

Data availability

References

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Competing interests

Additional information

Publisher's note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Detecting log anomaly using subword attention encoder and probabilistic feature selection

Abstract

Access this article

Similar content being viewed by others

Sprelog: Log-Based Anomaly Detection with Self-matching Networks and Pre-trained Models

Log-Based Anomaly Detection with Multi-Head Scaled Dot-Product Attention Mechanism

LogAttn: Unsupervised Log Anomaly Detection with an AutoEncoder Based Attention Mechanism

Data availability

References

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Competing interests

Additional information

Publisher's note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation