skip to main content
10.1145/3625549.3658830acmconferencesArticle/Chapter ViewAbstractPublication PageshpdcConference Proceedingsconference-collections
short-paper

Semantic-Aware Log Understanding and Analysis

Published: 30 August 2024 Publication History

Abstract

The exponential growth in system complexity and the corresponding surge in log data volume necessitate advanced log analysis techniques for efficient system management and anomaly detection. Traditional log understanding and analysis methods often fail to capture the rich semantic context inherent in log messages, leading to suboptimal monitoring and diagnostic capabilities. This paper aims to bridge the semantic gap by integrating cutting-edge semantic technologies into the log analysis pipeline. We leverage natural language processing, information retrieval, and large language models to enrich log data with semantic information, facilitating a deeper understanding of log messages. Our methodology enhances anomaly detection accuracy by utilizing hierarchical contextual information and pre-training technology, and refining log-based QA processes by log retrieval and log reader. Preliminary results demonstrate a significant improvement in identifying and diagnosing system anomalies, as well as in the automated answering log questions. This research not only presents a breakthrough in log data analysis but also sets the stage for future advancements in intelligent system monitoring and proactive fault resolution. Through this semantic-aware approach, we envision a new paradigm in log analysis that transcends traditional machine learning methods, offering a more robust and intuitive understanding of system behaviors and states.

References

[1]
P. Bodík, M. Goldszmidt, A. Fox, D. B. Woodard, and Hans Andersen. 2010. Fingerprinting the datacenter: automated classification of performance crises. In EuroSys '10.
[2]
Felix Burkhardt. 2016. QUARK: Architecture for a Question Answering Machine. Studientexte zur Sprachkommunikation: Elektronische Sprachsignalverarbeitung 2016 (2016), 61--68.
[3]
Michael Burrows, Charles Jerian, Butler Lampson, and Timothy Mann. 1992. On-line data compression in a log-structured file system. ACM SIGPLAN Notices 27, 9 (1992), 2--9.
[4]
Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2018. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805 (2018).
[5]
M. Du, Feifei Li, Guineng Zheng, and V. Srikumar. 2017. DeepLog: Anomaly Detection and Diagnosis from System Logs through Deep Learning. Proceedings of the 2017 ACM SIGSAC Conference on Computer and Communications Security (2017).
[6]
Yoav Goldberg and Omer Levy. 2014. word2vec Explained: deriving Mikolov et al.'s negative-sampling word-embedding method. arXiv preprint arXiv:1402.3722 (2014).
[7]
Haixuan Guo, Shuhan Yuan, and Xintao Wu. 2021. LogBERT: Log Anomaly Detection via BERT. In International Joint Conference on Neural Networks, IJCNN 2021, Shenzhen, China, July 18--22, 2021. IEEE, 1--8.
[8]
Pinjia He, Jieming Zhu, Zibin Zheng, and Michael R Lyu. 2017. Drain: An online log parsing approach with fixed depth tree. In 2017 IEEE International Conference on Web Services (ICWS). IEEE, 33--40.
[9]
Shilin He, Pinjia He, Zhuangbin Chen, Tianyi Yang, Yuxin Su, and Michael R Lyu. 2021. A survey on automated log analysis for reliability engineering. ACM Computing Surveys (CSUR) 54, 6 (2021), 1--37.
[10]
Shilin He, J. Zhu, Pinjia He, and Michael R. Lyu. 2016. Experience Report: System Log Analysis for Anomaly Detection. 2016 IEEE 27th International Symposium on Software Reliability Engineering (ISSRE) (2016), 207--218.
[11]
Shilin He, J. Zhu, Pinjia He, and Michael R. Lyu. 2020. Loghub: A Large Collection of System Log Datasets towards Automated Log Analytics. ArXiv abs/2008.06448 (2020).
[12]
Shaohan Huang, Yi Liu, Carol Fung, Rong He, Yining Zhao, Hailong Yang, and Zhongzhi Luan. 2020. HitAnomaly: Hierarchical Transformers for Anomaly Detection in System Log. IEEE Transactions on Network and Service Management (2020).
[13]
Shaohan Huang, Yi Liu, Carol Fung, Rong He, Yining Zhao, Hailong Yang, and Zhongzhi Luan. 2020. Paddy: An Event Log Parsing Approach using Dynamic Dictionary. In NOMS 2020-2020 IEEE/IFIP Network Operations and Management Symposium. IEEE, 1--8.
[14]
Hyunjae Kim, Jong Moon Ha, Jungho Park, Sunuwe Kim, Keunsu Kim, Beom Chan Jang, Hyunseok Oh, and Byeng D Youn. 2016. Fault log recovery using an incomplete-data-trained FDA classifier for failure diagnosis of engineered systems. International Journal of Prognostics and Health Management 7, 1 (2016).
[15]
Qingwei Lin, Hongyu Zhang, Jian-Guang Lou, Yu Zhang, and Xuewei Chen. 2016. Log clustering based problem identification for online service systems. In 2016 IEEE/ACM 38th International Conference on Software Engineering Companion (ICSE-C). IEEE, 102--111.
[16]
Vanessa Lopez, Victoria Uren, Enrico Motta, and Michele Pasin. 2007. AquaLog: An ontology-driven question answering system for organizational semantic intranets. Journal of Web Semantics 5, 2 (2007), 72--105.
[17]
Siyang Lu, X. Wei, Y. Li, and Liqiang Wang. 2018. Detecting Anomaly in Big Data System Logs Using Convolutional Neural Network. 2018 IEEE 16th Intl Conf on Dependable, Autonomic and Secure Computing, 16th Intl Conf on Pervasive Intelligence and Computing, 4th Intl Conf on Big Data Intelligence and Computing and Cyber Science and Technology Congress(DASC/PiCom/DataCom/CyberSciTech) (2018), 151--158.
[18]
Weibin Meng, Y. Liu, Yichen Zhu, S. Zhang, Dan Pei, Y. Chen, Ruizhi Zhang, Shimin Tao, P. Sun, and R. Zhou. 2019. LogAnomaly: Unsupervised Detection of Sequential and Quantitative Anomalies in Unstructured Logs. In IJCAI.
[19]
A. Oliner and J. Stearley. 2007. What Supercomputers Say: A Study of Five System Logs. 37th Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN'07) (2007), 575--584.
[20]
Przemysław Skibiński and Jakub Swacha. 2007. Fast and efficient log file compression. In CEUR Workshop Proceedings of 11th East-European Conference on Advances in Databases and Information Systems (ADBIS 2007)(to appear).
[21]
Yongmin Tan and Xiaohui Gu. 2010. On predictability of system anomalies in real world. In 2010 IEEE International Symposium on Modeling, Analysis and Simulation of Computer and Telecommunication Systems. IEEE, 133--140.
[22]
Sarmad Tanwir, Sarvesh Prabhu, Michael Hsiao, and Loganathan Lingappan. 2015. Information-theoretic and statistical methods of failure log selection for improved diagnosis. In 2015 IEEE International Test Conference (ITC). IEEE, 1--10.
[23]
Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Łukasz Kaiser, and Illia Polosukhin. 2017. Attention is all you need. In Advances in neural information processing systems. 5998--6008.
[24]
Wei Xu, Ling Huang, Armando Fox, David Patterson, and Michael Jordan. 2009. Largescale system problem detection by mining console logs. Proceedings of SOSP'09 (2009).
[25]
Wei Xu, Ling Huang, Armando Fox, David Patterson, and Michael I Jordan. 2009. Detecting large-scale system problems by mining console logs. In Proceedings of the ACM SIGOPS 22nd symposium on Operating systems principles. 117--132.
[26]
Ke Zhang, Jianwu Xu, Martin Renqiang Min, Guofei Jiang, Konstantinos Pelechrinis, and Hui Zhang. 2016. Automated IT system failure prediction: A deep learning approach. In 2016 IEEE International Conference on Big Data (Big Data). IEEE, 1291--1300.
[27]
Xu Zhang, Yong Xu, Qingwei Lin, Bo Qiao, Hongyu Zhang, Yingnong Dang, Chunyu Xie, Xinsheng Yang, Qian Cheng, Ze Li, et al. 2019. Robust log-based anomaly detection on unstable log data. In Proceedings of the 2019 27th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering. 807--817.
[28]
X. Zhang, Yanchen Xu, Qingwei Lin, Bo Qiao, H. Zhang, Y. Dang, C. Xie, Xinsheng Yang, Qian Cheng, Z. Li, Junjie Chen, Xiaoting He, Randolph Yao, Jian-Guang Lou, Murali Chintalapati, S. Furao, and Dongmei Zhang. 2019. Robust log-based anomaly detection on unstable log data. Proceedings of the 2019 27th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering (2019).
[29]
Ziming Zheng, Zhiling Lan, Byung H Park, and Al Geist. 2009. System log pre-processing to improve failure prediction. In 2009 IEEE/IFIP International Conference on Dependable Systems & Networks. IEEE, 572--577.
[30]
Jieming Zhu, Shilin He, Jinyang Liu, Pinjia He, Qi Xie, Zibin Zheng, and Michael R Lyu. 2019. Tools and benchmarks for automated log parsing. In 2019 IEEE/ACM 41st International Conference on Software Engineering: Software Engineering in Practice (ICSE-SEIP). IEEE, 121--130.

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
HPDC '24: Proceedings of the 33rd International Symposium on High-Performance Parallel and Distributed Computing
June 2024
436 pages
ISBN:9798400704130
DOI:10.1145/3625549
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

In-Cooperation

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 30 August 2024

Check for updates

Author Tags

  1. semantic-aware analysis
  2. log understanding
  3. natural language processing
  4. anomaly detection
  5. log parsing

Qualifiers

  • Short-paper

Conference

HPDC '24
Sponsor:

Acceptance Rates

Overall Acceptance Rate 166 of 966 submissions, 17%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • 0
    Total Citations
  • 89
    Total Downloads
  • Downloads (Last 12 months)89
  • Downloads (Last 6 weeks)15
Reflects downloads up to 01 Mar 2025

Other Metrics

Citations

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media