skip to main content
10.1145/3427796.3427808acmotherconferencesArticle/Chapter ViewAbstractPublication PagesicdcnConference Proceedingsconference-collections
research-article

LogFlow: Simplified Log Analysis for Large Scale Systems

Published: 05 January 2021 Publication History

Abstract

Distributed infrastructures generate huge amount of logs that can provide useful information about the state of system, but that can be challenging to analyze. The paper presents LogFlow, a tool to help human operators in the analysis of logs by automatically constructing graphs of correlations between log entries. The core of LogFlow is an interpretable predictive model based on a Recurrent Neural Network augmented with a state-of-the-art attention layer from which correlations between log entries are deduced. To be able to deal with huge amount of data, LogFlow also relies on a new log parser algorithm that can be orders of magnitude faster than best existing log parsers. Experiments run with several system logs generated by Supercomputers and Cloud systems show that LogFlow is able to achieve more than 96% of accuracy in most cases.

References

[1]
Jeremy Appleyard. 2016. Optimizing Recurrent Neural Networks in cuDNN 5. https://developer.nvidia.com/blog/optimizing-recurrent-neural-networks-cudnn-5/.
[2]
Yoshua Bengio, Réjean Ducharme, Pascal Vincent, and Christian Jauvin. 2003. A neural probabilistic language model. Journal of machine learning research(2003).
[3]
Andy Brown, Aaron Tuor, Brian Hutchinson, and Nicole Nichols. 2018. Recurrent neural network attention mechanisms for interpretable system log anomaly detection. In Workshop on Machine Learning for Computing Systems.
[4]
Mateusz Buda, Atsuto Maki, and Maciej A Mazurowski. 2018. A systematic study of the class imbalance problem in convolutional neural networks. Neural Networks 106(2018).
[5]
Anwesha Das, Frank Mueller, Paul Hargrove, Eric Roman, and Scott Baden. 2018. Doomsday: Predicting which node will fail when on supercomputers. In SuperComputing’18.
[6]
Anwesha Das, Frank Mueller, Charles Siegel, and Abhinav Vishnu. 2018. Desh: deep learning for system health prediction of lead times to failure in hpc. In Proceedings of the 27th International Symposium on High-Performance Parallel and Distributed Computing. 40–51.
[7]
Biplob Debnath, Mohiuddin Solaimani, Muhammad Ali Gulzar Gulzar, Nipun Arora, Cristian Lumezanu, Jianwu Xu, Bo Zong, Hui Zhang, Guofei Jiang, and Latifur Khan. 2018. LogLens: A real-time log analysis system. In IEEE 38th International Conference on Distributed Computing Systems (ICDCS).
[8]
Min Du, Feifei Li, Guineng Zheng, and Vivek Srikumar. 2017. Deeplog: Anomaly detection and diagnosis from system logs through deep learning. In Proceedings of the 2017 ACM SIGSAC Conference on Computer and Communications Security.
[9]
Xiaoyu Fu, Rui Ren, Sally A McKee, Jianfeng Zhan, and Ninghui Sun. 2014. Digging deeper into cluster system logs for failure prediction and root cause diagnosis. In IEEE International Conference on Cluster Computing.
[10]
Ana Gainaru, Franck Cappello, Joshi Fullop, Stefan Trausan-Matu, and William Kramer. 2011. Adaptive event prediction strategy with dynamic time window for large-scale hpc systems. In Managing Large-scale Systems via the Analysis of System Logs and the Application of Machine Learning Techniques.
[11]
Saurabh Gupta, Tirthak Patel, Christian Engelmann, and Devesh Tiwari. 2017. Failures in large scale systems: long-term measurement, analysis, and implications. In SuperComputing’17.
[12]
Pinjia He, Jieming Zhu, Zibin Zheng, and Michael R Lyu. 2017. Drain: An online log parsing approach with fixed depth tree. In 2017 IEEE International Conference on Web Services (ICWS). IEEE, 33–40.
[13]
Shilin He, Jieming Zhu, Pinjia He, and Michael R. Lyu. 2020. Loghub: A Large Collection of System Log Datasets towards Automated Log Analytics. arxiv:2008.06448
[14]
Sepp Hochreiter and Jürgen Schmidhuber. 1997. Long Short-Term Memory. Neural Computation 9, 8 (1997).
[15]
David Jauk, Dai Yang, and Martin Schulz. 2019. Predicting faults in high performance computing systems: An in-depth survey of the state-of-the-practice. In SuperComputing’19.
[16]
Minh-Thang Luong, Hieu Pham, and Christopher D. Manning. 2015. Effective approaches to attention-based neural machine translation. arxiv:1508.04025
[17]
Adetokunbo AO Makanju, A Nur Zincir-Heywood, and Evangelos E Milios. 2009. Clustering event logs using iterative partitioning. In Proceedings of the 15th ACM international conference on Knowledge discovery and data mining. 1255–1264.
[18]
Weibin Meng, Ying Liu, Yichen Zhu, Shenglin Zhang, Dan Pei, Yuqing Liu, Yihao Chen, Ruizhi Zhang, Shimin Tao, Pei Sun, 2019. Loganomaly: Unsupervised detection of sequential and quantitative anomalies in unstructured logs. In International Joint Conference on Artificial Intelligence.
[19]
Tomas Mikolov, Kai Chen, Greg Corrado, and Jeffrey Dean. 2013. Efficient estimation of word representations in vector space. arXiv:1301.3781 (2013).
[20]
Adam Oliner and Jon Stearley. 2007. What supercomputers say: A study of five system logs. In 37th Annual IEEE/IFIP International Conference on Dependable Systems and Networks. IEEE, 575–584.
[21]
Wenjie Pei, Tadas Baltrusaitis, David MJ Tax, and Louis-Philippe Morency. 2017. Temporal attention-gated model for robust sequence classification. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 6730–6739.
[22]
Guosai Wang, Lifei Zhang, and Wei Xu. 2017. What can we learn from four years of data center hardware failures?. In 47th Annual IEEE/IFIP International Conference on Dependable Systems and Networks. IEEE, 25–36.
[23]
Shoujin Wang, Wei Liu, Jia Wu, Longbing Cao, Qinxue Meng, and Paul J Kennedy. 2016. Training deep neural networks on imbalanced data sets. In 2016 international joint conference on neural networks. IEEE, 4368–4374.
[24]
Wei Xu, Ling Huang, Armando Fox, David Patterson, and Michael I Jordan. 2009. Detecting large-scale system problems by mining console logs. In Proceedings of the ACM SIGOPS 22nd symposium on Operating systems principles. 117–132.
[25]
Min-Ling Zhang and Zhi-Hua Zhou. 2013. A review on multi-label learning algorithms. IEEE transactions on knowledge and data engineering 26, 8(2013).
[26]
Ziming Zheng, Li Yu, Zhiling Lan, and Terry Jones. 2012. 3-dimensional root cause diagnosis via co-analysis. In Proceedings of the 9th international conference on Autonomic computing. ACM, 181–190.
[27]
Jieming Zhu, Shilin He, Jinyang Liu, Pinjia He, Qi Xie, Zibin Zheng, and Michael R Lyu. 2019. Tools and benchmarks for automated log parsing. In 2019 IEEE/ACM 41st International Conference on Software Engineering: Software Engineering in Practice. IEEE, 121–130.

Cited By

View all
  • (2024)Landscape and Taxonomy of Online Parser-Supported Log Anomaly Detection MethodsIEEE Access10.1109/ACCESS.2024.338728712(78193-78218)Online publication date: 2024
  • (2024)A literature review and existing challenges on software logging practicesEmpirical Software Engineering10.1007/s10664-024-10452-w29:4Online publication date: 18-Jun-2024
  • (2023)Knowledge Extraction and Discovery about Web System Based on the Benchmark Application of Online Stock Trading SystemSensors10.3390/s2304227423:4(2274)Online publication date: 17-Feb-2023
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Other conferences
ICDCN '21: Proceedings of the 22nd International Conference on Distributed Computing and Networking
January 2021
252 pages
ISBN:9781450389334
DOI:10.1145/3427796
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 05 January 2021

Permissions

Request permissions for this article.

Check for updates

Qualifiers

  • Research-article
  • Research
  • Refereed limited

Conference

ICDCN '21

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)16
  • Downloads (Last 6 weeks)0
Reflects downloads up to 19 Feb 2025

Other Metrics

Citations

Cited By

View all
  • (2024)Landscape and Taxonomy of Online Parser-Supported Log Anomaly Detection MethodsIEEE Access10.1109/ACCESS.2024.338728712(78193-78218)Online publication date: 2024
  • (2024)A literature review and existing challenges on software logging practicesEmpirical Software Engineering10.1007/s10664-024-10452-w29:4Online publication date: 18-Jun-2024
  • (2023)Knowledge Extraction and Discovery about Web System Based on the Benchmark Application of Online Stock Trading SystemSensors10.3390/s2304227423:4(2274)Online publication date: 17-Feb-2023
  • (2023)Failure Detection Using Semantic Analysis and Attention-Based Classifier Model for IT Infrastructure Log DataIEEE Access10.1109/ACCESS.2023.331943811(108178-108197)Online publication date: 2023
  • (2023)Improving Classification-Based Log Analysis Using Vectorization TechniquesProceedings of Third International Conference on Advances in Computer Engineering and Communication Systems10.1007/978-981-19-9228-5_24(271-282)Online publication date: 18-Mar-2023

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format.

HTML Format

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media