A Confidence-Guided Evaluation for Log Parsers Inner Quality

Xie, Xueshuo; Wang, Zhi; Xiao, Xuhang; Lu, Ye; Huang, Shenwei; Li, Tao

doi:10.1007/s11036-019-01501-6

A Confidence-Guided Evaluation for Log Parsers Inner Quality

Published: 14 January 2020

Volume 26, pages 1638–1649, (2021)
Cite this article

Mobile Networks and Applications Aims and scope Submit manuscript

Xueshuo Xie¹,
Zhi Wang ORCID: orcid.org/0000-0002-3252-9254²,
Xuhang Xiao²,
Ye Lu²,
Shenwei Huang¹ &
…
Tao Li²

422 Accesses
3 Citations
3 Altmetric
Explore all metrics

Abstract

Logs faithfully record application behaviors and system states. Log parsing converts unstructured log messages into structured event templates by extracting the constant portion of raw logs. Log parsing is a prerequisite for further log analysis such as usage analysis, anomaly detection, performance modeling, and failure diagnosis. When processing logs with varied length, log parsing suffers from accuracy decreasing or the over-fitting problem. In addition, traditional probability-based accuracy assessment methods are ineffective in assessing log parsing inner quality, especially in understanding the reason about accuracy declining caused by varied length logs. In this paper we present a p_value-guided inner quality assessment on multiple log parsing algorithms. This method uses conformal evaluation to gain a deep insight of log parser quality. In this method, we choose the string edit distance algorithm as underlying non-conformity measure for conformal evaluation. We introduce two quality indicators to evaluate log parsers: credibility and confidence. The credibility reflects how conformal a log message to a event template generated by a log parser whereas the confidence reflects how non-conformal this log message to all other event templates. In order to demonstrate the inherent difference among different log parsers, we display the distribution of credibility and confidence of each prediction on tSNE 2D space. In the experiment, we evaluate 13 log parsers on different datasets. The results show that our approach could effectively demonstrate the inherent quality of log parsers and recognize variable-length problem compared to traditional confusion matrix based metrics.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

An Efficient Log Parsing Algorithm Based on Heuristic Rules

ML-Parser: An Efficient and Accurate Online Log Parser

Article 30 November 2022

Yu-Qian Zhu, Jia-Ying Deng, … Wei Wang

Self-supervised Log Parsing

References

Zhu J, He P, Fu Q, Zhang H, Lyu MR, Zhang D (2015) Learning to log: helping developers make informed logging decisions. In: Proceedings of the 37th international conference on software engineering, vol 1. IEEE Press, pp 415–425
Yuan D, Park S, Zhou Y (2012) Characterizing logging practices in open-source software. In: Proceedings of the 34th international conference on software engineering. IEEE Press, pp 102–112
Mi H, Wang H, Zhou Y, Lyu MR-T, Cai H (2013) Toward fine-grained, unsupervised, scalable performance diagnosis for production cloud computing systems. IEEE Trans Parallel Distrib Syst 24(6):1245–1255
Article Google Scholar
Oliner A, Ganapathi A, Xu W (2012) Advances and challenges in log analysis. Commun ACM 55 (2):55–61
Article Google Scholar
Xu W, Huang L, Fox A, Patterson D, Jordan MI (2009) Detecting large-scale system problems by mining console logs. In: Proceedings of the ACM SIGOPS 22nd symposium on operating systems principles. ACM, pp 117–132
Oprea A, Li Z, Yen T-F, Chin SH, Alrwais S (2015) Detection of early-stage enterprise infection by mining large-scale log data. In: 2015 45th Annual IEEE/IFIP international conference on dependable systems and networks (DSN). IEEE, pp 45–56
Makanju A, Zincir-Heywood AN, Milios EE (2010) Fast entropy based alert detection in super computer logs. In: 2010 International conference on dependable systems and networks workshops (DSN-W). IEEE, pp 52–58
Sangaiah AK, Medhane DV, Han T, Hossain MS, Muhammad G (2019) Enforcing position-based confidentiality with machine learning paradigm through mobile edge computing in real-time industrial informatics. IEEE Transactions on Industrial Informatics
Lee G, Lin J, Liu C, Lorek A, Ryaboy D (2012) The unified logging infrastructure for data analytics at twitter. Proce VLDB Endowm 5(12):1771–1780
Article Google Scholar
He S, Zhu J, He P, Lyu MR (2016) Experience report: system log analysis for anomaly detection. In: 2016 IEEE 27th International symposium on software reliability engineering (ISSRE). IEEE, pp 207–218
Bertero C, Roy M, Sauvanaud C, Trédan G (2017) Experience report: log mining using natural language processing and application to anomaly detection. In: 2017 IEEE 28th International symposium on software reliability engineering (ISSRE). IEEE, pp 351–360
Fu Q, Lou J-G, Wang Y, Li J (2009) Execution anomaly detection in distributed systems through unstructured log analysis. In: Ninth IEEE International conference on data mining, 2009. ICDM’09. IEEE, pp 149–158
Chow M, Meisner D, Flinn J, Peek D, Wenisch TF (2014) The mystery machine: end-to-end performance analysis of large-scale internet services. In: OSDI, pp 217–231
Nagaraj K, Killian C, Neville J (2012) Structured comparative analysis of systems logs to diagnose performance problems. In: Proceedings of the 9th USENIX conference on networked systems design and implementation. USENIX Association, pp 26–26
Yuan D, Mai H, Xiong W, Tan L, Zhou Y, Pasupathy S (2010) Sherlog: error diagnosis by connecting clues from run-time logs. In: ACM SIGARCH computer architecture news, vol 38, no. 1. ACM, pp 143–154
Xu X, Zhu L, Weber I, Bass L, Sun D (2014) Pod-diagnosis: error diagnosis of sporadic operations on cloud applications. In: 2014 44th Annual IEEE/IFIP international conference on dependable systems and networks (DSN). IEEE, pp 252–263
Wong WE, Debroy V, Golden R, Xu X, Thuraisingham B (2012) Effective software fault localization using an rbf neural network. IEEE Trans Reliab 61(1):149–169
Article Google Scholar
Lin Q, Zhang H, Lou J-G, Zhang Y, Chen X (2016) Log clustering based problem identification for online service systems. In: Proceedings of the 38th international conference on software engineering companion. ACM, pp 102–111
Du M, Li F, Zheng G, Srikumar V (2017) Deeplog: anomaly detection and diagnosis from system logs through deep learning. In: Proceedings of the 2017 ACM SIGSAC conference on computer and communications security. ACM, pp 1285– 1298
Vaarandi R (2003) A data clustering algorithm for mining patterns from event logs. In: 3rd IEEE Workshop on IP operations & management, 2003.(IPOM 2003). IEEE, pp 119–126
Nagappan M, Vouk MA (2010) Abstracting log lines to log event types for mining software system logs. In: 2010 7th IEEE Working conference on mining software repositories (MSR). IEEE, pp 114–117
Vaarandi R, Pihelgas M (2015) Logcluster-a data clustering and pattern mining algorithm for event logs. In: 2015 11th International conference on network and service management (CNSM). IEEE, pp 1–7
Tang L, Li T, Perng C-S (2011) Logsig: generating system events from raw textual logs. In: Proceedings of the 20th ACM international conference on information and knowledge management. ACM, pp 785–794
Hamooni H, Debnath B, Xu J, Zhang H, Jiang G, Mueen A (2016) Logmine: fast pattern recognition for log analytics. In: Proceedings of the 25th ACM international on conference on information and knowledge management. ACM, pp 1573– 1582
Mizutani M (2013) Incremental mining of system log format. In: 2013 IEEE International conference on services computing (SCC). IEEE, pp 595–602
Shima K (2016) Length matters: clustering system log messages using length of words. arXiv:1611.03213
Jiang ZM, Hassan AE, Flora P, Hamann G (2008) Abstracting execution logs to execution events for enterprise applications (short paper). In: The Eighth international conference on quality software, 2008. QSIC’08. IEEE, pp 181–186
Jiang ZM, Hassan AE, Hamann G, Flora P (2008) An automated approach for abstracting execution logs to execution events. J Softw Maint Evol Res Pract 20(4):249–267
Article Google Scholar
Makanju A, Zincir-Heywood AN, Milios EE (2012) A lightweight algorithm for message type extraction in system application logs. IEEE Trans Knowl Data Eng 24(11):1921–1936
Article Google Scholar
Min D, Li F (2017) Spell: streaming parsing of system event logs. In: IEEE International conference on data mining
He P, Zhu J, Zheng Z, Lyu MR (2017) Drain: An online log parsing approach with fixed depth tree. In: 2017 IEEE International conference on web services (ICWS). IEEE, pp 33–40
He P, Zhu J, Xu P, Zheng Z, Lyu MR (2018) A directed acyclic graph approach to online log parsing. arXiv:1806.04356
He P, Zhu J, He S, Li J, Lyu MR (2018) Towards automated log parsing for large-scale log data analysis. IEEE Trans Depend Secur Comput 15(6):931–944
Article Google Scholar
Messaoudi S, Panichella A, Bianculli D, Briand L, Sasnauskas R (2018) A search-based approach for accurate identification of log message formats. In: Proceedings of the 26th IEEE/ACM international conference on program comprehension (ICPC’18). ACM
He P, Zhu J, He S, Li J, Lyu MR (2016) An evaluation study on log parsing and its use in log mining. In: 2016 46th Annual IEEE/IFIP international conference on dependable systems and networks (DSN). IEEE, pp 654–661
Zhu J, He S, Liu J, He P, Xie Q, Zheng Z, Lyu MR (2018) Tools and benchmarks for automated log parsing, arXiv:1811.03509
Shafer G, Vovk V (2008) A tutorial on conformal prediction. J Mach Learn Res 9:371–421
MathSciNet MATH Google Scholar
Papadopoulos H, Vovk V, Gammermam A (2007) Conformal prediction with neural networks. In: Proceedings of the 19th IEEE international conference on tools with artificial intelligence - volume 02, ser. ICTAI ’07. [Online]. Available: https://doi.org/10.1109/ICTAI.2007.77. IEEE Computer Society, Washington, DC, pp 388–395
Papadopoulos H (2008) ch. Inductive conformal prediction: theory and application to neural networks
Jordaney R, Sharad K, Dash SK, Wang Z, Papini D, Nouretdinov I, Cavallaro L (2017) Transcend: Detecting concept drift in malware classification models. In: proceedings OF the 26th USENIX security symposium (USENIX SECURITY’17). USENIX Association, pp 625–642
Pendlebury F, Pierazzi F, Jordaney R, Kinder J, Cavallaro L (2019) Tesseract: eliminating experimental bias in malware classification across space and time. In: Proceedings of the USENIX security symposium USENIX
Maaten Lvd, Hinton G (2008) Visualizing data using t-sne. J Mach Learn Res 9:2579–2605
MATH Google Scholar
He P, Zhu J, He S, Li J (2015) Log parser. https://github.com/logpai/logp-arser/tree/dev
Fedorova V, Gammerman A, Nouretdinov I, Vovk V (2012) Plug-in martingales for testing exchangeability on-line. arXiv:1204.3251
Ho S-S (2005) A martingale framework for concept change detection in time-varying data streams. In: Proceedings of the 22nd international conference on machine learning. ACM, pp 321–327
He P (2017) An end-to-end log management framework for distributed systems. In: 2017 IEEE 36th symposium on reliable distributed systems (SRDS). IEEE, pp 266–267
Beschastnikh I, Brun Y, Schneider S, Sloan M, Ernst MD (2011) Leveraging existing instrumentation to automatically infer invariant-constrained models. In: Proceedings of the 19th ACM SIGSOFT symposium and the 13th European conference on foundations of software engineering. ACM, pp 267–277
Shang W, Jiang ZM, Hemmati H, Adams B, Hassan AE, Martin P (2013) Assisting developers of big data analytics applications when deploying on hadoop clouds. In: 2013 35th International conference on software engineering (ICSE). IEEE, pp 402–411

Download references

Acknowledgments

This work is partially supported by the National Natural Science Foundation (61872200, 61872202, 11801284), the National Key Research and Development Program of China (2016YFC0400709, 2018YFB2100300), the Natural Science Foundation of Tianjin (18YFYZCG00060), the CERNET Innovation Project(NGII20180401).

Author information

Authors and Affiliations

College of Computer Science, Nankai University, Tianjin, 300350, China
Xueshuo Xie & Shenwei Huang
College of Cyber Science, Nankai University, Tianjin, 300350, China
Zhi Wang, Xuhang Xiao, Ye Lu & Tao Li

Authors

Xueshuo Xie
View author publications
You can also search for this author in PubMed Google Scholar
Zhi Wang
View author publications
You can also search for this author in PubMed Google Scholar
Xuhang Xiao
View author publications
You can also search for this author in PubMed Google Scholar
Ye Lu
View author publications
You can also search for this author in PubMed Google Scholar
Shenwei Huang
View author publications
You can also search for this author in PubMed Google Scholar
Tao Li
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Zhi Wang.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Xie, X., Wang, Z., Xiao, X. et al. A Confidence-Guided Evaluation for Log Parsers Inner Quality. Mobile Netw Appl 26, 1638–1649 (2021). https://doi.org/10.1007/s11036-019-01501-6

Download citation

Published: 14 January 2020
Issue Date: August 2021
DOI: https://doi.org/10.1007/s11036-019-01501-6

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A Confidence-Guided Evaluation for Log Parsers Inner Quality

Abstract

Access this article

Similar content being viewed by others

An Efficient Log Parsing Algorithm Based on Heuristic Rules

ML-Parser: An Efficient and Accurate Online Log Parser

Self-supervised Log Parsing

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

A Confidence-Guided Evaluation for Log Parsers Inner Quality

Abstract

Access this article

Similar content being viewed by others

An Efficient Log Parsing Algorithm Based on Heuristic Rules

ML-Parser: An Efficient and Accurate Online Log Parser

Self-supervised Log Parsing

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation