Skip to main content
Log in

Online log parsing using evolving research tree

  • Regular Paper
  • Published:
Knowledge and Information Systems Aims and scope Submit manuscript

Abstract

Logs are a reliable source of information for development and maintenance purposes. They record information at runtime regarding the state of a system and are commonly used to analyze its behavior. Parsing operations on logs structure the information embedded within the log message and are a crucial step for many log mining applications. In such use cases, parsing effectiveness can impact performance. For systems that require real-time performance, parsing efficiency is also an important factor. In this paper, we present USTEP, an online log parser that uses an evolving tree structure to encode and discover new parsing rules on the fly. Our evaluation of 14 datasets from different logging environments highlights the superiority of our method in terms of robustness and effectiveness compared to the state of the art. Our analysis of space and time complexity shows that USTEP is the only considered method capable of processing logs in constant time regardless of their length. We also propose here USTEP-UP, a way of running multiple USTEP instances in parallel.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Algorithm 1
Algorithm 2
Algorithm 3
Algorithm 4
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Algorithm 5
Fig. 10

Similar content being viewed by others

Notes

  1. https://www.splunk.com/.

  2. https://www.elastic.co/fr/whatis/elkstack/.

  3. https://github.com/outscale-dev/ustep-online-log-parser.

References

  1. Vervaet A, Chiky, R, Callau-Zori M (2021) Ustep: unfixed search tree for efficient log parsing. In: 2021 IEEE international conference on data mining (ICDM)

  2. Armbrust M, Fox A, Griffith R, Joseph AD, Katz R, Konwinski A, Lee G, Patterson D, Rabkin A, Stoica I et al (2010) A view of cloud computing. Commun ACM 53(4):50–58

    Article  Google Scholar 

  3. Gartner (2021) 3 Cloud disciplines to fuel digital innovation. https://www.gartner.com/smarterwithgartner/3-cloud-disciplines-to-fuel-digital-innovation

  4. Varghese B, Buyya R (2018) Next generation cloud computing: new trends and research directions. Futur Gener Comput Syst 79:849–861

    Article  Google Scholar 

  5. He S, He P, Chen Z, Yang T, Su Y, Lyu MR (2021) A survey on automated log analysis for reliability engineering. ACM Comput Surv (CSUR) 54(6):1–37

    Article  Google Scholar 

  6. Zeng L, Xiao Y, Chen H, Sun B, Han W (2016) Computer operating system logging and security issues: a survey. Secur Commun Netw 9(17):4804–4821

    Article  Google Scholar 

  7. Mi H, Wang H, Zhou Y, Lyu MR-T, Cai H (2013) Toward fine-grained, unsupervised, scalable performance diagnosis for production cloud computing systems. IEEE Trans Parallel Distrib Syst 24(6):1245–1255

    Article  Google Scholar 

  8. Liang H, Song L, Wang J, Guo L, Li X, Liang J (2021) Robust unsupervised anomaly detection via multi-time scale dcgans with forgetting mechanism for industrial multivariate time series. Neurocomputing 423:444–462

    Article  Google Scholar 

  9. Du M, Li F, Zheng G, Srikumar V (2017) Deeplog: anomaly detection and diagnosis from system logs through deep learning. In: Proceedings of the 2017 ACM SIGSAC conference on computer and communications security, pp 1285–1298

  10. Zhu J, He S, Liu J, He P, Xie Q, Zheng Z, Lyu MR (2019) Tools and benchmarks for automated log parsing. In: 2019 IEEE/ACM 41st international conference on software engineering: software engineering in practice (ICSE-SEIP). IEEE, pp 121–130

  11. Mizutani M (2013) Incremental mining of system log format. In: 2013 IEEE international conference on services computing. IEEE, pp 595–602

  12. Shima K (2016) Length matters: clustering system log messages using length of words. arXiv preprint arXiv:1611.03213

  13. Du M, Li F (2016) Spell: streaming parsing of system event logs. In: 2016 IEEE 16th international conference on data mining (ICDM). IEEE, pp 859–864

  14. He P, Zhu J, Zheng Z, Lyu MR (2017) Drain: an online log parsing approach with fixed depth tree. In: 2017 IEEE international conference on web services (ICWS). IEEE, pp 33–40

  15. Fu Q, Lou J-G, Wang Y, Li J (2009) Execution anomaly detection in distributed systems through unstructured log analysis. In: 2009 Ninth IEEE international conference on data mining. IEEE, pp 149–158

  16. Tang L, Li T, Perng C-S (2011) Logsig: Generating system events from raw textual logs. In: Proceedings of the 20th ACM international conference on information and knowledge management, pp 785–794

  17. Hamooni H, Debnath B, Xu J, Zhang H, Jiang G, Mueen A (2016) Logmine: fast pattern recognition for log analytics. In: Proceedings of the 25th ACM international on conference on information and knowledge management, pp 1573–1582

  18. Makanju AA, Zincir-Heywood AN, Milios EE (2009) Clustering event logs using iterative partitioning. In: Proceedings of the 15th ACM SIGKDD international conference on knowledge discovery and data mining, pp 1255–1264

  19. Vaarandi R (2003) A data clustering algorithm for mining patterns from event logs. In: Proceedings of the 3rd ieee workshop on IP operations & management (IPOM 2003)(IEEE Cat. No. 03EX764). IEEE, pp 119–126

  20. Nagappan M, Vouk MA (2010) Abstracting log lines to log event types for mining software system logs. In: 2010 7th IEEE working conference on mining software repositories (MSR 2010). IEEE, pp 114–117

  21. Vaarandi R, Pihelgas M (2015) Logcluster-a data clustering and pattern mining algorithm for event logs. In: 2015 11th international conference on network and service management (CNSM). IEEE, pp 1–7

  22. Jiang ZM, Hassan AE, Flora P, Hamann G (2008) Abstracting execution logs to execution events for enterprise applications (short paper). In: 2008 The eighth international conference on quality software. IEEE, pp 181–186

  23. Dai H, Li H, Shang W, Chen T-H, Chen C-S (2020) Logram: efficient log parsing using n-gram dictionaries. arXiv preprint arXiv:2001.03038

  24. Nedelkoski S, Bogatinovski J, Acker A, Cardoso J, Kao O (2020) Self-supervised log parsing. arXiv preprint arXiv:2003.07905

  25. He P, Zhu J, Xu P, Zheng Z, Lyu MR (2018) A directed acyclic graph approach to online log parsing

  26. He P, Zhu J, He S, Li J, Lyu MR (2017) Towards automated log parsing for large-scale log data analysis. IEEE Trans Dependable Secur Comput 15(6):931–944

    Article  Google Scholar 

  27. Agrawal A, Karlupia R, Gupta R (2019) Logan: a distributed online log parser. In: 2019 IEEE 35th international conference on data engineering (ICDE). IEEE, pp 1946–1951

  28. Pang G, Shen C, Cao L, Hengel AVD (2021) Deep learning for anomaly detection: a review. ACM Comput Surv (CSUR) 54(2):1–38

    Article  Google Scholar 

  29. Xu W, Huang L, Fox A, Patterson D, Jordan MI (2009) Detecting large-scale system problems by mining console logs. In: Proceedings of the ACM SIGOPS 22nd symposium on operating systems principles, pp 117–132

  30. Lou J-G, Fu Q, Yang S, Xu Y, Li J (2010) Mining invariants from console logs for system problem detection. In: USENIX annual technical conference, pp 1–14

  31. Zhang X, Xu Y, Lin Q, Qiao B, Zhang H, Dang Y, Xie C, Yang X, Cheng Q, Li Z et al (2019) Robust log-based anomaly detection on unstable log data. In: Proceedings of the 2019 27th ACM joint meeting on european software engineering conference and symposium on the foundations of software engineering, pp 807–817

  32. Nedelkoski S, Bogatinovski J, Acker A, Cardoso J, Kao O (2020) Self-attentive classification-based anomaly detection in unstructured logs. In: 2020 IEEE international conference on data mining (ICDM), pp 1196–1201. https://doi.org/10.1109/ICDM50108.2020.00148

  33. Kimura T, Watanabe A, Toyono T, Ishibashi K (2018) Proactive failure detection learning generation patterns of large-scale network logs. IEICE Trans Commun

  34. Lu S, Rao B, Wei X, Tak B, Wang L, Wang L (2017) Log-based abnormal task detection and root cause analysis for spark. In: 2017 IEEE international conference on web services (ICWS). IEEE, pp 389–396

  35. Anitha V, Isakki P (2016) A survey on predicting user behavior based on web server log files in a web usage mining. In: 2016 International conference on computing technologies and intelligent data engineering (ICCTIDE’16), pp 1–4. https://doi.org/10.1109/ICCTIDE.2016.7725340

  36. Awad M, Menascé DA (2015) Automatic workload characterization using system log analysis. In: Computer measurement group conference on performance and capacity, San Antonio, TX

  37. He P, Zhu J, He S, Li J, Lyu MR (2016) An evaluation study on log parsing and its use in log mining. In: 2016 46th annual IEEE/IFIP international conference on dependable systems and networks (DSN). IEEE, pp 654–661

  38. He S, Zhu J, He P, Lyu MR (2020) Loghub: a large collection of system log datasets towards automated log analytics. arXiv preprint arXiv:2008.06448

  39. Ghomi EJ, Rahmani AM, Qader NN (2017) Load-balancing algorithms in cloud computing: a survey. J Netw Comput Appl 88:50–71

    Article  Google Scholar 

  40. Mishra SK, Sahoo B, Parida PP (2020) Load balancing in cloud computing: a big picture. J King Saud Univ Comput Inf Sci 32(2):149–158

    Google Scholar 

Download references

Acknowledgements

The work described in this paper was supported by the cloud provider 3DS OUSTCALE and by the French National Research and Technology Association (CIFRE program N\(^{\circ }\) 2020/0289). We warmly thank both of them for their support.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Arthur Vervaet.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Some preliminary results have been published at the IEEE International Conference on Data Mining in 2021 [1].

Appendix A: Details about parsing experimental settings

Appendix A: Details about parsing experimental settings

Preprocessing using regex helps log parsers achieve more accurate results. During the evaluation, we selected the same regex for all the parsers on a given dataset. For every algorithm, the parameter setting values were fine-tuned through over 100 runs to avoid bias from randomization. We kept the values for which algorithms achieve the highest accuracy on a given dataset. Therefore, preprocessing regex for each dataset and parameters for each parser are summarized in Table 5).

Regarding the number of parameters, SHISO requires four: (1) maxChild the maximum number of children for each internal node; (2) mergeThreshold, a threshold for searching the most similar template in the children; (3) formatLookupThreshold, lower bound to find the most similar node to adjust; and (4)superFormatThreshold, the threshold of average LCS length to determine if the creation of a super format is needed. LenMa uses only one parameter \(T_c\), the threshold for similarity comparisons between the log message and the clusters. Spell also requires only one parameter \(\tau \) as a threshold for similarity. Finally, Drain needs three parameters [14]: (1) depth, the depth of the parsing tree; (2) st a threshold for similarity comparisons between the log messages and the discovered templates; and (3) maxChild, the maximum number of children that a node can have. Once this threshold is reached, every new value is sent to a default node. In the last version [25], the number of parameters was reduced to only one, st, and a dynamic update is proposed (Table 6).

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Vervaet, A., Callau-Zori, M., Chabchoub, Y. et al. Online log parsing using evolving research tree. Knowl Inf Syst 66, 1231–1255 (2024). https://doi.org/10.1007/s10115-023-01953-z

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10115-023-01953-z

Keywords

Navigation