Skip to main content
Log in

ML-Parser: An Efficient and Accurate Online Log Parser

  • Regular Paper
  • Published:
Journal of Computer Science and Technology Aims and scope Submit manuscript

Abstract

A log is a text message that is generated in various services, frameworks, and programs. The majority of log data mining tasks rely on log parsing as the first step, which transforms raw logs into formatted log templates. Existing log parsing approaches often fail to effectively handle the trade-off between parsing quality and performance. In view of this, in this paper, we present Multi-Layer Parser (ML-Parser), an online log parser that runs in a streaming manner. Specifically, we present a multi-layer structure in log parsing to strike a balance between efficiency and effectiveness. Coarse-grained tokenization and a fast similarity measure are applied for efficiency while fine-grained tokenization and an accurate similarity measure are used for effectiveness. In experiments, we compare ML-Parser with two existing online log parsing approaches, Drain and Spell, on ten real-world datasets, five labeled and five unlabeled. On the five labeled datasets, we use the proportion of correctly parsed logs to measure the accuracy, and ML-Parser achieves the highest accuracy on four datasets. On the whole ten datasets, we use Loss metric to measure the parsing quality. ML-Parse achieves the highest quality on seven out of the ten datasets while maintaining relatively high efficiency.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. Zhu J, He S, Liu J, He P, Xie Q, Zheng Z, Lyu M R. Tools and benchmarks for automated log parsing. In Proc. the 41st IEEE/ACM International Conference on Software Engineering: Software Engineering in Practice, May 2019, pp.121-130. DOI: 10.1109/ICSE-SEIP.2019.00021.

  2. He P, Zhu J, Zheng Z, Lyu M R. Drain: An online log parsing approach with fixed depth tree. In Proc. the 2017 IEEE International Conference on Web Services, June 2017, pp.33-40. DOI: 10.1109/ICWS.2017.13.

  3. Du M, Li F. Spell: Streaming parsing of system event logs. In Proc. the 2016 International Conference on Data Mining, Dec. 2016, pp.859-864. DOI: 10.1109/ICDM.2016.0103.

  4. Agrawal A, Karlupia R, Gupta R. Logan: A distributed online log parser. In Proc. the 35th IEEE International Conference on Data Engineering, April 2019, pp.1946-1951. DOI: 10.1109/ICDE.2019.00211.

  5. Agrawal A, Dixit A, Kapadia D, Karlupia R, Agrawal V, Gupta R. Delog: A privacy preserving log filtering frame-work for online compute platforms. arXiv:1902.04843, 2019. https://arxiv.org/abs/1902.04843, Jan. 2021.

  6. Xu W, Huang L, Fox A, Patterson D, Jordan M I. Detecting large-scale system problems by mining console logs. In Proc. the 22nd ACM SIGOPS Symposium on Operating Systems Principles, October 2009, pp.117-132. DOI: 10.1145/1629575.1629587.

  7. Vaarandi R. A data clustering algorithm for mining patterns from event logs. In Proc. the 3rd IEEE Workshop on IP Operations & Management, Oct. 2003, pp.119-126. DOI: 10.1109/IPOM.2003.1251233.

  8. Tang L, Li T, Perng C S. LogSig: Generating system events from raw textual logs. In Proc. the 20th ACM International Conference on Information and Knowledge Management, October 2011, pp.785-794. DOI: 10.1145/2063576.2063690.

  9. Fu Q, Lou J G, Wang Y, Li J. Execution anomaly detection in distributed systems through unstructured log analysis. In Proc. the 9th IEEE International Conference on Data Mining, Dec. 2009, pp.149-158. DOI: 10.1109/ICDM.2009.60.

  10. Makanju A A, Zincir-Heywood A N, Milios E E. Clustering event logs using iterative partitioning. In Proc. the 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, June 2009, pp.1255-1264. DOI: 10.1145/1557019.1557154.

  11. Makanju A, Zincir-Heywood A N, Milios E E. A lightweight algorithm for message type extraction in system application logs. IEEE Transactions on Knowledge and Data Engineering, 2011, 24(11): 1921-1936. DOI: https://doi.org/10.1109/TKDE.2011.138.

    Article  Google Scholar 

  12. Hamooni H, Debnath B, Xu J, Zhang H, Jiang G, Mueen A. LogMine: Fast pattern recognition for log analytics. In Proc. the 25th ACM International on Conference on Information and Knowledge Management, October 2016, pp.1573-1582. DOI: 10.1145/2983323.2983358.

  13. Shima K. Length matters: Clustering system log messages using length of words. arXiv:1611.03213, 2016. https://arxiv.org/abs/1611.03213, Jan. 2021.

  14. Levandowsky M, Winter D. Distance between sets. Nature, 1971, 234(5323): 34-35. DOI: https://doi.org/10.1038/234034a0.

    Article  Google Scholar 

  15. Nakatsu N, Kambayashi Y, Yajima S. A longest common subsequence algorithm suitable for similar text strings. Acta Informatica, 1982, 18(2): 171-179. DOI: https://doi.org/10.1007/BF00264437.

    Article  MathSciNet  MATH  Google Scholar 

  16. He P, Zhu J, He S, Li J, Lyu M R. An evaluation study on log parsing and its use in log mining. In Proc. the 46th Annual IEEE/IFIP International Conference on Dependable Systems and Networks, June 28-July 1, 2016, pp.654-661. DOI: 10.1109/DSN.2016.66.

  17. Yang Y, Zhang W, Zhang Y, Lin X, Wang L. Selectivity estimation on set containment search. Data Science and Engineering, 2019, 4(3): 254-268. DOI: https://doi.org/10.1007/s41019-019-00104-1.

    Article  Google Scholar 

  18. He P, Zhu J, Xu P, Zheng Z, Lyu M R. A directed acyclic graph approach to online log parsing. arXiv:1806.04356, 2018. https://arxiv.org/abs/1806.04356, Jan. 2021.

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Peng Wang.

Supplementary Information

ESM 1

(PDF 362 kb)

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Zhu, YQ., Deng, JY., Pu, JC. et al. ML-Parser: An Efficient and Accurate Online Log Parser. J. Comput. Sci. Technol. 37, 1412–1426 (2022). https://doi.org/10.1007/s11390-021-0730-4

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11390-021-0730-4

Keywords

Navigation