Skip to main content

FastLogSim: A Quick Log Pattern Parser Scheme Based on Text Similarity

  • Conference paper
  • First Online:
Knowledge Science, Engineering and Management (KSEM 2020)

Abstract

Logs completely record all system events which can be used to reveal network security issue and analyse user behaviour. Since logs are stored in the form of unstructured data and there is no universal log retention standard, they can hardly be analysed directly. Most of the existing log parsers focus on the parsing accuracy and ignore the time performance while parsing the large-amount of logs. Therefore, this paper proposes FastLogSim, a fast log parsing scheme based on text similarity. To simplify the parsing workload, we perform deduplication on the logs after removing the key variable values to obtain the template. Subsequently, the similarity is computed to merge the similar templates and then obtain the log pattern. FastLogSim not only reduces the number of templates that need to be parsed from tens of millions to dozens, but also greatly improves the speed of pattern extraction. We evaluated FastLogSim on four real public log datasets. The experimental results show that when the FastLogSim process tens thousands of logs, it performs almost the same time as the mainstream log parser. However, when the number of logs exceeds ten million, FastLogSim is three times faster than previous state-of-the-art parsers. Hence, FastLogSim is appropriative for large-scale log pattern mining.

Supported by the Department of Science and Technology of Jilin Province grant NO. 20190302070GX, the Education Department of Jilin Province grant NO. JJKH20190598KJ, Jilin Education Science Planning Project (GH180148) and Jilin Province College and University “Golden Course” Plan Project (Network protocol and network virus virtual simulation experiment).

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

References

  1. Du, M., Li, F.: Spell: online streaming parsing of large unstructured system logs. IEEE Trans. Knowl. Data Eng. 31(11), 2213–2227 (2019)

    Article  Google Scholar 

  2. Fu, Q., Lou, J., Wang, Y., Li, J.: Execution anomaly detection in distributed systems through unstructured log analysis. In: Ninth IEEE International Conference on Data Mining, pp. 149–158, December 2009

    Google Scholar 

  3. He, P., Zhu, J., He, S., Li, J., Lyu, M.R.: Towards automated log parsing for large-scale log data analysis. IEEE Trans. Dependable Secure Comput. 15(6), 931–944 (2018)

    Article  Google Scholar 

  4. He, P., Zhu, J., Zheng, Z., Lyu, M.R.: Drain: an online log parsing approach with fixed depth tree. In: IEEE International Conference on Web Services (ICWS), pp. 33–40, June 2017

    Google Scholar 

  5. He, P., Zhu, J., He, S., Jian, L., Lyu, M.R.: An evaluation study on log parsing and its use in log mining. In: 46th Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN) (2016)

    Google Scholar 

  6. Liu, W., Liu, X., Di, X., Qi, H.: A novel network intrusion detection algorithm based on fast Fourier transformation. In: 1st International Conference on Industrial Artificial Intelligence (IAI), pp. 1–6, July 2019

    Google Scholar 

  7. Makanju, A., Zincir-Heywood, A.N., Milios, E.E.: A lightweight algorithm for message type extraction in system application logs. IEEE Trans. Knowl. Data Eng. 24(11), 1921–1936 (2012)

    Article  Google Scholar 

  8. Makanju, A.A., Zincir-Heywood, A.N., Milios, E.E.: Clustering event logs using iterative partitioning. In: Proceedings of the 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD 2009, pp. 1255–1264. ACM, New York (2009)

    Google Scholar 

  9. Min, D., Li, F., Zheng, G., Srikumar, V.: Deeplog: anomaly detection and diagnosis from system logs through deep learning. In: The 2017 ACM SIGSAC Conference (2017)

    Google Scholar 

  10. Mizutani, M.: Incremental mining of system log format. In: IEEE International Conference on Services Computing, pp. 595–602 (2013)

    Google Scholar 

  11. Ortona, S.: An analysis of duplicate on web extracted objects. In: Companion Publication of International Conference on World Wide Web Companion (2014)

    Google Scholar 

  12. Poggi, N., Muthusamy, V., Carrera, D., Khalaf, R.: Business process mining from e-commerce web logs. In: Daniel, F., Wang, J., Weber, B. (eds.) BPM 2013. LNCS, vol. 8094, pp. 65–80. Springer, Heidelberg (2013). https://doi.org/10.1007/978-3-642-40176-3_7

    Chapter  Google Scholar 

  13. Saad, K., Simon, P.: Eliciting and utilising knowledge for security event log analysis: an association rule mining and automated planning approach. Expert Syst. Appl. 113(116–127), S0957417418304226 (2018)

    Google Scholar 

  14. Tang, L., Li, T., Perng, C.S.: Logsig: generating system events from raw textual logs. In: Proceedings of the 20th ACM International Conference on Information and Knowledge Management, CIKM ’11, pp. 785–794. ACM, New York (2011)

    Google Scholar 

  15. Vaarandi, R.: A data clustering algorithm for mining patterns from event logs (2003)

    Google Scholar 

  16. Wong, W.E., Debroy, V., Golden, R., Xu, X., Thuraisingham, B.: Effective software fault localization using an RBF neural network. IEEE Trans. Reliab. 61(1), 149–169 (2012)

    Article  Google Scholar 

  17. Xu, W., Huang, L., Fox, A., Patterson, D., Jordan, M.I.: Detecting large-scale system problems by mining console logs. In: Proceedings of the ACM SIGOPS 22Nd Symposium on Operating Systems Principles, SOSP 2009, pp. 117–132. ACM, New York (2009)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Xiaoqiang Di .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2020 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Liu, W., Liu, X., Di, X., Cai, B. (2020). FastLogSim: A Quick Log Pattern Parser Scheme Based on Text Similarity. In: Li, G., Shen, H., Yuan, Y., Wang, X., Liu, H., Zhao, X. (eds) Knowledge Science, Engineering and Management. KSEM 2020. Lecture Notes in Computer Science(), vol 12274. Springer, Cham. https://doi.org/10.1007/978-3-030-55130-8_19

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-55130-8_19

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-55129-2

  • Online ISBN: 978-3-030-55130-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics