Skip to main content

Grouping Test Results with the Common Root Cause Using String Similarity Algorithms

  • Conference paper
  • First Online:
  • 427 Accesses

Part of the book series: Advances in Intelligent Systems and Computing ((AISC,volume 1431))

Abstract

This paper presents an innovative concept of test log categorization that demonstrates how five different string similarity algorithms such as Pythons built-in diff library, Jaccard index, Jaro-Winkler distance, cosine similarity and Levenshtein ratio are applied on test logs from pytest - one of the popular assertion-based test frameworks. In order to minimize manual, error-prone work for software engineers of analyzing multiple test runs daily, these test logs are grouped into the following three distinctive categories; C1 - has similar failures in the same test, C2 - has a similar failure in two different tests and C3 - has a different failure in two same tests for easier root-cause analysis and fixes. The presented work demonstrates how efficient the string similarity algorithms can be.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   189.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   249.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

  1. Uspenskij, M.B.: Log mining and knowledge-based models in data storage systems diagnostics. In: E3S Web of Conferences, vol. 140, p. 03006 (2019)

    Google Scholar 

  2. Splunk—Turn Data Into Doing. https://www.splunk.com/

  3. Sematext Logs—Cloud Log Management Service—Hosted ELK. https://sematext.com/logsene/

  4. He, P., Zhu, J., Zheng, Z., Lyu, M.R.: Drain: an online log parsing approach with fixed depth tree. In: 2017 IEEE International Conference on Web Services (ICWS), pp. 33–40. IEEE (2017)

    Google Scholar 

  5. Du, M., Li, F.: Spell: online streaming parsing of large unstructured system logs. In: IEEE Transactions on Knowledge and Data Engineering, vol. 31, no. 11, pp. 2213–2227 (2019). https://ieeexplore.ieee.org/document/8489912/

  6. Ren, Y., et al.: System log detection model based on conformal prediction. Electronics 9(2), 232 (2020)

    Article  Google Scholar 

  7. Du, M., Li, F., Zheng, G., Srikumar, V.: DeepLog. In: Proceedings of the 2017 ACM SIGSAC Conference on Computer and Communications Security, New York, NY, USA, pp. 1285–1298. ACM (2017)

    Google Scholar 

  8. Continuous Integration. https://martinfowler.com/articles/continuousIntegration.html

  9. Durieux, T., Abreu, R., Monperrus, M., Bissyand´e, T.F., Cruz, L.: An analysis of 35+ million jobs of Travis CI; an analysis of 35+ million jobs of Travis CI. In: 2019 IEEE International Conference on Software Maintenance and Evolution (ICSME) (2019). https://travis-ci.org

  10. difflib - Helpers for computing deltas (2022). https://docs.python.org/3/library/difflib.html

  11. Cohen, W.W., Ravikumar, P., Fienberg, S.: A comparison of string metrics for matching names and records (2003)

    Google Scholar 

  12. Jaccard, P.: The distribution of the flora in the alpine zone.1. New Phytol. 11(2), 37–50 (1912)

    Article  Google Scholar 

  13. Han, J., Kamber, M., Pei, J.: Getting to know your data. In: Data Mining, pp. 39–82. Elsevier (2012). https://linkinghub.elsevier.com/retrieve/pii/B9780123814791000022

  14. Levenshtein, V.I., et al.: Binary codes capable of correcting deletions, insertions, and reversals. In: Soviet Physics doklady, vol. 10, no. 8, pp. 707–710 (1966)

    Google Scholar 

  15. Home—WithSecure™. https://www.withsecure.com/en/home

  16. pytest: helps you write better programs—pytest documentation. https://docs.pytest.org/en/7.0.x/

  17. Home - Ivves. https://ivves.eu/

Download references

Acknowledgements

This work was funded by local authorities (“Business Finland”) under grant agreement ITEA-2019-18022-IVVES of ITEA3 programme [17]. The authors would like to express gratitude to Alexei Vyskubov and Matvej Pashkovskiy from F-Secure for the discussions and their insightful input that lead to the findings of this paper.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Vladimir T. Kramar .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Kramar, V.T., Nurminen, J.K., Aalto, T. (2022). Grouping Test Results with the Common Root Cause Using String Similarity Algorithms. In: Daimi, K., Al Sadoon, A. (eds) Proceedings of the ICR’22 International Conference on Innovations in Computing Research. ICR 2022. Advances in Intelligent Systems and Computing, vol 1431. Springer, Cham. https://doi.org/10.1007/978-3-031-14054-9_21

Download citation

Publish with us

Policies and ethics