Grouping Test Results with the Common Root Cause Using String Similarity Algorithms

Kramar, Vladimir T.; Nurminen, Jukka K.; Aalto, Tatu

doi:10.1007/978-3-031-14054-9_21

Grouping Test Results with the Common Root Cause Using String Similarity Algorithms

Vladimir T. Kramar¹⁶,
Jukka K. Nurminen¹⁶ &
Tatu Aalto¹⁷

Conference paper
First Online: 11 August 2022

427 Accesses

Part of the book series: Advances in Intelligent Systems and Computing ((AISC,volume 1431))

Abstract

This paper presents an innovative concept of test log categorization that demonstrates how five different string similarity algorithms such as Pythons built-in diff library, Jaccard index, Jaro-Winkler distance, cosine similarity and Levenshtein ratio are applied on test logs from pytest - one of the popular assertion-based test frameworks. In order to minimize manual, error-prone work for software engineers of analyzing multiple test runs daily, these test logs are grouped into the following three distinctive categories; C1 - has similar failures in the same test, C2 - has a similar failure in two different tests and C3 - has a different failure in two same tests for easier root-cause analysis and fixes. The presented work demonstrates how efficient the string similarity algorithms can be.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 189.00; Price excludes VAT (USA)

Softcover Book: USD 249.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

Uspenskij, M.B.: Log mining and knowledge-based models in data storage systems diagnostics. In: E3S Web of Conferences, vol. 140, p. 03006 (2019)
Google Scholar
Splunk—Turn Data Into Doing. https://www.splunk.com/
Sematext Logs—Cloud Log Management Service—Hosted ELK. https://sematext.com/logsene/
He, P., Zhu, J., Zheng, Z., Lyu, M.R.: Drain: an online log parsing approach with fixed depth tree. In: 2017 IEEE International Conference on Web Services (ICWS), pp. 33–40. IEEE (2017)
Google Scholar
Du, M., Li, F.: Spell: online streaming parsing of large unstructured system logs. In: IEEE Transactions on Knowledge and Data Engineering, vol. 31, no. 11, pp. 2213–2227 (2019). https://ieeexplore.ieee.org/document/8489912/
Ren, Y., et al.: System log detection model based on conformal prediction. Electronics 9(2), 232 (2020)
Article Google Scholar
Du, M., Li, F., Zheng, G., Srikumar, V.: DeepLog. In: Proceedings of the 2017 ACM SIGSAC Conference on Computer and Communications Security, New York, NY, USA, pp. 1285–1298. ACM (2017)
Google Scholar
Continuous Integration. https://martinfowler.com/articles/continuousIntegration.html
Durieux, T., Abreu, R., Monperrus, M., Bissyand´e, T.F., Cruz, L.: An analysis of 35+ million jobs of Travis CI; an analysis of 35+ million jobs of Travis CI. In: 2019 IEEE International Conference on Software Maintenance and Evolution (ICSME) (2019). https://travis-ci.org
difflib - Helpers for computing deltas (2022). https://docs.python.org/3/library/difflib.html
Cohen, W.W., Ravikumar, P., Fienberg, S.: A comparison of string metrics for matching names and records (2003)
Google Scholar
Jaccard, P.: The distribution of the flora in the alpine zone.1. New Phytol. 11(2), 37–50 (1912)
Article Google Scholar
Han, J., Kamber, M., Pei, J.: Getting to know your data. In: Data Mining, pp. 39–82. Elsevier (2012). https://linkinghub.elsevier.com/retrieve/pii/B9780123814791000022
Levenshtein, V.I., et al.: Binary codes capable of correcting deletions, insertions, and reversals. In: Soviet Physics doklady, vol. 10, no. 8, pp. 707–710 (1966)
Google Scholar
Home—WithSecure™. https://www.withsecure.com/en/home
pytest: helps you write better programs—pytest documentation. https://docs.pytest.org/en/7.0.x/
Home - Ivves. https://ivves.eu/

Download references

Acknowledgements

This work was funded by local authorities (“Business Finland”) under grant agreement ITEA-2019-18022-IVVES of ITEA3 programme [17]. The authors would like to express gratitude to Alexei Vyskubov and Matvej Pashkovskiy from F-Secure for the discussions and their insightful input that lead to the findings of this paper.

Author information

Authors and Affiliations

University of Helsinki, Helsinki, Finland
Vladimir T. Kramar & Jukka K. Nurminen
F-Secure, Helsinki, Finland
Tatu Aalto

Authors

Vladimir T. Kramar
View author publications
You can also search for this author in PubMed Google Scholar
Jukka K. Nurminen
View author publications
You can also search for this author in PubMed Google Scholar
Tatu Aalto
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Vladimir T. Kramar .

Editor information

Editors and Affiliations

University of Detroit Mercy, Detroit, MI, USA
Kevin Daimi
Kent Institute Australia, Sydney, NSW, Australia
Abeer Al Sadoon

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Kramar, V.T., Nurminen, J.K., Aalto, T. (2022). Grouping Test Results with the Common Root Cause Using String Similarity Algorithms. In: Daimi, K., Al Sadoon, A. (eds) Proceedings of the ICR’22 International Conference on Innovations in Computing Research. ICR 2022. Advances in Intelligent Systems and Computing, vol 1431. Springer, Cham. https://doi.org/10.1007/978-3-031-14054-9_21

Download citation

DOI: https://doi.org/10.1007/978-3-031-14054-9_21
Published: 11 August 2022
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-14053-2
Online ISBN: 978-3-031-14054-9
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)

Publish with us

Policies and ethics