Abstract
This paper presents an innovative concept of test log categorization that demonstrates how five different string similarity algorithms such as Pythons built-in diff library, Jaccard index, Jaro-Winkler distance, cosine similarity and Levenshtein ratio are applied on test logs from pytest - one of the popular assertion-based test frameworks. In order to minimize manual, error-prone work for software engineers of analyzing multiple test runs daily, these test logs are grouped into the following three distinctive categories; C1 - has similar failures in the same test, C2 - has a similar failure in two different tests and C3 - has a different failure in two same tests for easier root-cause analysis and fixes. The presented work demonstrates how efficient the string similarity algorithms can be.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsReferences
Uspenskij, M.B.: Log mining and knowledge-based models in data storage systems diagnostics. In: E3S Web of Conferences, vol. 140, p. 03006 (2019)
Splunk—Turn Data Into Doing. https://www.splunk.com/
Sematext Logs—Cloud Log Management Service—Hosted ELK. https://sematext.com/logsene/
He, P., Zhu, J., Zheng, Z., Lyu, M.R.: Drain: an online log parsing approach with fixed depth tree. In: 2017 IEEE International Conference on Web Services (ICWS), pp. 33–40. IEEE (2017)
Du, M., Li, F.: Spell: online streaming parsing of large unstructured system logs. In: IEEE Transactions on Knowledge and Data Engineering, vol. 31, no. 11, pp. 2213–2227 (2019). https://ieeexplore.ieee.org/document/8489912/
Ren, Y., et al.: System log detection model based on conformal prediction. Electronics 9(2), 232 (2020)
Du, M., Li, F., Zheng, G., Srikumar, V.: DeepLog. In: Proceedings of the 2017 ACM SIGSAC Conference on Computer and Communications Security, New York, NY, USA, pp. 1285–1298. ACM (2017)
Continuous Integration. https://martinfowler.com/articles/continuousIntegration.html
Durieux, T., Abreu, R., Monperrus, M., Bissyand´e, T.F., Cruz, L.: An analysis of 35+ million jobs of Travis CI; an analysis of 35+ million jobs of Travis CI. In: 2019 IEEE International Conference on Software Maintenance and Evolution (ICSME) (2019). https://travis-ci.org
difflib - Helpers for computing deltas (2022). https://docs.python.org/3/library/difflib.html
Cohen, W.W., Ravikumar, P., Fienberg, S.: A comparison of string metrics for matching names and records (2003)
Jaccard, P.: The distribution of the flora in the alpine zone.1. New Phytol. 11(2), 37–50 (1912)
Han, J., Kamber, M., Pei, J.: Getting to know your data. In: Data Mining, pp. 39–82. Elsevier (2012). https://linkinghub.elsevier.com/retrieve/pii/B9780123814791000022
Levenshtein, V.I., et al.: Binary codes capable of correcting deletions, insertions, and reversals. In: Soviet Physics doklady, vol. 10, no. 8, pp. 707–710 (1966)
Home—WithSecure™. https://www.withsecure.com/en/home
pytest: helps you write better programs—pytest documentation. https://docs.pytest.org/en/7.0.x/
Home - Ivves. https://ivves.eu/
Acknowledgements
This work was funded by local authorities (“Business Finland”) under grant agreement ITEA-2019-18022-IVVES of ITEA3 programme [17]. The authors would like to express gratitude to Alexei Vyskubov and Matvej Pashkovskiy from F-Secure for the discussions and their insightful input that lead to the findings of this paper.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Kramar, V.T., Nurminen, J.K., Aalto, T. (2022). Grouping Test Results with the Common Root Cause Using String Similarity Algorithms. In: Daimi, K., Al Sadoon, A. (eds) Proceedings of the ICR’22 International Conference on Innovations in Computing Research. ICR 2022. Advances in Intelligent Systems and Computing, vol 1431. Springer, Cham. https://doi.org/10.1007/978-3-031-14054-9_21
Download citation
DOI: https://doi.org/10.1007/978-3-031-14054-9_21
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-14053-2
Online ISBN: 978-3-031-14054-9
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)