Using code reviews to automatically configure static analysis tools

Zampetti, Fiorella; Mudbhari, Saghan; Arnaoudova, Venera; Di Penta, Massimiliano; Panichella, Sebastiano; Antoniol, Giuliano

doi:10.1007/s10664-021-10076-4

Using code reviews to automatically configure static analysis tools

Published: 11 December 2021

Volume 27, article number 28, (2022)
Cite this article

Empirical Software Engineering Aims and scope Submit manuscript

Fiorella Zampetti ORCID: orcid.org/0000-0001-7098-8964¹,
Saghan Mudbhari²,
Venera Arnaoudova²,
Massimiliano Di Penta¹,
Sebastiano Panichella³ &
…
Giuliano Antoniol⁴

821 Accesses
9 Citations
Explore all metrics

Abstract

Developers often use Static Code Analysis Tools (SCAT) to automatically detect different kinds of quality flaws in their source code. Since many warnings raised by SCATs may be irrelevant for a project/organization, it can be possible to leverage information from the project development history, to automatically configure which warnings a SCAT should raise, and which not. In this paper, we propose an automated approach (Auto-SCAT) to leverage (statement-level) code review comments for recommending SCAT warnings, or warning categories, to be enabled. To this aim, we trace code review comments onto SCAT warnings by leveraging their descriptions and messages, as well as review comments made in other different projects. We apply Auto-SCAT to study how CheckStyle, a well-known SCAT, can be configured in the context of six Java open source projects, all using Gerrit for handling code reviews. Our results show that, Auto-SCAT is able to classify code review comments into CheckStyle checks with a precision of 61% and a recall of 52%. While considering also the code review comments not related to CheckStyle warnings Auto-SCAT has a precision and a recall of ≈ 75%. Furthermore, Auto-SCAT can configuring CheckStyle with a precision of 72.7% at checks level and a precision of 96.3% at category level. Finally, our findings highlight that Auto-SCAT outperforms state-of-art baselines based on default CheckStyle configurations, or leveraging the history of previously-removed warnings.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Large Language Model Assisted Software Engineering: Prospects, Challenges, and a Case Study

CoRT: Transformer-based code representations with self-supervision by predicting reserved words for code smell detection

Article 08 April 2024

Challenges of Low-Code/No-Code Software Development: A Literature Review

Notes

http://checkstyle.sourceforge.net
https://www.gerritcodereview.com
http://review.couchbase.org/#/c/21805/ line 241, author’s hidden for privacy reasons.
http://checkstyle.sourceforge.net/checks.html
https://www.eclipse.org/mylyn/
https://wiki.eclipse.org/Development_Conventions_and_Guidelines
https://checkstyle.sourceforge.io/sun_style.html
https://checkstyle.sourceforge.io/google_style.html
core/org.eclipse.cdt.ui/src/org/eclipse/cdt/internal/ui/refactoring/pullup/PullUp Information.java+refs-changes-77-22177-3
client/src/com/vaadin/client/widgets/Escalator.java+refs-changes-28-7628-1

References

Anderson P, Reps T, Teitelbaum T, Zarins M (2003) Tool support for fine-grained software inspection. In: IEEE Software
Ayewah N, Pugh W (2009) Using checklists to review static analysis warnings. In: Proceedings of the International Workshop on Defects in Large Software Systems: Held in Conjunction with the ACM SIGSOFT International Symposium on Software Testing and Analysis (ISSTA 2009), pp 11–15
Bacchelli A, Bird C (2013) Expectations, outcomes, and challenges of modern code review. In: Proceedings of the International Conference on Software Engineering (ICSE), pp 712–721
Baeza-Yates RA, Ribeiro-Neto B (1999) Modern information retrieval. Addison-Wesley longman publishing co. inc, UBoston
Google Scholar
Bavota G, Russo B (2015) Four eyes are better than two: on the impact of code reviews on software quality. In: IEEE International Conference on Software Maintenance and Evolution, (ICSME)
Baysal O, Kononenko O, Holmes R, Godfrey M (2013) The influence of non-technical factors on code review. In: Reverse Engineering (WCRE), 2013 20th Working Conference on
Beller M, Bacchelli A, Zaidman A, Juergens E (2014) Modern code reviews in open-source projects: Which problems do they fix?. In: Proceedings of the Working Conference on Mining Software Repositories (MSR), pp 202–211
Beller M, Bholanath R, McIntosh S, Zaidman A (2016) Analyzing the state of static analysis: A large-scale evaluation in open source software. In: Proceedings of the International Conference on Software Analysis, Evolution, and Reengineering (SANER), pp 470–481
Bosu A (2014) Characteristics of the vulnerable code changes identified through peer code review. In: 36Th international conference on software engineering, ICSE ’14, companion proceedings, hyderabad, india, may 31 - june 07, 2014, pp 736–738
Bosu A, Carver JC, Bird C, Orbeck J, Chockley C (2017) Process aspects and social dynamics of contemporary code review. Insights from open source development and industrial practice at microsoft. IEEE Transactions on Software Engineering
Cassee N, Vasilescu B, Serebrenik A (2020) The silent helper: The impact of continuous integration on code reviews. In: 27Th IEEE international conference on software analysis, evolution and reengineering, SANER 2020, london, ON, Canada, February 18-21, 2020, pp 423–434
Couto C, Montandon JE, Silva C, Valente MT (2013) Static correspondence and correlation between field defects and warnings reported by a bug finding tool. Softw Qual J 21(2):241–257
Article Google Scholar
Duvall P, Matyas SM, Glover A (2007) Continuous Integration. Improving Software Quality and Reducing Risk (The Addison-Wesley Signature Series). Addison-Wesley Professional
Fagan M (1976) Design and code inspections to reduce errors in program development. IBM Systems Journal
Fry Z, Weimer W (2013) Clustering static analysis defect reports to reduce maintenance costs. In: Proceedings of the Working Conference on Reverse Engineering (WCRE)
Hanam Q, Tan L, Holmes R, Lam P (2014) Finding patterns in static analysis alerts: Improving actionable alert ranking. In: Proceedings of the Working Conference on Mining Software Repositories
Huang A (2008) Similarity measures for text document clustering. In: New Zealand Computer Science Research Student Conference
Johnson B, Song Y, Murphy-Hill E, Bowdidge R (2013) Why don’t software developers use static analysis tools to find bugs?. In: Proceedings of the International Conference on Software Engineering (ICSE), pp 672–681
Khoo YP, Foster JS, Hicks M, Sazawal V (2008) Path projection for user-centered static analysis tools. In: Proceedings of the ACM SIGPLAN-SIGSOFT Workshop on Program Analysis for Software Tools and Engineering, pp 57–63
Kim S, Ernst M (2007) Which warnings should I fix first?. In: Proceedings of the Joint Meeting of the European Software Engineering Conference and the ACM SIGSOFT Symposium on the Foundations of Software Engineering (ESEC/FSE), pp 45–54
Kononenko O, Baysal O, Godfrey MW (2016) Code review quality: How developers see it. In: Proceedings of the 38th International Conference on Software Engineering
Manning CD, Raghavan P, Schütze H (2008) Introduction to Information Retrieval. Cambridge University Press, Cambridge
Book Google Scholar
Mäntylä M, Lassenius C (2009) What types of defects are really discovered in code reviews? IEEE Trans Software Eng 35(3):430–448
Article Google Scholar
Marcilio D, Bonifacio R, Monteiro E, Canedo E, Luz W, Pinto G (2019) Are static analysis violations really fixed? a closer look at realistic usage of sonarqube. In: International Conference on Program Comprehension(ICPC)
McIntosh S, Kamei Y, Adams B, Hassan AE (2016) An empirical study of the impact of modern code review practices on software quality. Empir Softw Eng 21(5):2146–2189
Article Google Scholar
Morales R, McIntosh S, Khomh F (2015) Do code review practices impact design quality? a case study of the qt, vtk, and itk projects. In: Proc. of the 22nd Int’l Conf. on Software Analysis, Evolution, and Reengineering (SANER)
Muske T, Baid A, Sanas T (2013) Review efforts reduction by partitioning of static analysis warnings. In: Proceedings of the International Working Conference on Source Code Analysis and Manipulation (SCAM)
Cousot P , Cousot R, Feret J, Mauborgne L, Monniaux D, Rival AX (2005) The astreé analyzer. In: Proceedings of the European Symposium on Programming (ESOP)
Panichella S, Zaugg N (2020) An empirical investigation of relevant changes and automation needs in modern code review. Empir Softw Eng 25(6):4833–4872
Article Google Scholar
Panichella S, Arnaoudova V, Di Penta M, Antoniol G (2015) Would static analysis tools help developers with code reviews?. In: Proceedings of the International Conference on Software Analysis, Evolution, and Reengineering (SANER), pp 161–170
Pascarella L, Spadini D, Palomba F, Bruntink M, Bacchelli A (2018) Information needs in contemporary code review. PACMHCI 2(CSCW):135 27:1–135
Google Scholar
Phang K, Foster JS, Hicks MW, Sazawal V (2009) Triaging checklists: a substitute for a phd in static analysis. Evaluation and Usability of Programming Languages and Tools (PLATEAU)
Porter M (1980) An algorithm for suffix stripping. Program 14 (3):130–137. https://doi.org/10.1108/eb046814
Article Google Scholar
Querel LP, Rigby PC (2018) Warningsguru: integrating statistical bug models with static analysis to provide timely and specific bug warnings. In: Proceedings of the 2018 26th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering, pp 892–895
Řehůřek R, Sojka P (2010) Software framework for topic modelling with large corpora. In: Proceedings of the LREC 2010 Workshop on New Challenges for NLP Frameworks (ELRA), pp 45–50
Reiss S (2007) Automatic code stylizing. In: Proceedings of the International Conference on Automated Software Engineering (ASE), pp 74–83
Ribeiro A, Meirelles P, Lago N, Kon F (2019) Ranking warnings from multiple source code static analyzers via ensemble learning. In: Proceedings of the 15th International Symposium on Open Collaboration, ACM, p 5
Rigby PC, German DM, Storey MA (2008) Open source software peer review practices: A case study of the apache server. In: Proceedings of the 30th International Conference on Software Engineering
Ruthruff JR, Penix J, Morgenthaler JD, Elbaum S, Rothermel G (2008) Predicting accurate and actionable static analysis warnings: an experimental approach. In: Proceedings of the International Conference on Software Engineering (ICSE), pp 341–350
Salton GM, Wong A, Yang C (1975) A vector space model for automatic indexing
Sokolova M, Guy L (2009) A systematic analysis of performance measures for classification tasks. Information Processing & Management 427–437
Spacco J, Hovemeyer D, Pugh W (2006) Tracking defect warnings across versions. In: Proceedings of the 2006 international workshop on Mining software repositories, ACM, pp 133–136
Vassallo C, Panichella S, Palomba F, Proksch S, Zaidman A, Gall HC (2018) Context is king: The developer perspective on the usage of static analysis tools. In: Proceedings of the International Conference on Software Analysis, Evolution and Reengineering (SANER), pp 38–49
Wedyan F, Alrmuny D, Bieman JM (2009) The effectiveness of automated static analysis tools for fault detection and refactoring prediction. In: Second International Conference on Software Testing Verification and Validation, ICST 2009, Denver, Colorado, USA, April 1-4, 2009, IEEE Computer Society, pp 141–150
Weißgerber P, Neu D, Diehl S (2008) Small patches get in!. In: Proceedings of the 2008 International Working Conference on Mining Software Repositories
Williams CC, Hollingsworth JK (2005) Automatic mining of source code repositories to improve bug finding techniques. IEEE Transactions on Software Engineering
Yang Y (1999) An evaluation of statistical approaches to text categorization. Inf Retr 1(1):69–90
Article Google Scholar
Yoon J, Jin M, Jung Y (2014) Reducing false alarms from an industrial-strength static analyzer by SVM. In: 21St asia-pacific software engineering conference, APSEC 2014, jeju, south korea, december 1–4, 2014 2 Industry, Short, and QuASoQ Papers, pp 3–6
Yüksel U, Sözer H (2013) Automated classification of static code analysis alerts: a case study. In: Proceedings of the International Conference on Software Maintenance
Zampetti F, Scalabrino S, Oliveto R, Canfora G, Di Penta M (2017) How open source projects use static code analysis tools in continuous integration pipelines. In: Proceedings of the 14th International Conference on Mining Software Repositories, MSR 2017, Buenos Aires, Argentina, May 20-28, 2017, pp 334–344
Zampetti F, Mudbhari S, Arnaoudova V, Di Penta M, Panichella S, Antoniol G (2020). https://doi.org/10.5281/zenodo.4399225
Zhang D, Jin YGD, Zhang H (2013) Diagnosis-oriented alarm correlations. In: Asia-Pacific Software Engineering Conference (APSEC)
Zheng J, Williams L, Nagappan N, Snipes W, Hudepohl JP, Vouk MA (2006) On the value of static analysis for fault detection in software. IEEE Transactions on Software Engineering (TSE) 32(4):240–253
Article Google Scholar

Download references

Author information

Authors and Affiliations

University of Sannio, Benevento, Italy
Fiorella Zampetti & Massimiliano Di Penta
Washington State University, Pullman, WA, USA
Saghan Mudbhari & Venera Arnaoudova
Zurich University of Applied Sciences, Winterthur, Switzerland
Sebastiano Panichella
Ecole Polytechnique de Montreal, Montreal, Canada
Giuliano Antoniol

Authors

Fiorella Zampetti
View author publications
You can also search for this author in PubMed Google Scholar
Saghan Mudbhari
View author publications
You can also search for this author in PubMed Google Scholar
Venera Arnaoudova
View author publications
You can also search for this author in PubMed Google Scholar
Massimiliano Di Penta
View author publications
You can also search for this author in PubMed Google Scholar
Sebastiano Panichella
View author publications
You can also search for this author in PubMed Google Scholar
Giuliano Antoniol
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Fiorella Zampetti.

Additional information

Communicated by: Lin Tan

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Zampetti, F., Mudbhari, S., Arnaoudova, V. et al. Using code reviews to automatically configure static analysis tools. Empir Software Eng 27, 28 (2022). https://doi.org/10.1007/s10664-021-10076-4

Download citation

Accepted: 28 October 2021
Published: 11 December 2021
DOI: https://doi.org/10.1007/s10664-021-10076-4

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Using code reviews to automatically configure static analysis tools

Abstract

Access this article

Similar content being viewed by others

Large Language Model Assisted Software Engineering: Prospects, Challenges, and a Case Study

CoRT: Transformer-based code representations with self-supervision by predicting reserved words for code smell detection

Challenges of Low-Code/No-Code Software Development: A Literature Review

Notes

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Using code reviews to automatically configure static analysis tools

Abstract

Access this article

Similar content being viewed by others

Large Language Model Assisted Software Engineering: Prospects, Challenges, and a Case Study

CoRT: Transformer-based code representations with self-supervision by predicting reserved words for code smell detection

Challenges of Low-Code/No-Code Software Development: A Literature Review

Notes

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation