SpongeBugs: Automatically generating fix suggestions in response to static code analysis warnings

https://doi.org/10.1016/j.jss.2020.110671Get rights and content

Highlights

  • We present SpongeBugs: a technique to fix violations of static code analysis rules.

  • SpongeBugs supports 11 widely-used rules checked by SonarQube and SpotBugs.

  • SpongeBugs is completely automatic, precise, and scalable.

  • SpongeBugs generated hundreds of fixes accepted in 12 open-source projects.

Abstract

Static code analysis tools such as FindBugs and SonarQube are widely used on open-source and industrial projects to detect a variety of issues that may negatively affect the quality of software. Despite these tools’ popularity and high level of automation, several empirical studies report that developers normally fix only a small fraction (typically, less than 10% (Marcilio et al., 2019) of the reported issues—so-called “warnings”. If these analysis tools could also automatically provide suggestions on how to fix the issues that trigger some of the warnings, their feedback would become more actionable and more directly useful to developers.

In this work, we investigate whether it is feasible to automatically generate fix suggestions for common warnings issued by static code analysis tools, and to what extent developers are willing to accept such suggestions into the codebases they are maintaining. To this end, we implemented SpongeBugs, a Java program transformation technique that fixes 11 distinct rules checked by two well-known static code analysis tools (SonarQube and SpotBugs). Fix suggestions are generated automatically based on templates, which are instantiated in a way that removes the source of the warnings; templates for some rules are even capable of producing multi-line patches. Based on the suggestions provided by SpongeBugs, we submitted 38 pull requests, including 946 fixes generated automatically by our technique for various open-source Java projects, including Eclipse UI – a core component of the Eclipse IDE – and both SonarQube and SpotBugs. Project maintainers accepted 87% of our fix suggestions (97% of them without any modifications). We further evaluated the applicability of our technique on software written by students and on a curated collection of bugs. All results indicate that our approach to generating fix suggestions is feasible, flexible, and can help increase the applicability of static code analysis tools.

Introduction

Static code analysis tools (SATs) are becoming increasingly popular as a way of detecting possible sources of defects earlier in the development process (Habib and Pradel, 2018). By working statically on the source or byte code of a project, these tools are applicable to large code bases (Johnson et al., 2013b, Liu et al., 2019), where they quickly search for patterns that may indicate problems – bugs, questionable design choices, or failures to follow stylistic conventions (Barik et al., 2016, Tómasdóttir et al., 2017) – and report them to users. There is evidence (Beller et al., 2016) that using these tools can help developers monitor and improve software code quality; indeed, static code analysis tools are used for both commercial and open-source software development (Marcilio et al., 2019a, Habib and Pradel, 2018, Liu et al., 2019). Some projects’ development rules even require that code has to clear the checks of a certain SAT before it can be released (Marcilio et al., 2019a, Beller et al., 2016, Ayewah et al., 2008).

At the same time, some features of SATs limit their wider applicability in practice. One key problem is that SATs are necessarily imprecise in checking for rule violations; in other words, they report warnings that may or may not correspond to an actual mistake. As a result, the first time a static analysis tool is run on a project, it is likely to report thousands of warnings (Habib and Pradel, 2018, Johnson et al., 2013b), which saturates the developers’ capability of sifting through them to select those that are more relevant and should be fixed (Marcilio et al., 2019a). Another related issue with using SATs in practice is that understanding the problem highlighted by a warning and coming up with a suitable fix is often nontrivial (Marcilio et al., 2019a, Johnson et al., 2013b).

Our research aims at improving the practical usability of SATs by automatically providing fix suggestions: modifications to the source code that make it compliant with the rules checked by the analysis tools. We developed an approach, called SpongeBugs and described in Section 3, whose current implementation works on Java code. SpongeBugs detects violations of 11 different rules checked by SonarQube and SpotBugs (successor to FindBugs (Habib and Pradel, 2018))—two well-known static code analysis tools, routinely used by very many software companies and consortia, including large ones such as the Apache Software Foundation and the Eclipse Foundation. The rules checked by SpongeBugs are among the most widely used in these two tools, and cover different kinds of code issues (ranging from performance, to correct behavior, style, and other aspects). For each violation it detects, SpongeBugs automatically suggests and presents a fix to the user.

By construction, the fixes SpongeBugs suggests remove the origin of a rule’s violation, but the maintainers still have to decide – based on their overall knowledge of the project – whether to accept and merge each suggestion. To assess whether developers are indeed willing to accept SpongeBugs’s suggestions, Section 5 presents the result of an empirical evaluation where we applied it to 12 open-source Java projects, and submitted 946 fix suggestions as pull requests to the projects. Project maintainers accepted 825 (87%) fix suggestions—97% of them without any modifications. This high acceptance rate suggests that SpongeBugs often generates patches of high quality, which developers find adequate and useful.

The empirical evaluation also indicates that SpongeBugs is applicable with good performance to large code bases (1.2 min to process 1,000 lines of code on average). SpongeBugs is also accurate, as it generates false positives (spurious rule violations) in less than 0.6% of all reported violations. We actually found several cases where SpongeBugs correctly detected cases of rule violations that were missed by SonarQube (Section 5.1.1).

To further demonstrate SpongeBugs’s versatility, Section 5 also discusses how SpongeBugs complements program repair tools (e.g., avatar (Liu et al., 2019)) and how it performs on software whose main contributors are non-professionals (i.e., students). With few exceptions – which we discuss throughout Section 5 to inform further progress in this line of work – SpongeBugs worked as intended: it provides sound, easy to apply suggestions to fix static rule violations.

The work reported in this paper is part of a large body of research (see Section 2) that deals with helping developers detecting and fixing bugs and code smells. SpongeBugs’ approach is characterized by the following features: (i) it targets static rules that correspond to frequent mistakes that are often fixable syntactically; (ii) it builds fix suggestions that remove the source of warning by construction; (iii) it scales to large code bases because it is based on lightweight program transformation techniques. Despite the focus on conceptually simple rule violations, SpongeBugs can generate nontrivial patches, including some that modify multiple hunks of code at once. In summary, SpongeBugs’s focus privileges generating a large number of practically useful fixes over being as broadly applicable as possible. Based on our empirical evaluation, Section 6 discusses the main limitations of SpongeBugs’s approach, and Section 7 outlines directions for further progress in this line of work.

This journal article extends a previous conference publication (Marcilio et al., 2019b) by significantly expanding the empirical evaluation of SpongeBugs with: (1) an extended and updated evaluation on SpongeBugs’ applicability in Section 5.1, using a revised implementation with numerous bug fixes; (2) a detailed analysis of accuracy (false positives and false negatives, in Sections 5.1.2 False positives, 5.1.3 False negatives); (3) a smaller-scale evaluation involving student projects (Section 5.1.4); (4) an experimental assessment (Section 5.3.2) of how SpongeBugs’s three-stage process trade-offs a modicum of precision for markedly better performance; and (5) an experimental comparison with the Defects4J curated collection of real-world Java bugs (Section 5.4).

Section snippets

Background and related work

Static analysis techniques reason about program behavior statically, that is without running the program (Nielson et al., 1999). This is in contrast to dynamic analysis techniques, which are instead driven by specific program inputs (provided, for example, by unit tests). Thus, static analysis techniques are often more scalable (because they do not require complete executions) but also less precise (because they over-approximate program behavior to encompass all possible inputs) than dynamic

SpongeBugs : Approach and implementation

SpongeBugs provides fix suggestions for violations of selected rules that are checked by SonarQube and SpotBugs. Section 3.1 discusses how we selected the rules to check and suggest fixes for. SpongeBugs works by means of source-to-source transformations, implemented as we outline in Section 3.2. This approach has advantages but also limitations in its applicability, which we discuss in Section 3.3.

Empirical evaluation of SpongeBugs : Experimental design

The overall goal of this research is suggesting fixes to warnings generated by static code analysis tools. Section 4.1 presents the research questions we answer in this empirical study, which targets:

  • Fifteen open-source projects selected using the criteria we present in Section 4.2;

  • Five student projects developed as part of software engineering courses;

  • Defects4J: a curated collection of faulty Java programs, widely used to evaluate automated program repair (see Section 2) tools.

All data created

Empirical evaluation of SpongeBugs : Results and discussion

The results of our empirical evaluation of SpongeBugs answer the four research questions presented in Section 4.1. For uniformity, the experiments14 related to RQ1–3 target the 12 projects whose maintainers accepted the pull requests fixing static analysis warnings (top portion of Table 5).

Limitations and threats to validity

Some of SpongeBugs’s transformations may violate a project’s stylistic guidelines (Liu et al., 2018). Take, for example, project primefaces, which uses a rule22 about the order of variable declarations within a class requiring that private constants (

) be defined after public constants. SpongeBugs’s fixes for rule C1 (String literals should not be duplicated) may violate this

Conclusions

In this work we introduced a new approach and a tool (SpongeBugs) that finds and repairs violations of rules checked by static code analysis tools such as SonarQube, FindBugs, and SpotBugs. We designed SpongeBugs to deal with rule violations that are frequently fixed in both private and open-source projects. We assessed SpongeBugs by running it on 12 popular open source projects, and submitted a large portion (total of 946) of the fixes it generated as pull requests in the projects. Overall,

Declaration of Competing Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Acknowledgments

We thank the maintainers for reviewing our patches; and the reviewers of SCAM and JSS for their helpful comments. This work was partially supported by CNPq, Brazil (#406308/2016-0 and #309032/2019-9); and by the Swiss National Science Foundation (SNSF) grant Hi-Fi (#200021_182060).

References (48)

  • LiuK. et al.

    Mining fix patterns for FindBugs violations

    IEEE Trans. Softw. Eng.

    (2018)
  • YuY. et al.

    Reviewer recommendation for pull-requests in GitHub: What can we learn from code review and bug assignment?

    Inf. Softw. Technol.

    (2016)
  • AftandilianE. et al.

    Building useful program analysis tools using an extensible Java compiler

  • AhoA.V. et al.

    Compilers: Principles, Techniques, and Tools

    (2006)
  • AyewahN. et al.

    Using static analysis to find bugs

    IEEE Softw.

    (2008)
  • BaderJ. et al.

    Getafix: Learning to fix bugs automatically

    Proc. ACM Program. Lang.

    (2019)
  • BarikT. et al.

    From quick fixes to slow fixes: Reimagining static analysis resolutions to enable design space exploration

  • BavishiR. et al.

    Phoenix: Automated data-driven synthesis of repairs for static analysis violations

  • BellerM. et al.

    Analyzing the state of static analysis: A large-scale evaluation in open source software

  • Brito, A., Xavier, L., Hora, A., Valente, M.T., 2018. Why and how Java developers break APIs. In: 25th International...
  • Carvalho, A., Luz, W., Marcílio, D., Bonifacio, R., Pinto, G., Canedo, E.D., 2020. C-3PR: A bot for fixing static...
  • CheremS. et al.

    Practical memory leak detection using guarded value-flow analysis

  • DantasR. et al.

    Reconciling the past and the present: An empirical study on the application of source code transformations to automatically rejuvenate Java programs

  • DigkasG. et al.

    How do developers fix issues and pay back technical debt in the apache ecosystem?

  • GazzolaL. et al.

    Automatic software repair: A survey

    IEEE Trans. Softw. Eng.

    (2019)
  • GeorgesA. et al.

    Statistically rigorous Java performance evaluation

  • HabibA. et al.

    How many of all bugs do we find? a study of static bug detectors

  • Johnson, B., Song, Y., Murphy-Hill, E.R., Bowdidge, R.W., 2013a. Why don’t software developers use static analysis...
  • JohnsonB. et al.

    Why don’t software developers use static analysis tools to find bugs?

  • JustR. et al.

    Defects4j: A database of existing faults to enable controlled testing studies for java programs

  • KalliamvakouE. et al.

    An in-depth study of the promises and perils of mining GitHub

    Empir. Softw. Eng.

    (2015)
  • KhedkerU. et al.

    Data Flow Analysis: Theory and Practice

    (2009)
  • KimJ. et al.

    Improving refactoring speed by 10x

  • KimD. et al.

    Automatic patch generation learned from human-written patches

  • Cited by (40)

    • GT-SimNet: Improving code automatic summarization via multi-modal similarity networks

      2022, Journal of Systems and Software
      Citation Excerpt :

      In actuality, the information extracted by these techniques represents the surface information of the code; thus, these techniques lack deep abstraction and exploration of the code structure and have certain limitations. Deep neural networks have shown state-of-the-art performance in feature extraction and abstraction (Marcilio et al., 2020). Code summarization has also found new directions (Fang et al., 2019).

    • An Empirical Evaluation on the Effect of Refactoring Code Smells Mobile Applications Android with ASATs on Resource Usage

      2024, International Journal on Advanced Science, Engineering and Information Technology
    View all citing articles on Scopus
    View full text