research-article

Automated patch correctness assessment: how far are we?

Authors:

Bo Lin,

Hai JinAuthors Info & Claims

ASE '20: Proceedings of the 35th IEEE/ACM International Conference on Automated Software Engineering

Pages 968 - 980

https://doi.org/10.1145/3324884.3416590

Published: 27 January 2021 Publication History

Get Access

Abstract

Test-based automated program repair (APR) has attracted huge attention from both industry and academia. Despite the significant progress made in recent studies, the overfitting problem (i.e., the generated patch is plausible but overfitting) is still a major and long-standing challenge. Therefore, plenty of techniques have been proposed to assess the correctness of patches either in the patch generation phase or in the evaluation of APR techniques. However, the effectiveness of existing techniques has not been systematically compared and little is known to their advantages and disadvantages. To fill this gap, we performed a large-scale empirical study in this paper. Specifically, we systematically investigated the effectiveness of existing automated patch correctness assessment techniques, including both static and dynamic ones, based on 902 patches automatically generated by 21 APR tools from 4 different categories. Our empirical study revealed the following major findings: (1) static code features with respect to patch syntax and semantics are generally effective in differentiating overfitting patches over correct ones; (2) dynamic techniques can generally achieve high precision while heuristics based on static code features are more effective towards recall; (3) existing techniques are more effective towards certain projects and types of APR techniques while less effective to the others; (4) existing techniques are highly complementary to each other. For instance, a single technique can only detect at most 53.5% of the overfitting patches while 93.3% of them can be detected by at least one technique when the oracle information is available. Based on our findings, we designed an integration strategy to first integrate static code features via learning, and then combine with others by the majority voting strategy. Our experiments show that the strategy can enhance the performance of existing patch correctness assessment techniques significantly.

References

[1]

2020. AgitarOne Homepage. http://www.agitar.com/index.html.

Abstract

References

Cited By

Index Terms

Recommendations

Evaluating representation learning of code changes for predicting patch correctness in program repair

Predicting Patch Correctness Based on the Similarity of Failing Test Cases

Context-Aware Code Change Embedding for Better Patch Correctness Assessment

Comments

Information

Published In

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Funding Sources

Conference

Acceptance Rates

Contributors

Other Metrics

Bibliometrics

Article Metrics

Other Metrics

Citations

Cited By

Login options

Full Access

View options

PDF

eReader

Share

Share this Publication link

Share on social media

Affiliations