Abstract
Evaluation is the foundation of automated program repair (APR), as it provides empirical evidence on strengths and weaknesses of APR techniques. However, the reliability of such evaluation is often threatened by various introduced biases. Consequently, bias exploration, which uncovers biases in the APR evaluation, has become a pivotal activity and performed since the early years when pioneer APR techniques were proposed. Unfortunately, there is still no methodology to support a systematic comprehension and discovery of evaluation biases in APR, which impedes the mitigation of such biases and threatens the evaluation of APR techniques.
In this work, we propose to systematically understand existing evaluation biases by rigorously conducting the first systematic literature review on existing known biases and systematically uncover new biases by building a taxonomy that categorizes evaluation biases. As a result, we identify 17 investigated biases and uncover a new bias in the usage of patch validation strategies. To validate this new bias, we devise and implement an executable framework APRConfig, based on which we evaluate three typical patch validation strategies with four representative heuristic-based and constraint-based APR techniques on three bug datasets. Overall, this article distills 13 findings for bias understanding, discovery, and validation. The systematic exploration we performed and the open source executable framework we proposed in this article provide new insights as well as an infrastructure for future exploration and mitigation of biases in APR evaluation.
- [1] . 2006. An evaluation of similarity coefficients for software fault localization. In Proceedings of the 12th Pacific Rim International Symposium on Dependable Computing (PRDC’06). IEEE, 39–46.Google ScholarDigital Library
- [2] . 2007. On the accuracy of spectrum-based fault localization. In Proceedings of Testing: Academic and Industrial Conference Practice and Research Techniques–MUTATION (TAICPART-MUTATION’07). IEEE, 89–98.Google ScholarDigital Library
- [3] . 2011. A practical guide for using statistical tests to assess randomized algorithms in software engineering. In Proceedings of the 33rd International Conference on Software Engineering (ICSE’11). IEEE, 1–10.Google ScholarDigital Library
- [4] . 2019. Getafix: Learning to fix bugs automatically. Proc. ACM Program. Lang. 3 (2019), 159:1–159:27.
DOI: Google ScholarDigital Library - [5] . 2014. The plastic surgery hypothesis. In Proceedings of the 22nd ACM SIGSOFT International Symposium on Foundations of Software Engineering. 306–317.Google ScholarDigital Library
- [6] . 2019. Phoenix: Automated data-driven synthesis of repairs for static analysis violations. In Proceedings of the 27th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering. 613–624.Google ScholarDigital Library
- [7] . 2019. Defexts: A curated dataset of reproducible real-world bugs for modern jvm languages. In Proceedings of the IEEE/ACM 41st International Conference on Software Engineering: Companion Proceedings (ICSE-Companion’19). IEEE, 47–50.Google ScholarDigital Library
- [8] . 2021. Refining fitness functions for search-based program repair. In Proceedings of the IEEE/ACM International Workshop on Automated Program Repair (APR’21).Google Scholar
- [9] . 2009. Fair and balanced? Bias in bug-fix datasets. In Proceedings of the 7th joint Meeting of the European Software Engineering Conference and the ACM SIGSOFT Symposium on the Foundations of Software Engineering. 121–130.Google ScholarDigital Library
- [10] . 2014. Corebench: Studying complexity of regression errors. In Proceedings of the International Symposium on Software Testing and Analysis. 105–115.Google ScholarDigital Library
- [11] . 2017. Contract-based program repair without the contracts. In Proceedings of the 32nd IEEE/ACM International Conference on Automated Software Engineering (ASE’17). IEEE, 637–647.Google ScholarDigital Library
- [12] . 2021. Contract-based program repair without the contracts: An extended study. IEEE Trans. Softw. Eng. 47, 12 (2021), 2841–2857.
DOI: Google ScholarCross Ref - [13] . 2016. Strategic Thinking in Complex Problem Solving. Oxford University Press.Google ScholarCross Ref
- [14] . 2019. Flakime: Laboratory-controlled test flakiness impact assessment. a case study on mutation testing and program repair. CoRR, abs/1912.03197 (2019). http://arxiv.org/abs/1912.03197.Google Scholar
- [15] . 2015. Npefix: Automatic runtime repair of null pointer exceptions in java. CoRR, abs/1512.07423 (2015). http://arxiv.org/abs/1512.07423.Google Scholar
- [16] . 2008. Z3: An efficient SMT solver. In International Conference on Tools and Algorithms for the Construction and Analysis of Systems. Springer, 337–340.Google ScholarDigital Library
- [17] . 2017. Test case prioritization: A systematic review and mapping of the literature. In Proceedings of the 31st Brazilian Symposium on Software Engineering. 34–43.Google ScholarDigital Library
- [18] . 2017. Dynamic patch generation for null pointer exceptions using metaprogramming. In Proceedings of the IEEE 24th International Conference on Software Analysis, Evolution and Reengineering (SANER’17). IEEE, 349–358.Google ScholarCross Ref
- [19] . 2017. The patches of the nopol automatic repair system on the bugs of defects4j version 1.1. 0. Research Report. hal-01480084. Université Lille 1 - Sciences et Technologies.Google Scholar
- [20] . 2019. Empirical review of Java program repair tools: A large-scale experiment on 2,141 bugs and 23,551 repair attempts. In Proceedings of the 27th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering. 302–313.Google ScholarDigital Library
- [21] . 2016. Dynamoth: Dynamic code synthesis for automatic program repair. In Proceedings of the 11th International Workshop on Automation of Software Test. 85–91.Google ScholarDigital Library
- [22] . 2016. IntroClassJava: A benchmark of 297 small and buggy Java programs. Research Report. hal-01272126. Universite Lille 1. https://hal.archives-ouvertes.fr/hal-01272126/file/main.pdf.Google Scholar
- [23] . 2014. Cognitive biases in information systems research: A scientometric analysis. In 22st European Conference on Information Systems, ECIS 2014, Tel Aviv, Israel, June 9-11, 2014, Michel Avital, Jan Marco Leimeister, and Ulrike Schultze (Eds.). http://aisel.aisnet.org/ecis2014/proceedings/track02/5.Google Scholar
- [24] . 2008. A taxonomy of software types to facilitate search and evidence-based software engineering. In Proceedings of the Conference of the Center for Advanced Studies on Collaborative Research: Meeting of Minds. 179–191.Google ScholarDigital Library
- [25] . 2017. Revisiting bias in qualitative research: Reflections on its relationship with funding and impact. International Journal of Qualitative Methods 16, 1 (2017), 1–2.Google Scholar
- [26] . 2017. Automatic software repair: A survey. IEEE Trans. Softw. Eng. 45, 1 (2017), 34–67.Google ScholarDigital Library
- [27] . 2019. Practical program repair via bytecode mutation. In Proceedings of the 28th ACM SIGSOFT International Symposium on Software Testing and Analysis. 19–30.Google ScholarDigital Library
- [28] . 2019. Automated program repair. Commun. ACM 62, 12 (2019), 56–65.Google ScholarDigital Library
- [29] . 2019. A comprehensive study of automatic program repair on the QuixBugs benchmark. In Proceedings of the IEEE 1st International Workshop on Intelligent Bug Fixing (IBF’19).Google Scholar
- [30] . 2021. A comprehensive study of automatic program repair on the QuixBugs benchmark. J. Syst. Softw. 171 (2021), 110825.Google ScholarCross Ref
- [31] . 2018. Towards practical program repair with on-demand candidate generation. In Proceedings of the 40th International Conference on Software Engineering. 12–23.Google ScholarDigital Library
- [32] . 2020. Taxonomy of real faults in deep learning systems. In Proceedings of the ACM/IEEE 42nd International Conference on Software Engineering. 1110–1121.Google ScholarDigital Library
- [33] . 2019. Inferring program transformations from singular examples via big code. In Proceedings of the 34th IEEE/ACM International Conference on Automated Software Engineering (ASE’19). IEEE, 255–266.Google ScholarDigital Library
- [34] . 2018. Shaping program repair space with existing patches and similar code. In Proceedings of the 27th ACM SIGSOFT International Symposium on Software Testing and Analysis. 298–309.Google ScholarDigital Library
- [35] . 2021. CURE: Code-aware neural machine translation for automatic program repair. In Proceedings of the IEEE/ACM 43rd International Conference on Software Engineering (ICSE’21). IEEE, 1161–1173.Google ScholarDigital Library
- [36] . 2014. Defects4J: A database of existing faults to enable controlled testing studies for Java programs. In Proceedings of the International Symposium on Software Testing and Analysis. 437–440.Google ScholarDigital Library
- [37] . 2022. Evaluating automatic program repair capabilities to repair API misuses. IEEE Trans. Softw. Eng. 48, 7 (2022), 2658–2679.
DOI: Google ScholarCross Ref - [38] . 2007. Guidelines for Performing Systematic Literature Reviews in Software Engineering.
Technical Report . Citeseer.Google Scholar - [39] . 2013. Automatic patch generation learned from human-written patches. In Proceedings of the 35th International Conference on Software Engineering (ICSE’13). IEEE, 802–811.Google ScholarCross Ref
- [40] . 2019. Automatic patch generation with context-based change application. Emp. Softw. Eng. 24, 6 (2019), 4071–4106.Google ScholarCross Ref
- [41] . 2009. Systematic literature reviews in software engineering–a systematic literature review. Inf. Softw. Technol. 51, 1 (2009), 7–15.Google ScholarDigital Library
- [42] . 2010. Systematic literature reviews in software engineering—A tertiary study. Inf. Softw. Technol. 52, 8 (2010), 792–805.Google ScholarDigital Library
- [43] . 2014. Potential biases in bug localization: Do they matter? In Proceedings of the 29th ACM/IEEE International Conference on Automated Software Engineering. 803–814.Google ScholarDigital Library
- [44] . 2018. Automated testing of android apps: A systematic literature review. IEEE Trans. Reliabil. 68, 1 (2018), 45–66.Google ScholarCross Ref
- [45] . 2015. Experience report: How do techniques, programs, and tests impact automated program repair? In Proceedings of the IEEE 26th International Symposium on Software Reliability Engineering (ISSRE’15). IEEE, 194–204.Google ScholarDigital Library
- [46] . 2020. Fixminer: Mining relevant fix patterns for automated program repair. Emp. Softw. Eng. 25, 3 (2020), 1980–2024.
DOI: Google ScholarDigital Library - [47] . 1999. The role of classification in knowledge representation and discovery. Libr. Trends 48, 1 (1999). http://alexia.lis.uiuc.edu/puboff/catalog/trends/48_1abs.html#kwasnik.Google Scholar
- [48] . 2012. How do you hire great engineers? Give them a challenge. https://gigaom.com/2012/01/19/quixey-challenge/.Google Scholar
- [49] . 2019. On reliability of patch correctness assessment. In Proceedings of the IEEE/ACM 41st International Conference on Software Engineering (ICSE’19). IEEE, 524–535.Google ScholarDigital Library
- [50] . 2016. History driven program repair. In Proceedings of the IEEE 23rd International Conference on Software Analysis, Evolution, and Reengineering (SANER’16), Vol. 1. IEEE, 213–224.Google ScholarCross Ref
- [51] . 2012. A systematic study of automated program repair: Fixing 55 out of 105 bugs for $8 each. In Proceedings of the 34th International Conference on Software Engineering (ICSE’12). IEEE, 3–13.Google ScholarCross Ref
- [52] . 2015. The ManyBugs and IntroClass benchmarks for automated repair of C programs. IEEE Trans. Softw. Eng. 41, 12 (2015), 1236–1256.Google ScholarDigital Library
- [53] . 2011. Genprog: A generic method for automatic software repair. IEEE Trans. Softw. Eng. 38, 1 (2011), 54–72.Google ScholarDigital Library
- [54] . 2017. Static analysis of android apps: A systematic literature review. Inf. Softw. Technol. 88 (2017), 67–95.Google ScholarDigital Library
- [55] . 2020. DLFix: Context-based code transformation learning for automated program repair. In Proceedings of the ACM/IEEE 42th International Conference on Software Engineering. IEEE, 602–614.Google ScholarDigital Library
- [56] . 2022. Context-aware code change embedding for better patch correctness assessment. ACM Trans. Softw. Eng. Methodol. 31, 3 (2022), 1–29.Google ScholarDigital Library
- [57] . 2020. Understanding the non-repairability factors of automated program repair techniques. In Proceedings of the 27th Asia-Pacific Software Engineering Conference.Google ScholarCross Ref
- [58] . 2017. QuixBugs: A multi-lingual program repair benchmark set based on the quixey challenge. In Proceedings Companion of the ACM SIGPLAN International Conference on Systems, Programming, Languages, and Applications: Software for Humanity. 55–56.Google ScholarDigital Library
- [59] . 2019. You cannot fix what you cannot find! An investigation of fault localization bias in benchmarking automated program repair systems. In Proceedings of the 12th IEEE Conference on Software Testing, Validation and Verification (ICST’19). IEEE, 102–113.Google ScholarCross Ref
- [60] . 2019. Avatar: Fixing semantic bugs with fix patterns of static analysis violations. In Proceedings of the IEEE 26th International Conference on Software Analysis, Evolution and Reengineering (SANER’19). IEEE, 1–12.Google ScholarCross Ref
- [61] . 2019. TBar: Revisiting template-based automated program repair. In Proceedings of the 28th ACM SIGSOFT International Symposium on Software Testing and Analysis. 31–42.Google ScholarDigital Library
- [62] . 2018. LSRepair: Live search of fix ingredients for automated program repair. In Proceedings of the 25th Asia-Pacific Software Engineering Conference (APSEC’18). IEEE, 658–662.Google ScholarCross Ref
- [63] . 2021. A critical review on the evaluation of automated program repair systems. J. Syst. Softw. 171 (2021), 110817.Google ScholarCross Ref
- [64] . 2020. On the efficiency of test suite based program repair: A systematic assessment of 16 automated repair systems for Java programs. In Proceedings of the 42nd ACM/IEEE International Conference on Software Engineering (ICSE’20).Google ScholarDigital Library
- [65] . 2018. Mining stackoverflow for program repair. In Proceedings of the IEEE 25th International Conference on Software Analysis, Evolution and Reengineering (SANER’18). IEEE, 118–129.Google ScholarCross Ref
- [66] . 2019. DroidLeaks: A comprehensive database of resource leaks in Android apps. Emp. Softw. Eng. 24, 6 (2019), 3435–3483.Google ScholarCross Ref
- [67] . 2021. Machine learning model development from a software engineering perspective: A systematic literature review. CoRR, abs/2102.07574 (2021). https://arxiv.org/abs/2102.07574.Google Scholar
- [68] . 2021. How does regression test selection affect program repair? An extensive study on 2 million patches. CoRR, abs/2105.07311 (2021). https://arxiv.org/abs/2105.07311.Google Scholar
- [69] . 2020. CoCoNuT: Combining context-aware neural translation models using ensemble for program repair. In Proceedings of the 29th ACM SIGSOFT International Symposium on Software Testing and Analysis. 101–114.Google ScholarDigital Library
- [70] . 2019. Bears: An extensible Java bug benchmark for automatic program repair studies. In Proceedings of the IEEE 26th International Conference on Software Analysis, Evolution and Reengineering (SANER’19). IEEE, 468–478.Google ScholarCross Ref
- [71] . 2019. Code4Bench: A multidimensional benchmark of Codeforces data for different program analysis techniques. J. Comput. Lang. 53 (2019), 38–52.Google ScholarCross Ref
- [72] . 2019. Sapfix: Automated end-to-end repair at scale. In Proceedings of the IEEE/ACM 41st International Conference on Software Engineering: Software Engineering in Practice (ICSE-SEIP’19). IEEE, 269–278.Google ScholarDigital Library
- [73] . 2015. Mining software repair models for reasoning on the search space of automated program fixing. Emp. Softw. Eng. 20, 1 (2015), 176–205.Google ScholarDigital Library
- [74] . 2016. Astor: A program repair library for java. In Proceedings of the 25th International Symposium on Software Testing and Analysis. 441–444.Google ScholarDigital Library
- [75] . 2018. Ultra-large repair search space with automatically mined templates: The cardumen mode of astor. In International Symposium on Search Based Software Engineering. Springer, 65–86.Google Scholar
- [76] . 2009. The Pyramid Principle: Logic in Writing and Thinking. Pearson Education.Google Scholar
- [77] . 2018. Cognitive biases in software engineering: A systematic mapping study. IEEE Trans. Softw. Eng. 46, 12 (2018), 1318–1339.Google ScholarCross Ref
- [78] . 2014. A critical review of “automatic patch generation learned from human-written patches”: Essay on the problem statement and the evaluation of automatic software repair. In Proceedings of the 36th International Conference on Software Engineering. 234–242.Google ScholarDigital Library
- [79] . 2018. Automatic software repair: A bibliography. ACM Comput. Surv. 51, 1 (2018), 1–24.Google ScholarDigital Library
- [80] . 2020. The living review on automated program repair. Technical Report. hal-01956501. HAL Archives Ouvertes. https://hal.archives-ouvertes.fr/hal-01956501v4/file/repair-living-review.pdf.Google Scholar
- [81] . 2019. Repairnator patches programs automatically. Ubiquity 2019(July2019), 1–12. Google ScholarDigital Library
- [82] . 2018. Do automated program repair techniques repair hard and important bugs? Emp. Softw. Eng. 23, 5 (2018), 2901–2947.Google ScholarDigital Library
- [83] . 1987. IEEE Standard Taxonomy for Software Engineering Standards.Google Scholar
- [84] . 2017. Evaluating and improving fault localization. In Proceedings of the IEEE/ACM 39th International Conference on Software Engineering (ICSE’17). IEEE, 609–620.Google ScholarDigital Library
- [85] . 2018. How to measure the performance of automated program repair. In Proceedings of the 5th International Conference on Information Science and Control Engineering (ICISCE’18). IEEE, 246–250.Google ScholarCross Ref
- [86] . 2014. The strength of random search on automated program repair. In Proceedings of the 36th International Conference on Software Engineering. 254–265.Google ScholarDigital Library
- [87] . 2013. Using automated program repair for evaluating the effectiveness of fault localization techniques. In Proceedings of the International Symposium on Software Testing and Analysis. 191–201.Google ScholarDigital Library
- [88] . 2015. An analysis of patch plausibility and correctness for generate-and-validate patch generation systems. In Proceedings of the International Symposium on Software Testing and Analysis. 24–36.Google ScholarDigital Library
- [89] . 2021. On the impact of flaky tests in automated program repair. In Proceedings of the IEEE International Conference on Software Analysis, Evolution and Reengineering (SANER’21). IEEE, 295–306.Google ScholarCross Ref
- [90] . 2019. Automated program repair: A step towards software automation. Sci. Chin. Inf. Sci. 62, 10 (2019), 200103.Google ScholarCross Ref
- [91] . 2018. Bugs.jar: A large-scale, diverse dataset of real-world java bugs. In Proceedings of the 15th International Conference on Mining Software Repositories. 10–13.Google ScholarDigital Library
- [92] . 2018. Elixir: An automated repair tool for Java programs. In Proceedings of the 40th International Conference on Software Engineering: Companion Proceeedings. 77–80.Google ScholarDigital Library
- [93] . 2019. Harnessing evolution for multi-hunk program repair. In Proceedings of the IEEE/ACM 41st International Conference on Software Engineering (ICSE’19). IEEE, 13–24.Google ScholarDigital Library
- [94] . 2019. Automated selection and quality assessment of primary studies: A systematic literature review. J. Data Inf. Qual. 12, 1 (2019), 1–26.Google Scholar
- [95] . 2021. FLACOCO: Fault localization for Java based on industry-grade coverage. CoRR, abs/2111.12513 (2021). https://arxiv.org/abs/2111.12513.Google Scholar
- [96] . 2014. An empirically based terminology and taxonomy for global software engineering. Emp. Softw. Eng. 19, 1 (2014), 105–153.Google ScholarDigital Library
- [97] . 2015. Is the cure worse than the disease? overfitting in automated program repair. In Proceedings of the 10th Joint Meeting on Foundations of Software Engineering. 532–543.Google ScholarDigital Library
- [98] . 2014. Bias in research. Evid.-bas. Nurs. 17, 4 (2014), 100–101.Google ScholarCross Ref
- [99] . 1995. Cognitive bias in software engineering. Commun. ACM 38, 6 (1995), 57–63.Google ScholarDigital Library
- [100] . 2017. Codeflaws: A programming competition benchmark for evaluating automated program repair tools. In Proceedings of the IEEE/ACM 39th International Conference on Software Engineering Companion (ICSE-C’17). IEEE, 180–182.Google Scholar
- [101] . 2014. Automatically generated patches as debugging aids: A human study. In Proceedings of the 22nd ACM SIGSOFT International Symposium on Foundations of Software Engineering. 64–74.Google ScholarDigital Library
- [102] . 2014. A taxonomy for requirements engineering and software test alignment. ACM Trans. Softw. Eng. Methodol. 23, 2 (2014), 1–38.Google ScholarDigital Library
- [103] . 2017. Taxonomies in software engineering: A systematic mapping study and a revised taxonomy development method. Inf. Softw. Technol. 85 (2017), 43–59.Google ScholarDigital Library
- [104] . 2000. A critique and improvement of the CL common language effect size statistics of McGraw and Wong. J. Educ. Behav. Stat. 25, 2 (2000), 101–132.Google Scholar
- [105] . 2020. Automated patch correctness assessment: How far are we? In Proceedings of the 35th IEEE/ACM International Conference on Automated Software Engineering. 968–980.Google ScholarDigital Library
- [106] . 2019. Attention please: Consider Mockito when evaluating newly proposed automated program repair techniques. In Proceedings of the Evaluation and Assessment on Software Engineering. 260–266.Google ScholarDigital Library
- [107] . 2009. Automatically finding patches using genetic programming. In Proceedings of the IEEE 31st International Conference on Software Engineering. IEEE, 364–374.Google ScholarDigital Library
- [108] . 2018. Context-aware patch generation for better automated program repair. In Proceedings of the IEEE/ACM 40th International Conference on Software Engineering (ICSE’18). IEEE, 1–11.Google ScholarDigital Library
- [109] . 1968. Development of a taxonomy of human performance: A review of classificatory systems relating to tasks and performance. https://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.973.125&rep=rep1&type=pdf.Google Scholar
- [110] . 1992. Individual comparisons by ranking methods. In Breakthroughs in Statistics. Springer, 196–202.Google ScholarCross Ref
- [111] . 2012. Experimentation in Software Engineering. Springer Science & Business Media.Google ScholarCross Ref
- [112] . 2016. A survey on software fault localization. IEEE Trans. Softw. Eng. 42, 8 (2016), 707–740.Google ScholarDigital Library
- [113] . 2017. Leveraging syntax-related code for automated program repair. In Proceedings of the 32nd IEEE/ACM International Conference on Automated Software Engineering (ASE’17). IEEE, 660–670.Google ScholarDigital Library
- [114] . 2017. Precise condition synthesis for program repair. In Proceedings of the IEEE/ACM 39th International Conference on Software Engineering (ICSE’17). IEEE, 416–426.Google ScholarDigital Library
- [115] . 2022. Restore: Retrospective fault localization enhancing automated program repair. IEEE Trans. Softw. Eng. 48, 2 (2022), 309–326.Google ScholarDigital Library
- [116] . 2019. VFix: Value-flow-guided precise program repair for null pointer dereferences. In Proceedings of the IEEE/ACM 41st International Conference on Software Engineering (ICSE’19). IEEE, 512–523.Google ScholarDigital Library
- [117] . 2016. Nopol: Automatic repair of conditional statement bugs in java programs. IEEE Trans. Softw. Eng. 43, 1 (2016), 34–55.Google ScholarDigital Library
- [118] . 2022. Just-in-time defect identification and localization: A two-phase framework. IEEE Trans. Softw. Eng. 48, 1 (2022), 82–101.
DOI: Google ScholarDigital Library - [119] . 2022. Artifact Page of Our Study. Retrieved from https://github.com/DehengYang/APRConfig, 2021.Google Scholar
- [120] . 2022. An Extended Description of the 17 Known Biases. Retrieved from https://github.com/DehengYang/APRConfig/blob/master/doc/RQ1.3_bias_mitigation/detailed_explanation_of_the_17_known_biases.md.Google Scholar
- [121] . 2022. The Guideline on How to Extend APRConfig. Retrieved from https://github.com/DehengYang/APRConfig/blob/master/How_to_extend.md.Google Scholar
- [122] . 2022. The Results of Our Investigation on Known Bias Mitigation. Retrieved from https://github.com/DehengYang/APRConfig/blob/master/doc/RQ1.3_bias_mitigation/results_of_investigation_on_known_bias_mitigation.md.Google Scholar
- [123] . 2022. The Results of Quality Assessment. Retrieved from https://github.com/DehengYang/APRConfig/blob/master/doc/SLR_results/results_of_quality_assessment.md.Google Scholar
- [124] . 2021. Is the ground truth really accurate? Dataset purification for automated program repair. In Proceedings of the IEEE 28th International Conference on Software Analysis, Evolution and Reengineering (SANER’21). IEEE.Google ScholarCross Ref
- [125] . 2021. Automated patch assessment for program repair at scale. Emp. Softw. Eng. 26, 2 (2021), 1–38.Google ScholarDigital Library
- [126] . 2021. Neural program repair with execution-based backpropagation. CoRR, abs/2105.04123 (2021). https://arxiv.org/abs/2105.04123.Google Scholar
- [127] . 2022. Neural program repair with execution-based backpropagation. In Proceedings of the IEEE/ACM 44th International Conference on Software Engineering (ICSE’22). IEEE, 1506–1518.Google ScholarDigital Library
- [128] . 2020. ARJA: Automated repair of java programs via multi-objective genetic programming. IEEE Trans. Softw. Eng. 46, 10 (2020), 1040–1067.Google ScholarCross Ref
- [129] . 2020. Toward better evolutionary program repair: An integrated approach. ACM Trans. Softw. Eng. Methodol. 29, 1 (2020), 1–53.Google ScholarDigital Library
- [130] . 2021. “Ignorance and prejudice” in software fairness. In Proceedings of the IEEE/ACM 43rd International Conference on Software Engineering (ICSE’21). IEEE, 1436–1447.Google ScholarDigital Library
Index Terms
- Seeing the Whole Elephant: Systematically Understanding and Uncovering Evaluation Biases in Automated Program Repair
Recommendations
Contract-based program repair without the contracts
ASE '17: Proceedings of the 32nd IEEE/ACM International Conference on Automated Software EngineeringAutomated program repair (APR) is a promising approach to automatically fixing software bugs. Most APR techniques use tests to drive the repair process; this makes them readily applicable to realistic code bases, but also brings the risk of generating ...
Static automated program repair for heap properties
ICSE '18: Proceedings of the 40th International Conference on Software EngineeringStatic analysis tools have demonstrated effectiveness at finding bugs in real world code. Such tools are increasingly widely adopted to improve software quality in practice. Automated Program Repair (APR) has the potential to further cut down on the cost ...
Impact of Code Language Models on Automated Program Repair
ICSE '23: Proceedings of the 45th International Conference on Software EngineeringAutomated program repair (APR) aims to help developers improve software reliability by generating patches for buggy programs. Although many code language models (CLM) are developed and effective in many software tasks such as code completion, there ...
Comments