research-article

Poracle: Testing Patches under Preservation Conditions to Combat the Overfitting Problem of Program Repair

Authors:

Elkhan Ismayilzada,

Md Mazba Ur Rahman,

Jooyong YiAuthors Info & Claims

ACM Transactions on Software Engineering and Methodology, Volume 33, Issue 2

Article No.: 44, Pages 1 - 39

https://doi.org/10.1145/3625293

Published: 21 December 2023 Publication History

Abstract

To date, the users of test-driven program repair tools suffer from the overfitting problem; a generated patch may pass all available tests without being correct. In the existing work, users are treated as merely passive consumers of the tests. However, what if they are willing to modify the test to better assess the patches obtained from a repair tool? In this work, we propose a novel semi-automatic patch-classification methodology named Poracle. Our key contributions are three-fold. First, we design a novel lightweight specification method that reuses the existing test. Specifically, the users extend the existing failing test with a preservation condition—the condition under which the patched and pre-patched versions should produce the same output. Second, we develop a fuzzer that performs differential fuzzing with a test containing a preservation condition. Once we find an input that satisfies a specified preservation condition but produces different outputs between the patched and pre-patched versions, we classify the patch as incorrect with high confidence. We show that our approach is more effective than the four state-of-the-art patch classification approaches. Last, we show through a user study that the users find our semi-automatic patch assessment method more effective and preferable than the manual assessment.

References

[1]

Michael Buckland and Fredric Gey. 1994. The relationship between recall and precision. J. Amer. Soc. Inf. Sci. 45, 1 (1994), 12–19.

Digital Library

[2]

Antonio Carzaniga, Alessandra Gorla, Andrea Mattavelli, Nicolo Perino, and Mauro Pezze. 2013. Automatic recovery from runtime failures. In 35th International Conference on Software Engineering (ICSE’13). IEEE, 782–791.

[3]

Antonio Carzaniga, Alessandra Gorla, Nicolò Perino, and Mauro Pezzè. 2010. Automatic workarounds for web applications. In 18th ACM SIGSOFT International Symposium on Foundations of Software Engineering. 237–246.

[4]

Liushan Chen, Yu Pei, and Carlo Alberto Furia. 2020. Contract-based program repair without the contracts: An extended study. IEEE Trans. Softw. Eng. 47, 12 (2020).

[5]

Tianqi Chen and Carlos Guestrin. 2016. XGBoost: A scalable tree boosting system. In ACM International Conference on Knowledge Discovery and Data Mining (SIGKDD’16). 785–794.

Digital Library

[6]

Tsong Yueh Chen, D. H. Huang, T. H. Tse, and Zhi Quan Zhou. 2004. Case studies on the selection of useful relations in metamorphic testing. In 4th Ibero-American Symposium on Software Engineering and Knowledge Engineering (JIISIC’04). Citeseer, 569–583.

[7]

Koen Claessen and John Hughes. 2000. QuickCheck: A lightweight tool for random testing of Haskell programs. In International Conference on Functional Programming (ICFP’00). 268–279.

Digital Library

[8]

Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2019. BERT: Pre-training of deep bidirectional transformers for language understanding. In Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 4171–4186.

[9]

The Apache Software Foundation. 2023. The API document of the GCD method. Retrieved from https://commons.apache.org/proper/commons-math/javadocs/api-3.6.1/org/apache/commons/math3/util/ArithmeticUtils.html#gcd(int,%20int)

[10]

The Apache Software Foundation. 2023. The API document of the inverseCumulativeProbability method. Retrieved from https://commons.apache.org/proper/commons-math/javadocs/api-3.6.1/org/apache/commons/math3/distribution/NormalDistribution.html#inverseCumulativeProbability(double)

[11]

Xiang Gao, Sergey Mechtaev, and Abhik Roychoudhury. 2019. Crash-avoiding program repair. In 28th ACM SIGSOFT International Symposium on Software Testing and Analysis. 8–18.

[12]

Ali Ghanbari, Samuel Benton, and Lingming Zhang. 2019. Practical program repair via bytecode mutation. In 28th ACM SIGSOFT International Symposium on Software Testing and Analysis. 19–30.

[13]

Chris Hawblitzel, Ming Kawaguchi, Shuvendu K. Lahiri, and Henrique Rebêlo. 2013. Towards modularly comparing programs using automated theorem provers. In International Conference on Automated Deduction. Springer, 282–299.

Digital Library

[14]

Paul Holser. 2014. junit-quickcheck: Property-based testing, JUnit-style. Retrieved from https://pholser.github.io/junit-quickcheck/

[15]

JetBrains. 2000. IntelliJ IDEA. Retrieved from https://www.jetbrains.com/idea/

[16]

Jiajun Jiang, Yingfei Xiong, Hongyu Zhang, Qing Gao, and Xiangqun Chen. 2018. Shaping program repair space with existing patches and similar code. In International Symposium on Software Testing and Analysis (ISSTA’18). ACM, 298–309.

[17]

Lingxiao Jiang and Zhendong Su. 2009. Automatic mining of functionally equivalent code fragments via random testing. In 18th International Symposium on Software Testing and Analysis. 81–92.

[18]

Nan Jiang, Kevin Liu, Thibaud Lutellier, and Lin Tan. 2023. Impact of code language models on automated program repair. In 45th International Conference on Software Engineering. 1430–1442.

Digital Library

[19]

René Just, Darioush Jalali, and Michael D. Ernst. 2014. Defects4J: A database of existing faults to enable controlled testing studies for Java programs. In International Symposium on Software Testing and Analysis (ISSTA’14). 437–440.

[20]

Dongsun Kim, Jaechang Nam, Jaewoo Song, and Sunghun Kim. 2013. Automatic patch generation learned from human-written patches. In International Conference on Software Engineering (ICSE’13). 802–811.

[21]

Hyungsub Kim, Muslum Ozgur Ozmen, Z. Berkay Celik, Antonio Bianchi, and Dongyan Xu. 2023. PatchVerif: Discovering faulty patches in robotic vehicles. In USENIX Security Symposium.

[22]

YoungJae Kim, Seungheon Han, Askar Yeltayuly Khamit, and Jooyong Yi. 2023. Automated program repair from fuzzing perspective. In 32nd International Symposium on Software Testing and Analysis (ISSTA’23). Association for Computing Machinery, New York, NY, 854–866. DOI:

Digital Library

[23]

Tien-Duy B. Le, Jooyong Yi, David Lo, Ferdian Thung, and Abhik Roychoudhury. 2014. Dynamic inference of change contracts. In IEEE International Conference on Software Maintenance and Evolution. IEEE, 451–455.

Digital Library

[24]

C. Le Goues, ThanhVu Nguyen, S. Forrest, and W. Weimer. 2012. GenProg: A generic method for automatic software repair. IEEE Trans. Softw. Eng. 38, 1 (Jan. 2012), 54–72.

Digital Library

[25]

Owolabi Legunsen, Wajih Ul Hassan, Xinyue Xu, Grigore Roşu, and Darko Marinov. 2016. How good are the specs? A study of the bug-finding effectiveness of existing Java API specifications. In 31st IEEE/ACM International Conference on Automated Software Engineering. 602–613.

Digital Library

[26]

Kui Liu, Anil Koyuncu, Dongsun Kim, and Tegawendé F. Bissyandé. 2019. TBar: Revisiting template-based automated program repair. In International Symposium on Software Testing and Analysis (ISSTA’19). 31–42.

[27]

Kui Liu, Shangwen Wang, Anil Koyuncu, Kisub Kim, Tegawendé F. Bissyandé, Dongsun Kim, Peng Wu, Jacques Klein, Xiaoguang Mao, and Yves Le Traon. 2020. On the efficiency of test suite based program repair: A systematic assessment of 16 automated repair systems for Java programs. In International Conference on Software Engineering (ICSE’20). 615–627.

Digital Library

[28]

Fan Long and Martin Rinard. 2016. Automatic patch generation by learning correct code. In ACM SIGPLAN Symposium on Principles of Programming Languages (POPL’16). 298–312.

[29]

Fan Long and Martin C. Rinard. 2016. An analysis of the search spaces for generate and validate patch generation systems. In International Conference on Software Engineering (ICSE’16). 702–713.

Digital Library

[30]

Henry B. Mann and Donald R. Whitney. 1947. On a test of whether one of two random variables is stochastically larger than the other. Ann. Math. Stat. 18, 1 (1947), 50–60.

[31]

Johannes Mayer and Ralph Guderlei. 2006. An empirical study on the selection of good metamorphic relations. In 30th Annual International Computer Software and Applications Conference (COMPSAC’06), Vol. 1. IEEE, 475–484.

Digital Library

[32]

Sergey Mechtaev, Jooyong Yi, and Abhik Roychoudhury. 2015. DirectFix: Looking for simple program repairs. In International Conference on Software Engineering (ICSE’15). 448–458.

[33]

Sergey Mechtaev, Jooyong Yi, and Abhik Roychoudhury. 2016. Angelix: Scalable multiline program patch synthesis via symbolic analysis. In International Conference on Software Engineering (ICSE’16). 691–701.

Digital Library

[34]

Andrew Meneely, Harshavardhan Srinivasan, Ayemi Musa, Alberto Rodriguez Tejeda, Matthew Mokary, and Brian Spates. 2013. When a patch goes bad: Exploring the properties of vulnerability-contributing commits. In ACM/IEEE International Symposium on Empirical Software Engineering and Measurement. IEEE, 65–74.

[35]

Microsoft. 2021. Visual Studio Code. Retrieved from https://code.visualstudio.com/

[36]

Martin Monperrus. 2018. Automatic software repair: A bibliography. ACM Comput. Surv. 51, 1 (2018), 1–24.

Digital Library

[37]

Amirfarhad Nilizadeh, Gary T. Leavens, Xuan-Bach D. Le, Corina S. Păsăreanu, and David R. Cok. 2021. Exploring true test overfitting in dynamic automated program repair using formal methods. In IEEE Conference on Software Testing, Verification and Validation (ICST’21). IEEE, 229–240.

[38]

Yannic Noller, Corina S. Păsăreanu, Marcel Böhme, Youcheng Sun, Hoang Lam Nguyen, and Lars Grunske. 2020. HyDiff: Hybrid differential software analysis. In International Conference on Software Engineering (ICSE’20). IEEE, 1273–1285.

Digital Library

[39]

Carlos Pacheco, Shuvendu K. Lahiri, Michael D. Ernst, and Thomas Ball. 2007. Feedback-directed random test generation. In International Conference on Software Engineering (ICSE’07). IEEE, 75–84.

Digital Library

[40]

Rohan Padhye, Caroline Lemieux, and Koushik Sen. 2019. JQF: Coverage-guided property-based testing in Java. In ACM SIGSOFT International Symposium on Software Testing and Analysis (ISSTA’19). 398–401.

[41]

Hristina Palikareva, Tomasz Kuchta, and Cristian Cadar. 2016. Shadow of a doubt: Testing for divergences between software versions. In International Conference on Software Engineering (ICSE’16). 1181–1192.

Digital Library

[42]

Chris Parnin and Alessandro Orso. 2011. Are automated debugging techniques actually helping programmers? In International Symposium on Software Testing and Analysis (ISSTA’11). 199–209.

[43]

Theofilos Petsios, Adrian Tang, Salvatore Stolfo, Angelos D. Keromytis, and Suman Jana. 2017. Nezha: Efficient domain-independent differential testing. In IEEE Symposium on Security and Privacy (SP’17). IEEE, 615–632.

[44]

Arooba Shahoor, Askar Yeltayuly Khamit, Jooyong Yi, and Dongsun Kim. 2023. LeakPair: Proactive repairing of memory leaks in single page web applications. In 38th International Conference on Automated Software Engineering (ASE’23).

Digital Library

[45]

Ridwan Shariffdeen, Yannic Noller, Lars Grunske, and Abhik Roychoudhury. 2021. Concolic program repair. In 42nd ACM SIGPLAN International Conference on Programming Language Design and Implementation. 390–405.

Digital Library

[46]

Edward K. Smith, Earl T. Barr, Claire Le Goues, and Yuriy Brun. 2015. Is the cure worse than the disease? Overfitting in automated program repair. In ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering (ESEC/FSE’15). 532–543.

Digital Library

[47]

Shin Hwei Tan, Hiroaki Yoshida, Mukul R. Prasad, and Abhik Roychoudhury. 2016. Anti-patterns in search-based program repair. In International Symposium on Foundations of Software Engineering (FSE’16). 727–738.

[48]

Haoye Tian, Yinghua Li, Weiguo Pian, Abdoul Kader Kabore, Kui Liu, Andrew Habib, Jacques Klein, and Tegawendé F. Bissyandé. 2022. Predicting patch correctness based on the similarity of failing test cases. ACM Trans. Softw. Eng. Methodol. 31, 4 (2022), 1–30.

Digital Library

[49]

Haoye Tian, Kui Liu, Abdoul Kader Kaboré, Anil Koyuncu, Li Li, Jacques Klein, and Tegawendé F. Bissyandé. 2020. Evaluating representation learning of code changes for predicting patch correctness in program repair. In International Conference on Automated Software Engineering (ASE’20). IEEE, 981–992.

Digital Library

[50]

Nikolai Tillmann and Wolfram Schulte. 2005. Parameterized unit tests. ACM SIGSOFT Softw. Eng. Notes 30, 5 (2005), 253–262.

Digital Library

[51]

Shangwen Wang, Ming Wen, Bo Lin, Hongjun Wu, Yihao Qin, Deqing Zou, Xiaoguang Mao, and Hai Jin. 2020. Automated patch correctness assessment: How far are we? In 35th IEEE/ACM International Conference on Automated Software Engineering. 968–980.

Digital Library

[52]

Ming Wen, Junjie Chen, Rongxin Wu, Dan Hao, and Shing-Chi Cheung. 2018. Context-aware patch generation for better automated program repair. In International Conference on Software Engineering (ICSE’18). ACM, 1–11.

Digital Library

[53]

Chu-Pan Wong, Priscila Santiesteban, Christian Kästner, and Claire Le Goues. 2021. VarFix: Balancing edit expressiveness and search effectiveness in automated program repair. In Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering (ESEC/FSE’21). 354–366.

Digital Library

[54]

Chunqiu Steven Xia and Lingming Zhang. 2022. Less training, more repairing please: Revisiting automated program repair via zero-shot learning. In 30th ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering. 959–971.

Digital Library

[55]

Qi Xin and Steven P. Reiss. 2017. Identifying test-suite-overfitted patches through test case generation. In International Symposium on Software Testing and Analysis (ISSTA’17). 226–236.

[56]

Yingfei Xiong, Xinyuan Liu, Muhan Zeng, Lu Zhang, and Gang Huang. 2018. Identifying patch correctness in test-based program repair. In International Conference on Software Engineering (ICSE’18). 789–799.

Digital Library

[57]

Yingfei Xiong, Jie Wang, Runfa Yan, Jiachen Zhang, Shi Han, Gang Huang, and Lu Zhang. 2017. Precise condition synthesis for program repair. In International Conference on Software Engineering (ICSE’17). 416–426.

Digital Library

[58]

Bo Yang and Jinqiu Yang. 2020. Exploring the differences between plausible and correct patches at fine-grained level. In IEEE 2nd International Workshop on Intelligent Bug Fixing (IBF’20). IEEE, 1–8.

[59]

Jinqiu Yang, Alexey Zhikhartsev, Yuefei Liu, and Lin Tan. 2017. Better test cases for better automated program repair. In Joint Meeting on Foundations of Software Engineering (FSE’17). 831–841.

[60]

He Ye, Jian Gu, Matias Martinez, Thomas Durieux, and Martin Monperrus. 2021. Automated classification of overfitting patches with statically extracted code features. IEEE Trans. Softw. Eng. 48, 8 (2021).

[61]

He Ye, Matias Martinez, Thomas Durieux, and Martin Monperrus. 2021. A comprehensive study of automatic program repair on the QuixBugs benchmark. J. Syst. Softw. 171 (2021), 110825.

[62]

Jooyong Yi and Elkhan Ismayilzada. 2022. Speeding up constraint-based program repair using a search-based technique. Inf. Softw. Technol. 146 (2022), 106865.

Digital Library

[63]

Jooyong Yi, Dawei Qi, Shin Hwei Tan, and Abhik Roychoudhury. 2013. Expressing and checking intended changes via software change contracts. In International Symposium on Software Testing and Analysis. 1–11.

[64]

Jooyong Yi, Dawei Qi, Shin Hwei Tan, and Abhik Roychoudhury. 2015. Software change contracts. ACM Trans. Softw. Eng. Methodol. 24, 3 (2015), 1–43.

Digital Library

[65]

Zhongxing Yu, Matias Martinez, Benjamin Danglot, Thomas Durieux, and Martin Monperrus. 2019. Alleviating patch overfitting with automatic test generation: A study of feasibility and effectiveness for the Nopol repair system. Empir. Softw. Eng. 24, 1 (2019), 33–67.

Digital Library

Cited By

Fei ZGe JLi CWang TLi YZhang HHuang LLuo B(2024)Patch Correctness Assessment: A SurveyACM Transactions on Software Engineering and Methodology10.1145/370297234:2(1-50)Online publication date: 8-Nov-2024
https://dl.acm.org/doi/10.1145/3702972
Kim YPark YHan SYi JFilkov VRay BZhou M(2024)Enhancing the Efficiency of Automated Program Repair via Greybox AnalysisProceedings of the 39th IEEE/ACM International Conference on Automated Software Engineering10.1145/3691620.3695602(1719-1731)Online publication date: 27-Oct-2024
https://dl.acm.org/doi/10.1145/3691620.3695602
Postolski IBraberman VGarbervetsky DUchitel Sd'Amorim M(2024)Verification of Programs with Common FragmentsCompanion Proceedings of the 32nd ACM International Conference on the Foundations of Software Engineering10.1145/3663529.3663783(487-491)Online publication date: 10-Jul-2024
https://dl.acm.org/doi/10.1145/3663529.3663783

Index Terms

Poracle: Testing Patches under Preservation Conditions to Combat the Overfitting Problem of Program Repair
1. Software and its engineering
  1. Software creation and management
    1. Software development techniques
      1. Automatic programming
    2. Software verification and validation
      1. Software defect analysis
        Software testing and debugging

Recommendations

Better test cases for better automated program repair
ESEC/FSE 2017: Proceedings of the 2017 11th Joint Meeting on Foundations of Software Engineering

Automated generate-and-validate program repair techniques (G&V techniques) suffer from generating many overfitted patches due to in-capabilities of test cases. Such overfitted patches are incor- rect patches, which only make all given test cases pass, ...
Is the cure worse than the disease? overfitting in automated program repair
ESEC/FSE 2015: Proceedings of the 2015 10th Joint Meeting on Foundations of Software Engineering

Automated program repair has shown promise for reducing the significant manual effort debugging requires. This paper addresses a deficit of earlier evaluations of automated repair techniques caused by repairing programs and evaluating generated patches'...
When Automated Program Repair Meets Regression Testing—An Extensive Study on Two Million Patches
In recent years, Automated Program Repair (APR) has been extensively studied in academia and even drawn wide attention from the industry. However, APR techniques can be extremely time consuming since (1) a large number of patches can be generated for a ...

Comments

Information & Contributors

Information

Published In

cover image ACM Transactions on Software Engineering and Methodology

ACM Transactions on Software Engineering and Methodology Volume 33, Issue 2

February 2024

947 pages

EISSN:1557-7392

DOI:10.1145/3618077

Editor:
Mauro Pezzè
USI Universitá della Svizzera italiana and SIT Schaffhausen Institute of Technology, Switzerland

Issue’s Table of Contents

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 21 December 2023

Online AM: 26 September 2023

Accepted: 06 September 2023

Revised: 30 July 2023

Received: 24 January 2023

Published in TOSEM Volume 33, Issue 2

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Funding Sources

National Research Foundation of Korea (NRF) grant funded by the Korea government (MSIT)
Institute for Information & Communications Technology Planning & Evaluation (IITP) grant funded by the Korea government (MSIT)

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

3
Total Citations
View Citations
417
Total Downloads

Downloads (Last 12 months)249
Downloads (Last 6 weeks)27

Reflects downloads up to 16 Feb 2025

Other Metrics

View Author Metrics

Citations

Cited By

Fei ZGe JLi CWang TLi YZhang HHuang LLuo B(2024)Patch Correctness Assessment: A SurveyACM Transactions on Software Engineering and Methodology10.1145/370297234:2(1-50)Online publication date: 8-Nov-2024
https://dl.acm.org/doi/10.1145/3702972
Kim YPark YHan SYi JFilkov VRay BZhou M(2024)Enhancing the Efficiency of Automated Program Repair via Greybox AnalysisProceedings of the 39th IEEE/ACM International Conference on Automated Software Engineering10.1145/3691620.3695602(1719-1731)Online publication date: 27-Oct-2024
https://dl.acm.org/doi/10.1145/3691620.3695602
Postolski IBraberman VGarbervetsky DUchitel Sd'Amorim M(2024)Verification of Programs with Common FragmentsCompanion Proceedings of the 32nd ACM International Conference on the Foundations of Software Engineering10.1145/3663529.3663783(487-491)Online publication date: 10-Jul-2024
https://dl.acm.org/doi/10.1145/3663529.3663783

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Full Text

View this article in Full Text.

Figures

Tables

Media

View full text|Download PDF

View Issue’s Table of Contents