skip to main content
research-article

Poracle: Testing Patches under Preservation Conditions to Combat the Overfitting Problem of Program Repair

Published: 21 December 2023 Publication History

Abstract

To date, the users of test-driven program repair tools suffer from the overfitting problem; a generated patch may pass all available tests without being correct. In the existing work, users are treated as merely passive consumers of the tests. However, what if they are willing to modify the test to better assess the patches obtained from a repair tool? In this work, we propose a novel semi-automatic patch-classification methodology named Poracle. Our key contributions are three-fold. First, we design a novel lightweight specification method that reuses the existing test. Specifically, the users extend the existing failing test with a preservation condition—the condition under which the patched and pre-patched versions should produce the same output. Second, we develop a fuzzer that performs differential fuzzing with a test containing a preservation condition. Once we find an input that satisfies a specified preservation condition but produces different outputs between the patched and pre-patched versions, we classify the patch as incorrect with high confidence. We show that our approach is more effective than the four state-of-the-art patch classification approaches. Last, we show through a user study that the users find our semi-automatic patch assessment method more effective and preferable than the manual assessment.

References

[1]
Michael Buckland and Fredric Gey. 1994. The relationship between recall and precision. J. Amer. Soc. Inf. Sci. 45, 1 (1994), 12–19.
[2]
Antonio Carzaniga, Alessandra Gorla, Andrea Mattavelli, Nicolo Perino, and Mauro Pezze. 2013. Automatic recovery from runtime failures. In 35th International Conference on Software Engineering (ICSE’13). IEEE, 782–791.
[3]
Antonio Carzaniga, Alessandra Gorla, Nicolò Perino, and Mauro Pezzè. 2010. Automatic workarounds for web applications. In 18th ACM SIGSOFT International Symposium on Foundations of Software Engineering. 237–246.
[4]
Liushan Chen, Yu Pei, and Carlo Alberto Furia. 2020. Contract-based program repair without the contracts: An extended study. IEEE Trans. Softw. Eng. 47, 12 (2020).
[5]
Tianqi Chen and Carlos Guestrin. 2016. XGBoost: A scalable tree boosting system. In ACM International Conference on Knowledge Discovery and Data Mining (SIGKDD’16). 785–794.
[6]
Tsong Yueh Chen, D. H. Huang, T. H. Tse, and Zhi Quan Zhou. 2004. Case studies on the selection of useful relations in metamorphic testing. In 4th Ibero-American Symposium on Software Engineering and Knowledge Engineering (JIISIC’04). Citeseer, 569–583.
[7]
Koen Claessen and John Hughes. 2000. QuickCheck: A lightweight tool for random testing of Haskell programs. In International Conference on Functional Programming (ICFP’00). 268–279.
[8]
Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2019. BERT: Pre-training of deep bidirectional transformers for language understanding. In Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 4171–4186.
[11]
Xiang Gao, Sergey Mechtaev, and Abhik Roychoudhury. 2019. Crash-avoiding program repair. In 28th ACM SIGSOFT International Symposium on Software Testing and Analysis. 8–18.
[12]
Ali Ghanbari, Samuel Benton, and Lingming Zhang. 2019. Practical program repair via bytecode mutation. In 28th ACM SIGSOFT International Symposium on Software Testing and Analysis. 19–30.
[13]
Chris Hawblitzel, Ming Kawaguchi, Shuvendu K. Lahiri, and Henrique Rebêlo. 2013. Towards modularly comparing programs using automated theorem provers. In International Conference on Automated Deduction. Springer, 282–299.
[14]
Paul Holser. 2014. junit-quickcheck: Property-based testing, JUnit-style. Retrieved from https://pholser.github.io/junit-quickcheck/
[15]
JetBrains. 2000. IntelliJ IDEA. Retrieved from https://www.jetbrains.com/idea/
[16]
Jiajun Jiang, Yingfei Xiong, Hongyu Zhang, Qing Gao, and Xiangqun Chen. 2018. Shaping program repair space with existing patches and similar code. In International Symposium on Software Testing and Analysis (ISSTA’18). ACM, 298–309.
[17]
Lingxiao Jiang and Zhendong Su. 2009. Automatic mining of functionally equivalent code fragments via random testing. In 18th International Symposium on Software Testing and Analysis. 81–92.
[18]
Nan Jiang, Kevin Liu, Thibaud Lutellier, and Lin Tan. 2023. Impact of code language models on automated program repair. In 45th International Conference on Software Engineering. 1430–1442.
[19]
René Just, Darioush Jalali, and Michael D. Ernst. 2014. Defects4J: A database of existing faults to enable controlled testing studies for Java programs. In International Symposium on Software Testing and Analysis (ISSTA’14). 437–440.
[20]
Dongsun Kim, Jaechang Nam, Jaewoo Song, and Sunghun Kim. 2013. Automatic patch generation learned from human-written patches. In International Conference on Software Engineering (ICSE’13). 802–811.
[21]
Hyungsub Kim, Muslum Ozgur Ozmen, Z. Berkay Celik, Antonio Bianchi, and Dongyan Xu. 2023. PatchVerif: Discovering faulty patches in robotic vehicles. In USENIX Security Symposium.
[22]
YoungJae Kim, Seungheon Han, Askar Yeltayuly Khamit, and Jooyong Yi. 2023. Automated program repair from fuzzing perspective. In 32nd International Symposium on Software Testing and Analysis (ISSTA’23). Association for Computing Machinery, New York, NY, 854–866. DOI:
[23]
Tien-Duy B. Le, Jooyong Yi, David Lo, Ferdian Thung, and Abhik Roychoudhury. 2014. Dynamic inference of change contracts. In IEEE International Conference on Software Maintenance and Evolution. IEEE, 451–455.
[24]
C. Le Goues, ThanhVu Nguyen, S. Forrest, and W. Weimer. 2012. GenProg: A generic method for automatic software repair. IEEE Trans. Softw. Eng. 38, 1 (Jan. 2012), 54–72.
[25]
Owolabi Legunsen, Wajih Ul Hassan, Xinyue Xu, Grigore Roşu, and Darko Marinov. 2016. How good are the specs? A study of the bug-finding effectiveness of existing Java API specifications. In 31st IEEE/ACM International Conference on Automated Software Engineering. 602–613.
[26]
Kui Liu, Anil Koyuncu, Dongsun Kim, and Tegawendé F. Bissyandé. 2019. TBar: Revisiting template-based automated program repair. In International Symposium on Software Testing and Analysis (ISSTA’19). 31–42.
[27]
Kui Liu, Shangwen Wang, Anil Koyuncu, Kisub Kim, Tegawendé F. Bissyandé, Dongsun Kim, Peng Wu, Jacques Klein, Xiaoguang Mao, and Yves Le Traon. 2020. On the efficiency of test suite based program repair: A systematic assessment of 16 automated repair systems for Java programs. In International Conference on Software Engineering (ICSE’20). 615–627.
[28]
Fan Long and Martin Rinard. 2016. Automatic patch generation by learning correct code. In ACM SIGPLAN Symposium on Principles of Programming Languages (POPL’16). 298–312.
[29]
Fan Long and Martin C. Rinard. 2016. An analysis of the search spaces for generate and validate patch generation systems. In International Conference on Software Engineering (ICSE’16). 702–713.
[30]
Henry B. Mann and Donald R. Whitney. 1947. On a test of whether one of two random variables is stochastically larger than the other. Ann. Math. Stat. 18, 1 (1947), 50–60.
[31]
Johannes Mayer and Ralph Guderlei. 2006. An empirical study on the selection of good metamorphic relations. In 30th Annual International Computer Software and Applications Conference (COMPSAC’06), Vol. 1. IEEE, 475–484.
[32]
Sergey Mechtaev, Jooyong Yi, and Abhik Roychoudhury. 2015. DirectFix: Looking for simple program repairs. In International Conference on Software Engineering (ICSE’15). 448–458.
[33]
Sergey Mechtaev, Jooyong Yi, and Abhik Roychoudhury. 2016. Angelix: Scalable multiline program patch synthesis via symbolic analysis. In International Conference on Software Engineering (ICSE’16). 691–701.
[34]
Andrew Meneely, Harshavardhan Srinivasan, Ayemi Musa, Alberto Rodriguez Tejeda, Matthew Mokary, and Brian Spates. 2013. When a patch goes bad: Exploring the properties of vulnerability-contributing commits. In ACM/IEEE International Symposium on Empirical Software Engineering and Measurement. IEEE, 65–74.
[35]
Microsoft. 2021. Visual Studio Code. Retrieved from https://code.visualstudio.com/
[36]
Martin Monperrus. 2018. Automatic software repair: A bibliography. ACM Comput. Surv. 51, 1 (2018), 1–24.
[37]
Amirfarhad Nilizadeh, Gary T. Leavens, Xuan-Bach D. Le, Corina S. Păsăreanu, and David R. Cok. 2021. Exploring true test overfitting in dynamic automated program repair using formal methods. In IEEE Conference on Software Testing, Verification and Validation (ICST’21). IEEE, 229–240.
[38]
Yannic Noller, Corina S. Păsăreanu, Marcel Böhme, Youcheng Sun, Hoang Lam Nguyen, and Lars Grunske. 2020. HyDiff: Hybrid differential software analysis. In International Conference on Software Engineering (ICSE’20). IEEE, 1273–1285.
[39]
Carlos Pacheco, Shuvendu K. Lahiri, Michael D. Ernst, and Thomas Ball. 2007. Feedback-directed random test generation. In International Conference on Software Engineering (ICSE’07). IEEE, 75–84.
[40]
Rohan Padhye, Caroline Lemieux, and Koushik Sen. 2019. JQF: Coverage-guided property-based testing in Java. In ACM SIGSOFT International Symposium on Software Testing and Analysis (ISSTA’19). 398–401.
[41]
Hristina Palikareva, Tomasz Kuchta, and Cristian Cadar. 2016. Shadow of a doubt: Testing for divergences between software versions. In International Conference on Software Engineering (ICSE’16). 1181–1192.
[42]
Chris Parnin and Alessandro Orso. 2011. Are automated debugging techniques actually helping programmers? In International Symposium on Software Testing and Analysis (ISSTA’11). 199–209.
[43]
Theofilos Petsios, Adrian Tang, Salvatore Stolfo, Angelos D. Keromytis, and Suman Jana. 2017. Nezha: Efficient domain-independent differential testing. In IEEE Symposium on Security and Privacy (SP’17). IEEE, 615–632.
[44]
Arooba Shahoor, Askar Yeltayuly Khamit, Jooyong Yi, and Dongsun Kim. 2023. LeakPair: Proactive repairing of memory leaks in single page web applications. In 38th International Conference on Automated Software Engineering (ASE’23).
[45]
Ridwan Shariffdeen, Yannic Noller, Lars Grunske, and Abhik Roychoudhury. 2021. Concolic program repair. In 42nd ACM SIGPLAN International Conference on Programming Language Design and Implementation. 390–405.
[46]
Edward K. Smith, Earl T. Barr, Claire Le Goues, and Yuriy Brun. 2015. Is the cure worse than the disease? Overfitting in automated program repair. In ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering (ESEC/FSE’15). 532–543.
[47]
Shin Hwei Tan, Hiroaki Yoshida, Mukul R. Prasad, and Abhik Roychoudhury. 2016. Anti-patterns in search-based program repair. In International Symposium on Foundations of Software Engineering (FSE’16). 727–738.
[48]
Haoye Tian, Yinghua Li, Weiguo Pian, Abdoul Kader Kabore, Kui Liu, Andrew Habib, Jacques Klein, and Tegawendé F. Bissyandé. 2022. Predicting patch correctness based on the similarity of failing test cases. ACM Trans. Softw. Eng. Methodol. 31, 4 (2022), 1–30.
[49]
Haoye Tian, Kui Liu, Abdoul Kader Kaboré, Anil Koyuncu, Li Li, Jacques Klein, and Tegawendé F. Bissyandé. 2020. Evaluating representation learning of code changes for predicting patch correctness in program repair. In International Conference on Automated Software Engineering (ASE’20). IEEE, 981–992.
[50]
Nikolai Tillmann and Wolfram Schulte. 2005. Parameterized unit tests. ACM SIGSOFT Softw. Eng. Notes 30, 5 (2005), 253–262.
[51]
Shangwen Wang, Ming Wen, Bo Lin, Hongjun Wu, Yihao Qin, Deqing Zou, Xiaoguang Mao, and Hai Jin. 2020. Automated patch correctness assessment: How far are we? In 35th IEEE/ACM International Conference on Automated Software Engineering. 968–980.
[52]
Ming Wen, Junjie Chen, Rongxin Wu, Dan Hao, and Shing-Chi Cheung. 2018. Context-aware patch generation for better automated program repair. In International Conference on Software Engineering (ICSE’18). ACM, 1–11.
[53]
Chu-Pan Wong, Priscila Santiesteban, Christian Kästner, and Claire Le Goues. 2021. VarFix: Balancing edit expressiveness and search effectiveness in automated program repair. In Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering (ESEC/FSE’21). 354–366.
[54]
Chunqiu Steven Xia and Lingming Zhang. 2022. Less training, more repairing please: Revisiting automated program repair via zero-shot learning. In 30th ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering. 959–971.
[55]
Qi Xin and Steven P. Reiss. 2017. Identifying test-suite-overfitted patches through test case generation. In International Symposium on Software Testing and Analysis (ISSTA’17). 226–236.
[56]
Yingfei Xiong, Xinyuan Liu, Muhan Zeng, Lu Zhang, and Gang Huang. 2018. Identifying patch correctness in test-based program repair. In International Conference on Software Engineering (ICSE’18). 789–799.
[57]
Yingfei Xiong, Jie Wang, Runfa Yan, Jiachen Zhang, Shi Han, Gang Huang, and Lu Zhang. 2017. Precise condition synthesis for program repair. In International Conference on Software Engineering (ICSE’17). 416–426.
[58]
Bo Yang and Jinqiu Yang. 2020. Exploring the differences between plausible and correct patches at fine-grained level. In IEEE 2nd International Workshop on Intelligent Bug Fixing (IBF’20). IEEE, 1–8.
[59]
Jinqiu Yang, Alexey Zhikhartsev, Yuefei Liu, and Lin Tan. 2017. Better test cases for better automated program repair. In Joint Meeting on Foundations of Software Engineering (FSE’17). 831–841.
[60]
He Ye, Jian Gu, Matias Martinez, Thomas Durieux, and Martin Monperrus. 2021. Automated classification of overfitting patches with statically extracted code features. IEEE Trans. Softw. Eng. 48, 8 (2021).
[61]
He Ye, Matias Martinez, Thomas Durieux, and Martin Monperrus. 2021. A comprehensive study of automatic program repair on the QuixBugs benchmark. J. Syst. Softw. 171 (2021), 110825.
[62]
Jooyong Yi and Elkhan Ismayilzada. 2022. Speeding up constraint-based program repair using a search-based technique. Inf. Softw. Technol. 146 (2022), 106865.
[63]
Jooyong Yi, Dawei Qi, Shin Hwei Tan, and Abhik Roychoudhury. 2013. Expressing and checking intended changes via software change contracts. In International Symposium on Software Testing and Analysis. 1–11.
[64]
Jooyong Yi, Dawei Qi, Shin Hwei Tan, and Abhik Roychoudhury. 2015. Software change contracts. ACM Trans. Softw. Eng. Methodol. 24, 3 (2015), 1–43.
[65]
Zhongxing Yu, Matias Martinez, Benjamin Danglot, Thomas Durieux, and Martin Monperrus. 2019. Alleviating patch overfitting with automatic test generation: A study of feasibility and effectiveness for the Nopol repair system. Empir. Softw. Eng. 24, 1 (2019), 33–67.

Cited By

View all
  • (2024)Patch Correctness Assessment: A SurveyACM Transactions on Software Engineering and Methodology10.1145/370297234:2(1-50)Online publication date: 8-Nov-2024
  • (2024)Enhancing the Efficiency of Automated Program Repair via Greybox AnalysisProceedings of the 39th IEEE/ACM International Conference on Automated Software Engineering10.1145/3691620.3695602(1719-1731)Online publication date: 27-Oct-2024
  • (2024)Verification of Programs with Common FragmentsCompanion Proceedings of the 32nd ACM International Conference on the Foundations of Software Engineering10.1145/3663529.3663783(487-491)Online publication date: 10-Jul-2024

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Transactions on Software Engineering and Methodology
ACM Transactions on Software Engineering and Methodology  Volume 33, Issue 2
February 2024
947 pages
EISSN:1557-7392
DOI:10.1145/3618077
  • Editor:
  • Mauro Pezzè
Issue’s Table of Contents

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 21 December 2023
Online AM: 26 September 2023
Accepted: 06 September 2023
Revised: 30 July 2023
Received: 24 January 2023
Published in TOSEM Volume 33, Issue 2

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. Automated program repair
  2. overfitting problem
  3. patch validation
  4. patch classification
  5. preservation condition

Qualifiers

  • Research-article

Funding Sources

  • National Research Foundation of Korea (NRF) grant funded by the Korea government (MSIT)
  • Institute for Information & Communications Technology Planning & Evaluation (IITP) grant funded by the Korea government (MSIT)

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)249
  • Downloads (Last 6 weeks)27
Reflects downloads up to 16 Feb 2025

Other Metrics

Citations

Cited By

View all
  • (2024)Patch Correctness Assessment: A SurveyACM Transactions on Software Engineering and Methodology10.1145/370297234:2(1-50)Online publication date: 8-Nov-2024
  • (2024)Enhancing the Efficiency of Automated Program Repair via Greybox AnalysisProceedings of the 39th IEEE/ACM International Conference on Automated Software Engineering10.1145/3691620.3695602(1719-1731)Online publication date: 27-Oct-2024
  • (2024)Verification of Programs with Common FragmentsCompanion Proceedings of the 32nd ACM International Conference on the Foundations of Software Engineering10.1145/3663529.3663783(487-491)Online publication date: 10-Jul-2024

View Options

Login options

Full Access

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Full Text

View this article in Full Text.

Full Text

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media