Abstract
Existing evaluations of automated repair techniques focus on the fraction of the defects for which the technique can produce a patch, the time needed to produce patches, and how well patches generalize to the intended specification. However, these evaluations have not focused on the applicability of repair techniques and the characteristics of the defects that these techniques can repair. Questions such as “Can automated repair techniques repair defects that are hard for developers to repair?” and “Are automated repair techniques less likely to repair defects that involve loops?” have not, as of yet, been answered. To address such questions, we annotate two large benchmarks totaling 409 C and Java defects in real-world software, ranging from 22K to 2.8M lines of code, with measures of the defect’s importance, the developer-written patch’s complexity, and the quality of the test suite. We then analyze relationships between these measures and the ability to produce patches for the defects of seven automated repair techniques —AE, GenProg, Kali, Nopol, Prophet, SPR, and TrpAutoRepair. We find that automated repair techniques are less likely to produce patches for defects that required developers to write a lot of code or edit many files, or that have many tests relevant to the defect. Java techniques are more likely to produce patches for high-priority defects. Neither the time it took developers to fix a defect nor the test suite’s coverage correlate with the automated repair techniques’ ability to produce patches. Finally, automated repair techniques are less capable of fixing defects that require developers to add loops and new function calls, or to change method signatures. These findings identify strengths and shortcomings of the state-of-the-art of automated program repair along new dimensions. The presented methodology can drive research toward improving the applicability of automated repair techniques to hard and important bugs.















Similar content being viewed by others
References
Alkhalaf M, Aydin A, Bultan T (2014) Semantic differential repair for input validation and sanitization. In: International symposium on software testing and analysis (ISSTA), San Jose, CA, USA, pp 225–236
Ammann P, Offutt J (2008) Introduction to software testing, 1st edn. Cambridge University Press, New York
Arcuri A, Yao X (2008) A novel co-evolutionary approach to automatic software bug fixing. In: Congress on Evolutionary Computation, pp 162–168
Bradbury JS, Jalbert K Di Penta M, Poulding S, Briand L, Clark J (eds) (2010) Automatic repair of concurrency bugs. Benevento, Italy
Brun Y, Bang J, Edwards G, Medvidovic N (2015) Self-adapting reliability in distributed software systems. IEEE Transactions on Software Engineering (TSE) 41(8):764–780. https://doi.org/10.1109/TSE.2015.2412134
Brun Y, Barr E, Xiao M, Le Goues C, Devanbu P (2013) Evolution vs. intelligent design in program patching. Tech. Rep., UC Davis: College of Engineering https://escholarship.org/uc/item/3z8926ks
Brun Y, Medvidovic N (2007) An architectural style for solving computationally intensive problems on large networks. In: Software engineering for adaptive and self-managing systems (SEAMS). Minneapolis, MN, USA. https://doi.org/10.1109/SEAMS.2007.4
Brun Y, Medvidovic N (2007) Fault and adversary tolerance as an emergent property of distributed systems’ software architectures. In: International workshop on engineering fault tolerant systems (EFTS). Dubrovnik, Croatia, pp 38–43. https://doi.org/10.1145/1316550.1316557
Bryant A, Charmaz K (2007) The SAGE handbook of grounded theory. SAGE Publications Ltd, New York
Carbin M, Misailovic S, Kling M, Rinard M (2011) Detecting and escaping infinite loops with xJolt. In: European conference on object oriented programming (ECOOP). Lancaster, England, UK
Carzaniga A, Gorla A, Mattavelli A, Perino N, Pezzė M (2013) Automatic recovery from runtime failures. In: ACM/IEEE international conference on software engineering (ICSE). San Francisco, CA, USA, pp 782–791
Carzaniga A, Gorla A, Perino N, Pezzė M (2010) Automatic workarounds for web applications. In: ACM SIGSOFT international symposium on foundations of software engineering (FSE). Santa Fe, New Mexico, USA, pp 237–246. https://doi.org/10.1145/1882291.1882327
Charmaz K (2006) Constructing grounded theory: a practical guide through qualitative analysis. SAGE Publications Ltd, New York
Coker Z, Hafiz M (2013) Program transformations to fix C integers. In: ACM/IEEE international conference on software engineering (ICSE). San Francisco, CA, USA, pp 792–801
Dallmeier V, Zeller A, Meyer B (2009) Generating fixes from object behavior anomalies. In: IEEE/ACM international conference on automated software engineering (ASE) short paper track. Auckland, New Zealand, pp 550–554. https://doi.org/10.1109/ASE.2009.15
Debroy V, Wong W (2010) Using mutation to automatically suggest fixes for faulty programs. In: International conference on software testing, verification, and validation. Paris, France, pp 65–74. https://doi.org/10.1109/ICST.2010.66
DeMarco F, Xuan J, Berre DL, Monperrus M (2014) Automatic repair of buggy if conditions and missing preconditions with SMT. In: International workshop on constraints in software testing, verification, and analysis (CSTVA). Hyderabad, India, pp 30–39. https://doi.org/10.1145/2593735.2593740
Demsky B, Ernst MD, Guo PJ, McCamant S, Perkins JH, Rinard M (2006) Inference and enforcement of data structure consistency specifications. In: International symposium on software testing and analysis (ISSTA). Portland, ME, USA, pp 233–243
Durieux T, Martinez M, Monperrus M, Sommerard R, Xuan J (2015) Automatic repair of real bugs: An experience report on the Defects4J dataset. arXiv:1505.07002
Elkarablieh B, Khurshid S (2008) Juzi: a tool for repairing complex data structures. In: ACM/IEEE international conference on software engineering (ICSE) formal demonstration track. Leipzig, Germany, pp 855–858. https://doi.org/10.1145/1368088.1368222
Ernst MD, Cockrell J, Griswold WG, Notkin D (2001) Dynamically discovering likely program invariants to support program evolution. IEEE Transactions on Software Engineering (TSE) 27(2):99–123
Ferguson CJ (2009) An effect size primer: a guide for clinicians and researchers. Prof Psychol: Res Prac 40(5):532–538. https://doi.org/10.1037/a0015808
Fry ZP, Landau B, Weimer W (2012) A human study of patch maintainability. In: International symposium on software testing and analysis (ISSTA). Minneapolis, MN, USA, pp 177–187
Galhotra S, Brun Y, Meliou A (2017) Fairness testing: testing software for discrimination. In: European software engineering conference and ACM SIGSOFT symposium on the foundations of software engineering (ESEC/FSE). Paderborn, Germany, pp 498–510. https://doi.org/10.1145/3106237.3106277
Gopinath D, Malik MZ, Khurshid S (2011) Specification-based program repair using SAT. In: International conference on tools and algorithms for the construction and analysis of systems (TACAS). Saarbrücken, Germany, pp 173–188
Harman M (2007) The current state and future of search based software engineering. In: ACM/IEEE international conference on software engineering (ICSE), pp 342–357. https://doi.org/10.1109/FOSE.2007.29
Hutchins M, Foster H, Goradia T, Ostrand T (1994) Experiments of the effectiveness of dataflow-and control flow-based test adequacy criteria. In: ACM/IEEE international conference on software engineering (ICSE). Sorrento, Italy, pp 191–200
Jeffrey D, Feng M, Gupta N, Gupta R (2009) Bugfix: a learning-based tool to assist developers in fixing bugs. In: International conference on program comprehension (ICPC). Vancouver, BC, Canada, pp 70–79. https://doi.org/10.1109/ICPC.2009.5090029
Jiang M, Chena TY, Kuoa FC, Towey D, Ding Z (2016) A metamorphic testing approach for supporting program repair without the need for a test oracle. J Syst Softw (JSS) 126:127–140. https://doi.org/10.1016/j.jss.2016.04.002
Jin G, Song L, Zhang W, Lu S, Liblit B (2011) Automated atomicity-violation fixing. In: ACM SIGPLAN conference on programming language design and implementation (PLDI). San Jose, CA, USA, pp 389–400. https://doi.org/10.1145/1993498.1993544
Just R, Jalali D, Ernst MD (2014) Defects4j: a database of existing faults to enable controlled testing studies for Java programs. In: Proceedings of the international symposium on software testing and analysis (ISSTA). San Jose, CA, USA, pp 437–440
Ke Y, Stolee KT, Le Goues C, Brun Y (2015) Repairing programs with semantic code search. In: International conference on automated software engineering (ASE). Lincoln, NE, USA, pp 295–306. https://doi.org/10.1109/ASE.2015.60
Kim D, Nam J, Song J, Kim S (2013) Automatic patch generation learned from human-written patches. In: ACM/IEEE international conference on software engineering (ICSE). San Francisco, CA, USA, pp 802–811. http://dl.acm.org/citation.cfm?id=2486788.2486893
Kong X, Zhang L, Wong WE, Li B (2015) Experience report: how do techniques, programs, and tests impact automated program repair?. In: IEEE international symposium on software reliability engineering (ISSRE). Gaithersburg, MD, USA, pp 194–204. https://doi.org/10.1109/ISSRE.2015.7381813
Koza JR (1992) Genetic programming: on the programming of computers by means of natural selection. MIT Press, Cambridge
Langdon WB, White DR, Harman M, Jia Y, Petke J (2016) API-constrained genetic improvement. In: International symposium on search based software engineering (SSBSE). Raleigh, NC, USA, pp 224–230. https://doi.org/10.1007/978-3-319-47106-8_16
Le XBD, Chu DH, Lo D, Le Goues C, Visser W (2017) S3: syntax- and semantic-guided repair synthesis via programming by examples. In: European software engineering conference and ACM SIGSOFT international symposium on foundations of software engineering (ESEC/FSE). Paderborn, Germany
Le Goues C, Dewey-Vogt M, Forrest S, Weimer W (2012a) A systematic study of automated program repair: Fixing 55 out of 105 bugs for $8 each. In: AMC/IEEE international conference on software engineering (ICSE). Zurich, Switzerland, pp 3–13
Le Goues C, Holtschulte N, Smith EK, Brun Y, Devanbu P, Forrest S, Weimer W (2015) The ManyBugs and IntroClass benchmarks for automated repair of C programs. IEEE Transactions on Software Engineering (TSE) 41(12):1236–1256. https://doi.org/10.1109/TSE.2015.2454513
Le Goues C, Nguyen T, Forrest S, Weimer W (2012b) Genprog: a generic method for automatic software repair. IEEE Transactions on Software Engineering (TSE) 38:54–72. https://doi.org/10.1109/TSE.2011.104
Le Roy MK (2009) Research methods in political science: an introduction using MicroCase, 7th edn. Thompson Learning, Wadsworth
Liu P, Tripp O, Zhang C (2014) Grail: context-aware fixing of concurrency bugs. In: ACM SIGSOFT international symposium on foundations of software engineering (FSE). Hong Kong, China, pp 318–329
Liu P, Zhang C (2012) Axis: Automatically fixing atomicity violations through solving control constraints. In: ACM/IEEE international conference on software engineering (ICSE). Zurich, Switzerland, pp 299–309
Long F, Rinard M (2015) Staged program repair with condition synthesis. In: European software engineering conference and ACM SIGSOFT international symposium on foundations of software engineering (ESEC/FSE). Bergamo, Italy, pp 166–178. https://doi.org/10.1145/2786805.2786811
Long F, Rinard M (2016a) An analysis of the search spaces for generate and validate patch generation systems. In: ACM/IEEE international conference on software engineering (ICSE). Austin, TX, USA, pp 702–713. https://doi.org/10.1145/2884781.2884872
Long F, Rinard M (2016b) Automatic patch generation by learning correct code. In: ACM SIGPLAN-SIGACT symposium on principles of programming languages (POPL). St. Petersburg, FL, USA, pp 298–312. https://doi.org/10.1145/2837614.2837617
Martinez M, Durieux T, Sommerard R, Xuan J, Monperrus M (2017) Automatic repair of real bugs in Java: a large-scale experiment on the Defects4J dataset. Empirical Software Engineering (EMSE) 22(4):1936–1964. https://doi.org/10.1007/s10664-016-9470-4
Matavire R, Brown I (2013) Profiling grounded theory approaches in information systems research. Eur J Inf Syst 22(1):119–129. https://doi.org/10.1057/ejis.2011.35
Mechtaev S, Yi J, Roychoudhury A (2015) Directfix: looking for simple program repairs. In: International conference on software engineering (ICSE). Florence, Italy
Mechtaev S, Yi J, Roychoudhury A (2016) Angelix: Scalable multiline program patch synthesis via symbolic analysis. In: International conference on software engineering (ICSE). Austin, TX, USA
Monperrus M (2014) A critical review of automatic patch generation learned from human-written patches: essay on the problem statement and the evaluation of automatic software repair. In: ACM/IEEE international conference on software engineering (ICSE). Hyderabad, India, pp 234–242. https://doi.org/10.1145/2568225.2568324
Muşlu K, Brun Y, Meliou A (2013) Data debugging with continuous testing. In: European software engineering conference and ACM SIGSOFT symposium on the foundations of software engineering (ESEC/FSE) NIER Track. Saint Petersburg, Russia, pp 631–634. https://doi.org/10.1145/2491411.2494580
Muşlu K, Brun Y, Meliou A (2015) Preventing data errors with continuous testing. In: International symposium on software testing and analysis (ISSTA). Baltimore, MD, USA, pp 373–384. https://doi.org/10.1145/2771783.2771792
Newson R (2002) Parameters behind nonparametric statistics: Kendall’s tau, Somers’ D and median differences. Stata J 2(1):45–64
Nguyen HDT, Qi D, Roychoudhury A, Chandra S (2013) Semfix: program repair via semantic analysis. In: ACM/IEEE international conference on software engineering (ICSE). San Francisco, CA, USA, pp 772–781
Orlov M, Sipper M (2011) Flight of the FINCH through the Java wilderness. IEEE Trans Evol Comput 15(2):166–182
Pei Y, Furia CA, Nordio M, Wei Y, Meyer B, Zeller A (2014) Automated fixing of programs with contracts. IEEE Transactions on Software Engineering (TSE) 40(5):427–449. https://doi.org/10.1109/TSE.2014.2312918
Perkins JH, Kim S, Larsen S, Amarasinghe S, Bachrach J, Carbin M, Pacheco C, Sherwood F, Sidiroglou S, Sullivan G, Wong WF, Zibin Y, Ernst MD, Rinard M (2009) Automatically patching errors in deployed software. In: ACM symposium on operating systems principles (SOSP). Big Sky, MT, USA, pp 87–102. https://doi.org/10.1145/1629575.1629585
Petke J, Haraldsson SO, Harman M, Langdon WB, White DR, Woodward JR (2017) Genetic improvement of software: a comprehensive survey. IEEE Transactions on Evolutionary Computation (TEC). In press. https://doi.org/10.1109/TEVC.2017.2693219
Qi Y, Mao X, Lei Y (2013) Efficient automated program repair through fault-recorded testing prioritization. In: International conference on software maintenance (ICSM). Eindhoven, The Netherlands, pp 180–189. https://doi.org/10.1109/ICSM.2013.29
Qi Z, Long F, Achour S, Rinard M (2015) An analysis of patch plausibility and correctness for generate-and-validate patch generation systems. In: International symposium on software testing and analysis (ISSTA). Baltimore, MD, USA, pp 24–36. https://doi.org/10.1145/2771783.2771791
Schulte E, Dorn J, Harding S, Forrest S, Weimer W (2014) Post-compiler software optimization for reducing energy. In: International conference on architectural support for programming languages and operating systems (ASPLOS). Salt Lake City, UT, USA, pp 639–652. https://doi.org/10.1145/2541940.2541980
Sidiroglou S, Keromytis AD (2005) Countering network worms through automatic patch generation. IEEE Secur Priv 3(6):41–49
Sidiroglou-Douskos S, Lahtinen E, Long F, Rinard M (2015) Automatic error elimination by horizontal code transfer across multiple applications. In: ACM SIGPLAN conference on programming language design and implementation (PLDI). Portland, OR, USA, pp 43–54. https://doi.org/10.1145/2737924.2737988
Smith EK, Barr E, Le Goues C, Brun Y (2015) Is the cure worse than the disease? Overfitting in automated program repair. In: European software engineering conference and ACM SIGSOFT symposium on the foundations of software engineering (ESEC/FSE). Bergamo, Italy, pp 532–543. https://doi.org/10.1145/2786805.2786825
softwaretestinghelp.com (2015) 15 most popular bug tracking software to ease your defect management process. http://www.softwaretestinghelp.com/popular-bug-tracking-software/, accessed December 11 2015
Soto M, Thung F, Wong CP, Goues CL, Lo D (2016) a deeper look into bug fixes: patterns, replacements, deletions, and additions. In: International conference on mining software repositories (MSR) mining challenge track. Austin, TX, USA. https://doi.org/10.1145/2901739.2903495
Tan SH, Roychoudhury A (2015) relifix: automated repair of software regressions. In: International conference on software engineering (ICSE). Florence, Italy
Wang X, Dong XL, Meliou A (2015) Data X-Ray: a diagnostic tool for data errors. In: International conference on management of data (SIGMOD)
Wei Y, Pei Y, Furia CA, Silva LS, Buchholz S, Meyer B, Zeller A (2010) Automated fixing of programs with contracts. In: International symposium on software testing and analysis (ISSTA). Trento, Italy, pp 61–72. https://doi.org/10.1145/1831708.1831716
Weimer W, Fry ZP, Forrest S (2013) Leveraging program equivalence for adaptive program repair: models and first results. In: IEEE/ACM international conference on automated software engineering (ASE). Palo alto, CA, USA
Weimer W, Nguyen T, Le Goues C, Forrest S (2009) Automatically finding patches using genetic programming. In: ACM/IEEE international conference on software engineering (ICSE). Vancouver, BC, Canada, pp 364–374. https://doi.org/10.1109/ICSE.2009.5070536
Weiss A, Guha A, Brun Y (2017) Tortoise: interactive system configuration repair. In: International conference on automated software engineering (ASE). Urbana-champaign, IL, USA
Wilkerson JL, Tauritz DR, Bridges JM (2012) Multi-objective coevolutionary automated software correction. In: Conference on genetic and evolutionary computation (GECCO). Philadelphia, PA, USA, pp 1229–1236. https://doi.org/10.1145/2330163.2330333
Yang G, Khurshid S, Kim M (2012) Specification-based test repair using a lightweight formal method. In: International symposium on formal methods (FM). Paris, France, pp 455–470. https://doi.org/10.1007/978-3-642-32759-9_37
Acknowledgements
This work is supported by the National Science Foundation under grants CCF-1453474 and CCF-1564162.
Author information
Authors and Affiliations
Corresponding author
Additional information
Communicated by: Martin Monperrus and Westley Weimer
Appendices
Appendix A: Importance and Difficulty Data
Table 1 describes the relevant concrete parameters for each of the bug tracking systems, project-hosting platforms, and defect benchmarks. We omit the semantics of the specific names the various systems and platforms use. This information is available from the underlying bug tracking systems and project-hosting platforms. Table 2 shows the mapping from concrete parameters to abstract parameters and to the five defect characteristics.
Appendix B: Availability of Data for Annotating Defects
Table 3 describes information about which abstract parameters were available in different issue tracking systems used by ManyBugs and Defects4J projects and how the corresponding concrete parameters were used to annotate the defects. Figure 16 shows the number of defects annotated for each abstract parameter using concrete parameters from bug trackers and benchmarks.
The number of defects annotated for each abstract parameter using the information described in Table 3 and data available in the ManyBugs and Defects4J benchmarks
Rights and permissions
About this article
Cite this article
Motwani, M., Sankaranarayanan, S., Just, R. et al. Do automated program repair techniques repair hard and important bugs?. Empir Software Eng 23, 2901–2947 (2018). https://doi.org/10.1007/s10664-017-9550-0
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10664-017-9550-0