Skip to main content
Log in

Do automated program repair techniques repair hard and important bugs?

  • Published:
Empirical Software Engineering Aims and scope Submit manuscript

Abstract

Existing evaluations of automated repair techniques focus on the fraction of the defects for which the technique can produce a patch, the time needed to produce patches, and how well patches generalize to the intended specification. However, these evaluations have not focused on the applicability of repair techniques and the characteristics of the defects that these techniques can repair. Questions such as “Can automated repair techniques repair defects that are hard for developers to repair?” and “Are automated repair techniques less likely to repair defects that involve loops?” have not, as of yet, been answered. To address such questions, we annotate two large benchmarks totaling 409 C and Java defects in real-world software, ranging from 22K to 2.8M lines of code, with measures of the defect’s importance, the developer-written patch’s complexity, and the quality of the test suite. We then analyze relationships between these measures and the ability to produce patches for the defects of seven automated repair techniques —AE, GenProg, Kali, Nopol, Prophet, SPR, and TrpAutoRepair. We find that automated repair techniques are less likely to produce patches for defects that required developers to write a lot of code or edit many files, or that have many tests relevant to the defect. Java techniques are more likely to produce patches for high-priority defects. Neither the time it took developers to fix a defect nor the test suite’s coverage correlate with the automated repair techniques’ ability to produce patches. Finally, automated repair techniques are less capable of fixing defects that require developers to add loops and new function calls, or to change method signatures. These findings identify strengths and shortcomings of the state-of-the-art of automated program repair along new dimensions. The presented methodology can drive research toward improving the applicability of automated repair techniques to hard and important bugs.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15

Similar content being viewed by others

References

  • Alkhalaf M, Aydin A, Bultan T (2014) Semantic differential repair for input validation and sanitization. In: International symposium on software testing and analysis (ISSTA), San Jose, CA, USA, pp 225–236

  • Ammann P, Offutt J (2008) Introduction to software testing, 1st edn. Cambridge University Press, New York

    Book  Google Scholar 

  • Arcuri A, Yao X (2008) A novel co-evolutionary approach to automatic software bug fixing. In: Congress on Evolutionary Computation, pp 162–168

  • Bradbury JS, Jalbert K Di Penta M, Poulding S, Briand L, Clark J (eds) (2010) Automatic repair of concurrency bugs. Benevento, Italy

  • Brun Y, Bang J, Edwards G, Medvidovic N (2015) Self-adapting reliability in distributed software systems. IEEE Transactions on Software Engineering (TSE) 41(8):764–780. https://doi.org/10.1109/TSE.2015.2412134

    Article  Google Scholar 

  • Brun Y, Barr E, Xiao M, Le Goues C, Devanbu P (2013) Evolution vs. intelligent design in program patching. Tech. Rep., UC Davis: College of Engineering https://escholarship.org/uc/item/3z8926ks

  • Brun Y, Medvidovic N (2007) An architectural style for solving computationally intensive problems on large networks. In: Software engineering for adaptive and self-managing systems (SEAMS). Minneapolis, MN, USA. https://doi.org/10.1109/SEAMS.2007.4

  • Brun Y, Medvidovic N (2007) Fault and adversary tolerance as an emergent property of distributed systems’ software architectures. In: International workshop on engineering fault tolerant systems (EFTS). Dubrovnik, Croatia, pp 38–43. https://doi.org/10.1145/1316550.1316557

  • Bryant A, Charmaz K (2007) The SAGE handbook of grounded theory. SAGE Publications Ltd, New York

    Book  Google Scholar 

  • Carbin M, Misailovic S, Kling M, Rinard M (2011) Detecting and escaping infinite loops with xJolt. In: European conference on object oriented programming (ECOOP). Lancaster, England, UK

  • Carzaniga A, Gorla A, Mattavelli A, Perino N, Pezzė M (2013) Automatic recovery from runtime failures. In: ACM/IEEE international conference on software engineering (ICSE). San Francisco, CA, USA, pp 782–791

  • Carzaniga A, Gorla A, Perino N, Pezzė M (2010) Automatic workarounds for web applications. In: ACM SIGSOFT international symposium on foundations of software engineering (FSE). Santa Fe, New Mexico, USA, pp 237–246. https://doi.org/10.1145/1882291.1882327

  • Charmaz K (2006) Constructing grounded theory: a practical guide through qualitative analysis. SAGE Publications Ltd, New York

    Google Scholar 

  • Coker Z, Hafiz M (2013) Program transformations to fix C integers. In: ACM/IEEE international conference on software engineering (ICSE). San Francisco, CA, USA, pp 792–801

  • Dallmeier V, Zeller A, Meyer B (2009) Generating fixes from object behavior anomalies. In: IEEE/ACM international conference on automated software engineering (ASE) short paper track. Auckland, New Zealand, pp 550–554. https://doi.org/10.1109/ASE.2009.15

  • Debroy V, Wong W (2010) Using mutation to automatically suggest fixes for faulty programs. In: International conference on software testing, verification, and validation. Paris, France, pp 65–74. https://doi.org/10.1109/ICST.2010.66

  • DeMarco F, Xuan J, Berre DL, Monperrus M (2014) Automatic repair of buggy if conditions and missing preconditions with SMT. In: International workshop on constraints in software testing, verification, and analysis (CSTVA). Hyderabad, India, pp 30–39. https://doi.org/10.1145/2593735.2593740

  • Demsky B, Ernst MD, Guo PJ, McCamant S, Perkins JH, Rinard M (2006) Inference and enforcement of data structure consistency specifications. In: International symposium on software testing and analysis (ISSTA). Portland, ME, USA, pp 233–243

  • Durieux T, Martinez M, Monperrus M, Sommerard R, Xuan J (2015) Automatic repair of real bugs: An experience report on the Defects4J dataset. arXiv:1505.07002

  • Elkarablieh B, Khurshid S (2008) Juzi: a tool for repairing complex data structures. In: ACM/IEEE international conference on software engineering (ICSE) formal demonstration track. Leipzig, Germany, pp 855–858. https://doi.org/10.1145/1368088.1368222

  • Ernst MD, Cockrell J, Griswold WG, Notkin D (2001) Dynamically discovering likely program invariants to support program evolution. IEEE Transactions on Software Engineering (TSE) 27(2):99–123

    Article  Google Scholar 

  • Ferguson CJ (2009) An effect size primer: a guide for clinicians and researchers. Prof Psychol: Res Prac 40(5):532–538. https://doi.org/10.1037/a0015808

    Article  MathSciNet  Google Scholar 

  • Fry ZP, Landau B, Weimer W (2012) A human study of patch maintainability. In: International symposium on software testing and analysis (ISSTA). Minneapolis, MN, USA, pp 177–187

  • Galhotra S, Brun Y, Meliou A (2017) Fairness testing: testing software for discrimination. In: European software engineering conference and ACM SIGSOFT symposium on the foundations of software engineering (ESEC/FSE). Paderborn, Germany, pp 498–510. https://doi.org/10.1145/3106237.3106277

  • Gopinath D, Malik MZ, Khurshid S (2011) Specification-based program repair using SAT. In: International conference on tools and algorithms for the construction and analysis of systems (TACAS). Saarbrücken, Germany, pp 173–188

  • Harman M (2007) The current state and future of search based software engineering. In: ACM/IEEE international conference on software engineering (ICSE), pp 342–357. https://doi.org/10.1109/FOSE.2007.29

  • Hutchins M, Foster H, Goradia T, Ostrand T (1994) Experiments of the effectiveness of dataflow-and control flow-based test adequacy criteria. In: ACM/IEEE international conference on software engineering (ICSE). Sorrento, Italy, pp 191–200

  • Jeffrey D, Feng M, Gupta N, Gupta R (2009) Bugfix: a learning-based tool to assist developers in fixing bugs. In: International conference on program comprehension (ICPC). Vancouver, BC, Canada, pp 70–79. https://doi.org/10.1109/ICPC.2009.5090029

  • Jiang M, Chena TY, Kuoa FC, Towey D, Ding Z (2016) A metamorphic testing approach for supporting program repair without the need for a test oracle. J Syst Softw (JSS) 126:127–140. https://doi.org/10.1016/j.jss.2016.04.002

  • Jin G, Song L, Zhang W, Lu S, Liblit B (2011) Automated atomicity-violation fixing. In: ACM SIGPLAN conference on programming language design and implementation (PLDI). San Jose, CA, USA, pp 389–400. https://doi.org/10.1145/1993498.1993544

  • Just R, Jalali D, Ernst MD (2014) Defects4j: a database of existing faults to enable controlled testing studies for Java programs. In: Proceedings of the international symposium on software testing and analysis (ISSTA). San Jose, CA, USA, pp 437–440

  • Ke Y, Stolee KT, Le Goues C, Brun Y (2015) Repairing programs with semantic code search. In: International conference on automated software engineering (ASE). Lincoln, NE, USA, pp 295–306. https://doi.org/10.1109/ASE.2015.60

  • Kim D, Nam J, Song J, Kim S (2013) Automatic patch generation learned from human-written patches. In: ACM/IEEE international conference on software engineering (ICSE). San Francisco, CA, USA, pp 802–811. http://dl.acm.org/citation.cfm?id=2486788.2486893

  • Kong X, Zhang L, Wong WE, Li B (2015) Experience report: how do techniques, programs, and tests impact automated program repair?. In: IEEE international symposium on software reliability engineering (ISSRE). Gaithersburg, MD, USA, pp 194–204. https://doi.org/10.1109/ISSRE.2015.7381813

  • Koza JR (1992) Genetic programming: on the programming of computers by means of natural selection. MIT Press, Cambridge

    MATH  Google Scholar 

  • Langdon WB, White DR, Harman M, Jia Y, Petke J (2016) API-constrained genetic improvement. In: International symposium on search based software engineering (SSBSE). Raleigh, NC, USA, pp 224–230. https://doi.org/10.1007/978-3-319-47106-8_16

  • Le XBD, Chu DH, Lo D, Le Goues C, Visser W (2017) S3: syntax- and semantic-guided repair synthesis via programming by examples. In: European software engineering conference and ACM SIGSOFT international symposium on foundations of software engineering (ESEC/FSE). Paderborn, Germany

  • Le Goues C, Dewey-Vogt M, Forrest S, Weimer W (2012a) A systematic study of automated program repair: Fixing 55 out of 105 bugs for $8 each. In: AMC/IEEE international conference on software engineering (ICSE). Zurich, Switzerland, pp 3–13

  • Le Goues C, Holtschulte N, Smith EK, Brun Y, Devanbu P, Forrest S, Weimer W (2015) The ManyBugs and IntroClass benchmarks for automated repair of C programs. IEEE Transactions on Software Engineering (TSE) 41(12):1236–1256. https://doi.org/10.1109/TSE.2015.2454513

    Article  Google Scholar 

  • Le Goues C, Nguyen T, Forrest S, Weimer W (2012b) Genprog: a generic method for automatic software repair. IEEE Transactions on Software Engineering (TSE) 38:54–72. https://doi.org/10.1109/TSE.2011.104

    Article  Google Scholar 

  • Le Roy MK (2009) Research methods in political science: an introduction using MicroCase, 7th edn. Thompson Learning, Wadsworth

    Google Scholar 

  • Liu P, Tripp O, Zhang C (2014) Grail: context-aware fixing of concurrency bugs. In: ACM SIGSOFT international symposium on foundations of software engineering (FSE). Hong Kong, China, pp 318–329

  • Liu P, Zhang C (2012) Axis: Automatically fixing atomicity violations through solving control constraints. In: ACM/IEEE international conference on software engineering (ICSE). Zurich, Switzerland, pp 299–309

  • Long F, Rinard M (2015) Staged program repair with condition synthesis. In: European software engineering conference and ACM SIGSOFT international symposium on foundations of software engineering (ESEC/FSE). Bergamo, Italy, pp 166–178. https://doi.org/10.1145/2786805.2786811

  • Long F, Rinard M (2016a) An analysis of the search spaces for generate and validate patch generation systems. In: ACM/IEEE international conference on software engineering (ICSE). Austin, TX, USA, pp 702–713. https://doi.org/10.1145/2884781.2884872

  • Long F, Rinard M (2016b) Automatic patch generation by learning correct code. In: ACM SIGPLAN-SIGACT symposium on principles of programming languages (POPL). St. Petersburg, FL, USA, pp 298–312. https://doi.org/10.1145/2837614.2837617

  • Martinez M, Durieux T, Sommerard R, Xuan J, Monperrus M (2017) Automatic repair of real bugs in Java: a large-scale experiment on the Defects4J dataset. Empirical Software Engineering (EMSE) 22(4):1936–1964. https://doi.org/10.1007/s10664-016-9470-4

    Article  Google Scholar 

  • Matavire R, Brown I (2013) Profiling grounded theory approaches in information systems research. Eur J Inf Syst 22(1):119–129. https://doi.org/10.1057/ejis.2011.35

    Article  Google Scholar 

  • Mechtaev S, Yi J, Roychoudhury A (2015) Directfix: looking for simple program repairs. In: International conference on software engineering (ICSE). Florence, Italy

  • Mechtaev S, Yi J, Roychoudhury A (2016) Angelix: Scalable multiline program patch synthesis via symbolic analysis. In: International conference on software engineering (ICSE). Austin, TX, USA

  • Monperrus M (2014) A critical review of automatic patch generation learned from human-written patches: essay on the problem statement and the evaluation of automatic software repair. In: ACM/IEEE international conference on software engineering (ICSE). Hyderabad, India, pp 234–242. https://doi.org/10.1145/2568225.2568324

  • Muşlu K, Brun Y, Meliou A (2013) Data debugging with continuous testing. In: European software engineering conference and ACM SIGSOFT symposium on the foundations of software engineering (ESEC/FSE) NIER Track. Saint Petersburg, Russia, pp 631–634. https://doi.org/10.1145/2491411.2494580

  • Muşlu K, Brun Y, Meliou A (2015) Preventing data errors with continuous testing. In: International symposium on software testing and analysis (ISSTA). Baltimore, MD, USA, pp 373–384. https://doi.org/10.1145/2771783.2771792

  • Newson R (2002) Parameters behind nonparametric statistics: Kendall’s tau, Somers’ D and median differences. Stata J 2(1):45–64

    Google Scholar 

  • Nguyen HDT, Qi D, Roychoudhury A, Chandra S (2013) Semfix: program repair via semantic analysis. In: ACM/IEEE international conference on software engineering (ICSE). San Francisco, CA, USA, pp 772–781

  • Orlov M, Sipper M (2011) Flight of the FINCH through the Java wilderness. IEEE Trans Evol Comput 15(2):166–182

    Article  Google Scholar 

  • Pei Y, Furia CA, Nordio M, Wei Y, Meyer B, Zeller A (2014) Automated fixing of programs with contracts. IEEE Transactions on Software Engineering (TSE) 40(5):427–449. https://doi.org/10.1109/TSE.2014.2312918

    Article  Google Scholar 

  • Perkins JH, Kim S, Larsen S, Amarasinghe S, Bachrach J, Carbin M, Pacheco C, Sherwood F, Sidiroglou S, Sullivan G, Wong WF, Zibin Y, Ernst MD, Rinard M (2009) Automatically patching errors in deployed software. In: ACM symposium on operating systems principles (SOSP). Big Sky, MT, USA, pp 87–102. https://doi.org/10.1145/1629575.1629585

  • Petke J, Haraldsson SO, Harman M, Langdon WB, White DR, Woodward JR (2017) Genetic improvement of software: a comprehensive survey. IEEE Transactions on Evolutionary Computation (TEC). In press. https://doi.org/10.1109/TEVC.2017.2693219

  • Qi Y, Mao X, Lei Y (2013) Efficient automated program repair through fault-recorded testing prioritization. In: International conference on software maintenance (ICSM). Eindhoven, The Netherlands, pp 180–189. https://doi.org/10.1109/ICSM.2013.29

  • Qi Z, Long F, Achour S, Rinard M (2015) An analysis of patch plausibility and correctness for generate-and-validate patch generation systems. In: International symposium on software testing and analysis (ISSTA). Baltimore, MD, USA, pp 24–36. https://doi.org/10.1145/2771783.2771791

  • Schulte E, Dorn J, Harding S, Forrest S, Weimer W (2014) Post-compiler software optimization for reducing energy. In: International conference on architectural support for programming languages and operating systems (ASPLOS). Salt Lake City, UT, USA, pp 639–652. https://doi.org/10.1145/2541940.2541980

  • Sidiroglou S, Keromytis AD (2005) Countering network worms through automatic patch generation. IEEE Secur Priv 3(6):41–49

    Article  Google Scholar 

  • Sidiroglou-Douskos S, Lahtinen E, Long F, Rinard M (2015) Automatic error elimination by horizontal code transfer across multiple applications. In: ACM SIGPLAN conference on programming language design and implementation (PLDI). Portland, OR, USA, pp 43–54. https://doi.org/10.1145/2737924.2737988

  • Smith EK, Barr E, Le Goues C, Brun Y (2015) Is the cure worse than the disease? Overfitting in automated program repair. In: European software engineering conference and ACM SIGSOFT symposium on the foundations of software engineering (ESEC/FSE). Bergamo, Italy, pp 532–543. https://doi.org/10.1145/2786805.2786825

  • softwaretestinghelp.com (2015) 15 most popular bug tracking software to ease your defect management process. http://www.softwaretestinghelp.com/popular-bug-tracking-software/, accessed December 11 2015

  • Soto M, Thung F, Wong CP, Goues CL, Lo D (2016) a deeper look into bug fixes: patterns, replacements, deletions, and additions. In: International conference on mining software repositories (MSR) mining challenge track. Austin, TX, USA. https://doi.org/10.1145/2901739.2903495

  • Tan SH, Roychoudhury A (2015) relifix: automated repair of software regressions. In: International conference on software engineering (ICSE). Florence, Italy

  • Wang X, Dong XL, Meliou A (2015) Data X-Ray: a diagnostic tool for data errors. In: International conference on management of data (SIGMOD)

  • Wei Y, Pei Y, Furia CA, Silva LS, Buchholz S, Meyer B, Zeller A (2010) Automated fixing of programs with contracts. In: International symposium on software testing and analysis (ISSTA). Trento, Italy, pp 61–72. https://doi.org/10.1145/1831708.1831716

  • Weimer W, Fry ZP, Forrest S (2013) Leveraging program equivalence for adaptive program repair: models and first results. In: IEEE/ACM international conference on automated software engineering (ASE). Palo alto, CA, USA

  • Weimer W, Nguyen T, Le Goues C, Forrest S (2009) Automatically finding patches using genetic programming. In: ACM/IEEE international conference on software engineering (ICSE). Vancouver, BC, Canada, pp 364–374. https://doi.org/10.1109/ICSE.2009.5070536

  • Weiss A, Guha A, Brun Y (2017) Tortoise: interactive system configuration repair. In: International conference on automated software engineering (ASE). Urbana-champaign, IL, USA

  • Wilkerson JL, Tauritz DR, Bridges JM (2012) Multi-objective coevolutionary automated software correction. In: Conference on genetic and evolutionary computation (GECCO). Philadelphia, PA, USA, pp 1229–1236. https://doi.org/10.1145/2330163.2330333

  • Yang G, Khurshid S, Kim M (2012) Specification-based test repair using a lightweight formal method. In: International symposium on formal methods (FM). Paris, France, pp 455–470. https://doi.org/10.1007/978-3-642-32759-9_37

Download references

Acknowledgements

This work is supported by the National Science Foundation under grants CCF-1453474 and CCF-1564162.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Manish Motwani.

Additional information

Communicated by: Martin Monperrus and Westley Weimer

Appendices

Appendix A: Importance and Difficulty Data

Table 1 describes the relevant concrete parameters for each of the bug tracking systems, project-hosting platforms, and defect benchmarks. We omit the semantics of the specific names the various systems and platforms use. This information is available from the underlying bug tracking systems and project-hosting platforms. Table 2 shows the mapping from concrete parameters to abstract parameters and to the five defect characteristics.

Table 1 We used grounded theory to extract from bug tracking systems, project-hosting platforms, and defect benchmarks the concrete parameters relevant to defect importance and difficulty, as well as several other parameters interesting to correlate with automated repair techniques’ ability to repair the defect
Table 2 Mapping of the concrete parameters from Table 1 to the eleven abstract parameters and then to the five defect characteristics

Appendix B: Availability of Data for Annotating Defects

Table 3 describes information about which abstract parameters were available in different issue tracking systems used by ManyBugs and Defects4J projects and how the corresponding concrete parameters were used to annotate the defects. Figure 16 shows the number of defects annotated for each abstract parameter using concrete parameters from bug trackers and benchmarks.

Table 3 Information about abstract parameters obtained from the issue tracking systems
Fig. 16
figure 16

The number of defects annotated for each abstract parameter using the information described in Table 3 and data available in the ManyBugs and Defects4J benchmarks

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Motwani, M., Sankaranarayanan, S., Just, R. et al. Do automated program repair techniques repair hard and important bugs?. Empir Software Eng 23, 2901–2947 (2018). https://doi.org/10.1007/s10664-017-9550-0

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10664-017-9550-0

Keywords

Navigation