Abstract
With the usage of version control systems, many bug fixes have accumulated over the years. Researchers have proposed various automatic program repair (APR) approaches that reuse past fixes to fix new bugs. However, some fundamental questions, such as how new fixes overlap with old fixes, have not been investigated. Intuitively, the overlap between old and new fixes decides how APR approaches can construct new fixes with old ones. Based on this intuition, we systematically designed six overlap metrics, and performed an empirical study on 5,735 bug fixes to investigate the usefulness of past fixes when composing new fixes. For each bug fix, we created delta graphs (i.e., program dependency graphs for code changes), and identified how bug fixes overlap with each other in terms of the content, code structures, and identifier names of fixes. Our results show that if an APR approach knows all code name changes and composes new fixes by fully or partially reusing the content of past fixes, only 2.1% and 3.2% new fixes can be created from single or multiple past fixes in the same project, compared with 0.9% and 1.2% fixes created from past fixes across projects. However, if an APR approach knows all code name changes and composes new fixes by fully or partially reusing the code structures of past fixes, up to 41.3% and 29.7% new fixes can be created. By making the above observations and revealing other ten findings, we investigated the upper bound of reusable past fixes and composable new fixes, exploring the potential of existing and future APR approaches.







Similar content being viewed by others
Notes
https://issues.apache.org/jira/browse/LUCENE-1510. To save space, in the rest of the paper, we present only ids of fixes, but do not present their urls. All the fixes come from Apache, and their urls can be generated by replacing the id in the above url.
PI is more restricted than PS and PN, but not the combination of the two. As a result, Column PI is not the sum of Columns PS and PN.
References
Barr ET, Brun Y, Devanbu P, Harman M, Sarro F (2014) The plastic surgery hypothesis. In: Proceedings of the ESEC/FSE, pp 306–317
Dagenais B, Hendren LJ (2008) Enabling static analysis for partial Java programs. In: Proceedings of the 23rd OOPSLA, pp 313–328
Fluri B, Wursch M, PInzger M, Gall HC (2007) Change distilling: tree differencing for fine-grained source code change extraction. IEEE Trans Softw Eng 33(11):725–743
Gabel M, Su Z (2010) A study of the uniqueness of source code. In: Proceedings of the FSE, pp 147–156
Gao Q, Zhang H, Wang J, Xiong Y, Zhang L, Mei H (2015) Fixing recurring crash bugs via analyzing Q&A sites. In: Proceedings of the 30th ASE, pp 307–318
Golub GH, Heath M, Wahba G (1979) Generalized cross-validation as a method for choosing a good ridge parameter. Technometrics 21(2):215–223
Higo Y, Ohtani A, Hayashi S, Hata H, Shinji K (2015) Toward reusing code changes. In: Proceedings of the MSR, pp 372–376
Jeffrey D, Feng M, Gupta N, Gupta R (2009) Bugfix: a learning-based tool to assist developers in fixing bugs. In: Proceedings of the 17th ICPC, pp 70–79
Jin G, Song L, Shi X, Scherpelz J, Lu S (2012) Understanding and detecting real-world performance bugs. In: Proceedings of the 33rd PLDI
Kaleeswaran S, Tulsian V, Kanade A, Orso A (2014) Minthint: automated synthesis of repair hints. In: Proceeding of the ICSE, pp 266–276
Kamiya T, Kusumoto S, Inoue K (2002) CCFinder: a multilinguistic token-based code clone detection system for large scale source code. IEEE Trans Softw Eng 654–670
Kim S, Pan K, Whitehead EEJ Jr (2006) Memories of bug fixes. In: Proceedings of the ESEC/FSE, pp 35–45
Kim S, Zimmermann T, Whitehead EJ Jr, Zeller A (2007) Predicting faults from cached history. In: Proceedings of the 29th ICSE, pp 489–498
Kim D, Nam J, Song J, Kim S (2013) Automatic patch generation learned from human-written patches. In: Proceedings of the 35th ICSE, pp 802–811
Kuhn HW (1955) The hungarian method for the assignment problem. Naval Res Logist Q 2(1–2):83–97
Le Goues C, Dewey-Vogt M, Forrest S, Weimer W (2012) A systematic study of automated program repair: fixing 55 out of 105 bugs for $8 each. In: Proceedings of the 34th ICSE, pp 3–13
Le Goues C, Holtschulte N, Smith EK, Brun Y, Devanbu P, Forrest S, Weimer W (2015) The ManyBugs and IntroClass benchmarks for automated repair of C programs. IEEE Trans Softw Eng 41(12):1236–1256
Liang G, Wang Q, Xie T, Mei H (2013) Inferring project-specific bug patterns for detecting sibling bugs. In: Proceedings of the ESEC/FSE, pp 565–575
Liu C, Yang J, Tan L, Hafiz M (2013) R2Fix: automatically generating bug fixes from bug reports. In: Proceedings of the 6th ICST, pp 282–291
Long F, Rinard M (2015) Staged program repair with condition synthesis. In: Proceedings of the 10th ESEC/FSE, pp 166–178
Long F, Rinard M (2016) Automatic patch generation by learning correct code. In: Proceedings of the 43rd POPL, pp 298–312
Lu S, Park S, Seo E, Zhou Y (2008) Learning from mistakes—a comprehensive study on real world concurrency bug characteristics. In: Proceedings of the 13th ASPLOS
Martinez M, Monperrus M (2013) Mining software repair models for reasoning on the search space of automated program fixing. Empir Softw Eng 20(1):176–205
Martinez M, Weimer W, Monperrus M (2014) Do the fix ingredients already exist? An empirical inquiry into the redundancy assumptions of program repair approaches. In: Proceedings of the 36th ICSE, pp 492–495
Meng N, Kim M, McKinley K S (2011) Sydit: creating and applying a program transformation from an example. In: Proceedings of the ESEC/FSE, pp 440–443
Munkres J (1957) Algorithms for the assignment and transportation problems. J Soc Ind Appl Math 5(1):32–38
Myers EW (1986) Ano(nd) difference algorithm and its variations. Algorithmica 1(1):251–266
Negara S, Codoban M, Dig D, Johnson RE (2014) Mining fine-grained code changes to detect unknown change patterns. In: Proceedings of the 36th ICSE, pp 803–813
Nguyen TT, Nguyen HA, Pham NH, Al-Kofahi J, Nguyen TN (2010) Recurring bug fixes in object-oriented programs. In: Proceedings of the 32nd ICSE, pp 315–324
Nguyen HA, Nguyen AT, Nguyen TT, Nguyen TN, Rajan H (2013) A study of repetitiveness of code changes in software evolution. In: Proceedings of the 28th ASE, pp 180–190
Qi Y, Mao X, Lei Y, Dai Z, Wang C (2014) The strength of random search on automated program repair. In: Proceedings of the 36th ICSE, pp 254–265
Qi Z, Long F, Achour S, Rinard M (2015) An analysis of patch plausibility and correctness for generate-and-validate patch generation systems. In: Proceedings of the ISSTA, pp 24–36
Ray B, Kim M (2012) A case study of cross-system porting in forked projects. In: Proceedings of the ESEC/FSE
Rolim R, Soares G, D’Antoni L, Polozov O, Gulwani S, Gheyi R, Suzuki R, Hartmann B (2017) Learning syntactic program transformations from examples. In: Proceedings of the 39th ICSE, pp 404–415
Tan L, Liu C, Li Z, Wang X, Zhou Y, Zhai C (2014) Bug characteristics in open source software. Empir Softw Eng 19(6):1665–1705
Tan M, Tan L, Dara S, Mayeux C (2015) Online defect prediction for imbalanced data. In: Proceedings of the ICSE, software engineering in practice, pp 99–108
Taneja K, Dig D, Xie T (2007) Automated detection of API refactorings in libraries. In: Proceedings of the 22nd ASE, pp 377–380
Weimer W, Nguyen T, Le Goues C, Forrest S (2009) Automatically finding patches using genetic programming. In: Proceedings of the 31st ICSE, pp 364–374
Xiong Y, Wang J, Yan R, Zhang J, Han S, Huang G, Zhang L (2017) Precise condition synthesis for program repair. In: Proceedings of the ICSE, pp 416–426
Yin Z, Yuan D, Zhou Y, Pasupathy S, Bairavasundaram L (2011a) How do fixes become bugs? In: Proceedings of the ESEC/FSE, pp 26–36
Yin Z, Ma X, Zheng J, Zhou Y, Bairavasundaram L N, Pasupathy S (2011b) An empirical study on configuration errors in commercial and open source systems. In: Proceedings of the 23rd SOSP, pp 159–172
Yue R, Meng N, Wang Q (2017) A characterization study of repeated bug fixes. In: Proceedings of the ICSME, pp 422–432
Zhang J, Wang Z, Zhang L, Hao D, Zang L, Cheng S, Zhang L (2016) Predictive mutation testing. In: Proceedings of the ISSTA, pp 342–353
Zhong H, Meng N (2017) Poster: an empirical study on using hints from past fixes. In: Proceedings of the 39th ICSE, pp 144–145
Zhong H, Su Z (2015) An empirical study on real bug fixes. In: Proceedings of the 37th ICSE, pp 913–923
Zhong H, Wang X (2017) Boosting complete-code tool for partial program. In: Proceedings of the ASE, pp 671–681
Zhong H, Zhang L, Xie T, Mei H (2009) Inferring resource specifications from natural language API documentation. In: Proceedings of the 24th ASE, pp 307–318
Acknowledgements
Hao Zhong is sponsored by the 973 Program in China (No. 2015CB352203), the National Nature Science Foundation of China No. 61572313, and the grant of Science and Technology Commission of Shanghai Municipality No. 15DZ1100305. Na Meng is sponsored by the NSF CCF No. 1565827 and ONR-450487.
Author information
Authors and Affiliations
Corresponding author
Additional information
Communicated by: Sunghun Kim
This paper is an extended version of a poster paper (Zhong and Meng 2017) that is presented in the 39th International Conference on Software Engineering, 2017.
Rights and permissions
About this article
Cite this article
Zhong, H., Meng, N. Towards reusing hints from past fixes. Empir Software Eng 23, 2521–2549 (2018). https://doi.org/10.1007/s10664-017-9584-3
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10664-017-9584-3