skip to main content
10.1145/3377813.3381356acmconferencesArticle/Chapter ViewAbstractPublication PagesicseConference Proceedingsconference-collections
research-article

Precfix: large-scale patch recommendation by mining defect-patch pairs

Published: 18 September 2020 Publication History

Abstract

Patch recommendation is the process of identifying errors in software systems and suggesting suitable fixes for them. Patch recommendation can significantly improve developer productivity by reducing both the debugging and repairing time. Existing techniques usually rely on complete test suites and detailed debugging reports, which are often absent in practical industrial settings. In this paper, we propose Precfix, a pragmatic approach targeting large-scale industrial codebase and making recommendations based on previously observed debugging activities. Precfix collects defect-patch pairs from development histories, performs clustering, and extracts generic reusable patching patterns as recommendations. We conducted experimental study on an industrial codebase with 10K projects involving diverse defect patterns. We managed to extract 3K templates of defect-patch pairs, which have been successfully applied to the entire codebase. Our approach is able to make recommendations within milliseconds and achieves a false positive rate of 22% confirmed by manual review. The majority (10/12) of the interviewed developers appreciated Precfix, which has been rolled out to Alibaba to support various critical businesses.

References

[1]
Rui Abreu, Peter Zoeteweij, and Arjan JC Van Gemund. 2007. On the Accuracy of Spectrum-Based Fault Localization. In Testing: Academic and Industrial Conference Practice and Research Techniques-MUTATION. 89--98.
[2]
Hiralal Agrawal, Joseph R Horgan, Saul London, and W Eric Wong. 1995. Fault Localization Using Execution Slices and Dataflow Tests. In International Symposium on Software Reliability Engineering. 143--151.
[3]
Johannes Bader, Andrew Scott, Michael Pradel, and Satish Chandra. 2019. Getafix: Learning to FixBugs Automatically. arXiv:cs.SE/1902.06111
[4]
Jon Louis Bentley. 1975. Multidimensional Binary Search Trees Used for Associative Searching. Communications of the ACM 18, 9 (1975), 509--517.
[5]
Tom Britton, Lisa Jeng, Graham Carver, Paul Cheak, and Tomer Katzenellenbogen. 2013. Reversible Debugging Software. Technical Report. Judge Business School, University of Cambridge.
[6]
Zimin Chen, Steve Kommrusch, Michele Tufano, Louis-Noël Pouchet, Denys Poshyvanyk, and Martin Monperrus. 2018. SequenceR: Sequence-to-Sequence Learning for End-to-End Program Repair. arXiv:cs.SE/1901.01808
[7]
Valentin Dallmeier, Andreas Zeller, and Bertrand Meyer. 2009. Generating Fixes From Object Behavior Anomalies. In International Conference on Automated Software Engineering. 550--554.
[8]
Belur V Dasarathy. 1991. Nearest Neighbor (NN) Norms: NN Pattern Classification Techniques. IEEE Computer Society Tutorial (1991).
[9]
Martin Ester, Hans-Peter Kriegel, Jörg Sander, Xiaowei Xu, et al. 1996. A Density-Based Algorithm for Discovering Clusters in Large Spatial Databases with Noise. In International Conference on Knowledge Discovery and Data Mining. 226--231.
[10]
Qing Gao, Yingfei Xiong, Yaqing Mi, Lu Zhang, Weikun Yang, Zhaoping Zhou, Bing Xie, and Hong Mei. 2015. Safe Memory-Leak Fixing For C Programs. In International Conference on Software Engineering. 459--470.
[11]
Luca Gazzola, Daniela Micucci, and Leonardo Mariani. 2019. Automatic Software Repair: A Survey. Transactions on Software Engineering 45, 1 (2019), 34--67.
[12]
Git 2019. git-blame: Show What Revision and Author Last Modified Each Line of a File. https://git-scm.com/docs/git-blame.
[13]
Alibaba Group. 2019. Alibaba Cloud. https://www.alibabacloud.com.
[14]
Alibaba Group. 2019. MaxCompute: Conduct Large-Scale Data Warehousing with MaxCompute. https://www.alibabacloud.com/product/maxcompute.
[15]
Daniel S. Hirschberg. 1977. Algorithms for the Longest Common Subsequence Problem. Journal of the ACM 24, 4 (1977), 664--675.
[16]
Dennis Jeffrey, Min Feng, Neelam Gupta, and Rajiv Gupta. 2009. BugFix: A Learning-Based Tool to Assist Developers in Fixing Bugs. In International Conference on Program Comprehension. 70--79.
[17]
Jiajun Jiang, Yingfei Xiong, Hongyu Zhang, Qing Gao, and Xiangqun Chen. 2018. Shaping Program Repair Space with Existing Patches and Similar Code. In International Symposium on Software Testing and Analysis. 298--309.
[18]
James A Jones and Mary Jean Harrold. 2005. Empirical Evaluation of the Tarantula Automatic Fault-Localization Technique. In International Conference on Automated Software Engineering. 273--282.
[19]
René Just, Darioush Jalali, and Michael D. Ernst. 2014. Defects4J: A Database of Existing Faults to Enable Controlled Testing Studies for Java Programs. In International Symposium on Software Testing and Analysis. 437--440.
[20]
Shalini Kaleeswaran, Varun Tulsian, Aditya Kanade, and Alessandro Orso. 2014. Minthint: Automated Synthesis of Repair Hints. In International Conference on Software Engineering. 266--276.
[21]
Yalin Ke, Kathryn T Stolee, Claire Le Goues, and Yuriy Brun. 2015. Repairing Programs with Semantic Code Search. In International Conference on Automated Software Engineering. 295--306.
[22]
Dongsun Kim, Jaechang Nam, Jaewoo Song, and Sunghun Kim. 2013. Automatic Patch Generation Learned from Human-Written Patches. In International Conference on Software Engineering. 802--811.
[23]
Sunghun Kim, Thomas Zimmermann, E James Whitehead Jr, and Andreas Zeller. 2007. Predicting Faults from Cached History. In International Conference on Software Engineering. 489--498.
[24]
Anil Koyuncu, Kui Liu, Tegawendé F Bissyandé, Dongsun Kim, Martin Monperrus, Jacques Klein, and Yves Le Traon. 2019. iFixR: Bug Report Driven Program Repair. In Symposium on Foundations of Software Engineering. 314--325.
[25]
Vladimir I Levenshtein. 1966. Binary Codes Capable of Correcting Deletions, Insertions, and Reversals. Soviet physics doklady 10, 8 (1966), 707--710.
[26]
Jian Li, Pinjia He, Jieming Zhu, and Michael R. Lyu. 2017. Software Defect Prediction via Convolutional Neural Network. In International Conference on Software Quality, Reliability and Security. 318--328.
[27]
Yi Li, Chenguang Zhu, Milos Gligoric, Julia Rubin, and Marsha Chechik. 2019. Precise Semantic History Slicing Through Dynamic Delta Refinement. Automated Software Engineering 26, 4 (Dec 2019), 757--793.
[28]
Yi Li, Chenguang Zhu, Julia Rubin, and Marsha Chechik. 2017. FHistorian: Locating Features in Version Histories. In Proceedings of the 21st International Systems and Software Product Line Conference - Volume A (Sevilla, Spain). ACM, New York, NY, USA, 49--58.
[29]
Yi Li, Chenguang Zhu, Julia Rubin, and Marsha Chechik. 2017. Semantic Slicing of Software Version Histories. Transactions on Software Engineering 44, 2 (2017), 182--201.
[30]
Haopeng Liu, Yuxi Chen, and Shan Lu. 2016. Understanding and Generating High Quality Patches for Concurrency Bugs. In Symposium on Foundations of Software Engineering. 715--726.
[31]
Peng Liu, Omer Tripp, and Charles Zhang. 2014. Grail: Context-Aware Fixing of Concurrency Bugs. In Symposium on Foundations of Software Engineering. 318--329.
[32]
Fan Long and Martin Rinard. 2016. Automatic Patch Generation by Learning Correct Code. In Symposium on Principles of Programming Languages. 298--312.
[33]
Gurmeet Singh Manku, Arvind Jain, and Anish Das Sarma. 2007. Detecting Near-Duplicates for Web Crawling. In International Conference on World Wide Web. 141--150.
[34]
Martin Monperrus. 2018. Automatic Software Repair: A Bibliography. ACM Computing Surveys 51, 1 (2018), 17:1--17:24.
[35]
Seokhyeon Moon, Yunho Kim, Moonzoo Kim, and Shin Yoo. 2014. Ask the Mutants: Mutating Faulty Programs for Fault Localization. In International Conference on Software Testing, Verification and Validation. 153--162.
[36]
Mathieu Nayrolles and Abdelwahab Hamou-Lhadj. 2018. CLEVER: Combining Code Metrics with Clone Detection for Just-In-Time Fault Prevention and Resolution in Large Industrial Projects. In International Conference on Mining Software Repositories. 153--164.
[37]
Helmut Neukirchen. 2016. Survey and Performance Evaluation of DBSCAN Spatial Clustering Implementations for Big Data and High-Performance Computing Paradigms. Technical Report. Engineering Research Institute, University of Iceland.
[38]
Hoang Duong Thien Nguyen, Dawei Qi, Abhik Roychoudhury, and Satish Chandra. 2013. Semfix: Program Repair via Semantic Analysis. In International Conference on Software Engineering. 772--781.
[39]
Suphakit Niwattanakul, Jatsada Singthongchai, Ekkachai Naenudorn, and Supachanun Wanapu. 2013. Using of Jaccard Coefficient for Keywords Similarity. In International Multiconference of Engineers and Computer Scientists. 380--384.
[40]
Mike Papadakis and Yves Le Traon. 2015. Metallaxis-FL: Mutation-Based Fault Localization. Software Testing, Verification and Reliability 25, 5--7 (2015), 605--628.
[41]
Spencer Pearson, Jose Campos, Rene Just, Gordon Fraser, Rui Abreu, Michael D. Ernst, Deric Pang, and Benjamin Keller. 2017. Evaluating and Improving Fault Localization. In International Conference on Software Engineering. 609--620.
[42]
Vidyasagar Potdar and Elizabeth Chang. 2004. Open Source and Closed Source Software Development Methodologies. In International Conference on Software Engineering. 105--109.
[43]
Zichao Qi, Fan Long, Sara Achour, and Martin Rinard. 2015. An Analysis of Patch Plausibility and Correctness for Generate-And-Validate Patch Generation Systems. In International Symposium on Software Testing and Analysis. 24--36.
[44]
Foyzur Rahman, Daryl Posnett, Abram Hindle, Earl Barr, and Premkumar Devanbu. 2011. Bugcache for Inspections: Hit or Miss?. In Symposium on Foundations of Software Engineering. 322--331.
[45]
Manos Renieres and Steven P Reiss. 2003. Fault Localization with Nearest Neighbor Queries. In International Conference on Automated Software Engineering. 30--39.
[46]
Hesam Samimi, Max Schafer, Shay Artzi, Todd Millstein, Frank Tip, and Laurie Hendren. 2012. Automated Repair of HTML Generation Errors in PHP Applications using String Constraint Solving. In International Conference on Software Engineering. 277--287.
[47]
Guido Schryen. 2009. A Comprehensive and Comparative Analysis of the Patching Behavior of Open Source and Closed Source Software Vendors. In International Conference on IT Security Incident Management and IT Forensics. 153--168.
[48]
Jacek Śliwerski, Thomas Zimmermann, and Andreas Zeller. 2005. When do Changes Induce Fixes?. In International Conference on Mining Software Repositories. 1--5.
[49]
Undo Software. 2014. Increasing Software Development Productivity with Reversible Debugging.
[50]
Yida Tao, Jindae Kim, Sunghun Kim, and Chang Xu. 2014. Automatically Generated Patches as Debugging Aids: A Human Study. In Symposium on Foundations of Software Engineering. 64--74.
[51]
Shaowei Wang, David Lo, Lingxiao Jiang, Lucia, and Hoong Chuin Lau. 2011. Search-Based Fault Localization. In International Conference on Automated Software Engineering. 556--559.
[52]
Westley Weimer, ThanhVu Nguyen, Claire Le Goues, and Stephanie Forrest. 2009. Automatically Finding Patches Using Genetic Programming. In International Conference on Software Engineering. 364--374.
[53]
Chu-Pan Wong, Yingfei Xiong, Hongyu Zhang, Dan Hao, Lu Zhang, and Hong Mei. 2014. Boosting Bug-Report-Oriented Fault Localization with Segmentation and Stack-Trace Analysis. In International Conference on Software Maintenance. 181--190.
[54]
W. Eric Wong, Ruizhi Gao, Yihao Li, Rui Abreu, and Franz Wotawa. 2016. A Survey on Software Fault Localization. Transactions on Software Engineering 42, 8 (2016), 707--740.
[55]
Rongxin Wu, Hongyu Zhang, Shing-Chi Cheung, and Sunghun Kim. 2014. CrashLocator: Locating Crashing Faults Based on Crash Stacks. In International Symposium on Software Testing and Analysis. 204--214.
[56]
He Ye, Matias Martinez, Thomas Durieux, and Martin Monperrus. 2019. A Comprehensive Study of Automatic Program Repair on the Quixbugs Benchmark. In 2019 IEEE 1st International Workshop on Intelligent Bug Fixing (IBF). IEEE, 1--10.
[57]
Jooyong Yi, Shin Hwei Tan, Sergey Mechtaev, Marcel Böhme, and Abhik Roychoudhury. 2018. A Correlation Study between Automated Program Repair and Test-Suite Metrics. In International Conference on Software Engineering. 24.
[58]
Xiangyu Zhang, Neelam Gupta, and Rajiv Gupta. 2006. Locating faults through automated predicate switching. In Proceedings of the 28th international conference on Software engineering. ACM, 272--281.
[59]
Jian Zhou, Hongyu Zhang, and David Lo. 2012. Where Should the Bugs be Fixed? More Accurate Information Retrieval-Based Bug Localization Based on Bug Reports. In International Conference on Software Engineering. 14--24.
[60]
Daming Zou, Jingjing Liang, Yingfei Xiong, Michael D Ernst, and Lu Zhang. 2019. An Empirical Study of Fault Localization Families and Their Combinations. Transactions on Software Engineering (2019), To appear.

Cited By

View all
  • (2024)Evolving Paradigms in Automated Program Repair: Taxonomy, Challenges, and OpportunitiesACM Computing Surveys10.1145/369645057:2(1-43)Online publication date: 10-Oct-2024
  • (2023)A Survey of Learning-based Automated Program RepairACM Transactions on Software Engineering and Methodology10.1145/363197433:2(1-69)Online publication date: 23-Dec-2023
  • (2023)Mining Fix Patterns with Context Information for Automatic Program Repair2023 IEEE/ACM International Workshop on Automated Program Repair (APR)10.1109/APR59189.2023.00007(1-8)Online publication date: May-2023
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
ICSE-SEIP '20: Proceedings of the ACM/IEEE 42nd International Conference on Software Engineering: Software Engineering in Practice
June 2020
258 pages
ISBN:9781450371230
DOI:10.1145/3377813
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

In-Cooperation

  • KIISE: Korean Institute of Information Scientists and Engineers
  • IEEE CS

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 18 September 2020

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. defect detection
  2. patch generation
  3. patch recommendation

Qualifiers

  • Research-article

Funding Sources

Conference

ICSE '20
Sponsor:

Upcoming Conference

ICSE 2025

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)31
  • Downloads (Last 6 weeks)2
Reflects downloads up to 07 Mar 2025

Other Metrics

Citations

Cited By

View all
  • (2024)Evolving Paradigms in Automated Program Repair: Taxonomy, Challenges, and OpportunitiesACM Computing Surveys10.1145/369645057:2(1-43)Online publication date: 10-Oct-2024
  • (2023)A Survey of Learning-based Automated Program RepairACM Transactions on Software Engineering and Methodology10.1145/363197433:2(1-69)Online publication date: 23-Dec-2023
  • (2023)Mining Fix Patterns with Context Information for Automatic Program Repair2023 IEEE/ACM International Workshop on Automated Program Repair (APR)10.1109/APR59189.2023.00007(1-8)Online publication date: May-2023
  • (2023)BTLink : automatic link recovery between issues and commits based on pre-trained BERT modelEmpirical Software Engineering10.1007/s10664-023-10342-728:4Online publication date: 12-Jul-2023
  • (2023)CrossFix: Resolution of GitHub issues via similar bugs recommendationJournal of Software: Evolution and Process10.1002/smr.2554Online publication date: 23-Mar-2023
  • (2021)Sirius: Static Program Repair with Dependence Graph-Based Systematic Edit Patterns2021 IEEE International Conference on Software Maintenance and Evolution (ICSME)10.1109/ICSME52107.2021.00045(437-447)Online publication date: Sep-2021

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media