ABSTRACT
Version histories of code contain useful information and these data are public, thanks to open source software. However, searching through large repository histories can be complex, because there is no specific tool to search for code changes. This paper presents DiffSearch, the first efficient and scalable search engine for code changes. Given a list of repositories and a query, DiffSearch can retrieve specific code changes in a few seconds. We design a language-agnostic approach that we test on three popular programming languages: Java, JavaScript, and Python, and we design a query language that is an extension of the supported languages. We evaluate DiffSearch in three steps. First, we measure a recall of 81.8%, 89.6%, and 90,4% for Java, Python, and JavaScript, respectively, and an average response time lower than five seconds. Second, we demonstrate its scalability with a large dataset of one million code changes. Last, we perform a case study to show one of its possible applications, where DiffSearch gathers a dataset of 74,903 Java bug fixes.
- Johannes Bader, Andrew Scott, Michael Pradel, and Satish Chandra. 2019. Getafix: Learning to Fix Bugs Automatically. In OOPSLA. 159:1--159:27.Google Scholar
- Sushil Krishna Bajracharya, Trung Chi Ngo, Erik Linstead, Yimeng Dou, Paul Rigor, Pierre Baldi, and Cristina Videira Lopes. 2006. Sourcerer: a search engine for open source code supporting structure-based search. In Companion to the 21th Annual ACM SIGPLAN Conference on Object-Oriented Programming, Systems, Languages, and Applications, OOPSLA 2006, October 22--26, 2006, Portland, Oregon, USA, Peri L. Tarr and William R. Cook (Eds.). ACM, 681--682. Google ScholarDigital Library
- Xiaodong Gu, Hongyu Zhang, and Sunghun Kim. 2018. Deep code search. In Proceedings of the 40th International Conference on Software Engineering, ICSE 2018, Gothenburg, Sweden, May 27 - June 03, 2018, Michel Chaudron, Ivica Crnkovic, Marsha Chechik, and Mark Harman (Eds.). ACM, 933--944. Google ScholarDigital Library
- Jeff Johnson, Matthijs Douze, and Hervé Jégou. 2019. Billion-scale similarity search with GPUs. IEEE Transactions on Big Data (2019).Google ScholarCross Ref
- Rafael-Michael Karampatsis and Charles A. Sutton. 2019. How Often Do Single-Statement Bugs Occur? The ManySStuBs4J Dataset. CoRR abs/1905.13334 (2019). arXiv:1905.13334 http://arxiv.org/abs/1905.13334Google Scholar
- Kisub Kim, Dongsun Kim, Tegawendé F Bissyandé, Eunjong Choi, Li Li, Jacques Klein, and Yves Le Traon. 2018. FaCoY: a code-to-code search engine. In Proceedings of the 40th International Conference on Software Engineering. 946--957.Google ScholarDigital Library
- V. Benjamin Livshits and Thomas Zimmermann. 2005. DynaMine: Finding common error patterns by mining software revision histories. In European Software Engineering Conference and Symposium on Foundations of Software Engineering (ESEC/FSE). ACM, 296--305.Google Scholar
- Sifei Luan, Di Yang, Celeste Barnaby, Koushik Sen, and Satish Chandra. 2019. Aroma: Code recommendation via structural code search. Proceedings of the ACM on Programming Languages 3, OOPSLA (2019), 152.Google ScholarDigital Library
- Fernanda Madeiral, Thomas Durieux, Victor Sobreira, and Marcelo Maia. 2018. Towards an automated approach for bug fix pattern detection. arXiv preprint arXiv:1807.11286 (2018).Google Scholar
- Matias Martinez and Martin Monperrus. 2019. Coming: A tool for mining change pattern instances from git commits. In 2019 IEEE/ACM 41st International Conference on Software Engineering: Companion Proceedings (ICSE-Companion). IEEE, 79--82.Google ScholarDigital Library
- Stas Negara, Mihai Codoban, Danny Dig, and Ralph E Johnson. 2014. Mining fine-grained code changes to detect unknown change patterns. In Proceedings of the 36th International Conference on Software Engineering. 803--813.Google ScholarDigital Library
- Anh Tuan Nguyen, Michael Hilton, MihaiCodoban, Hoan Anh Nguyen, Lily Mast, Eli Rademacher, Tien N Nguyen, and Danny Dig. 2016. API code recommendation using statistical learning from fine-grained changes. In Proceedings of the 2016 24th ACM SIGSOFT International Symposium on Foundations of Software Engineering. 511--522.Google ScholarDigital Library
- Hoan Anh Nguyen, Tien N. Nguyen, Danny Dig, Son Nguyen, Hieu Tran, and Michael Hilton. 2019. Graph-based mining of in-the-wild, fine-grained, semantic code change patterns. In Proceedings of the 41st International Conference on Software Engineering, ICSE 2019, Montreal, QC, Canada, May 25--31, 2019. 819--830. Google ScholarDigital Library
- Steven P. Reiss. 2009. Semantics-based code search. In 31st International Conference on Software Engineering, ICSE 2009, May 16--24, 2009, Vancouver, Canada, Proceedings. IEEE, 243--253. Google ScholarDigital Library
- Reudismam Rolim, Gustavo Soares, Rohit Gheyi, and Loris D'Antoni. 2018. Learning Quick Fixes from Code Repositories. CoRR abs/1803.03806 (2018). arXiv:1803.03806 http://arxiv.org/abs/1803.03806Google Scholar
- Michele Tufano, Jevgenija Pantiuchina, Cody Watson, Gabriele Bavota, and Denys Poshyvanyk. 2019. On learning meaningful code changes via neural machine translation. In Proceedings of the 41st International Conference on Software Engineering, ICSE 2019, Montreal, QC, Canada, May 25--31, 2019. 25--36. https://dl.acm.org/citation.cfm?id=3339509Google ScholarDigital Library
- Tongjie Wang, Yaroslav Golubev, Oleg Smirnov, Jiawei Li, Timofey Bryksin, and Iftekhar Ahmed. 2021. PyNose: A Test Smell Detector For Python. arXiv preprint arXiv:2108.04639 (2021).Google Scholar
Index Terms
- Efficiently and precisely searching for code changes with diffsearch
Recommendations
Bug localization via searching crowd-contributed code
Internetware '14: Proceedings of the 6th Asia-Pacific Symposium on InternetwareBug localization, i.e., locating bugs in code snippets, is a frequent task in software development. Although static bug-finding tools are available to reduce manual effort in bug localization, these tools typically detect bugs with known project-...
DiffSearch: A Scalable and Precise Search Engine for Code Changes
The source code of successful projects is evolving all the time, resulting in hundreds of thousands of code changes stored in source code repositories. This wealth of data can be useful, e.g., to find changes similar to a planned code change or examples ...
Critics: an interactive code review tool for searching and inspecting systematic changes
FSE 2014: Proceedings of the 22nd ACM SIGSOFT International Symposium on Foundations of Software EngineeringDuring peer code reviews, developers often examine program differences. When using existing program differencing tools, it is difficult for developers to inspect systematic changes—similar, related changes that are scattered across multiple files. ...
Comments