skip to main content
10.1145/3510454.3522678acmconferencesArticle/Chapter ViewAbstractPublication PagesicseConference Proceedingsconference-collections
short-paper

Efficiently and precisely searching for code changes with diffsearch

Published:19 October 2022Publication History

ABSTRACT

Version histories of code contain useful information and these data are public, thanks to open source software. However, searching through large repository histories can be complex, because there is no specific tool to search for code changes. This paper presents DiffSearch, the first efficient and scalable search engine for code changes. Given a list of repositories and a query, DiffSearch can retrieve specific code changes in a few seconds. We design a language-agnostic approach that we test on three popular programming languages: Java, JavaScript, and Python, and we design a query language that is an extension of the supported languages. We evaluate DiffSearch in three steps. First, we measure a recall of 81.8%, 89.6%, and 90,4% for Java, Python, and JavaScript, respectively, and an average response time lower than five seconds. Second, we demonstrate its scalability with a large dataset of one million code changes. Last, we perform a case study to show one of its possible applications, where DiffSearch gathers a dataset of 74,903 Java bug fixes.

References

  1. Johannes Bader, Andrew Scott, Michael Pradel, and Satish Chandra. 2019. Getafix: Learning to Fix Bugs Automatically. In OOPSLA. 159:1--159:27.Google ScholarGoogle Scholar
  2. Sushil Krishna Bajracharya, Trung Chi Ngo, Erik Linstead, Yimeng Dou, Paul Rigor, Pierre Baldi, and Cristina Videira Lopes. 2006. Sourcerer: a search engine for open source code supporting structure-based search. In Companion to the 21th Annual ACM SIGPLAN Conference on Object-Oriented Programming, Systems, Languages, and Applications, OOPSLA 2006, October 22--26, 2006, Portland, Oregon, USA, Peri L. Tarr and William R. Cook (Eds.). ACM, 681--682. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Xiaodong Gu, Hongyu Zhang, and Sunghun Kim. 2018. Deep code search. In Proceedings of the 40th International Conference on Software Engineering, ICSE 2018, Gothenburg, Sweden, May 27 - June 03, 2018, Michel Chaudron, Ivica Crnkovic, Marsha Chechik, and Mark Harman (Eds.). ACM, 933--944. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Jeff Johnson, Matthijs Douze, and Hervé Jégou. 2019. Billion-scale similarity search with GPUs. IEEE Transactions on Big Data (2019).Google ScholarGoogle ScholarCross RefCross Ref
  5. Rafael-Michael Karampatsis and Charles A. Sutton. 2019. How Often Do Single-Statement Bugs Occur? The ManySStuBs4J Dataset. CoRR abs/1905.13334 (2019). arXiv:1905.13334 http://arxiv.org/abs/1905.13334Google ScholarGoogle Scholar
  6. Kisub Kim, Dongsun Kim, Tegawendé F Bissyandé, Eunjong Choi, Li Li, Jacques Klein, and Yves Le Traon. 2018. FaCoY: a code-to-code search engine. In Proceedings of the 40th International Conference on Software Engineering. 946--957.Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. V. Benjamin Livshits and Thomas Zimmermann. 2005. DynaMine: Finding common error patterns by mining software revision histories. In European Software Engineering Conference and Symposium on Foundations of Software Engineering (ESEC/FSE). ACM, 296--305.Google ScholarGoogle Scholar
  8. Sifei Luan, Di Yang, Celeste Barnaby, Koushik Sen, and Satish Chandra. 2019. Aroma: Code recommendation via structural code search. Proceedings of the ACM on Programming Languages 3, OOPSLA (2019), 152.Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Fernanda Madeiral, Thomas Durieux, Victor Sobreira, and Marcelo Maia. 2018. Towards an automated approach for bug fix pattern detection. arXiv preprint arXiv:1807.11286 (2018).Google ScholarGoogle Scholar
  10. Matias Martinez and Martin Monperrus. 2019. Coming: A tool for mining change pattern instances from git commits. In 2019 IEEE/ACM 41st International Conference on Software Engineering: Companion Proceedings (ICSE-Companion). IEEE, 79--82.Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Stas Negara, Mihai Codoban, Danny Dig, and Ralph E Johnson. 2014. Mining fine-grained code changes to detect unknown change patterns. In Proceedings of the 36th International Conference on Software Engineering. 803--813.Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Anh Tuan Nguyen, Michael Hilton, MihaiCodoban, Hoan Anh Nguyen, Lily Mast, Eli Rademacher, Tien N Nguyen, and Danny Dig. 2016. API code recommendation using statistical learning from fine-grained changes. In Proceedings of the 2016 24th ACM SIGSOFT International Symposium on Foundations of Software Engineering. 511--522.Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Hoan Anh Nguyen, Tien N. Nguyen, Danny Dig, Son Nguyen, Hieu Tran, and Michael Hilton. 2019. Graph-based mining of in-the-wild, fine-grained, semantic code change patterns. In Proceedings of the 41st International Conference on Software Engineering, ICSE 2019, Montreal, QC, Canada, May 25--31, 2019. 819--830. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Steven P. Reiss. 2009. Semantics-based code search. In 31st International Conference on Software Engineering, ICSE 2009, May 16--24, 2009, Vancouver, Canada, Proceedings. IEEE, 243--253. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Reudismam Rolim, Gustavo Soares, Rohit Gheyi, and Loris D'Antoni. 2018. Learning Quick Fixes from Code Repositories. CoRR abs/1803.03806 (2018). arXiv:1803.03806 http://arxiv.org/abs/1803.03806Google ScholarGoogle Scholar
  16. Michele Tufano, Jevgenija Pantiuchina, Cody Watson, Gabriele Bavota, and Denys Poshyvanyk. 2019. On learning meaningful code changes via neural machine translation. In Proceedings of the 41st International Conference on Software Engineering, ICSE 2019, Montreal, QC, Canada, May 25--31, 2019. 25--36. https://dl.acm.org/citation.cfm?id=3339509Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Tongjie Wang, Yaroslav Golubev, Oleg Smirnov, Jiawei Li, Timofey Bryksin, and Iftekhar Ahmed. 2021. PyNose: A Test Smell Detector For Python. arXiv preprint arXiv:2108.04639 (2021).Google ScholarGoogle Scholar

Index Terms

  1. Efficiently and precisely searching for code changes with diffsearch
          Index terms have been assigned to the content through auto-classification.

          Recommendations

          Comments

          Login options

          Check if you have access through your login credentials or your institution to get full access on this article.

          Sign in
          • Published in

            cover image ACM Conferences
            ICSE '22: Proceedings of the ACM/IEEE 44th International Conference on Software Engineering: Companion Proceedings
            May 2022
            394 pages
            ISBN:9781450392235
            DOI:10.1145/3510454

            Copyright © 2022 ACM

            Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

            Publisher

            Association for Computing Machinery

            New York, NY, United States

            Publication History

            • Published: 19 October 2022

            Permissions

            Request permissions about this article.

            Request Permissions

            Check for updates

            Qualifiers

            • short-paper

            Acceptance Rates

            Overall Acceptance Rate276of1,856submissions,15%

            Upcoming Conference

            ICSE 2025
          • Article Metrics

            • Downloads (Last 12 months)38
            • Downloads (Last 6 weeks)3

            Other Metrics

          PDF Format

          View or Download as a PDF file.

          PDF

          eReader

          View online with eReader.

          eReader