skip to main content
10.1145/3387904.3389289acmconferencesArticle/Chapter ViewAbstractPublication PagesicseConference Proceedingsconference-collections
short-paper

Staged Tree Matching for Detecting Code Move across Files

Published:12 September 2020Publication History

ABSTRACT

In software development, developers often need to understand source code differences in their activities. GumTree is a tool that detects tree-based source code differences. GumTree constructs abstract syntax trees from the source code before and after a given change, and then, it identifies inserted/deleted/moved subtrees and updated nodes. Source code differences are detected based on the four kinds of information in GumTree. However, GumTree calculates the difference for each file individually, so that it cannot detect moves of code fragments across files. In this research, we propose (1) to construct a single abstract syntax tree from all source files included in a project and (2) to perform a staged tree matching to detect across-file code moves efficiently and accurately. We have already conducted a pilot experiment on open source projects with our technique. As a result, we were able to detect code moves across files in all the projects, and the number of such code moves was 76,600 in total.

References

  1. Md Salman Ahmed and Anika Tabassum. 2018. Automatic Contextual Commit Message Generation: A Two-phase Conversion Approach.Google ScholarGoogle Scholar
  2. Jean-Rémy Falleri, Floréal Morandat, Xavier Blanc, Matias Martinez, and Martin Monperrus. 2014. Fine-grained and accurate source code differencing. In International Conference on Automated Software Engineering, Vasteras, Sweden - 19, 2014. 313--324.Google ScholarGoogle Scholar
  3. Beat Fluri, Michael Wuersch, Martin PInzger, and Harald Gall. 2007. Change Distilling: Tree Differencing for Fine-Grained Source Code Change Extraction. IEEE Transactions on Software Engineering 33, 11 (2007), 725--743.Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Daniel M. German, Bram Adams, and Kate Stewart. 2019. Cregit: Token-Level Blame Information in Git Version Control Repositories. Empirical Software Engineering 24, 4 (2019), 2725--2763.Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Yoshiki Higo, Akio Ohtani, and Shinji Kusumoto. 2017. Generating Simpler AST Edit Scripts by Considering Copy-and-Paste. In Proceedings of the 32nd IEEE/ACM International Conference on Automated Software Engineering. 532--542.Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Christian Macho, Shane McIntosh, and Martin Pinzger. 2017. Extracting build changes with builddiff. In The 14th International Conference on Mining Software Repositories. 368--378.Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Fernanda Madeiral, Thomas Durieux, Victor Sobreira, and Marcelo Maia. 2018. Towards an automated approach for bug fix pattern detection.Google ScholarGoogle Scholar
  8. Martin Monperrus and Matias Martinez. [n.d.]. CVS-Vintage: A Dataset of 14 CVS Repositories of Java Software.Google ScholarGoogle Scholar
  9. Anh Tuan Nguyen, Michael Hilton, Mihai Codoban, Hoan Anh Nguyen, Lily Mast, Eli Rademacher, Tien N Nguyen, and Danny Dig. 2016. API code recommendation using statistical learning from fine-grained changes. In International Symposium on Foundations of Software Engineering. ACM, 511--522.Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Nikolaos Tsantalis, Matin Mansouri, Laleh M. Eshkevari, Davood Mazinanian, and Danny Dig. 2018. Accurate and Efficient Refactoring Detection in Commit History. In Proceedings of the 40th International Conference on Software Engineering. 483--494.Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Staged Tree Matching for Detecting Code Move across Files

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in
        • Published in

          cover image ACM Conferences
          ICPC '20: Proceedings of the 28th International Conference on Program Comprehension
          July 2020
          481 pages
          ISBN:9781450379588
          DOI:10.1145/3387904

          Copyright © 2020 ACM

          Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

          Publisher

          Association for Computing Machinery

          New York, NY, United States

          Publication History

          • Published: 12 September 2020

          Permissions

          Request permissions about this article.

          Request Permissions

          Check for updates

          Qualifiers

          • short-paper
          • Research
          • Refereed limited

          Upcoming Conference

          ICSE 2025

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader