ABSTRACT
In software development, developers often need to understand source code differences in their activities. GumTree is a tool that detects tree-based source code differences. GumTree constructs abstract syntax trees from the source code before and after a given change, and then, it identifies inserted/deleted/moved subtrees and updated nodes. Source code differences are detected based on the four kinds of information in GumTree. However, GumTree calculates the difference for each file individually, so that it cannot detect moves of code fragments across files. In this research, we propose (1) to construct a single abstract syntax tree from all source files included in a project and (2) to perform a staged tree matching to detect across-file code moves efficiently and accurately. We have already conducted a pilot experiment on open source projects with our technique. As a result, we were able to detect code moves across files in all the projects, and the number of such code moves was 76,600 in total.
- Md Salman Ahmed and Anika Tabassum. 2018. Automatic Contextual Commit Message Generation: A Two-phase Conversion Approach.Google Scholar
- Jean-Rémy Falleri, Floréal Morandat, Xavier Blanc, Matias Martinez, and Martin Monperrus. 2014. Fine-grained and accurate source code differencing. In International Conference on Automated Software Engineering, Vasteras, Sweden - 19, 2014. 313--324.Google Scholar
- Beat Fluri, Michael Wuersch, Martin PInzger, and Harald Gall. 2007. Change Distilling: Tree Differencing for Fine-Grained Source Code Change Extraction. IEEE Transactions on Software Engineering 33, 11 (2007), 725--743.Google ScholarDigital Library
- Daniel M. German, Bram Adams, and Kate Stewart. 2019. Cregit: Token-Level Blame Information in Git Version Control Repositories. Empirical Software Engineering 24, 4 (2019), 2725--2763.Google ScholarDigital Library
- Yoshiki Higo, Akio Ohtani, and Shinji Kusumoto. 2017. Generating Simpler AST Edit Scripts by Considering Copy-and-Paste. In Proceedings of the 32nd IEEE/ACM International Conference on Automated Software Engineering. 532--542.Google ScholarDigital Library
- Christian Macho, Shane McIntosh, and Martin Pinzger. 2017. Extracting build changes with builddiff. In The 14th International Conference on Mining Software Repositories. 368--378.Google ScholarDigital Library
- Fernanda Madeiral, Thomas Durieux, Victor Sobreira, and Marcelo Maia. 2018. Towards an automated approach for bug fix pattern detection.Google Scholar
- Martin Monperrus and Matias Martinez. [n.d.]. CVS-Vintage: A Dataset of 14 CVS Repositories of Java Software.Google Scholar
- Anh Tuan Nguyen, Michael Hilton, Mihai Codoban, Hoan Anh Nguyen, Lily Mast, Eli Rademacher, Tien N Nguyen, and Danny Dig. 2016. API code recommendation using statistical learning from fine-grained changes. In International Symposium on Foundations of Software Engineering. ACM, 511--522.Google ScholarDigital Library
- Nikolaos Tsantalis, Matin Mansouri, Laleh M. Eshkevari, Davood Mazinanian, and Danny Dig. 2018. Accurate and Efficient Refactoring Detection in Commit History. In Proceedings of the 40th International Conference on Software Engineering. 483--494.Google ScholarDigital Library
Index Terms
- Staged Tree Matching for Detecting Code Move across Files
Recommendations
Move-optimized source code tree differencing
ASE '16: Proceedings of the 31st IEEE/ACM International Conference on Automated Software EngineeringWhen it is necessary to express changes between two source code files as a list of edit actions (an edit script), modern tree differencing algorithms are superior to most text-based approaches because they take code movements into account and express ...
A novel neural source code representation based on abstract syntax tree
ICSE '19: Proceedings of the 41st International Conference on Software EngineeringExploiting machine learning techniques for analyzing programs has attracted much attention. One key problem is how to represent code fragments well for follow-up analysis. Traditional information retrieval based methods often treat programs as natural ...
Detecting code clones with gaps by function applications
PEPM 2017: Proceedings of the 2017 ACM SIGPLAN Workshop on Partial Evaluation and Program ManipulationCode clones are pairs or groups of code segments which are identical or similar to each other. Generally the existence of code clones is considered to make it cumbersome to maintain the source code, so that various kinds of code clone detection tools ...
Comments