Code2Img: Tree-Based Image Transformation for Scalable Code Clone Detection | IEEE Journals & Magazine | IEEE Xplore

Code2Img: Tree-Based Image Transformation for Scalable Code Clone Detection


Abstract:

Code clone detection is an active research domain of software engineering. There are two core demands for clone detection: scalable detection and complicated clone detect...Show More

Abstract:

Code clone detection is an active research domain of software engineering. There are two core demands for clone detection: scalable detection and complicated clone detection. For scalable detection, existing approaches treat the source code as a text or token sequence and then calculate their similarity. However, the text-based and token-based approaches are difficult to detect complicated clone types due to the lack of consideration of code structure. The methods based on intermediate representations of code can effectively achieve complex clone types detection but are limited by the complexity of representations to be scalable. In this paper, we propose Code2Img, a tree-based code clone detector, which satisfies scalability while detecting complicated clones effectively. Given the source code, we first perform clone filtering by the inverted index to locate the suspected clones. For each suspected clone, we create the adjacency image based on the adjacency matrix of the normalized abstract syntax tree (AST). Then we design an image encoder to highlight the structural details further and refine pixels of the image. Specifically, we employ the Markov model to encode the adjacency image into a state probability image and remove its useless pixels. By this, the original complex tree can be transformed into a one-dimensional vector while preserving the structural feature of the AST. Finally, we detect clones by calculating the Jaccard Similarity of these vectors. We conduct comparative evaluations on effectiveness and scalability with eight other state-of-the-art clone detectors (SourcererCC, NIL, LVMapper, Nicad, Siamese, CCAligner, Deckard, and Yang2018). The experimental results show that Code2Img achieves the best performance among all the comparative tools in terms of both detection effectiveness and scalability. It indicates that Code2Img can be applicable to scalable complicated clone detection.
Published in: IEEE Transactions on Software Engineering ( Volume: 49, Issue: 9, 01 September 2023)
Page(s): 4429 - 4442
Date of Publication: 17 July 2023

ISSN Information:

Funding Agency:


Contact IEEE to Subscribe

References

References is not available for this document.