Abstract
In this paper, we propose TPCaps, a new code clone detection framework for addressing the inefficiency of semantic clone detection and the difficulty in locating code clones. Based on CapsNet with tokens and Program Dependence Graph (PDG), TPCaps can improve the processing rate and the capability for detecting. Firstly, TPCaps determines tokens by dataset partitioning and semantic signature, filtering out the valid code clone. Then, using the tokens mentioned above, it can detect Type-1 and Type-2 clones effectively. In addition, TPcaps generates PDG that is composed of data dependencies and control dependencies extracted from codes. Using PDG as input, the improved capsule network, we called RCapsNet, is able to detect Type-3 and Type-4 clones. Based on the CapsNet, RCapsNet introduces selective search algorithm combines with the Regional Proposal Network (RPN), where CapsNet handles the clone features to achieve detection and classification, and RPN processes the location information and updates trains the candidate frames to obtain a specific clone location. In the experimental section, we evaluate the recall and precision of the model. TPCaps shows its high accuracy compared to other models.
Similar content being viewed by others
Data availability statement
The processed data required to reproduce these findings cannot be shared at this time as the data also forms part of an ongoing study.
References
Mondal M, Roy CK, Schneider KA (2020) A survey on clone refactoring and tracking. J Syst Softw 159:110429, ISSN 0164-1212. https://doi.org/10.1016/j.jss.2019.110429
Mondal M, Roy CK, Schneider KA et al (2018) J Syst Softw 144:41–59,ISSN 0164-1212. https://doi.org/10.1016/j.jss.2018.05.028
Zhang F, Khoo S-c, Xiaohong S (2017) Predicting change consistency in a clone group. J Syst Softw 134:105–119, ISSN 0164-1212. https://doi.org/10.1016/j.jss.2017.08.045
Tsantalis N, Mazinanian D and Krishnan GP, (2015)"Assessing the Refactorability of Software Clones," in IEEE Transactions on Software Engineering, 41(11):1055–1090
Nguyen HA, Nguyen TT, Pham NH, Al-Kofahi J and Nguyen TN, (2012)"Clone Management for Evolving Software," in IEEE Transactions on Software Engineering 38(5):1008–1026
Bellon S, Koschke R, Antoniol G, Krinke J and Merlo E, (2007) "Comparison and Evaluation of Clone Detection Tools," in IEEE Transactions on Software Engineering, vol. 33, no. 9, pp. 577–591
Roy CK and Cordy JR, (2007) “A survey on software clone detection research,” Technical Report 541, Queen’s University at Kingston, Tech. Rep
Nishi MA, Damevski K (2018) Scalable code clone detection and search based on adaptive prefix filtering. J Syst Softw 137:130–142. https://doi.org/10.1016/j.jss.2017.11.039
Kulkarni A, Callan J. (2010) “Document allocation policies for selective searching of distributed metricses,” Proceedings of the 19th ACM international conference on Information and knowledge management. (pp.449). ACM
Roy CK and Cordy JR, (2008) NICAD: Accurate detection of near-miss intentional clones using flexible pretty-printing and code normaliza-tion,” in Proc. 16th IEEE Int. Conf. Program Comprehension, pp. 172–181
Kamiya T, Kusumoto S and Inoue K, (2002) "CCFinder: a multilinguistic token-based code clone detection system for large scale source code," in IEEE Transactions on Software Engineering, 28(7):654–670
Wahler V, Seipel D, Wolff J, Fischer G (2004) Clone detection in source code by frequent itemset techniques. In: Source code analysis and manipulation. Fourth IEEE International Workshop on, Chicago, pp 128–135
Komondoor R and Horwitz S, (2001) “Using slicing to identify duplication in source code,” in International static analysis symposium. Springer, pp. 40–56
Kamalpriya CM and Singh P, (2017) “Enhancing program dependency graph based clone detection using approximate subgraph matching,” in Proc. IEEE 11th Int. Workshop Softw. Clones (IWSC), pp. 1–7
Vaibhav S, Farima F, Yadong L, Pierre B, Cristina L. (2017)“Oreo: Detection of Clones in the Twilight Zone,”. arXiv:1806.05837
Sheneamer A, Roy S, Kalita J. (2017) “A Detection Framework for Semantic Code Clones and Obfuscated Code,” Expert Systems with Applications, S0957417417308631
Sabour S, Frosst N, Hinton G. (2017) “Dynamic routing with capsules,”. arXiv:1710.09829
Zhang B, Xu X, Yang M, Chen X and Y. Ye, (2018) "Cross-Domain Sentiment Classification by Capsule Network with Semantic Rules," in IEEE Access, 6:58284–58294
Lin A, Li J and Ma Z, (2019) "On Learning and Learned Data Representation by Capsule Networks," in IEEE Access, 7:50808–50822
Edraki M, Rahnavard N, Shah M (2020) Subspace capsule network[C]. Proc AAAI Conf Artif Intell 34(07):10745–10753
Pan C, Velipasalar S. (2021) PT-CapsNet: A Novel Prediction-Tuning Capsule Network Suitable for Deeper Architectures[C]. Proceedings of the IEEE/CVF International Conference on Computer Vision. 11996–12005
Spinellis D, Gousios G, Samoladas I, Stamelos I (2008) The sqo-Oss quality model: measurement based open source software evaluation. Asian Pac J Cancer Prevent Apjcp 15(5):2101–2107
Saini V, Farmahinifarahani F, Lu Y, Baldi P and Lopes C. (2018) “Oreo: detection of clones in the twilight zone,” Proceedings of the 26th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering, 354–365
Sargsyan S, Kurmangaleev S, Belevantsev A, Avetisyan (2016) Scalable and accurate detection of code clones. Program Comput Softw 42(1):27–33
Girshick R, (2015) “Fast R-CNN,” arXiv:1504.08083
Ren S, He K, Girshick R, Sun J, (2016)“Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks,” arXiv:1506.01497
Roy CK, Cordy JR (2008) NICAD: accurate detection of near-miss intentional clones using flexible pretty-printing and code normalization. 16th IEEE International Conference on Program Comprehension, Amsterdam, pp 172–181
Jiang L, Misherghi G, Su Z, Glondu S (2007) DECKARD: scalable and accurate tree-based detection of code clones. 29th International Conference on Software Engineering (ICSE'07), Minneapolis, pp 96–105
Wang P, Svajlenko J, Wu Y, et al. (2018) CCAligner: a token based large-gap clone detector[C]//Proceedings of the 40th International Conference on Software Engineering. 1066–1077
Sajnani H, Saini V, Svajlenko J, Roy CK, and Lopes CV, (2016) “SourcererCC: Scaling code clone detection to big-code,” in Proc. IEEE/ACM 38th Int. Conf. Softw. Eng. pp. 1157–1168
Hua W, Sui Y, Wan Y et al (2020) FCCA: hybrid code representation for functional clone detection using attention networks[J]. IEEE Trans Reliab 70(1):304–318
Acknowledgements
This work was supported in part by the State Grid Jiangxi Information & Telecommunication Company Project under Grant 52183520007V.
Author information
Authors and Affiliations
Contributions
Yuancheng Li contributed to the conception of the study;
Chaohang Yu designed the experiment and wrote the manuscript;
Yaqi Cui performed the experiment, performed the data analyses and wrote the manuscript.
Corresponding author
Ethics declarations
Ethical statement
I confirm that there is no misconduct in this manuscript submission and declare that the research satisfies all the requirements of the submission guidelines regarding the ethical responsibilities of authors.
Consent to participate
Consent to participate was obtained from all individual participants included in the study.
Consent for publication
Written informed consent for publication was obtained from all participants.
Conflicts of interest/competing interests
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Li, Y., Yu, C. & Cui, Y. TPCaps: a framework for code clone detection and localization based on improved CapsNet. Appl Intell 53, 16594–16605 (2023). https://doi.org/10.1007/s10489-022-03158-3
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10489-022-03158-3