TPCaps: a framework for code clone detection and localization based on improved CapsNet

Li, Yuancheng; Yu, Chaohang; Cui, Yaqi

doi:10.1007/s10489-022-03158-3

TPCaps: a framework for code clone detection and localization based on improved CapsNet

Published: 12 December 2022

Volume 53, pages 16594–16605, (2023)
Cite this article

Applied Intelligence Aims and scope Submit manuscript

Yuancheng Li¹,
Chaohang Yu¹ &
Yaqi Cui¹

225 Accesses
1 Citation
1 Altmetric
Explore all metrics

Abstract

In this paper, we propose TPCaps, a new code clone detection framework for addressing the inefficiency of semantic clone detection and the difficulty in locating code clones. Based on CapsNet with tokens and Program Dependence Graph (PDG), TPCaps can improve the processing rate and the capability for detecting. Firstly, TPCaps determines tokens by dataset partitioning and semantic signature, filtering out the valid code clone. Then, using the tokens mentioned above, it can detect Type-1 and Type-2 clones effectively. In addition, TPcaps generates PDG that is composed of data dependencies and control dependencies extracted from codes. Using PDG as input, the improved capsule network, we called RCapsNet, is able to detect Type-3 and Type-4 clones. Based on the CapsNet, RCapsNet introduces selective search algorithm combines with the Regional Proposal Network (RPN), where CapsNet handles the clone features to achieve detection and classification, and RPN processes the location information and updates trains the candidate frames to obtain a specific clone location. In the experimental section, we evaluate the recall and precision of the model. TPCaps shows its high accuracy compared to other models.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

SEED: Semantic Graph Based Deep Detection for Type-4 Clone

Precise Code Clone Detection with Architecture of Abstract Syntax Trees

Finding Code-Clone Snippets in Large Source-Code Collection by ccgrep

Data availability statement

The processed data required to reproduce these findings cannot be shared at this time as the data also forms part of an ongoing study.

References

Mondal M, Roy CK, Schneider KA (2020) A survey on clone refactoring and tracking. J Syst Softw 159:110429, ISSN 0164-1212. https://doi.org/10.1016/j.jss.2019.110429
Article Google Scholar
Mondal M, Roy CK, Schneider KA et al (2018) J Syst Softw 144:41–59,ISSN 0164-1212. https://doi.org/10.1016/j.jss.2018.05.028
Article Google Scholar
Zhang F, Khoo S-c, Xiaohong S (2017) Predicting change consistency in a clone group. J Syst Softw 134:105–119, ISSN 0164-1212. https://doi.org/10.1016/j.jss.2017.08.045
Article Google Scholar
Tsantalis N, Mazinanian D and Krishnan GP, (2015)"Assessing the Refactorability of Software Clones," in IEEE Transactions on Software Engineering, 41(11):1055–1090
Nguyen HA, Nguyen TT, Pham NH, Al-Kofahi J and Nguyen TN, (2012)"Clone Management for Evolving Software," in IEEE Transactions on Software Engineering 38(5):1008–1026
Bellon S, Koschke R, Antoniol G, Krinke J and Merlo E, (2007) "Comparison and Evaluation of Clone Detection Tools," in IEEE Transactions on Software Engineering, vol. 33, no. 9, pp. 577–591
Roy CK and Cordy JR, (2007) “A survey on software clone detection research,” Technical Report 541, Queen’s University at Kingston, Tech. Rep
Nishi MA, Damevski K (2018) Scalable code clone detection and search based on adaptive prefix filtering. J Syst Softw 137:130–142. https://doi.org/10.1016/j.jss.2017.11.039
Article Google Scholar
Kulkarni A, Callan J. (2010) “Document allocation policies for selective searching of distributed metricses,” Proceedings of the 19th ACM international conference on Information and knowledge management. (pp.449). ACM
Roy CK and Cordy JR, (2008) NICAD: Accurate detection of near-miss intentional clones using flexible pretty-printing and code normaliza-tion,” in Proc. 16th IEEE Int. Conf. Program Comprehension, pp. 172–181
Kamiya T, Kusumoto S and Inoue K, (2002) "CCFinder: a multilinguistic token-based code clone detection system for large scale source code," in IEEE Transactions on Software Engineering, 28(7):654–670
Wahler V, Seipel D, Wolff J, Fischer G (2004) Clone detection in source code by frequent itemset techniques. In: Source code analysis and manipulation. Fourth IEEE International Workshop on, Chicago, pp 128–135
Google Scholar
Komondoor R and Horwitz S, (2001) “Using slicing to identify duplication in source code,” in International static analysis symposium. Springer, pp. 40–56
Kamalpriya CM and Singh P, (2017) “Enhancing program dependency graph based clone detection using approximate subgraph matching,” in Proc. IEEE 11th Int. Workshop Softw. Clones (IWSC), pp. 1–7
Vaibhav S, Farima F, Yadong L, Pierre B, Cristina L. (2017)“Oreo: Detection of Clones in the Twilight Zone,”. arXiv:1806.05837
Sheneamer A, Roy S, Kalita J. (2017) “A Detection Framework for Semantic Code Clones and Obfuscated Code,” Expert Systems with Applications, S0957417417308631
Sabour S, Frosst N, Hinton G. (2017) “Dynamic routing with capsules,”. arXiv:1710.09829
Zhang B, Xu X, Yang M, Chen X and Y. Ye, (2018) "Cross-Domain Sentiment Classification by Capsule Network with Semantic Rules," in IEEE Access, 6:58284–58294
Lin A, Li J and Ma Z, (2019) "On Learning and Learned Data Representation by Capsule Networks," in IEEE Access, 7:50808–50822
Edraki M, Rahnavard N, Shah M (2020) Subspace capsule network[C]. Proc AAAI Conf Artif Intell 34(07):10745–10753
Google Scholar
Pan C, Velipasalar S. (2021) PT-CapsNet: A Novel Prediction-Tuning Capsule Network Suitable for Deeper Architectures[C]. Proceedings of the IEEE/CVF International Conference on Computer Vision. 11996–12005
Spinellis D, Gousios G, Samoladas I, Stamelos I (2008) The sqo-Oss quality model: measurement based open source software evaluation. Asian Pac J Cancer Prevent Apjcp 15(5):2101–2107
Google Scholar
Saini V, Farmahinifarahani F, Lu Y, Baldi P and Lopes C. (2018) “Oreo: detection of clones in the twilight zone,” Proceedings of the 26th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering, 354–365
Sargsyan S, Kurmangaleev S, Belevantsev A, Avetisyan (2016) Scalable and accurate detection of code clones. Program Comput Softw 42(1):27–33
Article MathSciNet Google Scholar
Girshick R, (2015) “Fast R-CNN,” arXiv:1504.08083
Ren S, He K, Girshick R, Sun J, (2016)“Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks,” arXiv:1506.01497
Roy CK, Cordy JR (2008) NICAD: accurate detection of near-miss intentional clones using flexible pretty-printing and code normalization. 16th IEEE International Conference on Program Comprehension, Amsterdam, pp 172–181
Google Scholar
Jiang L, Misherghi G, Su Z, Glondu S (2007) DECKARD: scalable and accurate tree-based detection of code clones. 29th International Conference on Software Engineering (ICSE'07), Minneapolis, pp 96–105
Google Scholar
Wang P, Svajlenko J, Wu Y, et al. (2018) CCAligner: a token based large-gap clone detector[C]//Proceedings of the 40th International Conference on Software Engineering. 1066–1077
Sajnani H, Saini V, Svajlenko J, Roy CK, and Lopes CV, (2016) “SourcererCC: Scaling code clone detection to big-code,” in Proc. IEEE/ACM 38th Int. Conf. Softw. Eng. pp. 1157–1168
Hua W, Sui Y, Wan Y et al (2020) FCCA: hybrid code representation for functional clone detection using attention networks[J]. IEEE Trans Reliab 70(1):304–318
Article Google Scholar

Download references

Acknowledgements

This work was supported in part by the State Grid Jiangxi Information & Telecommunication Company Project under Grant 52183520007V.

Author information

Authors and Affiliations

School of Control and Computer Engineering, North China Electric Power University, NO.2 Beinong Road, Beijing, 102206, China
Yuancheng Li, Chaohang Yu & Yaqi Cui

Authors

Yuancheng Li
View author publications
You can also search for this author in PubMed Google Scholar
Chaohang Yu
View author publications
You can also search for this author in PubMed Google Scholar
Yaqi Cui
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

Yuancheng Li contributed to the conception of the study;

Chaohang Yu designed the experiment and wrote the manuscript;

Yaqi Cui performed the experiment, performed the data analyses and wrote the manuscript.

Corresponding author

Correspondence to Yuancheng Li.

Ethics declarations

Ethical statement

I confirm that there is no misconduct in this manuscript submission and declare that the research satisfies all the requirements of the submission guidelines regarding the ethical responsibilities of authors.

Consent to participate

Consent to participate was obtained from all individual participants included in the study.

Consent for publication

Written informed consent for publication was obtained from all participants.

Conflicts of interest/competing interests

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Li, Y., Yu, C. & Cui, Y. TPCaps: a framework for code clone detection and localization based on improved CapsNet. Appl Intell 53, 16594–16605 (2023). https://doi.org/10.1007/s10489-022-03158-3

Download citation

Accepted: 29 December 2021
Published: 12 December 2022
Issue Date: July 2023
DOI: https://doi.org/10.1007/s10489-022-03158-3

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

TPCaps: a framework for code clone detection and localization based on improved CapsNet

Abstract

Access this article

Similar content being viewed by others

SEED: Semantic Graph Based Deep Detection for Type-4 Clone

Precise Code Clone Detection with Architecture of Abstract Syntax Trees

Finding Code-Clone Snippets in Large Source-Code Collection by ccgrep

Data availability statement

References

Acknowledgements

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Ethical statement

Consent to participate

Consent for publication

Conflicts of interest/competing interests

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

TPCaps: a framework for code clone detection and localization based on improved CapsNet

Abstract

Access this article

Similar content being viewed by others

SEED: Semantic Graph Based Deep Detection for Type-4 Clone

Precise Code Clone Detection with Architecture of Abstract Syntax Trees

Finding Code-Clone Snippets in Large Source-Code Collection by ccgrep

Data availability statement

References

Acknowledgements

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Ethical statement

Consent to participate

Consent for publication

Conflicts of interest/competing interests

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation