Skip to main content
Log in

TPCaps: a framework for code clone detection and localization based on improved CapsNet

  • Published:
Applied Intelligence Aims and scope Submit manuscript

Abstract

In this paper, we propose TPCaps, a new code clone detection framework for addressing the inefficiency of semantic clone detection and the difficulty in locating code clones. Based on CapsNet with tokens and Program Dependence Graph (PDG), TPCaps can improve the processing rate and the capability for detecting. Firstly, TPCaps determines tokens by dataset partitioning and semantic signature, filtering out the valid code clone. Then, using the tokens mentioned above, it can detect Type-1 and Type-2 clones effectively. In addition, TPcaps generates PDG that is composed of data dependencies and control dependencies extracted from codes. Using PDG as input, the improved capsule network, we called RCapsNet, is able to detect Type-3 and Type-4 clones. Based on the CapsNet, RCapsNet introduces selective search algorithm combines with the Regional Proposal Network (RPN), where CapsNet handles the clone features to achieve detection and classification, and RPN processes the location information and updates trains the candidate frames to obtain a specific clone location. In the experimental section, we evaluate the recall and precision of the model. TPCaps shows its high accuracy compared to other models.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8

Similar content being viewed by others

Data availability statement

The processed data required to reproduce these findings cannot be shared at this time as the data also forms part of an ongoing study.

References

  1. Mondal M, Roy CK, Schneider KA (2020) A survey on clone refactoring and tracking. J Syst Softw 159:110429, ISSN 0164-1212. https://doi.org/10.1016/j.jss.2019.110429

    Article  Google Scholar 

  2. Mondal M, Roy CK, Schneider KA et al (2018) J Syst Softw 144:41–59,ISSN 0164-1212. https://doi.org/10.1016/j.jss.2018.05.028

    Article  Google Scholar 

  3. Zhang F, Khoo S-c, Xiaohong S (2017) Predicting change consistency in a clone group. J Syst Softw 134:105–119, ISSN 0164-1212. https://doi.org/10.1016/j.jss.2017.08.045

    Article  Google Scholar 

  4. Tsantalis N, Mazinanian D and Krishnan GP, (2015)"Assessing the Refactorability of Software Clones," in IEEE Transactions on Software Engineering, 41(11):1055–1090

  5. Nguyen HA, Nguyen TT, Pham NH, Al-Kofahi J and Nguyen TN, (2012)"Clone Management for Evolving Software," in IEEE Transactions on Software Engineering 38(5):1008–1026

  6. Bellon S, Koschke R, Antoniol G, Krinke J and Merlo E, (2007) "Comparison and Evaluation of Clone Detection Tools," in IEEE Transactions on Software Engineering, vol. 33, no. 9, pp. 577–591

  7. Roy CK and Cordy JR, (2007) “A survey on software clone detection research,” Technical Report 541, Queen’s University at Kingston, Tech. Rep

  8. Nishi MA, Damevski K (2018) Scalable code clone detection and search based on adaptive prefix filtering. J Syst Softw 137:130–142. https://doi.org/10.1016/j.jss.2017.11.039

    Article  Google Scholar 

  9. Kulkarni A, Callan J. (2010) “Document allocation policies for selective searching of distributed metricses,” Proceedings of the 19th ACM international conference on Information and knowledge management. (pp.449). ACM

  10. Roy CK and Cordy JR, (2008) NICAD: Accurate detection of near-miss intentional clones using flexible pretty-printing and code normaliza-tion,” in Proc. 16th IEEE Int. Conf. Program Comprehension, pp. 172–181

  11. Kamiya T, Kusumoto S and Inoue K, (2002) "CCFinder: a multilinguistic token-based code clone detection system for large scale source code," in IEEE Transactions on Software Engineering, 28(7):654–670

  12. Wahler V, Seipel D, Wolff J, Fischer G (2004) Clone detection in source code by frequent itemset techniques. In: Source code analysis and manipulation. Fourth IEEE International Workshop on, Chicago, pp 128–135

    Google Scholar 

  13. Komondoor R and Horwitz S, (2001) “Using slicing to identify duplication in source code,” in International static analysis symposium. Springer, pp. 40–56

  14. Kamalpriya CM and Singh P, (2017) “Enhancing program dependency graph based clone detection using approximate subgraph matching,” in Proc. IEEE 11th Int. Workshop Softw. Clones (IWSC), pp. 1–7

  15. Vaibhav S, Farima F, Yadong L, Pierre B, Cristina L. (2017)“Oreo: Detection of Clones in the Twilight Zone,”. arXiv:1806.05837

  16. Sheneamer A, Roy S, Kalita J. (2017) “A Detection Framework for Semantic Code Clones and Obfuscated Code,” Expert Systems with Applications, S0957417417308631

  17. Sabour S, Frosst N, Hinton G. (2017) “Dynamic routing with capsules,”. arXiv:1710.09829

  18. Zhang B, Xu X, Yang M, Chen X and Y. Ye, (2018) "Cross-Domain Sentiment Classification by Capsule Network with Semantic Rules," in IEEE Access, 6:58284–58294

  19. Lin A, Li J and Ma Z, (2019) "On Learning and Learned Data Representation by Capsule Networks," in IEEE Access, 7:50808–50822

  20. Edraki M, Rahnavard N, Shah M (2020) Subspace capsule network[C]. Proc AAAI Conf Artif Intell 34(07):10745–10753

    Google Scholar 

  21. Pan C, Velipasalar S. (2021) PT-CapsNet: A Novel Prediction-Tuning Capsule Network Suitable for Deeper Architectures[C]. Proceedings of the IEEE/CVF International Conference on Computer Vision. 11996–12005

  22. Spinellis D, Gousios G, Samoladas I, Stamelos I (2008) The sqo-Oss quality model: measurement based open source software evaluation. Asian Pac J Cancer Prevent Apjcp 15(5):2101–2107

    Google Scholar 

  23. Saini V, Farmahinifarahani F, Lu Y, Baldi P and Lopes C. (2018) “Oreo: detection of clones in the twilight zone,” Proceedings of the 26th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering, 354–365

  24. Sargsyan S, Kurmangaleev S, Belevantsev A, Avetisyan (2016) Scalable and accurate detection of code clones. Program Comput Softw 42(1):27–33

    Article  MathSciNet  Google Scholar 

  25. Girshick R, (2015) “Fast R-CNN,” arXiv:1504.08083

  26. Ren S, He K, Girshick R, Sun J, (2016)“Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks,” arXiv:1506.01497

  27. Roy CK, Cordy JR (2008) NICAD: accurate detection of near-miss intentional clones using flexible pretty-printing and code normalization. 16th IEEE International Conference on Program Comprehension, Amsterdam, pp 172–181

    Google Scholar 

  28. Jiang L, Misherghi G, Su Z, Glondu S (2007) DECKARD: scalable and accurate tree-based detection of code clones. 29th International Conference on Software Engineering (ICSE'07), Minneapolis, pp 96–105

    Google Scholar 

  29. Wang P, Svajlenko J, Wu Y, et al. (2018) CCAligner: a token based large-gap clone detector[C]//Proceedings of the 40th International Conference on Software Engineering. 1066–1077

  30. Sajnani H, Saini V, Svajlenko J, Roy CK, and Lopes CV, (2016) “SourcererCC: Scaling code clone detection to big-code,” in Proc. IEEE/ACM 38th Int. Conf. Softw. Eng. pp. 1157–1168

  31. Hua W, Sui Y, Wan Y et al (2020) FCCA: hybrid code representation for functional clone detection using attention networks[J]. IEEE Trans Reliab 70(1):304–318

    Article  Google Scholar 

Download references

Acknowledgements

This work was supported in part by the State Grid Jiangxi Information & Telecommunication Company Project under Grant 52183520007V.

Author information

Authors and Affiliations

Authors

Contributions

Yuancheng Li contributed to the conception of the study;

Chaohang Yu designed the experiment and wrote the manuscript;

Yaqi Cui performed the experiment, performed the data analyses and wrote the manuscript.

Corresponding author

Correspondence to Yuancheng Li.

Ethics declarations

Ethical statement

I confirm that there is no misconduct in this manuscript submission and declare that the research satisfies all the requirements of the submission guidelines regarding the ethical responsibilities of authors.

Consent to participate

Consent to participate was obtained from all individual participants included in the study.

Consent for publication

Written informed consent for publication was obtained from all participants.

Conflicts of interest/competing interests 

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Li, Y., Yu, C. & Cui, Y. TPCaps: a framework for code clone detection and localization based on improved CapsNet. Appl Intell 53, 16594–16605 (2023). https://doi.org/10.1007/s10489-022-03158-3

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10489-022-03158-3

Keywords

Navigation