SEED: Semantic Graph Based Deep Detection for Type-4 Clone

Xue, Zhipeng; Jiang, Zhijie; Huang, Chenlin; Xu, Rulin; Huang, Xiangbing; Hu, Liumin

doi:10.1007/978-3-031-08129-3_8

Zhipeng Xue¹⁰,
Zhijie Jiang¹⁰,
Chenlin Huang¹⁰,
Rulin Xu¹⁰,
Xiangbing Huang¹⁰ &
…
Liumin Hu¹⁰

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 13297))

Included in the following conference series:

International Conference on Software and Software Reuse

502 Accesses
5 Citations

Abstract

Type-4 clones refer to a pair of code snippets with similar semantics but written in different syntax, which challenges the existing code clone detection techniques. Previous studies, however, highly rely on syntactic structures and textual tokens, which cannot precisely represent the semantic information of code and might introduce non-negligible noise into the detection models. To overcome these limitations, we design a novel semantic graph-based deep detection approach, called SEED. For a pair of code snippets, SEED constructs a semantic graph of each code snippet based on intermediate representation to represent the code semantic more precisely compared to the representations based on lexical and syntactic analysis. To accommodate the characteristics of Type-4 clones, a semantic graph is constructed focusing on the operators and API calls instead of all tokens. Then, SEED generates the feature vectors by using the graph match network and performs clone detection based on the similarity among the vectors. Extensive experiments show that our approach significantly outperforms two baseline approaches over two public datasets and one customized dataset. Especially, SEED outperforms other baseline methods by an average of 25.2% in the form of F1-Score. Our experiments demonstrate that SEED can reach state-of-the-art and be useful for Type-4 clone detection in practice.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 44.99; Price excludes VAT (USA)

Softcover Book: USD 59.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Oreo: Scaling Clone Detection Beyond Near-Miss Clones

Precise Code Clone Detection with Architecture of Abstract Syntax Trees

Combining Holistic Source Code Representation with Siamese Neural Networks for Detecting Code Clones

Notes

1.
https://github.com/xzpxzp123123/SEED.
2.
https://llvm.org/.
3.
http://soot-oss.github.io/soot/.
4.
https://github.com/piyush69/JCoffee.
5.
http://poj.org/.
6.
http://codeforces.com/.

References

Antoniol, G., Villano, U., Merlo, E., Di Penta, M.: Analyzing cloning evolution in the Linux kernel. Inf. Softw. Technol. 44(13), 755–765 (2002)
Article Google Scholar
Ben-Nun, T., Jakobovits, A.S., Hoefler, T.: Neural code comprehension: a learnable representation of code semantics. Adv. Neural Inf. Process. Syst. 31, 3585–3597 (2018)
Google Scholar
Cho, K., et al.: Learning phrase representations using rnn encoder-decoder for statistical machine translation. arXiv preprint arXiv:1406.1078 (2014)
Jiang, L., Misherghi, G., Su, Z., Glondu, S.: Deckard: scalable and accurate tree-based detection of code clones. In: 29th International Conference on Software Engineering (ICSE 2007), pp. 96–105. IEEE (2007)
Google Scholar
Li, X., Wang, L., Xin, Y., Yang, Y., Chen, Y.: Automated vulnerability detection in source code using minimum intermediate representation learning. Appl. Sci. 10(5), 1692 (2020)
Article Google Scholar
Li, Y., Gu, C., Dullien, T., Vinyals, O., Kohli, P.: Graph matching networks for learning the similarity of graph structured objects. In: International Conference on Machine Learning, pp. 3835–3845. PMLR (2019)
Google Scholar
Li, Y., Tarlow, D., Brockschmidt, M., Zemel, R.: Gated graph sequence neural networks. arXiv preprint arXiv:1511.05493 (2015)
Li, Z., Lu, S., Myagmar, S., Zhou, Y.: Cp-miner: finding copy-paste and related bugs in large-scale software code. IEEE Trans. Softw. Eng. 32(3), 176–192 (2006)
Article Google Scholar
Linstead, E., Bajracharya, S., Ngo, T., Rigor, P., Lopes, C., Baldi, P.: Sourcerer: mining and searching internet-scale software repositories. Data Min. Knowl. Disc. 18(2), 300–336 (2009)
Article MathSciNet Google Scholar
Mazinanian, D., Tsantalis, N., Stein, R., Valenta, Z.: Jdeodorant: clone refactoring. In: Proceedings of the 38th International Conference on Software Engineering Companion, pp. 613–616 (2016)
Google Scholar
Mikolov, T., Chen, K., Corrado, G., Dean, J.: Efficient estimation of word representations in vector space. arXiv preprint arXiv:1301.3781 (2013)
Mou, L., Li, G., Zhang, L., Wang, T., Jin, Z.: Convolutional neural networks over tree structures for programming language processing. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 30 (2016)
Google Scholar
Pizzolotto, D., Inoue, K.: Blanker: a refactor-oriented cloned source code normalizer. In: 2020 IEEE 14th International Workshop on Software Clones (IWSC), pp. 22–25. IEEE (2020)
Google Scholar
Roy, C.K., Cordy, J.R., Koschke, R.: Comparison and evaluation of code clone detection techniques and tools: a qualitative approach. Sci. Comput. Program. 74(7), 470–495 (2009)
Article MathSciNet Google Scholar
Saini, V., Farmahinifarahani, F., Lu, Y., Baldi, P., Lopes, C.V.: Oreo: detection of clones in the twilight zone. In: Proceedings of the 2018 26th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering, pp. 354–365 (2018)
Google Scholar
Svajlenko, J., Islam, J.F., Keivanloo, I., Roy, C.K., Mia, M.M.: Towards a big data curated benchmark of inter-project code clones. In: 2014 IEEE International Conference on Software Maintenance and Evolution, pp. 476–480. IEEE (2014)
Google Scholar
Wang, W., Li, G., Ma, B., Xia, X., Jin, Z.: Detecting code clones with graph neural network and flow-augmented abstract syntax tree. In: 2020 IEEE 27th International Conference on Software Analysis, Evolution and Reengineering (SANER), pp. 261–271. IEEE (2020)
Google Scholar
Wei, H., Li, M.: Supervised deep features for software functional clone detection by exploiting lexical and syntactical information in source code. In: IJCAI, pp. 3034–3040 (2017)
Google Scholar
White, M., Tufano, M., Vendome, C., Poshyvanyk, D.: Deep learning code fragments for code clone detection. In: 2016 31st IEEE/ACM International Conference on Automated Software Engineering (ASE), pp. 87–98. IEEE (2016)
Google Scholar
Yu, H., Lam, W., Chen, L., Li, G., Xie, T., Wang, Q.: Neural detection of semantic code clones via tree-based convolution. In: 2019 IEEE/ACM 27th International Conference on Program Comprehension (ICPC), pp. 70–80. IEEE (2019)
Google Scholar
Zeng, C., et al.: degraphcs: embedding variable-based flow graph for neural code search. arXiv preprint arXiv:2103.13020 (2021)
Zhang, J., Wang, X., Zhang, H., Sun, H., Wang, K., Liu, X.: A novel neural source code representation based on abstract syntax tree. In: 2019 IEEE/ACM 41st International Conference on Software Engineering (ICSE), pp. 783–794. IEEE (2019)
Google Scholar
Zhang, L., Yan, L., Zhang, Z., Zhang, J., Chan, W., Zheng, Z.: A theoretical analysis on cloning the failed test cases to improve spectrum-based fault localization. J. Syst. Softw. 129, 35–57 (2017)
Article Google Scholar
Zhao, G., Huang, J.: Deepsim: deep learning code functional similarity. In: Proceedings of the 2018 26th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering, pp. 141–151 (2018)
Google Scholar
Zou, Y., Ban, B., Xue, Y., Xu, Y.: CCGraph: a PDG-based code clone detector with approximate graph matching. In: 2020 35th IEEE/ACM International Conference on Automated Software Engineering (ASE), pp. 931–942. IEEE (2020)
Google Scholar

Download references

Acknowledgements

The authors would like to thank the anonymous reviewers for their insightful comments. This work was substantially supported by National Natural Science Foundation of China (No. 61872373 and 61872375).

Author information

Authors and Affiliations

National University of Defense Technology, Changsha, China
Zhipeng Xue, Zhijie Jiang, Chenlin Huang, Rulin Xu, Xiangbing Huang & Liumin Hu

Authors

Zhipeng Xue
View author publications
You can also search for this author in PubMed Google Scholar
Zhijie Jiang
View author publications
You can also search for this author in PubMed Google Scholar
Chenlin Huang
View author publications
You can also search for this author in PubMed Google Scholar
Rulin Xu
View author publications
You can also search for this author in PubMed Google Scholar
Xiangbing Huang
View author publications
You can also search for this author in PubMed Google Scholar
Liumin Hu
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding authors

Correspondence to Zhijie Jiang or Chenlin Huang .

Editor information

Editors and Affiliations

University of Namur, Namur, Belgium
Gilles Perrouin
École de Technologie Supérieure, Montreal, QC, Canada
Naouel Moha
University of Montpellier, Montpellier, France
Abdelhak-Djamel Seriai

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Xue, Z., Jiang, Z., Huang, C., Xu, R., Huang, X., Hu, L. (2022). SEED: Semantic Graph Based Deep Detection for Type-4 Clone. In: Perrouin, G., Moha, N., Seriai, AD. (eds) Reuse and Software Quality. ICSR 2022. Lecture Notes in Computer Science, vol 13297. Springer, Cham. https://doi.org/10.1007/978-3-031-08129-3_8

Download citation

DOI: https://doi.org/10.1007/978-3-031-08129-3_8
Published: 10 June 2022
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-08128-6
Online ISBN: 978-3-031-08129-3
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

SEED: Semantic Graph Based Deep Detection for Type-4 Clone