Abstract
In this paper, we propose the collaborative method that analyzes both code structure and function semantics for code comparison. First, we create the function call graph of code and use it to obtain the structure semantics with the graph auto-encoder. Then the function semantics are obtained with the names and definition of the used library functions and built-in classes in code. Finally, we integrate the structure and function semantics to collaboratively analyze the similarity of codes. We adopt several real code datasets to validate our method and the experimental results show that it outperforms other baselines. The ablation experiments show that the function call structure contributes the most to the performance. We also visualize the semantics of function structures to illustrate that the proposed method can extract the correlations and differences between codes.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Liao, Z., Zhao, Y., Liu, S., et al.: The measurement of the software ecosystem’s productivity with github. Comput. Syst. Sci. Eng. 36(1), 239–258 (2021)
Wu, Y., Zou, D., Dou, S., et al.: SCDetector: software functional clone detection based on semantic tokens analysis. In: Proceedings of the 35th IEEE/ACM International Conference on Automated Software Engineering, pp. 821–833 (2020)
Hamilton, W.L., Ying, R., Leskovec, J.: Inductive representation learning on large graphs. In: Proceedings of the 31st International Conference on Neural Information Processing Systems, pp. 1025–1035 (2017)
Sajnani, H., Saini, V., Svajlenko, J., et al.: Sourcerercc: scaling code clone detection to big-code. In: Proceedings of the 38th International Conference on Software Engineering, pp. 1157–1168 (2016)
White, M., Tufano, M., Vendome, C., et al.: Deep learning code fragments for code clone detection. In: 2016 IEEE/ACM 31th International Conference on Automated Software Engineering, pp. 87–98 (2016)
Yu, H., Lam, W., Chen, L., et al.: Neural detection of semantic code clones via tree-based convolution. In: 2019 IEEE/ACM 27th International Conference on Program Comprehension, pp. 70–80 (2019)
Zhao, G., Huang, J.: Deepsim: deep learning code functional similarity. In: Proceedings of the 2018 26th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering, pp. 141–151 (2018)
Roy, C.K., Cordy, J.R.: NICAD: accurate detection of near-miss intentional clones using flexible pretty-printing and code normalization. In: 2008 16th IEEE International Conference on Program Comprehension, pp. 172–181 (2008)
Kodhai, E., Kanmani, S., Kamatchi, A., et al.: Detection of type-1 and type-2 code clones using textual analysis and metrics. In: 2010 International Conference on Recent Trends in Information, Telecommunication and Computing, pp. 241–249 (2010)
Jia, X., Ma, R., Liu, S., et al.: BinDeep: a deep learning approach to binary code similarity detection. Expert Syst. Appl. 168, 114348 (2021)
Rattan, D., Bhatia, R.K., Singh, M.: Software clone detection: a systematic review. Inf. Softw. Technol. 55(7), 1165–1199 (2013)
Rattan, D., Kaur, J.: Systematic mapping study of metrics based clone detection techniques. In: Proceedings of the International Conference on Advances in Information Communication Technology and Computing, pp. 1–7 (2016)
Roy, C.K., Cordy, J.R.: A survey on software clone detection research. Queen’s Sch. Comput. TR 541(115), 64–68 (2007)
Sheneamer, A., Kalita, J.: Code clone detection using coarse and fine-grained hybrid approaches. In: 2015 IEEE 7th International Conference on Intelligent Computing and Information Systems, pp. 472–480 (2015)
Sudhamani, M., Rangarajan, L.: Code clone detection based on order and content of control statements. In: 2016 2nd International Conference on Contemporary Computing and Informatics, pp. 59–64 (2016)
Hu, Y., Wang, H., Zhang, Y., et al.: A semantics-based hybrid approach on binary code similarity comparison. IEEE Trans. Softw. Eng. 47(6), 1241–1258 (2019)
Zhang, F., Li, G., Liu, C., et al.: Flowchart-based cross-language source code similarity detection. Sci. Program. 2020, 1–15 (2020)
Haq, I.U., Juan, C.: A survey of binary code similarity. ACM Comput. Surv. 54(3), 1–38 (2021)
Wang, W., Li, G., Ma, B., et al.: Detecting code clones with graph neural network and flow-augmented abstract syntax tree. In: 2020 IEEE 27th International Conference on Software Analysis, Evolution and Reengineering, pp. 261–271 (2020)
Svajlenko, J., Islam, J.F., Keivanloo, I., et al.: Towards a big data curated benchmark of inter-project code clones. In: 2014 IEEE International Conference on Software Maintenance and Evolution, pp. 476–480 (2014)
Mou, L., Li, G., Zhang, L., et al.: Convolutional neural networks over tree structures for programming language processing. In: Proceedings of the Thirtieth AAAI Conference on Artificial Intelligence, pp. 1287–1293 (2016)
Mehrotra, N., Agarwal, N., Gupta, P., et al.: Modeling functional similarity in source code with graph-based Siamese networks. IEEE Trans. Softw. Eng. 48, 3771–3789 (2021)
Van der Maaten, L., Hinton, G.: Visualizing data using t-SNE. J. Mach. Learn. Res. 9(2605), 2579–2605 (2008)
Acknowledgments
This work was supported by the Major Project of NSF Shandong Province under Grant No. ZR2018ZB0420 and the Key Research and Development Program of Shandong Province under Grant No. 2019JZZY010107.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2023 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.
About this paper
Cite this paper
Ning, X., Wu, H., Wan, L., Gong, B., Sun, Y. (2023). Collaborative Analysis on Code Structure and Semantics. In: Sun, Y., et al. Computer Supported Cooperative Work and Social Computing. ChineseCSCW 2022. Communications in Computer and Information Science, vol 1682. Springer, Singapore. https://doi.org/10.1007/978-981-99-2385-4_6
Download citation
DOI: https://doi.org/10.1007/978-981-99-2385-4_6
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-99-2384-7
Online ISBN: 978-981-99-2385-4
eBook Packages: Computer ScienceComputer Science (R0)