Code Representation Based on Hybrid Graph Modelling

Wu, Qiong; Jiang, Xue; Zheng, Zhuoran; Gao, Xuejian; Lyu, Chen; Lyu, Lei

doi:10.1007/978-3-030-92307-5_35

Qiong Wu¹⁰,
Xue Jiang¹⁰,
Zhuoran Zheng¹¹,
Xuejian Gao¹⁰,
Chen Lyu¹⁰ &
…
Lei Lyu¹⁰

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 1516))

Included in the following conference series:

International Conference on Neural Information Processing

2383 Accesses

Abstract

Several sequence- or abstract syntax tree (AST)-based models have been proposed for modelling lexical-level and syntactic-level information of source code. However, an effective method of learning code semantic information is still lacking. Thus, we propose a novel code representation method based on hybrid graph modelling, called HGCR. HGCR is a code information extraction model. Specifically, in HGCR, two novel graphs, the Structure Graph (SG) and the Execution Data Flow Graph (EDFG), are first extracted from AST to model the syntactic structural and semantic information of source code, respectively. Then, two improved graph neural networks are applied to learn the graphs to obtain an effective code representation. We demonstrate the effectiveness of our model on two common code understanding tasks: code classification and code clone detection. Empirically, our model outperforms state-of-the-art models.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 99.00; Price excludes VAT (USA)

Softcover Book: USD 129.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

References

Allamanis, M., Peng, H., Sutton, C.: A convolutional attention network for extreme summarization of source code. In: ICML (2016)
Google Scholar
Gu, W., et al.: CRaDLe: deep code retrieval based on semantic dependency learning. Neural Netw. 141, 385–394 (2021)
Google Scholar
Hindle, A., Barr, E.T., Gabel, M., Su, Z., Devanbu, P.: On the naturalness of software. Commun. ACM 59, 122–131 (2016)
Google Scholar
Hua, W., Sui, Y., Wan, Y., Liu, G., Xu, G.: FCCA: hybrid code representation for functional clone detection using attention networks. IEEE Trans. Reliabil. 70, 304–318 (2020)
Google Scholar
Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. In: ICLR (2014)
Google Scholar
Kipf, T.N., Welling, M.: Semi-supervised classification with graph convolutional networks. In: ICLR (2017)
Google Scholar
Mehrotra, N., Agarwal, N., Gupta, P., Anand, S., Lo, D., Purandare, R.: Modeling functional similarity in source code with graph-based Siamese networks. arXiv preprint arXiv:2011.11228 (2020)
Sun, Z., Zhu, Q., Xiong, Y., Sun, Y., Mou, L., Zhang, L.: TreeGen: a tree-based transformer architecture for code generation. In: AAAI (2020)
Google Scholar
Tufano, M., Watson, C., Bavota, G., Di Penta, M., White, M., Poshyvanyk, D.: Deep learning similarities from different representations of source code. In: MSR (2018)
Google Scholar
Vaswani, A., et al.: Attention is all you need. In: NeurIPS (2017)
Google Scholar
Veličković, P., Cucurull, G., Casanova, A., Romero, A., Lio, P., Bengio, Y.: Graph attention networks. In: ICLR (2018)
Google Scholar
Wang, W., Li, G., Ma, B., Xia, X., Jin, Z.: Detecting code clones with graph neural network and flow-augmented abstract syntax tree. In: SANER (2020)
Google Scholar
Wang, Y., Li, H.: Code completion by modeling flattened abstract syntax trees as graphs. arXiv preprint arXiv:2103.09499 (2021)

Download references

Acknowledgement

This work is financially supported by the National Natural Science Foundation of China (61602286, 61976127) and the Special Project on Innovative Methods (2020IM020100).

Author information

Authors and Affiliations

School of Information Science and Engineering, Shandong Normal University, Jinan, China
Qiong Wu, Xue Jiang, Xuejian Gao, Chen Lyu & Lei Lyu
School of Computer Science and Engineering, Nanjing University of Science and Technology, Nanjing, China
Zhuoran Zheng

Authors

Qiong Wu
View author publications
You can also search for this author in PubMed Google Scholar
Xue Jiang
View author publications
You can also search for this author in PubMed Google Scholar
Zhuoran Zheng
View author publications
You can also search for this author in PubMed Google Scholar
Xuejian Gao
View author publications
You can also search for this author in PubMed Google Scholar
Chen Lyu
View author publications
You can also search for this author in PubMed Google Scholar
Lei Lyu
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Chen Lyu .

Editor information

Editors and Affiliations

Sampoerna University, Jakarta, Indonesia
Teddy Mantoro
Kyungpook National University, Daegu, Korea (Republic of)
Minho Lee
Sampoerna University, Jakarta, Indonesia
Media Anugerah Ayu
Murdoch University, Murdoch, WA, Australia
Kok Wai Wong
Universitas Indonesia, Depok, Indonesia
Achmad Nizar Hidayanto

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Wu, Q., Jiang, X., Zheng, Z., Gao, X., Lyu, C., Lyu, L. (2021). Code Representation Based on Hybrid Graph Modelling. In: Mantoro, T., Lee, M., Ayu, M.A., Wong, K.W., Hidayanto, A.N. (eds) Neural Information Processing. ICONIP 2021. Communications in Computer and Information Science, vol 1516. Springer, Cham. https://doi.org/10.1007/978-3-030-92307-5_35

Download citation

DOI: https://doi.org/10.1007/978-3-030-92307-5_35
Published: 02 December 2021
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-92306-8
Online ISBN: 978-3-030-92307-5
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Code Representation Based on Hybrid Graph Modelling