Abstract
Several sequence- or abstract syntax tree (AST)-based models have been proposed for modelling lexical-level and syntactic-level information of source code. However, an effective method of learning code semantic information is still lacking. Thus, we propose a novel code representation method based on hybrid graph modelling, called HGCR. HGCR is a code information extraction model. Specifically, in HGCR, two novel graphs, the Structure Graph (SG) and the Execution Data Flow Graph (EDFG), are first extracted from AST to model the syntactic structural and semantic information of source code, respectively. Then, two improved graph neural networks are applied to learn the graphs to obtain an effective code representation. We demonstrate the effectiveness of our model on two common code understanding tasks: code classification and code clone detection. Empirically, our model outperforms state-of-the-art models.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Allamanis, M., Peng, H., Sutton, C.: A convolutional attention network for extreme summarization of source code. In: ICML (2016)
Gu, W., et al.: CRaDLe: deep code retrieval based on semantic dependency learning. Neural Netw. 141, 385–394 (2021)
Hindle, A., Barr, E.T., Gabel, M., Su, Z., Devanbu, P.: On the naturalness of software. Commun. ACM 59, 122–131 (2016)
Hua, W., Sui, Y., Wan, Y., Liu, G., Xu, G.: FCCA: hybrid code representation for functional clone detection using attention networks. IEEE Trans. Reliabil. 70, 304–318 (2020)
Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. In: ICLR (2014)
Kipf, T.N., Welling, M.: Semi-supervised classification with graph convolutional networks. In: ICLR (2017)
Mehrotra, N., Agarwal, N., Gupta, P., Anand, S., Lo, D., Purandare, R.: Modeling functional similarity in source code with graph-based Siamese networks. arXiv preprint arXiv:2011.11228 (2020)
Sun, Z., Zhu, Q., Xiong, Y., Sun, Y., Mou, L., Zhang, L.: TreeGen: a tree-based transformer architecture for code generation. In: AAAI (2020)
Tufano, M., Watson, C., Bavota, G., Di Penta, M., White, M., Poshyvanyk, D.: Deep learning similarities from different representations of source code. In: MSR (2018)
Vaswani, A., et al.: Attention is all you need. In: NeurIPS (2017)
Veličković, P., Cucurull, G., Casanova, A., Romero, A., Lio, P., Bengio, Y.: Graph attention networks. In: ICLR (2018)
Wang, W., Li, G., Ma, B., Xia, X., Jin, Z.: Detecting code clones with graph neural network and flow-augmented abstract syntax tree. In: SANER (2020)
Wang, Y., Li, H.: Code completion by modeling flattened abstract syntax trees as graphs. arXiv preprint arXiv:2103.09499 (2021)
Acknowledgement
This work is financially supported by the National Natural Science Foundation of China (61602286, 61976127) and the Special Project on Innovative Methods (2020IM020100).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2021 Springer Nature Switzerland AG
About this paper
Cite this paper
Wu, Q., Jiang, X., Zheng, Z., Gao, X., Lyu, C., Lyu, L. (2021). Code Representation Based on Hybrid Graph Modelling. In: Mantoro, T., Lee, M., Ayu, M.A., Wong, K.W., Hidayanto, A.N. (eds) Neural Information Processing. ICONIP 2021. Communications in Computer and Information Science, vol 1516. Springer, Cham. https://doi.org/10.1007/978-3-030-92307-5_35
Download citation
DOI: https://doi.org/10.1007/978-3-030-92307-5_35
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-92306-8
Online ISBN: 978-3-030-92307-5
eBook Packages: Computer ScienceComputer Science (R0)