Skip to main content
Log in

Code comment generation based on graph neural network enhanced transformer model for code understanding in open-source software ecosystems

  • Published:
Automated Software Engineering Aims and scope Submit manuscript

Abstract

In open-source software ecosystems, the scale of source code is getting larger and larger, and developers often use various methods (good code comments or method names, etc.) to make the code easier to read and understand. However, high-quality code comments or method names are often unavailable due to tight project schedules or other reasons in open-source software ecosystems such as Github. Therefore, in this work, we try to use deep learning models to generate appropriate code comments or method names to help software development and maintenance, which requires a non-trivial understanding of the code. Therefore, we propose a Graph neural network enhanced Transformer model (GTrans for short) to learn code representation to understand code better. Specifically, GTrans learns code representation from code sequences and graphs. We use a Transformer encoder to capture the global representation from code sequence and a graph neural network (GNN) encoder to focus on the local details in the code graph, and then use a decoder to combine both global and local representations by attention mechanism. We use three public datasets collected from GitHub to evaluate our model. In an extensive evaluation, we show that GTrans outperforms the state-of-the-art models up to 3.8% increase in METEOR metrics on code comment generation and outperforms the state-of-the-art models by margins of 5.8%–9.4% in ROUGE metrics on method name generation after some adjustments on the structure. Empirically, we find the method name generation task depends on more local information than global, and the code comment generation task is in contrast. Our data and code are available at https://github.com/zc-work/GTrans.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6

Similar content being viewed by others

Notes

  1. The self-multi-head attention layer in the decoder is not equipped with relative position encoding.

  2. We found 5 steps are enough and more propagation steps do not help.

References

  • Ahmad, W.U., Chakraborty, S., Ray, B., Chang, K.W.: A transformer-based approach for source code summarization. arXiv preprint arXiv:200500653 (2020)

  • Allamanis, M., Brockschmidt, M., Khademi, M.: Learning to represent programs with graphs. arXiv preprint arXiv:1711.00740 (2017)

  • Allamanis, M., Peng, H., Sutton, C.: A convolutional attention network for extreme summarization of source code. In: International conference on machine learning, PMLR, pp. 2091–2100 (2016)

  • Allamanis, M., Barr, E.T., Devanbu, P., Sutton, C.: A survey of machine learning for big code and naturalness. ACM Comput. Surv. (CSUR) 51(4), 1–37 (2018)

    Article  Google Scholar 

  • Alon, U., Brody, S., Levy, O., Yahav, E.: code2seq: Generating sequences from structured representations of code. arXiv preprint arXiv:1808.01400 (2018)

  • Alon, U., Zilberstein, M., Levy, O., Yahav, E.: code2vec: Learning distributed representations of code. Proc. ACM Program. Lang. 3(POPL), 1–29 (2019)

  • Alon, U., Zilberstein, M., Levy, O., Yahav, E.: A general path-based representation for predicting program properties. ACM SIGPLAN Notices 53(4), 404–419 (2018)

    Article  Google Scholar 

  • Banerjee, S., Lavie, A.: Meteor: An automatic metric for mt evaluation with improved correlation with human judgments. In: Proceedings of the acl Workshop on Intrinsic and Extrinsic Evaluation Measures for Machine Translation and/or Summarization, pp. 65–72 (2005)

  • Barone, A.V.M., Sennrich, R.: A parallel corpus of python functions and documentation strings for automated code documentation and code generation. arXiv preprint arXiv:170702275 (2017)

  • Chen, F., Kim, M., Choo, J.: Novel natural language summarization of program code via leveraging multiple input representations. In: Findings of the Association for Computational Linguistics: EMNLP 2021, pp. 2510–2520 (2021)

  • Chen, X., Liu, C., Song, D.: Tree-to-tree neural networks for program translation. arXiv preprint arXiv:180203691 (2018)

  • Chirkova, N., Troshin, S.: Empirical study of transformers for source code. arXiv preprint arXiv:201007987 (2020)

  • Cho, K., Van Merriënboer, B., Gulcehre, C., Bahdanau, D., Bougares, F., Schwenk, H., Bengio, Y.: Learning phrase representations using rnn encoder-decoder for statistical machine translation. arXiv preprint arXiv:14061078 (2014)

  • Eriguchi, A., Hashimoto, K., Tsuruoka, Y.: Tree-to-sequence attentional neural machine translation. arXiv preprint arXiv:160306075 (2016)

  • Feng, Z., Guo, D., Tang, D., Duan, N., Feng, X., Gong, M., Shou, L., Qin, B., Liu, T., Jiang, D., et al.: Codebert: a pre-trained model for programming and natural languages. arXiv preprint arXiv:2002.08155 (2020)

  • Fernandes, P., Allamanis, M., Brockschmidt, M.: Structured neural summarization. In: 7th International Conference on Learning Representations, ICLR 2019, New Orleans, LA, USA, May 6–9, 2019, OpenReview.net (2019). https://openreview.net/forum?id=H1ersoRqtm

  • Gao, S., Gao, C., He, Y., Zeng, J., Nie, L.Y., Xia, X.: Code structure guided transformer for source code summarization. arXiv preprint arXiv:210409340 (2021)

  • Guo, D., Ren, S., Lu, S., Feng, Z., Tang, D., Liu, S., Zhou, L., Duan, N., Svyatkovskiy, A., Fu, S., et al.: Graphcodebert: pre-training code representations with data flow. arXiv preprint arXiv:2009.08366 (2020)

  • Hellendoorn, V.J., Sutton, C., Singh, R., Maniatis, P., Bieber, D.: Global relational models of source code. In: 8th International Conference on Learning Representations, ICLR 2020, Addis Ababa, Ethiopia, April 26–30, 2020, OpenReview.net (2020). https://openreview.net/forum?id=B1lnbRNtwr

  • Hellendoorn, V.J., Sutton, C., Singh, R., Maniatis, P., Bieber, D.: Global relational models of source code. In: International Conference on Learning Representations (2019)

  • Hu, X., Li, G., Xia, X,, Lo, D., Jin, Z.: Deep code comment generation. In: 2018 IEEE/ACM 26th International Conference on Program Comprehension (ICPC), pp. 200–210. IEEE (2018a)

  • Hu, X., Li, G., Xia, X., Lo, D., Lu, S., Jin, Z.: Summarizing source code with transferred api knowledge (2018b)

  • Hu, X., Li, G., Xia, X., Lo, D., Jin, Z.: Deep code comment generation with hybrid lexical and syntactical information. Emp. Softw. Eng. 25(3), 2179–2217 (2020)

    Article  Google Scholar 

  • Iyer, S., Konstas, I., Cheung, A., Zettlemoyer, L.: Summarizing source code using a neural attention model. In: Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pp. 2073–2083 (2016)

  • LeClair, A., Haque, S., Wu, L., McMillan, C.: Improved code summarization via a graph neural network. In: Proceedings of the 28th International Conference on Program Comprehension, pp. 184–195 (2020)

  • LeClair, A., Jiang, S., McMillan, C.: A neural model for generating natural language summaries of program subroutines. In: 2019 IEEE/ACM 41st International Conference on Software Engineering (ICSE), pp. 795–806. IEEE (2019)

  • Lin, C., Ouyang, Z., Zhuang, J., Chen, J., Li, H., Wu, R.: Improving code summarization with block-wise abstract syntax tree splitting. arXiv preprint arXiv:210307845 (2021)

  • Lin, C.Y.: Rouge: a package for automatic evaluation of summaries. In: Text Summarization Branches Out, pp. 74–81 (2004)

  • Luong, M.T., Pham, H., Manning, C.D.: Effective approaches to attention-based neural machine translation. arXiv preprint arXiv:150804025 (2015)

  • Murphy-Hill, E., Parnin, C., Black, A.P.: How we refactor, and how we know it. IEEE Trans. Softw. Eng. 38(1), 5–18 (2011)

    Article  Google Scholar 

  • Papineni, K., Roukos, S., Ward, T., Zhu, W.J.: Bleu: a method for automatic evaluation of machine translation. In: Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics, pp. 311–318 (2002)

  • Ribeiro, L.F., Zhang, Y., Gardent, C., Gurevych, I.: Modeling global and local node contexts for text generation from knowledge graphs. Trans. Assoc. Comput. Ling. 8, 589–604 (2020)

    Google Scholar 

  • See, A., Liu, P.J., Manning, C.D.: Get to the point: summarization with pointer-generator networks. arXiv preprint arXiv:1704.04368 (2017)

  • Shaw, P., Uszkoreit, J., Vaswani, A.: Self-attention with relative position representations. arXiv preprint arXiv:180302155 (2018)

  • Shido, Y., Kobayashi, Y., Yamamoto, A., Miyamoto, A., Matsumura, T.: Automatic source code summarization with extended tree-lstm. In: 2019 International Joint Conference on Neural Networks (IJCNN), pp. 1–8. IEEE (2019)

  • Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, L., Polosukhin, I.: Attention is all you need. arXiv preprint arXiv:170603762 (2017)

  • Wan, Y., Zhao, Z., Yang, M., Xu, G., Ying, H., Wu, J., Yu, P.S.: Improving automatic source code summarization via deep reinforcement learning. In: Proceedings of the 33rd ACM/IEEE International Conference on Automated Software Engineering, pp. 397–407 (2018)

  • Wang, W., Zhang, K., Li, G., Jin, Z.: Learning to represent programs with heterogeneous graphs. arXiv preprint arXiv:201204188 (2020)

  • Wei, B., Li, G., Xia, X., Fu, Z., Jin, Z.: Code generation as a dual task of code summarization. arXiv preprint arXiv:191005923 (2019)

  • Xia, X., Bao, L., Lo, D., Xing, Z., Hassan, A.E., Li, S.: Measuring program comprehension: a large-scale field study with professionals. IEEE Trans. Softw. Eng. 44(10), 951–976 (2017)

    Article  Google Scholar 

  • Xu, S., Zhang, S., Wang, W., Cao, X., Guo, C., Xu, J.: Method name suggestion with hierarchical attention networks. In: Proceedings of the 2019 ACM SIGPLAN Workshop on Partial Evaluation and Program Manipulation, pp. 10–21 (2019)

  • Zhang, J., Wang, X., Zhang, H., Sun, H., Liu, X.: Retrieval-based neural source code summarization. In: 2020 IEEE/ACM 42nd International Conference on Software Engineering (ICSE), pp. 1385–1397. IEEE (2020)

  • Zügner, D., Kirschstein, T., Catasta, M., Leskovec, J., Günnemann, S.: Language-agnostic representation learning of source code from structure and context. arXiv preprint arXiv:210311318 (2021)

Download references

Acknowledgements

This work has been supported by the National Key R&D Program of China under Grant 2018YFB1402800, National Natural Science Foundation of China (Nos. 61772560, 61902236), Fundamental Research Funds for the Central Universities of Central South University (Grant No. 2021zzts0725). We are grateful for resources from the High Performance Computing Center of Central South University.

Author information

Authors and Affiliations

Authors

Corresponding authors

Correspondence to Cong Zhou or Xiaoxian Yang.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendices

Appendix A: Hyper-parameters

Table 5 shows the hyper-parameters that we used in our experiments.

Table 5 The hyper-parameters that we used in our experiments

The k refers to the clipping distance in relative position representations in Transformer. And T indicates GGNN’s unrolled timestep.

Appendix B: The modification in the encoder

Fig. 7 shows the modification in the encoder.

Fig. 7
figure 7

Left: the encoder-parallel model. Right: the encoder-cascaded model. By the way, we also tried the parallel combination in the decoder, but it is still not as good as GTrans

Appendix C: More examples

figure a

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Kuang, L., Zhou, C. & Yang, X. Code comment generation based on graph neural network enhanced transformer model for code understanding in open-source software ecosystems. Autom Softw Eng 29, 43 (2022). https://doi.org/10.1007/s10515-022-00341-1

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s10515-022-00341-1

Keywords

Navigation