CodeAttention: translating source code to comments by exploiting the code constructs

Zheng, Wenhao; Zhou, Hongyu; Li, Ming; Wu, Jianxin

doi:10.1007/s11704-018-7457-6

CodeAttention: translating source code to comments by exploiting the code constructs

Research Article
Published: 16 October 2018

Volume 13, pages 565–578, (2019)
Cite this article

Frontiers of Computer Science Aims and scope Submit manuscript

Wenhao Zheng¹,
Hongyu Zhou¹,
Ming Li¹ &
…
Jianxin Wu¹

197 Accesses
13 Citations
1 Altmetric
Explore all metrics

Abstract

Appropriate comments of code snippets provide insight for code functionality, which are helpful for program comprehension. However, due to the great cost of authoring with the comments, many code projects do not contain adequate comments. Automatic comment generation techniques have been proposed to generate comments from pieces of code in order to alleviate the human efforts in annotating the code. Most existing approaches attempt to exploit certain correlations (usually manually given) between code and generated comments, which could be easily violated if coding patterns change and hence the performance of comment generation declines. In addition, recent approaches ignore exploiting the code constructs and leveraging the code snippets like plain text. Furthermore, previous datasets are also too small to validate the methods and show their advantage. In this paper, we propose a new attention mechanism called CodeAttention to translate code to comments, which is able to utilize the code constructs, such as critical statements, symbols and keywords. By focusing on these specific points, CodeAttention could understand the semantic meanings of code better than previous methods. To verify our approach in wider coding patterns, we build a large dataset from open projects in GitHub. Experimental results in this large dataset demonstrate that the proposed method has better performance over existing approaches in both objective and subjective evaluation. We also perform ablation studies to determine effects of different parts in CodeAttention.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Semantic Code Search in Software Repositories using Neural Machine Translation

Rethinking AI code generation: a one-shot correction approach based on user feedback

Article Open access 12 July 2024

Commenting source code: is it worth it for small programming tasks?

Article 16 November 2018

Discover the latest articles and news from researchers in related subjects, suggested using machine learning.

References

Fluri B, Wursch M, Gall C. Do code and comments co-evolve? on the relation between source code and comment changes. In: Preceedings of the 14th Working Conference on Reverse Engineering. 2007, 70–79
Google Scholar
Sridhara G, Hill E, Muppaneni D, Pollock L, Vijay K. Towards automatically generating summary comments for java methods. In: Proceedings of the 25th IEEE/ACM International Conference on Automated Software Engineering. 2010, 43–52
Chapter Google Scholar
Rastkar S, Murphy C, Murray G. Summarizing software artifacts: a case study of bug reports. In: Proceedings of the 32nd ACM/IEEE International Conference on Software Engineering. 2010, 505–514
Google Scholar
McBurney W, McMillan C. Automatic documentation generation via source code summarization of method context. In: Preceedings of the 22nd International Conference on Program Comprehension. 2014, 279–290
Google Scholar
Sulír M, Porubän J. Generating method documentation using concrete values from executions. OASIcs-OpenAccess Series in Informatics, 2017, 56(3): 1–13
Google Scholar
Srinivasan I, Ioannis K, Alvin C, Luke Z. Summarizing source code using a neural attention model. In: Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics. 2016, 2073–2083
Google Scholar
Allamanis M, Peng H, Sutton C. A convolutional attention network for extreme summarization of source code. In: Proceedings of the 23rd International Conference on Machine Learning. 2016, 2091–2010
Google Scholar
Huo X, Li M, Zhou H. Learning unified features from natural and programming languages for locating buggy source codes. In: Proceedings of the 25th International Joint Conference on Artificial Intelligence. 2016, 1606–1612
Google Scholar
Sridhara G, Pollock L, Vijay K. Automatically detecting and describing high level actions within methods. In: Proceedings of the 33rd ACM/IEEE International Conference on Software Engineering. 2011, 101–110
Google Scholar
Movshovitz D, Movshovitz A Y, Steenkiste P, Faloutsos C. Analysis of the reputation system and user contributions on a question answering website: stackoverflow. In: Proceedings of the 2013 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining. 2013, 886–893
Google Scholar
Haiduc S, Aponte J, Moreno L, Marcus A. On the use of automated text summarization techniques for summarizing source code. In: Preceedings of the 17th Working Conference on Reverse Engineering. 2010, 35–44
Google Scholar
Eddy P, Robinson J A, Kraft N A, Carver J C. Evaluating source code summarization techniques: replication and expansion. In: Preceedings of the 21st International Conference on Program Comprehension. 2013, 13–22
Google Scholar
Rodeghero P, McMillan C, McBurney W, Bosch N, D’Mello S. Improving automated source code summarization via an eye-tracking study of programmers. In: Proceedings of the 36th ACM/IEEE International Conference on Software Engineering. 2014, 390–401
Google Scholar
Dyer R, Nguyen A, Rajan H, Nguyen T N. Boa: a language and infrastructure for analyzing ultra-large-scale software repositories. In: Proceedings of the 35th International Conference on Software Engineering. 2013, 422–431
Google Scholar
Wong E, Yang J, Tan L. Autocomment: mining question and answer sites for automatic comment generation. In: Proceedings of the 28th IEEE/ACM International Conference on Automated Software Engineering. 2013, 562–567
Google Scholar
Wong E, Liu T, Tan L. CloCom: mining existing source code for automatic comment generation. In: Proceedings of the 22nd IEEE International Conference on Software Analysis, Evolution, and Reengineering. 2015, 380–389
Google Scholar
Peter B, Stephen A D P, Vincent J D P, Robert LM. The mathematics of statistical machine translation: parameter estimation. Computational Linguistics, 1993, 19(2): 263–311
Google Scholar
Koehn P, Och J, Marcu D. Statistical phrase-based translation. In: Proceedings of the 41st Annual Meeting of the Association for Computational Linguistics. 2003, 48–54
Google Scholar
Hinton G, Deng L, Yu D, Dahl G, Mohamed R, Jaitly N, Senior A, Vanhoucke V, Nguyen P, Sainath T, Kingsbury B. Deep neural networks for acoustic modeling in speech recognition. IEEE Signal Processing Magazine, 2012, 29(6): 82–97
Article Google Scholar
Krizhevsky A, Sutskever I, Hinton G. ImageNet classification with deep convolutional neural networks. Advances in Neural Information Processing System, 2012, 1106–1114
Google Scholar
Ilya S, Oriol V, Quoc L. Sequence to sequence learning with neural networks. Advances in Neural Information Processing Systems, 2014, 3104–3112
Google Scholar
Yin Y, Goudriaan J, Lantinga E A, Vos J, Spiertz H J. A flexible sigmoid function of determinate growth. Annals of Botany, 2003, 91(3): 361–371
Article Google Scholar
Hochreiter S, Schmidhuber J. Long short-term memory. Neural Computation, 1997, 9(8): 1735–1780
Article Google Scholar
Cho K, Merrienboer V, Bahdanau D, Bengio Y. On the properties of neural machine translation: encoder-decoder approaches. 2014, arXiv preprint arXiv:1409.1259
Google Scholar
Chung J, Gulcehre C, Cho K, Bengio Y. Empirical evaluation of gated recurrent neural networks on sequence modeling. 2014, arXiv preprint arXiv:1412.3555
Google Scholar
Cho K, Bahdanau D, Bougares F, Schwenk H, Bengio Y. Learning phrase representations using RNN Encoder-decoder for statistical machine translation. In: Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing. 2014, 1724–1734
Google Scholar
Oda Y, Fudaba H, Neubig G, Hata H, Sakti S, Toda T, Nakamura S. Learning to generate pseudo-code from source code using statistical machine translation. In: Proceedings of the 30th IEEE/ACM International Conference on Automated Software Engineering. 2015, 574–584
Google Scholar
Neamtiu I, Foster S, Hicks M. Understanding source code evolution using abstract syntax tree matching. ACM SIGSOFT Software Engineering Notes, 2005, 30(4): 1–5
Article Google Scholar
Bahdanau D, Cho K, Bengio Y. Neural machine translation by jointly learning to align and translate. 2014, arXiv preprint arXiv:1409.0473
Google Scholar
Koehn P, Hoang H, Birch A, Callison C, Federico M, Bertoldi N, Cowan B, Shen W, Moran C, Zens R. Moses: open source toolkit for statistical machine translation. In: Proceedings of the 45th Annual Meeting of the Association for Computational Linguistics. 2007, 177–180
Google Scholar
Heafield K. KenLM: faster and smaller language model queries. In: Proceedings of the 6th Workshop on Statistical Machine Translation. 2011, 187–197
Google Scholar
Vinyals O, Kaiser L, Koo T, Petrov S, Sutskever I, Hinton G. Grammar as a foreign language. Advances in Neural Information Processing Systems. 2015, 2773–2781
Google Scholar
Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez A, Kaiser L, Polosukhin I. Attention is all you need. Advances in Neural Information Processing Systems. 2017, 6000–6010
Google Scholar
Papineni K, Roukos S, Ward T, Zhu J. BLEU: a method for automatic evaluation of machine translation. In: Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics. 2002, 311–318
Google Scholar
Banerjee S, Lavie A. METEOR: an automatic metric for MT evaluation with improved correlation with human judgments. In: Proceedings of the Association for Computational Linguistics Workshop on Intrinsic and Extrinsic Evaluation Measures for Machine Translation and/or Summarization. 2005, 65–72
Google Scholar
Denkowski M, Lavie A. Meteor universal: language specific translation evaluation for any target language. In: Proceedings of the 9th Workshop on Statistical Machine Translation. 2014, 376–380
Chapter Google Scholar
Stent A, Marge M, Singhai M. Evaluating evaluation methods for generation in the presence of variation. In: Proceedings of the 6th International Conference on Intelligent Text Processing and Computational Linguistics. 2005, 341–351
Chapter Google Scholar

Download references

Acknowledgements

This research was supported by the National Key Research and Development Program (2017YFB1001903) and the National Natural Science Foundation of China (Grant No. 61422304).

Author information

Authors and Affiliations

National Key Laboratory for Novel Software Technology, Nanjing University, Nanjing, 210023, China
Wenhao Zheng, Hongyu Zhou, Ming Li & Jianxin Wu

Authors

Wenhao Zheng
View author publications
Search author on:PubMed Google Scholar
Hongyu Zhou
View author publications
Search author on:PubMed Google Scholar
Ming Li
View author publications
Search author on:PubMed Google Scholar
Jianxin Wu
View author publications
Search author on:PubMed Google Scholar

Corresponding author

Correspondence to Ming Li.

Additional information

Wenhao Zheng received the BS degree in Computer Science and Technology from Nanjing University, China in 2015. In the same year, he was admitted to study for a MS degree in Nanjing University without entrance examination. He is currently a member of the LAMDA group. His research interests mainly include machine learning and software mining. Mr. Zheng received National Scholarship in 2014.

Hongyu Zhou received his BE degree from Wuhan University, China in 2015. He is currently a postgraduate student in the Department of Computer Science and Technology, Nanjing University, China. His research interests include computer vision and machine learning.

Ming Li received the BS and PhD degrees in computer science from Nanjing University, China in 2003 and 2008 respectively. He is currently an assistant professor with LAMDA Group, the Department of Computer Sciences and Technology, Nanjing University. His major research interests include machine learning, data mining and information retrieval, especially on learning with labeled and unlabeled data. He has been granted various awards including the CCF Outstanding Doctoral Dissertation Award (2009), Microsoft Fellowship Award (2005), etc. He has served on the program committee of a number of important international conferences including KDD’10, ACML’10, ACML’09, ACM CKIM’09, IEEE ICME’10, AI’10, etc. and served as reviewers for a number of journals including IEEE Trans. KDE, IEEE Trans. NN, IEEE Trans. SMCC, ACM Trans. IST, Pattern Recognition, Knowledge and Information Systems, Journal of Computer Science and Technology, etc. He is a committee member of CAAI machine learning society, member of ACM, IEEE, IEEE computer society, CCF and CAAI.

Jianxin Wu received his BS and MS degrees in computer science from Nanjing University, and his PhD degree in computer science from the Georgia Institute of Technology. He is currently a professor in the Department of Computer Science and Technology at Nanjing University, China, and is associated with the National Key Laboratory for Novel Software Technology, China.

Electronic supplementary material

Supplementary material, approximately 211 KB.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Zheng, W., Zhou, H., Li, M. et al. CodeAttention: translating source code to comments by exploiting the code constructs. Front. Comput. Sci. 13, 565–578 (2019). https://doi.org/10.1007/s11704-018-7457-6

Download citation

Received: 29 December 2017
Accepted: 07 June 2018
Published: 16 October 2018
Issue Date: June 2019
DOI: https://doi.org/10.1007/s11704-018-7457-6

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

CodeAttention: translating source code to comments by exploiting the code constructs

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Semantic Code Search in Software Repositories using Neural Machine Translation

Rethinking AI code generation: a one-shot correction approach based on user feedback

Commenting source code: is it worth it for small programming tasks?

Explore related subjects

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Electronic supplementary material

Supplementary material, approximately 211 KB.

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now