ABSTRACT
Code summarization aims at generating natural language abstraction for source code, and it can be of great help for program comprehension and software maintenance. The current code summarization approaches have made progress with neural-network. However, most of these methods focus on learning the semantic and syntax of source code snippets, ignoring the dependency of codes. In this paper, we propose a novel method based on neural-network model using the knowledge of the call dependency between source code and its related codes. We extract call dependencies from the source code, transform it as a token sequence of method names, and leverage the Seq2Seq model for code summarization using the combination of source code and call dependency information. About 100,000 code data is collected from 1,000 open source Java proejects on github for experiment. The large-scale code experiment shows that by considering not only the code itself but also the codes it called, the code summarization model can be improved with the BLEU score to 33.08.
- Miltiadis Allamanis, Earl T Barr, Christian Bird, and Charles Sutton. 2014. Learning natural coding conventions. In Proceedings of the 22nd ACM SIGSOFT International Symposium on Foundations of Software Engineering. ACM, 281--293.Google ScholarDigital Library
- Miltiadis Allamanis, Earl T Barr, Christian Bird, and Charles Sutton. 2015. Suggesting accurate method and class names. In Proceedings of the 2015 10th Joint Meeting on Foundations of Software Engineering. ACM, 38--49.Google ScholarDigital Library
- Miltos Allamanis, Daniel Tarlow, Andrew Gordon, and Yi Wei. 2015. Bimodal modelling of source code and natural language. In International Conference on Machine Learning. 2123--2132.Google ScholarDigital Library
- Uri Alon, Omer Levy, and Eran Yahav. 2018. code2seq: Generating sequences from structured representations of code. arXiv preprint arXiv:1808.01400 (2018).Google Scholar
- Alberto Bacchelli, Michele Lanza, and Romain Robbes. 2010. Linking e-mails and source code artifacts. In 2010 ACM/IEEE 32nd International Conference on Software Engineering, Vol. 1. IEEE, 375--384.Google ScholarDigital Library
- Dzmitry Bahdanau, Kyunghyun Cho, and Yoshua Bengio. 2014. Neural machine translation by jointly learning to align and translate. arXiv preprint arXiv:1409.0473 (2014).Google Scholar
- Themistoklis Diamantopoulos, Georgios Karagiannopoulos, and Andreas Symeonidis. 2018. Codecatch: extracting source code snippets from online sources. In 2018 IEEE/ACM 6th International Workshop on Realizing Artificial Intelligence Synergies in Software Engineering (RAISE). IEEE, 21--27.Google Scholar
- Brian P Eddy, Jeffrey A Robinson, Nicholas A Kraft, and Jeffrey C Carver. 2013. Evaluating source code summarization techniques: Replication and expansion. In 2013 21st International Conference on Program Comprehension (ICPC). IEEE, 13--22.Google ScholarCross Ref
- Xiaodong Gu, Hongyu Zhang, Dongmei Zhang, and Sunghun Kim. 2016. Deep API learning. In Proceedings of the 2016 24th ACM SIGSOFT International Symposium on Foundations of Software Engineering. ACM, 631--642.Google ScholarDigital Library
- Sonia Haiduc, Jairo Aponte, Laura Moreno, and Andrian Marcus. 2010. On the use of automated text summarization techniques for summarizing source code. In 2010 17th Working Conference on Reverse Engineering. IEEE, 35--44.Google ScholarDigital Library
- Hany Hassan, Anthony Aue, Chang Chen, Vishal Chowdhary, Jonathan Clark, Christian Federmann, Xuedong Huang, Marcin Junczys-Dowmunt, William Lewis, Mu Li, et al. 2018. Achieving human parity on automatic chinese to english news translation. arXiv preprint arXiv:1803.05567 (2018).Google Scholar
- Emily Hill, Lori Pollock, and K Vijay-Shanker. 2009. Automatically capturing source code context of nl-queries for software maintenance and reuse. In Proceedings of the 31st International Conference on Software Engineering. IEEE Computer Society, 232--242.Google ScholarDigital Library
- Xing Hu, Ge Li, Xin Xia, David Lo, and Zhi Jin. 2018. Deep code comment generation. In Proceedings of the 26th Conference on Program Comprehension. ACM, 200--210.Google ScholarDigital Library
- Xing Hu, Ge Li, Xin Xia, David Lo, Shuai Lu, and Zhi Jin. 2018. Summarizing source code with transferred api knowledge. (2018).Google Scholar
- Srinivasan Iyer, Ioannis Konstas, Alvin Cheung, and Luke Zettlemoyer. 2016. Summarizing source code using a neural attention model. In Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), Vol. 1. 2073--2083.Google ScholarCross Ref
- Siyuan Jiang, Ameer Armaly, and Collin McMillan. 2017. Automatically generating commit messages from diffs using neural machine translation. In Proceedings of the 32nd IEEE/ACM International Conference on Automated Software Engineering. IEEE Press, 135--146.Google ScholarDigital Library
- Laura Moreno, Jairo Aponte, Giriprasad Sridhara, Andrian Marcus, Lori Pollock, and K Vijay-Shanker. 2013. Automatic generation of natural language summaries for java classes. In 2013 21st International Conference on Program Comprehension (ICPC). IEEE, 23--32.Google ScholarCross Ref
- Najam Nazar, Yan Hu, and He Jiang. 2016. Summarizing software artifacts: A literature review. Journal of Computer Science and Technology 31, 5 (2016), 883--909.Google ScholarCross Ref
- Federico Tomassetti Nicholas Smith, Danny van Bruggen. [n.d.]. JAVAPARSER FOR PROCESSING JAVA CODE. https://javaparser.org/.Google Scholar
- Kishore Papineni, Salim Roukos, Todd Ward, and Wei-Jing Zhu. 2002. BLEU: a method for automatic evaluation of machine translation. In Proceedings of the 40th annual meeting on association for computational linguistics. Association for Computational Linguistics, 311--318.Google Scholar
- Nicholas Smith, Danny van Bruggen, and Federico Tomassetti. 2017. JavaParser: visited. Leanpub, oct. de (2017).Google Scholar
- Ilya Sutskever, Oriol Vinyals, and Quoc V Le. 2014. Sequence to sequence learning with neural networks. In Advances in neural information processing systems. 3104--3112.Google Scholar
- Eike von Savigny. 2010. Ludwig Wittgenstein: Philosophische Untersuchungen. Vol. 13. Walter de Gruyter.Google Scholar
- Gang Yin, Tao Wang, Huaimin Wang, Qiang Fan, Yang Zhang, Yue Yu, and Cheng Yang. 2015. OSSEAN: mining crowd wisdom in open source communities. In 2015 IEEE Symposium on Service-Oriented System Engineering. IEEE, 367--371.Google ScholarDigital Library
Index Terms
- A Neural-Network based Code Summarization Approach by Using Source Code and its Call Dependencies
Recommendations
Why My Code Summarization Model Does Not Work: Code Comment Improvement with Category Prediction
Continuous Special Section: AI and SECode summarization aims at generating a code comment given a block of source code and it is normally performed by training machine learning algorithms on existing code block-comment pairs. Code comments in practice have different intentions. For example,...
Retrieval-based neural source code summarization
ICSE '20: Proceedings of the ACM/IEEE 42nd International Conference on Software EngineeringSource code summarization aims to automatically generate concise summaries of source code in natural language texts, in order to help developers better understand and maintain source code. Traditional work generates a source code summary by utilizing ...
An Extractive-and-Abstractive Framework for Source Code Summarization
(Source) Code summarization aims to automatically generate summaries/comments for given code snippets in the form of natural language. Such summaries play a key role in helping developers understand and maintain source code. Existing code summarization ...
Comments