research-article

A Neural-Network based Code Summarization Approach by Using Source Code and its Call Dependencies

Authors:

Jinsheng DengAuthors Info & Claims

Internetware '19: Proceedings of the 11th Asia-Pacific Symposium on Internetware

Article No.: 12, Pages 1 - 10

https://doi.org/10.1145/3361242.3362774

Published: 28 October 2019 Publication History

Abstract

Code summarization aims at generating natural language abstraction for source code, and it can be of great help for program comprehension and software maintenance. The current code summarization approaches have made progress with neural-network. However, most of these methods focus on learning the semantic and syntax of source code snippets, ignoring the dependency of codes. In this paper, we propose a novel method based on neural-network model using the knowledge of the call dependency between source code and its related codes. We extract call dependencies from the source code, transform it as a token sequence of method names, and leverage the Seq2Seq model for code summarization using the combination of source code and call dependency information. About 100,000 code data is collected from 1,000 open source Java proejects on github for experiment. The large-scale code experiment shows that by considering not only the code itself but also the codes it called, the code summarization model can be improved with the BLEU score to 33.08.

References

[1]

Miltiadis Allamanis, Earl T Barr, Christian Bird, and Charles Sutton. 2014. Learning natural coding conventions. In Proceedings of the 22nd ACM SIGSOFT International Symposium on Foundations of Software Engineering. ACM, 281--293.

Digital Library

[2]

Miltiadis Allamanis, Earl T Barr, Christian Bird, and Charles Sutton. 2015. Suggesting accurate method and class names. In Proceedings of the 2015 10th Joint Meeting on Foundations of Software Engineering. ACM, 38--49.

Digital Library

[3]

Miltos Allamanis, Daniel Tarlow, Andrew Gordon, and Yi Wei. 2015. Bimodal modelling of source code and natural language. In International Conference on Machine Learning. 2123--2132.

Digital Library

[4]

Uri Alon, Omer Levy, and Eran Yahav. 2018. code2seq: Generating sequences from structured representations of code. arXiv preprint arXiv:1808.01400 (2018).

[5]

Alberto Bacchelli, Michele Lanza, and Romain Robbes. 2010. Linking e-mails and source code artifacts. In 2010 ACM/IEEE 32nd International Conference on Software Engineering, Vol. 1. IEEE, 375--384.

Digital Library

[6]

Dzmitry Bahdanau, Kyunghyun Cho, and Yoshua Bengio. 2014. Neural machine translation by jointly learning to align and translate. arXiv preprint arXiv:1409.0473 (2014).

[7]

Themistoklis Diamantopoulos, Georgios Karagiannopoulos, and Andreas Symeonidis. 2018. Codecatch: extracting source code snippets from online sources. In 2018 IEEE/ACM 6th International Workshop on Realizing Artificial Intelligence Synergies in Software Engineering (RAISE). IEEE, 21--27.

[8]

Brian P Eddy, Jeffrey A Robinson, Nicholas A Kraft, and Jeffrey C Carver. 2013. Evaluating source code summarization techniques: Replication and expansion. In 2013 21st International Conference on Program Comprehension (ICPC). IEEE, 13--22.

[9]

Xiaodong Gu, Hongyu Zhang, Dongmei Zhang, and Sunghun Kim. 2016. Deep API learning. In Proceedings of the 2016 24th ACM SIGSOFT International Symposium on Foundations of Software Engineering. ACM, 631--642.

Digital Library

[10]

Sonia Haiduc, Jairo Aponte, Laura Moreno, and Andrian Marcus. 2010. On the use of automated text summarization techniques for summarizing source code. In 2010 17th Working Conference on Reverse Engineering. IEEE, 35--44.

Digital Library

[11]

Hany Hassan, Anthony Aue, Chang Chen, Vishal Chowdhary, Jonathan Clark, Christian Federmann, Xuedong Huang, Marcin Junczys-Dowmunt, William Lewis, Mu Li, et al. 2018. Achieving human parity on automatic chinese to english news translation. arXiv preprint arXiv:1803.05567 (2018).

[12]

Emily Hill, Lori Pollock, and K Vijay-Shanker. 2009. Automatically capturing source code context of nl-queries for software maintenance and reuse. In Proceedings of the 31st International Conference on Software Engineering. IEEE Computer Society, 232--242.

Digital Library

[13]

Xing Hu, Ge Li, Xin Xia, David Lo, and Zhi Jin. 2018. Deep code comment generation. In Proceedings of the 26th Conference on Program Comprehension. ACM, 200--210.

Digital Library

[14]

Xing Hu, Ge Li, Xin Xia, David Lo, Shuai Lu, and Zhi Jin. 2018. Summarizing source code with transferred api knowledge. (2018).

[15]

Srinivasan Iyer, Ioannis Konstas, Alvin Cheung, and Luke Zettlemoyer. 2016. Summarizing source code using a neural attention model. In Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), Vol. 1. 2073--2083.

[16]

Siyuan Jiang, Ameer Armaly, and Collin McMillan. 2017. Automatically generating commit messages from diffs using neural machine translation. In Proceedings of the 32nd IEEE/ACM International Conference on Automated Software Engineering. IEEE Press, 135--146.

[17]

Laura Moreno, Jairo Aponte, Giriprasad Sridhara, Andrian Marcus, Lori Pollock, and K Vijay-Shanker. 2013. Automatic generation of natural language summaries for java classes. In 2013 21st International Conference on Program Comprehension (ICPC). IEEE, 23--32.

[18]

Najam Nazar, Yan Hu, and He Jiang. 2016. Summarizing software artifacts: A literature review. Journal of Computer Science and Technology 31, 5 (2016), 883--909.

[19]

Federico Tomassetti Nicholas Smith, Danny van Bruggen. [n.d.]. JAVAPARSER FOR PROCESSING JAVA CODE. https://javaparser.org/.

[20]

Kishore Papineni, Salim Roukos, Todd Ward, and Wei-Jing Zhu. 2002. BLEU: a method for automatic evaluation of machine translation. In Proceedings of the 40th annual meeting on association for computational linguistics. Association for Computational Linguistics, 311--318.

[21]

Nicholas Smith, Danny van Bruggen, and Federico Tomassetti. 2017. JavaParser: visited. Leanpub, oct. de (2017).

[22]

Ilya Sutskever, Oriol Vinyals, and Quoc V Le. 2014. Sequence to sequence learning with neural networks. In Advances in neural information processing systems. 3104--3112.

[23]

Eike von Savigny. 2010. Ludwig Wittgenstein: Philosophische Untersuchungen. Vol. 13. Walter de Gruyter.

[24]

Gang Yin, Tao Wang, Huaimin Wang, Qiang Fan, Yang Zhang, Yue Yu, and Cheng Yang. 2015. OSSEAN: mining crowd wisdom in open source communities. In 2015 IEEE Symposium on Service-Oriented System Engineering. IEEE, 367--371.

Digital Library

Cited By

Zhou ZLi MYu HFan GYang PHuang Z(2024)Learning to Generate Structured Code Summaries From Hybrid Code ContextIEEE Transactions on Software Engineering10.1109/TSE.2024.343956250:10(2512-2528)Online publication date: 1-Oct-2024
https://dl.acm.org/doi/10.1109/TSE.2024.3439562
Zhu JMiao YXu TZhu JSun X(2024)On the Effectiveness of Large Language Models in Statement-level Code Summarization2024 IEEE 24th International Conference on Software Quality, Reliability and Security (QRS)10.1109/QRS62785.2024.00030(216-227)Online publication date: 1-Jul-2024
https://doi.org/10.1109/QRS62785.2024.00030
Shruthi DChethan HAgughasi V(2024)Effective Approach for Fine-Tuning Pre-Trained Models for the Extraction of Texts From Source CodesITM Web of Conferences10.1051/itmconf/2024650300465(03004)Online publication date: 16-Jul-2024
https://doi.org/10.1051/itmconf/20246503004
Show More Cited By

Recommendations

Improving automatic source code summarization via deep reinforcement learning
ASE '18: Proceedings of the 33rd ACM/IEEE International Conference on Automated Software Engineering

Code summarization provides a high level natural language description of the function performed by code, as it can benefit the software maintenance, code categorization and retrieval. To the best of our knowledge, most state-of-the-art approaches follow ...
Improved Code Summarization via a Graph Neural Network
ICPC '20: Proceedings of the 28th International Conference on Program Comprehension

Automatic source code summarization is the task of generating natural language descriptions for source code. Automatic code summarization is a rapidly expanding research area, especially as the community has taken greater advantage of advances in neural ...
Retrieval-based neural source code summarization
ICSE '20: Proceedings of the ACM/IEEE 42nd International Conference on Software Engineering

Source code summarization aims to automatically generate concise summaries of source code in natural language texts, in order to help developers better understand and maintain source code. Traditional work generates a source code summary by utilizing ...

Comments

Information & Contributors

Information

Published In

cover image ACM Other conferences

Internetware '19: Proceedings of the 11th Asia-Pacific Symposium on Internetware

October 2019

179 pages

ISBN:9781450377010

DOI:10.1145/3361242

Copyright © 2019 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

In-Cooperation

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 28 October 2019

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article
Research
Refereed limited

Funding Sources

National Grand R&D Plan
National Natural Science Foundation of China

Conference

Internetware '19

Internetware '19: The 11th Asia-Pacific Symposium on Internetware

October 28 - 29, 2019

Fukuoka, Japan

Acceptance Rates

Internetware '19 Paper Acceptance Rate 20 of 35 submissions, 57%;

Overall Acceptance Rate 55 of 111 submissions, 50%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

14
Total Citations
View Citations
343
Total Downloads

Downloads (Last 12 months)36
Downloads (Last 6 weeks)2

Reflects downloads up to 03 Mar 2025

Other Metrics

View Author Metrics

Citations

Cited By

Zhou ZLi MYu HFan GYang PHuang Z(2024)Learning to Generate Structured Code Summaries From Hybrid Code ContextIEEE Transactions on Software Engineering10.1109/TSE.2024.343956250:10(2512-2528)Online publication date: 1-Oct-2024
https://dl.acm.org/doi/10.1109/TSE.2024.3439562
Zhu JMiao YXu TZhu JSun X(2024)On the Effectiveness of Large Language Models in Statement-level Code Summarization2024 IEEE 24th International Conference on Software Quality, Reliability and Security (QRS)10.1109/QRS62785.2024.00030(216-227)Online publication date: 1-Jul-2024
https://doi.org/10.1109/QRS62785.2024.00030
Shruthi DChethan HAgughasi V(2024)Effective Approach for Fine-Tuning Pre-Trained Models for the Extraction of Texts From Source CodesITM Web of Conferences10.1051/itmconf/2024650300465(03004)Online publication date: 16-Jul-2024
https://doi.org/10.1051/itmconf/20246503004
Sharma TKechagia MGeorgiou STiwari RVats IMoazen HSarro F(2024)A survey on machine learning techniques applied to source codeJournal of Systems and Software10.1016/j.jss.2023.111934209:COnline publication date: 14-Mar-2024
https://dl.acm.org/doi/10.1016/j.jss.2023.111934
Zhang XHou XQiao XSong W(2024)A review of automatic source code summarizationEmpirical Software Engineering10.1007/s10664-024-10553-629:6Online publication date: 7-Oct-2024
https://dl.acm.org/doi/10.1007/s10664-024-10553-6
Bastías ODíaz JLópez Fenner J(2023)Exploring the Intersection between Software Maintenance and Machine Learning—A Systematic Mapping StudyApplied Sciences10.3390/app1303171013:3(1710)Online publication date: 29-Jan-2023
https://doi.org/10.3390/app13031710
Rai SBelwal RGupta A(2022)A Review on Source Code DocumentationACM Transactions on Intelligent Systems and Technology10.1145/351931213:5(1-44)Online publication date: 22-Mar-2022
https://dl.acm.org/doi/10.1145/3519312
Lin LHuang ZYu YLiu Y(2022)Multi-Modal Code Summarization with Retrieved Summary2022 IEEE 22nd International Working Conference on Source Code Analysis and Manipulation (SCAM)10.1109/SCAM55253.2022.00020(132-142)Online publication date: Oct-2022
https://doi.org/10.1109/SCAM55253.2022.00020
Yu CYang GChen XLiu KZhou Y(2022)BashExplainer: Retrieval-Augmented Bash Code Comment Generation based on Fine-tuned CodeBERT2022 IEEE International Conference on Software Maintenance and Evolution (ICSME)10.1109/ICSME55016.2022.00016(82-93)Online publication date: Oct-2022
https://doi.org/10.1109/ICSME55016.2022.00016
Roy DFakhoury SArnaoudova VSpinellis DGousios GChechik MDi Penta M(2021)Reassessing automatic evaluation metrics for code summarization tasksProceedings of the 29th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering10.1145/3468264.3468588(1105-1116)Online publication date: 20-Aug-2021
https://dl.acm.org/doi/10.1145/3468264.3468588
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Table of Conten