skip to main content
10.1145/3361242.3362774acmotherconferencesArticle/Chapter ViewAbstractPublication PagesinternetwareConference Proceedingsconference-collections
research-article

A Neural-Network based Code Summarization Approach by Using Source Code and its Call Dependencies

Published: 28 October 2019 Publication History

Abstract

Code summarization aims at generating natural language abstraction for source code, and it can be of great help for program comprehension and software maintenance. The current code summarization approaches have made progress with neural-network. However, most of these methods focus on learning the semantic and syntax of source code snippets, ignoring the dependency of codes. In this paper, we propose a novel method based on neural-network model using the knowledge of the call dependency between source code and its related codes. We extract call dependencies from the source code, transform it as a token sequence of method names, and leverage the Seq2Seq model for code summarization using the combination of source code and call dependency information. About 100,000 code data is collected from 1,000 open source Java proejects on github for experiment. The large-scale code experiment shows that by considering not only the code itself but also the codes it called, the code summarization model can be improved with the BLEU score to 33.08.

References

[1]
Miltiadis Allamanis, Earl T Barr, Christian Bird, and Charles Sutton. 2014. Learning natural coding conventions. In Proceedings of the 22nd ACM SIGSOFT International Symposium on Foundations of Software Engineering. ACM, 281--293.
[2]
Miltiadis Allamanis, Earl T Barr, Christian Bird, and Charles Sutton. 2015. Suggesting accurate method and class names. In Proceedings of the 2015 10th Joint Meeting on Foundations of Software Engineering. ACM, 38--49.
[3]
Miltos Allamanis, Daniel Tarlow, Andrew Gordon, and Yi Wei. 2015. Bimodal modelling of source code and natural language. In International Conference on Machine Learning. 2123--2132.
[4]
Uri Alon, Omer Levy, and Eran Yahav. 2018. code2seq: Generating sequences from structured representations of code. arXiv preprint arXiv:1808.01400 (2018).
[5]
Alberto Bacchelli, Michele Lanza, and Romain Robbes. 2010. Linking e-mails and source code artifacts. In 2010 ACM/IEEE 32nd International Conference on Software Engineering, Vol. 1. IEEE, 375--384.
[6]
Dzmitry Bahdanau, Kyunghyun Cho, and Yoshua Bengio. 2014. Neural machine translation by jointly learning to align and translate. arXiv preprint arXiv:1409.0473 (2014).
[7]
Themistoklis Diamantopoulos, Georgios Karagiannopoulos, and Andreas Symeonidis. 2018. Codecatch: extracting source code snippets from online sources. In 2018 IEEE/ACM 6th International Workshop on Realizing Artificial Intelligence Synergies in Software Engineering (RAISE). IEEE, 21--27.
[8]
Brian P Eddy, Jeffrey A Robinson, Nicholas A Kraft, and Jeffrey C Carver. 2013. Evaluating source code summarization techniques: Replication and expansion. In 2013 21st International Conference on Program Comprehension (ICPC). IEEE, 13--22.
[9]
Xiaodong Gu, Hongyu Zhang, Dongmei Zhang, and Sunghun Kim. 2016. Deep API learning. In Proceedings of the 2016 24th ACM SIGSOFT International Symposium on Foundations of Software Engineering. ACM, 631--642.
[10]
Sonia Haiduc, Jairo Aponte, Laura Moreno, and Andrian Marcus. 2010. On the use of automated text summarization techniques for summarizing source code. In 2010 17th Working Conference on Reverse Engineering. IEEE, 35--44.
[11]
Hany Hassan, Anthony Aue, Chang Chen, Vishal Chowdhary, Jonathan Clark, Christian Federmann, Xuedong Huang, Marcin Junczys-Dowmunt, William Lewis, Mu Li, et al. 2018. Achieving human parity on automatic chinese to english news translation. arXiv preprint arXiv:1803.05567 (2018).
[12]
Emily Hill, Lori Pollock, and K Vijay-Shanker. 2009. Automatically capturing source code context of nl-queries for software maintenance and reuse. In Proceedings of the 31st International Conference on Software Engineering. IEEE Computer Society, 232--242.
[13]
Xing Hu, Ge Li, Xin Xia, David Lo, and Zhi Jin. 2018. Deep code comment generation. In Proceedings of the 26th Conference on Program Comprehension. ACM, 200--210.
[14]
Xing Hu, Ge Li, Xin Xia, David Lo, Shuai Lu, and Zhi Jin. 2018. Summarizing source code with transferred api knowledge. (2018).
[15]
Srinivasan Iyer, Ioannis Konstas, Alvin Cheung, and Luke Zettlemoyer. 2016. Summarizing source code using a neural attention model. In Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), Vol. 1. 2073--2083.
[16]
Siyuan Jiang, Ameer Armaly, and Collin McMillan. 2017. Automatically generating commit messages from diffs using neural machine translation. In Proceedings of the 32nd IEEE/ACM International Conference on Automated Software Engineering. IEEE Press, 135--146.
[17]
Laura Moreno, Jairo Aponte, Giriprasad Sridhara, Andrian Marcus, Lori Pollock, and K Vijay-Shanker. 2013. Automatic generation of natural language summaries for java classes. In 2013 21st International Conference on Program Comprehension (ICPC). IEEE, 23--32.
[18]
Najam Nazar, Yan Hu, and He Jiang. 2016. Summarizing software artifacts: A literature review. Journal of Computer Science and Technology 31, 5 (2016), 883--909.
[19]
Federico Tomassetti Nicholas Smith, Danny van Bruggen. [n.d.]. JAVAPARSER FOR PROCESSING JAVA CODE. https://javaparser.org/.
[20]
Kishore Papineni, Salim Roukos, Todd Ward, and Wei-Jing Zhu. 2002. BLEU: a method for automatic evaluation of machine translation. In Proceedings of the 40th annual meeting on association for computational linguistics. Association for Computational Linguistics, 311--318.
[21]
Nicholas Smith, Danny van Bruggen, and Federico Tomassetti. 2017. JavaParser: visited. Leanpub, oct. de (2017).
[22]
Ilya Sutskever, Oriol Vinyals, and Quoc V Le. 2014. Sequence to sequence learning with neural networks. In Advances in neural information processing systems. 3104--3112.
[23]
Eike von Savigny. 2010. Ludwig Wittgenstein: Philosophische Untersuchungen. Vol. 13. Walter de Gruyter.
[24]
Gang Yin, Tao Wang, Huaimin Wang, Qiang Fan, Yang Zhang, Yue Yu, and Cheng Yang. 2015. OSSEAN: mining crowd wisdom in open source communities. In 2015 IEEE Symposium on Service-Oriented System Engineering. IEEE, 367--371.

Cited By

View all
  • (2024)Learning to Generate Structured Code Summaries From Hybrid Code ContextIEEE Transactions on Software Engineering10.1109/TSE.2024.343956250:10(2512-2528)Online publication date: 1-Oct-2024
  • (2024)On the Effectiveness of Large Language Models in Statement-level Code Summarization2024 IEEE 24th International Conference on Software Quality, Reliability and Security (QRS)10.1109/QRS62785.2024.00030(216-227)Online publication date: 1-Jul-2024
  • (2024)Effective Approach for Fine-Tuning Pre-Trained Models for the Extraction of Texts From Source CodesITM Web of Conferences10.1051/itmconf/2024650300465(03004)Online publication date: 16-Jul-2024
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Other conferences
Internetware '19: Proceedings of the 11th Asia-Pacific Symposium on Internetware
October 2019
179 pages
ISBN:9781450377010
DOI:10.1145/3361242
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

In-Cooperation

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 28 October 2019

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. Call Dependency
  2. Code Summarization
  3. Neural Network
  4. Open Source

Qualifiers

  • Research-article
  • Research
  • Refereed limited

Funding Sources

  • National Grand R&D Plan
  • National Natural Science Foundation of China

Conference

Internetware '19

Acceptance Rates

Internetware '19 Paper Acceptance Rate 20 of 35 submissions, 57%;
Overall Acceptance Rate 55 of 111 submissions, 50%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)36
  • Downloads (Last 6 weeks)2
Reflects downloads up to 03 Mar 2025

Other Metrics

Citations

Cited By

View all
  • (2024)Learning to Generate Structured Code Summaries From Hybrid Code ContextIEEE Transactions on Software Engineering10.1109/TSE.2024.343956250:10(2512-2528)Online publication date: 1-Oct-2024
  • (2024)On the Effectiveness of Large Language Models in Statement-level Code Summarization2024 IEEE 24th International Conference on Software Quality, Reliability and Security (QRS)10.1109/QRS62785.2024.00030(216-227)Online publication date: 1-Jul-2024
  • (2024)Effective Approach for Fine-Tuning Pre-Trained Models for the Extraction of Texts From Source CodesITM Web of Conferences10.1051/itmconf/2024650300465(03004)Online publication date: 16-Jul-2024
  • (2024)A survey on machine learning techniques applied to source codeJournal of Systems and Software10.1016/j.jss.2023.111934209:COnline publication date: 14-Mar-2024
  • (2024)A review of automatic source code summarizationEmpirical Software Engineering10.1007/s10664-024-10553-629:6Online publication date: 7-Oct-2024
  • (2023)Exploring the Intersection between Software Maintenance and Machine Learning—A Systematic Mapping StudyApplied Sciences10.3390/app1303171013:3(1710)Online publication date: 29-Jan-2023
  • (2022)A Review on Source Code DocumentationACM Transactions on Intelligent Systems and Technology10.1145/351931213:5(1-44)Online publication date: 22-Mar-2022
  • (2022)Multi-Modal Code Summarization with Retrieved Summary2022 IEEE 22nd International Working Conference on Source Code Analysis and Manipulation (SCAM)10.1109/SCAM55253.2022.00020(132-142)Online publication date: Oct-2022
  • (2022)BashExplainer: Retrieval-Augmented Bash Code Comment Generation based on Fine-tuned CodeBERT2022 IEEE International Conference on Software Maintenance and Evolution (ICSME)10.1109/ICSME55016.2022.00016(82-93)Online publication date: Oct-2022
  • (2021)Reassessing automatic evaluation metrics for code summarization tasksProceedings of the 29th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering10.1145/3468264.3468588(1105-1116)Online publication date: 20-Aug-2021
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media