research-article

API + code = better code summary? insights from an exploratory study

Authors:
Prantik Parashar Sarmah

IIT Tirupati, India

IIT Tirupati, India
View Profile

,
Sridhar Chimalakonda

IIT Tirupati, India

IIT Tirupati, India
View Profile

PROMISE 2022: Proceedings of the 18th International Conference on Predictive Models and Data Analytics in Software EngineeringNovember 2022Pages 92–101https://doi.org/10.1145/3558489.3559075

Published:09 November 2022Publication History

PROMISE 2022: Proceedings of the 18th International Conference on Predictive Models and Data Analytics in Software Engineering

Pages 92–101

ABSTRACT

Automatic code summarization techniques aid in program comprehension by generating a natural language summary from source code. Recent research in this area has seen significant developments from basic Seq2Seq models to different flavors of transformer models, which try to encode the structural components of the source code using various input representations. Apart from the source code itself, components used in source code, such as API knowledge, have previously been helpful in code summarization using recurrent neural networks (RNN). So, in this article, along with source code and its structure, we explore the importance of APIs in improving the performance of code summarization models. Our model uses a transformer-based architecture containing two encoders for two input modules, source code and API sequences, and a joint decoder to generate summaries combining the outputs of two encoders. We experimented with our proposed model on a dataset of java projects collected from GitHub containing around 87K <Java Method, API Sequence, Comment> triplets. The experiments show our model outperforms most of the existing RNN-based approaches, but the overall performance does not improve compared with the state-of-the-art approach using transformers. Thus, the results show that although API information is helpful for code summarization, we see immense scope for further research focusing on improving models and leveraging additional API knowledge for code summarization.

References

Wasi Uddin Ahmad, Saikat Chakraborty, Baishakhi Ray, and Kai-Wei Chang. 2020. A transformer-based approach for source code summarization. arXiv preprint arXiv:2005.00653, https://doi.org/10.48550/arXiv.2005.00653 Google Scholar
Uri Alon, Meital Zilberstein, Omer Levy, and Eran Yahav. 2019. code2vec: Learning distributed representations of code. Proceedings of the ACM on Programming Languages, 3, POPL (2019), 1–29. https://doi.org/10.1145/3291636 Google ScholarDigital Library
Junyan Cheng, Iordanis Fostiropoulos, and Barry Boehm. 2021. GN-Transformer: Fusing Sequence and Graph Representation for Improved Code Summarization. arXiv preprint arXiv:2111.08874, https://doi.org/10.48550/arXiv.2111.08874 Google Scholar
Akiko Eriguchi, Kazuma Hashimoto, and Yoshimasa Tsuruoka. 2016. Tree-to-Sequence Attentional Neural Machine Translation. In Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). Association for Computational Linguistics, Berlin, Germany. 823–833. https://doi.org/10.18653/v1/P16-1078 Google ScholarCross Ref
Jaroslav Fowkes, Pankajan Chanthirasegaran, Razvan Ranca, Miltiadis Allamanis, Mirella Lapata, and Charles Sutton. 2016. TASSAL: Autofolding for source code summarization. In 2016 IEEE/ACM 38th International Conference on Software Engineering Companion (ICSE-C). 649–652. Google ScholarDigital Library
Shuzheng Gao, Cuiyun Gao, Yulan He, Jichuan Zeng, Lun Yiu Nie, and Xin Xia. 2021. Code structure guided transformer for source code summarization. arXiv preprint arXiv:2104.09340, https://doi.org/10.48550/arXiv.2104.09340 Google Scholar
Yuexiu Gao and Chen Lyu. 2022. M2TS: Multi-Scale Multi-Modal Approach Based on Transformer for Source Code Summarization. arXiv preprint arXiv:2203.09707, https://doi.org/10.48550/arXiv.2203.09707 Google Scholar
Sonia Haiduc, Jairo Aponte, and Andrian Marcus. 2010. Supporting program comprehension with source code summarization. In 2010 acm/ieee 32nd international conference on software engineering. 2, 223–226. https://doi.org/10.1145/1810295.1810335 Google ScholarDigital Library
Jacob Harer, Chris Reale, and Peter Chin. 2019. Tree-transformer: A transformer-based method for correction of tree-structured data. arXiv preprint arXiv:1908.00449, https://doi.org/10.48550/arXiv.1908.00449 Google Scholar
Xing Hu, Ge Li, Xin Xia, David Lo, and Zhi Jin. 2018. Deep code comment generation. In 2018 IEEE/ACM 26th International Conference on Program Comprehension (ICPC). 200–20010. Google ScholarDigital Library
Xing Hu, Ge Li, Xin Xia, David Lo, Shuai Lu, and Zhi Jin. 2018. Summarizing source code with transferred api knowledge. https://doi.org/10.24963/ijcai.2018/314 Google ScholarCross Ref
Walid M Ibrahim, Nicolas Bettenburg, Bram Adams, and Ahmed E Hassan. 2012. On the relationship between comment update practices and software bugs. Journal of Systems and Software, 85, 10 (2012), 2293–2304. https://doi.org/10.1016/j.jss.2011.09.019 Google ScholarDigital Library
Srinivasan Iyer, Ioannis Konstas, Alvin Cheung, and Luke Zettlemoyer. 2016. Summarizing source code using a neural attention model. In Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). 2073–2083. Google ScholarCross Ref
Xue Jiang, Zhuoran Zheng, Chen Lyu, Liang Li, and Lei Lyu. 2021. TreeBERT: A tree-based pre-trained model for programming language. In Proceedings of the Thirty-Seventh Conference on Uncertainty in Artificial Intelligence, Cassio de Campos and Marloes H. Maathuis (Eds.) (Proceedings of Machine Learning Research, Vol. 161). PMLR, 54–63. https://proceedings.mlr.press/v161/jiang21a.html Google Scholar
Thomas D. LaToza, Gina Venolia, and Robert DeLine. 2006. Maintaining Mental Models: A Study of Developer Work Habits. Association for Computing Machinery, New York, NY, USA. 492–501. isbn:1595933751 https://doi.org/10.1145/1134285.1134355 Google ScholarDigital Library
Alon Lavie and Abhaya Agarwal. 2007. METEOR: An automatic metric for MT evaluation with high levels of correlation with human judgments. In Proceedings of the second workshop on statistical machine translation. 228–231. Google ScholarCross Ref
Alexander LeClair, Sakib Haque, Lingfei Wu, and Collin McMillan. 2020. Improved code summarization via a graph neural network. In Proceedings of the 28th International Conference on Program Comprehension. 184–195. https://doi.org/10.1145/3387904.3389268 Google ScholarDigital Library
Chin-Yew Lin. 2004. Text Summarization Branches Out, chapter ROUGE: A Package for Automatic Evaluation of Summaries. Google Scholar
Chin-Yew Lin and Eduard Hovy. 2003. Automatic evaluation of summaries using n-gram co-occurrence statistics. In Proceedings of the 2003 Human Language Technology Conference of the North American Chapter of the Association for Computational Linguistics. 150–157. Google ScholarDigital Library
Shangqing Liu, Yu Chen, Xiaofei Xie, Jing Kai Siow, and Yang Liu. 2020. Retrieval-Augmented Generation for Code Summarization via Hybrid GNN. In International Conference on Learning Representations. https://doi.org/10.48550/arXiv.2006.05405 Google Scholar
Paul W. McBurney and Collin McMillan. 2016. Automatic Source Code Summarization of Context for Java Methods. IEEE Transactions on Software Engineering, 42, 2 (2016), 103–119. https://doi.org/10.1109/TSE.2015.2465386 Google ScholarDigital Library
Kishore Papineni, Salim Roukos, Todd Ward, and Wei-Jing Zhu. 2002. Bleu: a method for automatic evaluation of machine translation. In Proceedings of the 40th annual meeting of the Association for Computational Linguistics. 311–318. Google Scholar
Ramin Shahbazi, Rishab Sharma, and Fatemeh H Fard. 2021. API2Com: On the Improvement of Automatically Generated Code Comments Using API Documentations. In 2021 IEEE/ACM 29th International Conference on Program Comprehension (ICPC). 411–421. https://doi.org/10.1109/ICPC52881.2021.00049 Google ScholarCross Ref
Lin Shi, Hao Zhong, Tao Xie, and Mingshu Li. 2011. An empirical study on evolution of API documentation. In International Conference on Fundamental Approaches To Software Engineering. 416–431. https://doi.org/10.1007/978-3-642-19811-3_29 Google ScholarCross Ref
Yusuke Shido, Yasuaki Kobayashi, Akihiro Yamamoto, Atsushi Miyamoto, and Tadayuki Matsumura. 2019. Automatic Source Code Summarization with Extended Tree-LSTM. In 2019 International Joint Conference on Neural Networks (IJCNN). 1–8. https://doi.org/10.1109/IJCNN.2019.8851751 Google ScholarCross Ref
Diomidis Spinellis. 2010. Code documentation. IEEE software, 27, 4 (2010), 18–19. https://doi.org/10.1109/MS.2010.95 Google ScholarDigital Library
Giriprasad Sridhara, Emily Hill, Divya Muppaneni, Lori Pollock, and K Vijay-Shanker. 2010. Towards automatically generating summary comments for java methods. In Proceedings of the IEEE/ACM international conference on Automated software engineering. 43–52. https://doi.org/10.1145/1858996.1859006 Google ScholarDigital Library
Kai Sheng Tai, Richard Socher, and Christopher D Manning. 2015. Improved semantic representations from tree-structured long short-term memory networks. arXiv preprint arXiv:1503.00075, https://doi.org/10.48550/arXiv.1503.00075 Google Scholar
Carmine Vassallo, Sebastiano Panichella, Massimiliano Di Penta, and Gerardo Canfora. 2014. Codes: Mining source code descriptions from developers discussions. In Proceedings of the 22nd International Conference on Program Comprehension. 106–109. https://doi.org/10.1145/2597008.2597799 Google ScholarDigital Library
Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Ł ukasz Kaiser, and Illia Polosukhin. 2017. Attention is all you need. Advances in neural information processing systems, 30 (2017). Google Scholar
Yu Wang, Yu Dong, Xuesong Lu, and Aoying Zhou. 2022. GypSum: Learning Hybrid Representations for Code Summarization. arXiv preprint arXiv:2204.12916, https://doi.org/10.48550/arXiv.2204.12916 Google Scholar
Edmund Wong, Taiyue Liu, and Lin Tan. 2015. Clocom: Mining existing source code for automatic comment generation. In 2015 IEEE 22nd International Conference on Software Analysis, Evolution, and Reengineering (SANER). 380–389. https://doi.org/10.1109/SANER.2015.7081848 Google ScholarCross Ref
Hongqiu Wu, Hai Zhao, and Min Zhang. 2020. Code summarization with structure-induced transformer. arXiv preprint arXiv:2012.14710, https://doi.org/10.48550/arXiv.2012.14710 Google Scholar
Xin Xia, Lingfeng Bao, David Lo, Zhenchang Xing, Ahmed E Hassan, and Shanping Li. 2017. Measuring program comprehension: A large-scale field study with professionals. IEEE Transactions on Software Engineering, 44, 10 (2017), 951–976. https://doi.org/10.1109/TSE.2017.2734091 Google ScholarDigital Library
Annie TT Ying and Martin P Robillard. 2013. Code fragment summarization. In Proceedings of the 2013 9th Joint Meeting on Foundations of Software Engineering. 655–658. https://doi.org/10.1145/2491411.2494587 Google ScholarDigital Library
Jian Zhang, Xu Wang, Hongyu Zhang, Hailong Sun, and Xudong Liu. 2020. Retrieval-based neural source code summarization. In 2020 IEEE/ACM 42nd International Conference on Software Engineering (ICSE). 1385–1397. Google ScholarDigital Library
Hao Zhong and Hong Mei. 2019. An Empirical Study on API Usages. IEEE Transactions on Software Engineering, 45, 4 (2019), 319–334. https://doi.org/10.1109/TSE.2017.2782280 Google ScholarCross Ref

Index Terms

API + code = better code summary? insights from an exploratory study
1. Computing methodologies
  1. Artificial intelligence
    1. Natural language processing
      1. Natural language generation
2. Software and its engineering
  1. Software notations and tools
    1. Software maintenance tools

Recommendations

A Neural-Network based Code Summarization Approach by Using Source Code and its Call Dependencies
Internetware '19: Proceedings of the 11th Asia-Pacific Symposium on Internetware

Code summarization aims at generating natural language abstraction for source code, and it can be of great help for program comprehension and software maintenance. The current code summarization approaches have made progress with neural-network. However,...
Read More
An Extractive-and-Abstractive Framework for Source Code Summarization
(Source) Code summarization aims to automatically generate summaries/comments for given code snippets in the form of natural language. Such summaries play a key role in helping developers understand and maintain source code. Existing code summarization ...
Read More
Snippet Comment Generation Based on Code Context Expansion
Code commenting plays an important role in program comprehension. Automatic comment generation helps improve software maintenance efficiency. The code comments to annotate a method mainly include header comments and snippet comments. The header comment ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
PROMISE 2022: Proceedings of the 18th International Conference on Predictive Models and Data Analytics in Software Engineering
November 2022
101 pages
ISBN:9781450398602
DOI:10.1145/3558489
General Chair:
Shane McIntosh
University of Waterloo, Canada
,
Program Chairs:
Weiyi Shang
Concordia University, Canada
,
Gema Rodriguez Perez
University of British Columbia, Canada
Copyright © 2022 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 9 November 2022
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
API sequences
code summarization
source code
transformers
Qualifiers
- research-article
Conference

Acceptance Rates
Overall Acceptance Rate64of125submissions,51%

Upcoming Conference

ICSE 2025

2025 IEEE/ACM 46th International Conference on Software Engineering

April 26 - May 3, 2025

Ottawa , ON , Canada
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 0
  Total Citations
  View Citations
- 149
  Total Downloads
- Downloads (Last 12 months)31
- Downloads (Last 6 weeks)3
Other Metrics
View Author Metrics
Cited By
This publication has not been cited yet

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

API + code = better code summary? insights from an exploratory study

PROMISE 2022: Proceedings of the 18th International Conference on Predictive Models and Data Analytics in Software Engineering

ABSTRACT

References

Cited By

Index Terms

Recommendations

A Neural-Network based Code Summarization Approach by Using Source Code and its Call Dependencies

An Extractive-and-Abstractive Framework for Source Code Summarization

Snippet Comment Generation Based on Code Context Expansion

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Upcoming Conference

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

API + code = better code summary? insights from an exploratory study

PROMISE 2022: Proceedings of the 18th International Conference on Predictive Models and Data Analytics in Software Engineering

ABSTRACT

References

Cited By

Index Terms

Recommendations

A Neural-Network based Code Summarization Approach by Using Source Code and its Call Dependencies

An Extractive-and-Abstractive Framework for Source Code Summarization

Snippet Comment Generation Based on Code Context Expansion

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Upcoming Conference

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media