Improving Code Summarization With Tree Transformer Enhanced by Position-Related Syntax Complement | IEEE Journals & Magazine | IEEE Xplore

Improving Code Summarization With Tree Transformer Enhanced by Position-Related Syntax Complement


Impact Statement:Current code summarization approaches have been focusing on enhancing the performance of code summaries by utilizing syntax information from PLs’ ASTs. ASTs are typically...Show More

Abstract:

Code summarization aims to generate natural language (NL) summaries automatically given the source code snippet, which aids developers in understanding source code faster...Show More
Impact Statement:
Current code summarization approaches have been focusing on enhancing the performance of code summaries by utilizing syntax information from PLs’ ASTs. ASTs are typically fed into neural networks via serialized approaches. However, current serialization approaches neglect the relative position relation between the nodes in the AST, which results in losing syntax information, such as hierarchical and sibling relations between nodes. To overcome the aforementioned limitations, this article introduces a novel method of code summarization that retains the relative position relationships between nodes by incorporating particular tree position embedding to nodes in the serialized AST. Moreover, we propose a tree attention mechanism that enables code tokens to emphasize more on those located at crucial syntactic positions in the AST.

Abstract:

Code summarization aims to generate natural language (NL) summaries automatically given the source code snippet, which aids developers in understanding source code faster and improves software maintenance. Recent approaches using NL techniques in code summarization fall short of adequately capturing the syntactic characteristics of programming languages (PLs), particularly the position-related syntax, from which the semantics of the source code can be extracted. In this article, we present Syntax transforMer (SyMer) based on the transformer architecture where we enhance it with position-related syntax complement (PSC) to better capture syntactic characteristics. PSC takes advantage of unambiguous relations among code tokens in abstract syntax tree (AST), as well as the gathered attention on crucial code tokens indicated by its syntactic structure. The experimental results demonstrate that SyMer outperforms state-of-the-art models by at least 2.4% bilingual evaluation understudy (BLEU),...
Published in: IEEE Transactions on Artificial Intelligence ( Volume: 5, Issue: 9, September 2024)
Page(s): 4776 - 4786
Date of Publication: 30 April 2024
Electronic ISSN: 2691-4581

Funding Agency:


Contact IEEE to Subscribe

References

References is not available for this document.