Journals & Magazines >IEEE Transactions on Software... >Volume: 49 Issue: 4

BEQAIN: An Effective and Efficient Identifier Normalization Approach With BERT and the Question Answering System

Download PDF
Download References
Request Permissions
Save to
Alerts

Abstract:

As one of the most important resources to express the semantics of source code, identifiers are usually composed of several common or domain-specific terms and abbreviati...Show More

Metadata

Abstract:

As one of the most important resources to express the semantics of source code, identifiers are usually composed of several common or domain-specific terms and abbreviations, thus heavily hindering developers from analyzing and comprehending source code. Hence, it is very necessary to normalize identifiers, which aims to align the vocabulary found in identifiers with natural language words found in other software artifacts. Even though researchers have proposed several identifier normalization approaches in the literature, these approaches only rely on the lexical information in identifiers and related source code entities to normalize identifiers, suffering from the lack of deep semantic understanding of identifiers. In this paper, we propose an effective and efficient identifier normalization approach BEQAIN to split identifiers into their composing words and expand the enclosed abbreviations. Specifically, BEQAIN employs a deep learning model, which is mainly composed of a Bidirectional Encoder Representation from Transformers (BERT) layer and a Conditional Random Fields (CRF) layer to embed identifiers into low-level vectors and learn the identifier splitting patterns. The BERT-CRF network is also combined with a pre-processing component and a post-processing component to resolve the problems of over-splitting and under-splitting so as to improve the identifier splitting performance. Furthermore, BEQAIN also employs a Question Answering (Q&A) system to learn the abbreviation expansion mappings and leverages the current programming context to determine the exactly correct expansion when there are multiple expansions for specific abbreviations. After BEQAIN is fully trained, it can be used to normalize identifiers. We conduct extensive experiments to validate the effectiveness and efficiency of BEQAIN over two publicly available datasets with nine projects. Experimental results show that BEQAIN achieves the overall average Accuracy of 80.20% and outperforms the ex...

Published in: IEEE Transactions on Software Engineering ( Volume: 49, Issue: 4, 01 April 2023)

Page(s): 2597 - 2620

Date of Publication: 08 December 2022

ISSN Information:

DOI: 10.1109/TSE.2022.3227559

Funding Agency:

Contents

References is not available for this document.

BEQAIN: An Effective and Efficient Identifier Normalization Approach With BERT and the Question Answering System

Abstract:

Metadata

Abstract:

ISSN Information:

Funding Agency:

References

IEEE Account

Purchase Details

Profile Information

Need Help?

BEQAIN: An Effective and Efficient Identifier Normalization Approach With BERT and the Question Answering System

Alerts

Abstract:

Metadata

Abstract:

ISSN Information:

Funding Agency:

References

IEEE Account

Purchase Details

Profile Information

Need Help?