Abstract:
Code comments are a crucial source of software documentation that captures various aspects of the code. Such comments play a vital role in understanding the source code a...Show MoreMetadata
Abstract:
Code comments are a crucial source of software documentation that captures various aspects of the code. Such comments play a vital role in understanding the source code and facilitating communication between developers. However, with the iterative release of software, software projects become larger and more complex, leading to a corresponding increase in issues such as mismatched, incomplete, or outdated code comments. These inconsistencies in code comments can misguide developers and result in potential bugs, and there has been a steady rise in reports of such inconsistencies over time. Despite numerous methods being proposed for detecting code comment inconsistencies, their learning effect remains limited due to a lack of consideration for issues such as characterization noise and labeling errors in datasets. To overcome these limitations, we propose a novel approach called MCCL that first removes noise from the dataset and then detects inconsistent code comments in a timely manner, thereby enhancing the model's learning ability. Our proposed model facilitates better matching between code and comments, leading to improved development of software engineering projects. MCCL comprises two components, namely method comment detection and confidence learning denoising. The method comment detection component captures the intricate relationships between code and comments by learning their syntactic and semantic structures. It correlates the code and comments through an attention mechanism to identify how changes in the code affect the comments. Furthermore, confidence learning denoising component of MCCL identifies and removes characterization noises and labeling errors to enhance the quality of the datasets. This is achieved by implementing principles such as pruning noisy data, counting with probabilistic thresholds to estimate noise, and ranking examples to train with confidence. By effectively eliminating noise from the dataset, our model is able to more accurately l...
Published in: IEEE Transactions on Software Engineering ( Volume: 50, Issue: 3, March 2024)