Augmenting commit classification by using fine-grained source code changes and a pre-trained deep neural language model
Introduction
Software evolution, since being coined by Lehman in 1969, has been the plateau of analyzing the complex nature of how software evolve. Maintenance cost estimation, defect resolution scheduling and technical debt management represent a few of several tasks that drive the high cost of software maintenance, which can reach up to 90%, in comparison with the remaining software development life cycle phases.
Many studies focusing on the analysis of software maintenance activities [1], [2], [3], [4] have been carried to closely monitor the software evolution phase and to provide managers with insights on how to effectively manage costs and plan development tasks. In this context, several studies [2], [5], [6], [7], [8], [9] have focused on categorizing code changes into 3 main maintenance categories: (1) Corrective, e.g., fixing errors and faults observed during software use, (2) Perfective, e.g., improving software quality attributes such as performance, maintainability, usability, etc. and (3) Adaptive, e.g., adapt the software to new environment (software, hardware, etc.) or add new functionalities, etc. Such categorization, supports practitioners in making decisions with respect to various aspects such as resource allocation, choice of frameworks and technologies, managing technical debt, etc. Technically, source code changes are being profiled by their corresponding commit messages, which represent the developer’s explanations of the performed change in the code. Commit message classification was the subject of several studies [2], [3], [4], [10]. These studies have been advancing the mining of source code to better characterize the underlying changes. For instance, several studies have analyzed software commits, to extract fine-grained change patterns, such as refactoring operations [11], [12], API migrations [13], [14], [15] and bug fixes [16].
To extract useful information from the commit messages, existing studies used keywords extracted from the commit messages. These keywords have been used along with source code changes in the classification of commits. Using static keywords or any other embedding method that does not take into account the context of each word in the commit message will lead to incorrect classification of ambiguous commit messages, i.e., messages that contain words that can be used in commit messages that belong to different maintenance categories.
To overcome this limitation, we propose to use Bidirectional Encoder Representations from Transformers (BERT) [17], a pre-trained neural language model that provided state-of-the-art results in many natural language processing tasks including text classification. With the help of transformers, the model can better determine the context of each word in the commit messages.
We also aim at conducting a deep analysis of the impact of the introduction of different source code changes on the overall performance of commit classifiers as well as for each maintenance category. Specifically, we intend to investigate how source code changes support commit classification when using transformer encoders for the commit messages. We will determine which source code changes represent a reliable source for the classification of commits into each one of the adopted maintenance categories (corrective, perfective and adaptive) by combining BERT word representations of the commit message and code changes extracted from the same commit.
We started with creating a labeled dataset of 1793 commits that we have collected from publicly available datasets and that we have made available to the public.1 For each commit, we used popular code change mining tools to extract a fine-grained set of source code changes that we have included as additional features to the set of features we obtained using the BERT model. Different versions of the dataset have been then used to train and test deep neural network (DNN) models that will be used later on for the classification of unseen commits.
The main contributions of this work can be summarized in what follows:
- •
The use of BERT to encode commit messages which will provide the commit classifier with a better understanding of each word in the message. This will improve the performance of the commit classification model which is in turn based on a fully connected deep neural network (DNN).
- •
The deep experimental analysis of the impact of introducing different categories of source code changes on the overall as well as the per-category commit classification performance.
- •
The collection of a balanced benchmark dataset of manually annotated commits. For each commit, an extended set of fine-grained source code changes is also provided.
The rest of the paper is organized as follows: Section 2 provides a survey of previous works on commit classification. Section 3 provides an overview of maintenance and code change categories. This section also introduces the concept of transfer learning in natural language processing and the BERT pre-trained language model. Section 4 presents a description of our proposed commit classification approach. Experiments are presented and discussed in Section 5. Threats to the validity of this work are discussed in Section 6. Finally, Section 7 concludes the paper.
Section snippets
Related work
Understanding and organizing software changes is becoming more and more challenging especially with complicated projects that go through a lot of changes during their development process. In this context, researchers have proposed different methods to improve software quality dealing with commit classification and software change prediction.
Background
In this section, we start by providing short and simple definitions of the maintenance categories proposed by Swanson [18] and required in this work.
We then provide an introduction to transfer learning in the context of Natural Language Processing (NLP) and present the state-of-the-art transfer learning model: Bidirectional Encoder Representations from Transformers (BERT), a pre-trained neural language model that revolutionized many NLP tasks including text classification.
Methodology
One of the main objectives of this paper is to conduct a deep investigation of the impact of introducing source code changes in the process of automatic commit classification. We propose to use six different models: (1) DNN@BERT: a deep neural network (DNN) model that is trained on BERT’s word representations (BWR) of commit messages, (2) DNN@BERTRefact_cc: a DNN model that combines BWR of commit messages and source code changes obtained by using RefactoringMiner, (3) DNN@BERTCD_cc: a DNN
Experimental results and analysis
In this section, we describe the experimental setup then we report and analyze results of several experiments that investigate the efficacy of the aforementioned models on the commit classification performance.
Threats to validity
This section enumerates all the factors that may influence the correctness of our findings.
Our analysis is mainly threatened by the accuracy of the tools we used to extract the code changes. If these tools miss any code changes, or report inaccurate ones, this will negatively impact our results. However, we selected these tools as they are known for their accuracy, for instance, refactoring studies [12], [38] report that Refactoring Miner has high precision and recall scores in comparison with
Conclusion and future work
In this paper, we presented and compared several models for commit classification. Our approach combines BERT’s word representations with the fine-grained code changes extracted from the source code for each commit. To do so, we used existing code change mining tools to identify fine-grained code changes, refactoring operations, and bug patches. We trained, validated and tested our deep neural network based models on a labeled dataset that we created from a variety of open source projects. Our
CRediT authorship contribution statement
Lobna Ghadhab: Methodology, Software, Investigation, Data curation, Visualization, Writing - original draft. Ilyes Jenhani: Validation, Investigation, Formal analysis, Writing - original draft, Supervision. Mohamed Wiem Mkaouer: Conceptualization, Resources, Formal analysis, Validation, Writing - original draft. Montassar Ben Messaoud: Project administration, Formal analysis, Writing - original draft, Supervision.
Declaration of Competing Interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.
References (39)
- et al.
Automatically classifying software changes via discriminative topic model: Supporting multi-category and cross-project
J. Syst. Softw.
(2016) - et al.
Feature changes in source code for commit classification into maintenance activities
- et al.
Boosting automatic commit classification into maintenance activities by utilizing source code changes
- S. Gharbi, M.W. Mkaouer, I. Jenhani, M. Ben Messaoud, On the classification of software change messages using...
- et al.
Automatic classification of large changes into maintenance categories
- et al.
Tree2tree neural translation model for learning source code changes
(2018) - et al.
Predicting defects using change genealogies
- E.G. Knyazev, Automated source code changes classification for effective code review and analysis, in: Proceedings of...
- et al.
Identifying refactorings from source-code changes
- Y. Zhou, A. Sharma, Automated identification of security issues from commit messages and bug reports, in: Proceedings...
Identifying reasons for software changes using historic databases
Refdiff: detecting refactorings in version histories
Accurate and efficient refactoring detection in commit history
Change distilling: Tree differencing for fine-grained source code change extraction
IEEE Trans. Softw. Eng.
Fixminer: Mining relevant fix patterns for automated program repair
Bert: Pre-training of deep bidirectional transformers for language understanding
Cited by (30)
A survey on machine learning techniques applied to source code
2024, Journal of Systems and SoftwareImproving generalization in deep neural network using knowledge transformation based on fisher criterion
2023, Journal of SupercomputingExploring the Impact of Code Clones on Deep Learning Software
2023, ACM Transactions on Software Engineering and Methodology