Augmenting commit classification by using fine-grained source code changes and a pre-trained deep neural language model

https://doi.org/10.1016/j.infsof.2021.106566Get rights and content

Abstract

Context:

Analyzing software maintenance activities is very helpful in ensuring cost-effective evolution and development activities. The categorization of commits into maintenance tasks supports practitioners in making decisions about resource allocation and managing technical debt.

Objective:

In this paper, we propose to use a pre-trained language neural model, namely BERT (Bidirectional Encoder Representations from Transformers) for the classification of commits into three categories of maintenance tasks — corrective, perfective and adaptive. The proposed commit classification approach will help the classifier better understand the context of each word in the commit message.

Methods:

We built a balanced dataset of 1793 labeled commits that we collected from publicly available datasets. We used several popular code change distillers to extract fine-grained code changes that we have incorporated into our dataset as additional features to BERT’s word representation features. In our study, a deep neural network (DNN) classifier has been used as an additional layer to fine-tune the BERT model on the task of commit classification. Several models have been evaluated to come up with a deep analysis of the impact of code changes on the classification performance of each commit category.

Results and conclusions:

Experimental results have shown that the DNN model trained on BERT’s word representations and Fixminer code changes (DNN@BERT+Fix_cc) provided the best performance and achieved 79.66% accuracy and a macro-average f1 score of 0.8. Comparison with the state-of-the-art model that combines keywords and code changes (RF@KW+CD_cc) has shown that our model achieved approximately 8% improvement in accuracy. Results have also shown that a DNN model using only BERT’s word representation features achieved an improvement of 5% in accuracy compared to the RF@KW+CD_cc model.

Introduction

Software evolution, since being coined by Lehman in 1969, has been the plateau of analyzing the complex nature of how software evolve. Maintenance cost estimation, defect resolution scheduling and technical debt management represent a few of several tasks that drive the high cost of software maintenance, which can reach up to 90%, in comparison with the remaining software development life cycle phases.

Many studies focusing on the analysis of software maintenance activities [1], [2], [3], [4] have been carried to closely monitor the software evolution phase and to provide managers with insights on how to effectively manage costs and plan development tasks. In this context, several studies [2], [5], [6], [7], [8], [9] have focused on categorizing code changes into 3 main maintenance categories: (1) Corrective, e.g., fixing errors and faults observed during software use, (2) Perfective, e.g., improving software quality attributes such as performance, maintainability, usability, etc. and (3) Adaptive, e.g., adapt the software to new environment (software, hardware, etc.) or add new functionalities, etc. Such categorization, supports practitioners in making decisions with respect to various aspects such as resource allocation, choice of frameworks and technologies, managing technical debt, etc. Technically, source code changes are being profiled by their corresponding commit messages, which represent the developer’s explanations of the performed change in the code. Commit message classification was the subject of several studies [2], [3], [4], [10]. These studies have been advancing the mining of source code to better characterize the underlying changes. For instance, several studies have analyzed software commits, to extract fine-grained change patterns, such as refactoring operations [11], [12], API migrations [13], [14], [15] and bug fixes [16].

To extract useful information from the commit messages, existing studies used keywords extracted from the commit messages. These keywords have been used along with source code changes in the classification of commits. Using static keywords or any other embedding method that does not take into account the context of each word in the commit message will lead to incorrect classification of ambiguous commit messages, i.e., messages that contain words that can be used in commit messages that belong to different maintenance categories.

To overcome this limitation, we propose to use Bidirectional Encoder Representations from Transformers (BERT) [17], a pre-trained neural language model that provided state-of-the-art results in many natural language processing tasks including text classification. With the help of transformers, the model can better determine the context of each word in the commit messages.

We also aim at conducting a deep analysis of the impact of the introduction of different source code changes on the overall performance of commit classifiers as well as for each maintenance category. Specifically, we intend to investigate how source code changes support commit classification when using transformer encoders for the commit messages. We will determine which source code changes represent a reliable source for the classification of commits into each one of the adopted maintenance categories (corrective, perfective and adaptive) by combining BERT word representations of the commit message and code changes extracted from the same commit.

We started with creating a labeled dataset of 1793 commits that we have collected from publicly available datasets and that we have made available to the public.1 For each commit, we used popular code change mining tools to extract a fine-grained set of source code changes that we have included as additional features to the set of features we obtained using the BERT model. Different versions of the dataset have been then used to train and test deep neural network (DNN) models that will be used later on for the classification of unseen commits.

The main contributions of this work can be summarized in what follows:

  • The use of BERT to encode commit messages which will provide the commit classifier with a better understanding of each word in the message. This will improve the performance of the commit classification model which is in turn based on a fully connected deep neural network (DNN).

  • The deep experimental analysis of the impact of introducing different categories of source code changes on the overall as well as the per-category commit classification performance.

  • The collection of a balanced benchmark dataset of manually annotated commits. For each commit, an extended set of fine-grained source code changes is also provided.

The rest of the paper is organized as follows: Section 2 provides a survey of previous works on commit classification. Section 3 provides an overview of maintenance and code change categories. This section also introduces the concept of transfer learning in natural language processing and the BERT pre-trained language model. Section 4 presents a description of our proposed commit classification approach. Experiments are presented and discussed in Section 5. Threats to the validity of this work are discussed in Section 6. Finally, Section 7 concludes the paper.

Section snippets

Related work

Understanding and organizing software changes is becoming more and more challenging especially with complicated projects that go through a lot of changes during their development process. In this context, researchers have proposed different methods to improve software quality dealing with commit classification and software change prediction.

Background

In this section, we start by providing short and simple definitions of the maintenance categories proposed by Swanson [18] and required in this work.

We then provide an introduction to transfer learning in the context of Natural Language Processing (NLP) and present the state-of-the-art transfer learning model: Bidirectional Encoder Representations from Transformers (BERT), a pre-trained neural language model that revolutionized many NLP tasks including text classification.

Methodology

One of the main objectives of this paper is to conduct a deep investigation of the impact of introducing source code changes in the process of automatic commit classification. We propose to use six different models: (1) DNN@BERT: a deep neural network (DNN) model that is trained on BERT’s word representations (BWR) of commit messages, (2) DNN@BERT+Refact_cc: a DNN model that combines BWR of commit messages and source code changes obtained by using RefactoringMiner, (3) DNN@BERT+CD_cc: a DNN

Experimental results and analysis

In this section, we describe the experimental setup then we report and analyze results of several experiments that investigate the efficacy of the aforementioned models on the commit classification performance.

Threats to validity

This section enumerates all the factors that may influence the correctness of our findings.

Our analysis is mainly threatened by the accuracy of the tools we used to extract the code changes. If these tools miss any code changes, or report inaccurate ones, this will negatively impact our results. However, we selected these tools as they are known for their accuracy, for instance, refactoring studies [12], [38] report that Refactoring Miner has high precision and recall scores in comparison with

Conclusion and future work

In this paper, we presented and compared several models for commit classification. Our approach combines BERT’s word representations with the fine-grained code changes extracted from the source code for each commit. To do so, we used existing code change mining tools to identify fine-grained code changes, refactoring operations, and bug patches. We trained, validated and tested our deep neural network based models on a labeled dataset that we created from a variety of open source projects. Our

CRediT authorship contribution statement

Lobna Ghadhab: Methodology, Software, Investigation, Data curation, Visualization, Writing - original draft. Ilyes Jenhani: Validation, Investigation, Formal analysis, Writing - original draft, Supervision. Mohamed Wiem Mkaouer: Conceptualization, Resources, Formal analysis, Validation, Writing - original draft. Montassar Ben Messaoud: Project administration, Formal analysis, Writing - original draft, Supervision.

Declaration of Competing Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

References (39)

  • YanM. et al.

    Automatically classifying software changes via discriminative topic model: Supporting multi-category and cross-project

    J. Syst. Softw.

    (2016)
  • MarianoR.V.R. et al.

    Feature changes in source code for commit classification into maintenance activities

  • LevinS. et al.

    Boosting automatic commit classification into maintenance activities by utilizing source code changes

  • S. Gharbi, M.W. Mkaouer, I. Jenhani, M. Ben Messaoud, On the classification of software change messages using...
  • HindleA. et al.

    Automatic classification of large changes into maintenance categories

  • ChakrabortyS. et al.

    Tree2tree neural translation model for learning source code changes

    (2018)
  • HerzigK. et al.

    Predicting defects using change genealogies

  • E.G. Knyazev, Automated source code changes classification for effective code review and analysis, in: Proceedings of...
  • WeissgerberP. et al.

    Identifying refactorings from source-code changes

  • Y. Zhou, A. Sharma, Automated identification of security issues from commit messages and bug reports, in: Proceedings...
  • MockusA. et al.

    Identifying reasons for software changes using historic databases

  • SilvaD. et al.

    Refdiff: detecting refactorings in version histories

  • TsantalisN. et al.

    Accurate and efficient refactoring detection in commit history

  • J. Falleri, F. Morandat, X. Blanc, M. Martinez, M. Monperrus, Fine-grained and accurate source code differencing, in:...
  • FluriB. et al.

    Change distilling: Tree differencing for fine-grained source code change extraction

    IEEE Trans. Softw. Eng.

    (2007)
  • M. Martinez, M. Monperrus, Coming: a tool for mining change pattern instances from git commits, in: Proceedings of the...
  • KoyuncuA. et al.

    Fixminer: Mining relevant fix patterns for automated program repair

    (2018)
  • DevlinJ. et al.

    Bert: Pre-training of deep bidirectional transformers for language understanding

  • E.B. Swanson, The dimensions of maintenance, in: Proceedings of the 2nd International Conference on Software...
  • Cited by (30)

    View all citing articles on Scopus
    View full text