Interpretability application of the Just-in-Time software defect prediction model
Introduction
Software defect prediction technology has been one of the most dynamic contents in software engineering since 1970s. The software has become an essential factor affecting the national economy, military, political, and even social life. Highly reliable and complex software systems depend heavily on the reliability of the software they employ. Software defects are the potential source of related system errors, failures, crashes, and even crashes (Wong et al., 2017). Therefore, defect repair becomes a critical activity in software maintenance, but it also consumes time and resources (Marks et al., 2011).
Defects have an essential impact on software quality and even software economics. For example, the National Institute of Standards and Technology (NIST) estimates that software defects cost as much as the United States $60 billion a year. Thus, identifying and fixing these software flaws could save the United States $22 billion (Newman, 2002).
Statistics show that fixing defects account for 50% to 75% the total cost of software development (LaToza et al., 2006). At the same time, the complexity and difference of defect distribution problems and the deficiency of existing defect prediction technology in solving practical problems are also explained. At present, the existing software defect prediction methods can be divided into static defect prediction methods and dynamic defect prediction methods (Zubrow, 2009). The static defect prediction method is based on the measurement data related to defects to predict the defect tendency, defect density, or defect number of program modules. The dynamic defect prediction method predicts the distribution of system defects over time based on the time of defects or failures to discover the distribution law of software defects over time in its life cycle or some stages. This is because, as people develop software, increases the workload, increase or decrease of humans, such as demand, coding work will introduce more defects. Reviews, test work, can be reducing the number of defects, but in general, assumes that process, factors such as the ability of technology are stable, the defects and the scale of software is a proportional relationship. The software defect prediction technology of the company has been unable to meet the timely discovery of software defects, and there is an obvious problem of inefficiency (Eyolfson et al., 2011).
In order to solve the above challenges, researchers in the field of software engineering put forward the Just-in-Time defect prediction technology. Just-in-Time defect prediction technology refers to the technology to predict defects in every code change submitted by developers. In instant defect prediction, the predicted software entity is a code change. The immediacy of Just-in-Time defect prediction technology is reflected in the fact that it can perform defect analysis on a code change after a developer submits it and predict the likelihood that the code change will be defective. Thus, this technology can effectively cope with the challenges faced by traditional defect prediction technology, mainly reflected in the following three aspects:
(1) Fine-grained. Code change level prediction focuses more on fine-grained software entities than module or file level defect prediction. As a result, developers can spend less time and effort reviewing code changes predicted to be defective.
(2) Just-in-Time. Just-in-Time defect prediction technology can be used to predict defects when a code change is submitted. At this point, developers still have a deep memory of the changed code and do not need to spend time re-understanding their submitted code changes, which helps to fix defects in a more timely manner.
(3) Easy to trace. The developer’s information is saved in the code changes the developer submits. As a result, the project manager can more easily find the developer who introduced the defect, which facilitates timely analysis of the cause of the defect and helps complete defect allocation (Kamei et al., 2012).
Although the machine learning model has outstanding performance in many fields, such as face recognition, image classification, natural language processing etc, this performance is more dependent on a highly nonlinear model and parameter tuning technology. There is no way to fathom what machine learning models learn from the data and how they make their final decisions. This “end to end” decision-making model results in a machine learning model that is exceptionally unexplanatory. From a human perspective, the decision-making process of the model is incomprehensible. That is, the model is unexplainable. The inexplicability of the machine learning model has many potential dangers, and it is not easy to establish trust between humans and machines. Since an unexplainable model cannot provide more reliable information, its actual deployment in many fields will be minimal. For example, the lack of an interpretable automatic medical diagnosis model may bring wrong treatment plans to patients and even seriously threaten the life safety of patients. Therefore, the lack of interpretability has become one of the main obstacles to developing and applying machine learning in authentic tasks.
Machine learning model interpretability has a wide range of potential applications, including model validation, model diagnosis, auxiliary analysis and knowledge discovery. Interpretability means that we have enough understandable information to solve a problem. Specifically, in artificial intelligence, explainable depth models can provide the decision basis for each prediction result. For example, the example shown in Fig. 1 describes the process of a model used for assisted seeing a doctor prove its credibility to doctors: “the model not only has to give its prediction result (flu), but also provides the result of the basis of the conclusion-sneeze, headache and no fatigue (counter-example). Only by doing so can doctors have reason to believe that its diagnosis is justified and well-founded to avoid the tragedy of “frustrated life”.
We illustrate our research in the form of three research questions:
- •
RQ1: How efficient is our prediction model? In previous studies, Mockus and Weiss only used a large-scale telecommunications system project to evaluate their prediction model (Lessmann et al., 2008), which may result in unreliable results. To better evaluate our prediction model and increase the experimental persuasiveness, we used data sets from six open source projects published by Lessmann et al. (2008). Furthermore, to better identify defects caused by code changes, a new defect prediction model was established based on Kamei’s previous work. In the new prediction model, the prediction accuracy of software defects caused by code changes is 68%, and the recall rate is 64%.
- •
RQ2: What features can be used to judge by interpretability techniques to play a significant role in the prediction? Up to now, Just-in-Time defect prediction technology only predicts the possibility of defects in the change. What are the types of defects to be predicted? Where are the software defects located? At present, there is no relevant research on these issues. The defect type describes the cause and characteristics of the defect, and the defect location refers to the module, file, function and even the line of code where the defect is located. Having information about the type of defect and the location of the defect has excellent potential to help developers fix the defect quickly. Although many researchers have proposed some defect classification and location techniques for immediate defects, there is no related research on predicting defect classification and location. This experiment used interpretable models to calculate the number of files (NF), relative loss measures (LA/LF and LT/NF), and the time interval between the last change and the current change. The experiment found that whether changes can repair defects (PD) was the most crucial feature.
- •
RQ3: After removing unimportant features, what is the performance of the defect model?
At present, Just-in-Time software defect prediction technology still has low efficiency caused by heavy workload when facing substantial software projects. We hope that the most influential features can be screened out through the explanatory model through preliminary screening so that the model can have high predictive power in as little time as possible. Our results show that we can achieve 96% of the predictive model’s original capacity at 45% of the original effort.
Section snippets
Just-in-Time Defect prediction technology
During software development and maintenance, code modification is required to remove inherent software defects, improve existing functions, reconstruct existing code, or improve operating performance. However, some code changes may accidentally introduce new defects after completing the modification task (Wong et al., 2010). Therefore, developers want a defect prediction model that can quickly and accurately determine whether committed code change is a Buggy code change (that is, a code change
Case study design
In this part, we mainly answer the preliminary preparation of the three questions we are studying. We introduce the information about the data set used in the experiment and the pre-processing of the data set.
RQ1: How efficient is our prediction model?
Overview: To answer RQ1, we use the feature criteria selected in the table to build a software change risk prediction model based on RandomForest. In order to accurately evaluate the performance of the prediction model, we used an open-source project data set for validation.
Validation technique and data used: Before the experiment, we used the 10-Fold cross-validation method for the preliminary processing of the data set (Efron, 1983). Firstly, the data set is randomly selected, and then the
Limitations and threats to validity
Construct validity. A large number of previous work shows that the parameters of Random Forest classification technology impact the performance of defect models (Mitchell, 2011, Mende, 2010, Mende and Koschke, 2009, Tantithamthavorn et al., 2016, Tantithamthavorn et al., 2018). Although the value of n trees we used for the random forest prediction model is the default value of 100, recent studies show that the parameters of the random forest model do not affect our research results (
Conclusion
In this paper, the interpretability model is used to explain and optimize the defect model. We validated our experiment with an in-depth study of six open source projects. Our experimental results show that the random forest model in RQ1 can predict software defects well, with an accuracy of 71.52% and a recall rate of 68.88%. In RQ2, we innovatively used LIME model to explain the software defect prediction model and results. The existing Just-in-Time Defect Prediction technique only predicts
Declaration of Competing Interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.
Acknowledgements
This research was supported by the 2021 Key R&D Program in Shaanxi Province, China (2021GY-041) and the National Natural Science Foundation of China special project capability-based construction method and execution mechanisms for ubiquitous operating systems (62141208).
References (56)
- et al.
Investigating the effect of dataset size, metrics sets, and feature selection techniques on software fault prediction problem
Inform. Sci.
(2009) - et al.
A comparative study of general fuzzy min–max neural networks for pattern classification problems
Neurocomputing
(2020) - et al.
Software defect prediction using ensemble learning on selected features
Inf. Softw. Technol.
(2015) - et al.
Fine-grained just-in-time defect prediction
J. Syst. Softw.
(2019) - et al.
Be more familiar with our enemies and pave the way forward: A review of the roles bugs played in software failures
J. Syst. Softw.
(2017) - et al.
Class imbalance reduction (cir): a novel approach to software defect prediction in the presence of class imbalance
Symmetry
(2020) - Bird, C., Nagappan, N., Murphy, B., Gall, H., Devanbu, P., 2011. Don’t touch my code! examining the effects of...
- et al.
A framework for evaluating the results of the szz approach for identifying bug-introducing changes
IEEE Trans. Softw. Eng.
(2016) - et al.
An extensive comparison of bug prediction approaches
Estimating the error rate of a prediction rule: improvement on cross-validation
J. Amer. Statist. Assoc.
(1983)
Predicting faults using the complexity of code changes
It’s not a bug, it’s a feature: how misclassification impacts bug prediction
Studying just-in-time defect prediction using cross-project models
Empir. Softw. Eng.
The effects of over and under sampling on fault-prone module detection
Defect prediction: Accomplishments and future challenges
A large-scale empirical study of just-in-time quality assurance
IEEE Trans. Softw. Eng.
Finding conclusion stability for selecting the best effort predictor in software effort estimation
Autom. Softw. Eng.
Evaluation of sampling-based ensembles of classifiers on imbalanced data for software defect prediction problems
SN Comput. Sci.
Classifying software changes: Clean or buggy?
IEEE Trans. Softw. Eng.
Dealing with noise in defect prediction
Automatic identification of bug-introducing changes
Benchmarking classification models for software defect prediction: A proposed framework and novel findings
IEEE Trans. Softw. Eng.
Using tri-relation networks for effective software fault-proneness prediction
IEEE Access
Predicting the severity of bug reports based on feature selection
Int. J. Softw. Eng. Knowl. Eng.
Cited by (48)
On the relative value of clustering techniques for Unsupervised Effort-Aware Defect Prediction
2024, Expert Systems with ApplicationsA software defect prediction method based on learnable three-line hybrid feature fusion
2024, Expert Systems with ApplicationsSMOTE-kTLNN: A hybrid re-sampling method based on SMOTE and a two-layer nearest neighbor classifier
2024, Expert Systems with ApplicationsAligning XAI explanations with software developers’ expectations: A case study with code smell prioritization
2024, Expert Systems with ApplicationsA multi-objective effort-aware defect prediction approach based on NSGA-II
2023, Applied Soft ComputingBoosting multi-objective just-in-time software defect prediction by fusing expert metrics and semantic metrics
2023, Journal of Systems and Software