Cost-sensitive learning for defect escalation
Introduction
Building large enterprise software is generally a highly complex and lengthy process, during which numerous software defect reports can exist and some of them may not be resolved when the software products are released (usually against a tight deadline) [14]. For example, it may be difficult to reproduce a reported error condition; there may be conflicts between desired product behavior and applicable standards; there may be uncertainty as whether a requested change is related to a defect or a request for enhancement; or it may be difficult to assess which of several products in a given environment may cause a reported error condition. Enterprise software vendors often have in place sophisticated processes for evaluating defect reports before release. This process entails a careful human expert review of each known bug, evaluation of trade-offs and delicate judgment. Still, after product release a small number of defects become “escalated” by customers, whose businesses are seriously impacted. Escalations of software defects require software vendors’ immediate management attention and senior software engineers’ immediate and continuous effort to reduce the business or financial loss to the customers. Therefore, software defect escalations are costly to the software vendors, with the associated costs amounting to millions of dollars each year. In addition, software defect escalations result in loss of reputation, satisfaction, loyalty and repeat revenue of customers, incurring extremely high costs in the long run for the enterprise software vendors [3], [5].
In this paper, we further investigate a possible solution of developing a Software defecT Escalation Prediction (STEP) system. It is an extension of our previous work [28]. The objective of the STEP system is to assist human experts in the review process of software defect reports by modeling and predicting escalation risk using data mining technologies [4], [15], [20]. If the STEP system can accurately predict the escalation risk of known defect reports, then some escalations can be prevented by correcting those high-risk defect reports with a much lower cost within the software development and testing cycle before release. This would save a huge amount of money for the enterprise software vendors [9].
Indeed, the business goal of STEP (and many industrial applications using data mining) is to maximize the net profit, that is, the difference in the cost of defect resolution before and after introducing the data mining solution, as opposed to the usual data-mining measures such as accuracy, AUC (area under the ROC curve), misclassification cost [41], lift, or recall and precision combinations [26]. However, it is clear that the net profit is not equivalent to any of these standard machine learning measures, and we have found little previous work that directly optimizes the net profit as the data mining effort.
In this paper, we first set up a simple framework in which the problem of maximum net profit can be converted to minimum total cost in cost-sensitive learning under certain conditions (see Section 2). We then apply and compare four well-known cost-sensitive learning algorithms on a defect report dataset to see how they perform, in terms of maximum net profit (Section 5). Our results (see Mini-Summary in Sections 5.5 and 6) suggest that cost-sensitive decision tree is best for producing the highest positive net profit. Conclusions drawn in this study not only help enterprise software vendors to improve profit in software production by reducing the cost of escalations, but also provide some general guidelines for mining imbalanced datasets [25], [36], [46] and cost-sensitive learning.
To the best of our knowledge, applying data mining for predicting software defect escalations is novel in software business. Software development is an extremely complex process, and hundreds or even thousands of defect reports may exist within a large enterprise software product. Predicting and prioritizing defect reports for evaluation and resolution is crucial in software engineering development. Our data-mining based STEP is the first and important step toward improving the effectiveness and efficiency of this process through automated analysis. As we will show in Sections 5 Comparing cost-sensitive learning approaches for step, 6 Experiments on public datasets, 7 Deployment, our STEP performs quite well. The system is currently deployed with product groups of a software vendor, and the system has quickly become a popular tool for prioritization.
In summary, this is a real-world application paper. It has four main contributions as follows. (1) It proposes a software defect escalation prediction system. (2) It converts a maximum net profit problem in software engineering to cost-sensitive learning. (3) It introduces negative values in the cost matrix, which are corresponding to the benefit obtained from correct classification. This is seldom discussed in existing cost-sensitive learning algorithms, which focus on the cost of misclassification. (4) The comparison studies of different approaches shed light for data mining practitioners on algorithm selection and for data mining researchers to see the performance of different techniques on real-world applications.
The paper is organized as follows: in the next section, we describe the maximum net profit and its relationship with cost-sensitive learning. Section 3 reviews several popular approaches of cost-sensitive learning. Then Section 4 describes an STEP dataset, and Section 5 compares different cost-sensitive learning approaches for maximum net profit. In Section 6, we further investigate the performance of different approaches on five real-world datasets. Section 7 discusses the deployment of our work, and Section 8 concludes the paper.
Section snippets
Maximum net profit and cost-sensitive learning
As we have discussed in the Introduction section, correcting defects after an escalation occurs is much more expensive than correcting defects before they become escalated. If we treat defect escalations as positive examples, then the cost of false negative FN (correcting an escalated defect) should cost many times more than the cost of false positive FP (correcting a non-escalated defect). If the cost of FN and FP is known, like in our study, then this would seem to be a straightforward
Review of cost-sensitive learning
Cost-sensitive learning is an inductive learning which takes costs into consideration. It is one of the most active and important research areas in machine learning, and it plays an important role in real-world data mining applications. It involves a large variety of different types of costs in data mining and machine learning, including misclassification costs, data acquisition cost (instance costs and attribute costs), active learning costs, computation cost, and human–computer interaction
The datasets
Our dataset consists of historical defect reports from industry software projects of an enterprise software vendor. Defect reports change over time and so there is an opportunity to learn from multiple different versions of a single defect report. Additionally, the same defect can be reported several times by different parties. Therefore, numerous data records in the dataset may belong to only one defect. Confidentiality of the data only allows us to give a brief description of the data. The
Comparing cost-sensitive learning approaches for step
In this section we compare the four cost-sensitive learning approaches (Costing, Relabeling, Weighting, and CSTree) reviewed in Section 3. Since the first three algorithms are cost-sensitive meta-learning approaches (the fourth method is a single algorithm of cost-sensitive decision tree), we will investigate their performance with different base learning algorithms. More than a dozen different algorithms in WEKA [48] had originally been chosen for the first three approaches, but due to the
Experiments on public datasets
To further verify our results, we apply these methods on five real-world publicly available datasets.1 The characteristics of these datasets are listed in Table 12.
We first investigate the impact of the improvement of CSTree discussed in Section 3, by comparing with CSTree without pruning (denoted as CSTree-NP) and with its previous version denoted as CSDT. The experiments are conducted on the datasets listed in Table 12. We run the experiments
Deployment
Our STEP system has been in deployment in the product group of an enterprise software vendor where the dataset comes from. It has been used to make suggestions on current defect reports with high risks of escalation.
We have evaluated STEP using the defect reports submitted or updated during the most recent three weeks in the test set. Any records corresponding to defect reports which had already been escalated at the time of preparing the data set were also removed. After STEP makes its
Conclusions
In this paper, we have presented a successful case for predicting and preventing escalations from known product defect reports for enterprise software vendors. A software defect escalation prediction (STEP) system based on data-mining for the maximum net profit has been proposed and tested, and is currently deployed at an enterprise software vendor. Results provide strong evidence that we can indeed make useful predictions about the escalation risk of product defects. The enterprise software
Acknowledgments
The authors would like to thank the anonymous reviewers for their insightful and constructive comments and suggestions that have helped improve the quality of this paper. This research has been supported by the US National Science Foundation (IIS-1115417).
References (52)
- N. Abe, B. Zadrozny, J. Langford, An iterative method for multiclass cost-sensitive learning, in: Proceedings of the...
- et al.
An empirical comparison of voting classification algorithms: bagging, boosting, and variants
Mach. Learn.
(1999) - et al.
Software defect reduction top 10 list
Computer
(2001) - et al.
Data Mining Techniques: For Marketing, Sales, and Customer Support
(1997) Software Engineering Economics
(1981)- J.P. Bradford, C. Kuntz, R. Kohavi, C. Brunk, C.E. Brodley, Pruning decision trees with misclassification costs, in:...
- U. Brefeld, P. Geibel, F. Wysotzki, Support vector machines with example dependent costs, in: Proceedings of the 14th...
Bagging Predictors
Mach. Learn.
(1996)- T. Bruckhaus, C.X. Ling, N.H. Madhavji, S. Sheng, Software escalation prediction with data mining, in: Workshop on...
- M. Cebe, C. Gunduz-Demir, Test-cost sensitive classification based on conditioned loss functions, in: Proceeding of the...
Test-cost sensitive Naïve Bayesian classification
SMOTE: synthetic minority over-sampling technique
J. Artif. Intell. Res.
MetaCost: a general method for making classifiers cost-sensitive
Cited by (27)
Non-revisiting genetic cost-sensitive sparse autoencoder for imbalanced fault diagnosis
2022, Applied Soft ComputingCitation Excerpt :For imbalanced fault diagnosis, the data sampling techniques are well addressed, such as synthetic minority oversampling technique(SMOTE) [10] and generative adversarial network(GAN) [11]. On the contrary, few researchers utilize the cost-sensitive learning for imbalanced fault diagnosis although it has been fully utilized in other fields, such as software defect prediction [12], financial misstatements detection [13] and escalation defection [14]. Besides, implementation of cost-sensitive learning is still a hard problem, and the main techniques employed can be divided into two categories: direct methods and meta-learning methods [15].
Novel algorithms for cost-sensitive classification and knowledge discovery in class imbalanced datasets with an application to NASA software defects
2018, Information SciencesCitation Excerpt :Therefore, these studies separated a software project’s source code into each function, then predicted which function contained bugs. SDP studies either take a non cost-sensitive approach [11,14,17] or a cost-sensitive approach [12,19,21,22]. When performed non-cost-sensitively, the aim is to make as many correct predictions as possible.
Software defect prediction using a cost sensitive decision forest and voting, and a potential solution to the class imbalance problem
2015, Information SystemsCitation Excerpt :In this paper we propose a cost-sensitive decision forest algorithm called CSForest and a suitable cost sensitive voting technique called CSVoting in order to reduce the classification cost. We also empirically compare our proposed technique with two existing cost-sensitive classifiers called Weighting [23] and CSTree [16,12] and two cost-insensitive classifiers called C4.5 [15] and SysFor [11]. We use six (6) publicly available real-world datasets available from the NASA MDP repository [20].