Elsevier

Knowledge-Based Systems

Volume 66, August 2014, Pages 146-155
Knowledge-Based Systems

Cost-sensitive learning for defect escalation

https://doi.org/10.1016/j.knosys.2014.04.033Get rights and content

Abstract

While most software defects (i.e., bugs) are corrected and tested as part of the prolonged software development cycle, enterprise software venders often have to release software products before all reported defects are corrected, due to deadlines and limited resources. A small number of these reported defects will be escalated by customers whose businesses are seriously impacted. Escalated defects must be resolved immediately and individually by the software vendors at a very high cost. The total costs can be even greater, including loss of reputation, satisfaction, loyalty, and repeat revenue. In this paper, we develop a Software defecT Escalation Prediction (STEP) system to mine historical defect report data and predict the escalation risk of current defect reports for maximum net profit. More specifically, we first describe a simple and general framework to convert the maximum net profit problem to cost-sensitive learning. We then apply and compare four well-known cost-sensitive learning approaches for STEP. Our experiments suggest that cost-sensitive decision trees (CSTree) is the best methods for producing the highest positive net profit.

Introduction

Building large enterprise software is generally a highly complex and lengthy process, during which numerous software defect reports can exist and some of them may not be resolved when the software products are released (usually against a tight deadline) [14]. For example, it may be difficult to reproduce a reported error condition; there may be conflicts between desired product behavior and applicable standards; there may be uncertainty as whether a requested change is related to a defect or a request for enhancement; or it may be difficult to assess which of several products in a given environment may cause a reported error condition. Enterprise software vendors often have in place sophisticated processes for evaluating defect reports before release. This process entails a careful human expert review of each known bug, evaluation of trade-offs and delicate judgment. Still, after product release a small number of defects become “escalated” by customers, whose businesses are seriously impacted. Escalations of software defects require software vendors’ immediate management attention and senior software engineers’ immediate and continuous effort to reduce the business or financial loss to the customers. Therefore, software defect escalations are costly to the software vendors, with the associated costs amounting to millions of dollars each year. In addition, software defect escalations result in loss of reputation, satisfaction, loyalty and repeat revenue of customers, incurring extremely high costs in the long run for the enterprise software vendors [3], [5].

In this paper, we further investigate a possible solution of developing a Software defecT Escalation Prediction (STEP) system. It is an extension of our previous work [28]. The objective of the STEP system is to assist human experts in the review process of software defect reports by modeling and predicting escalation risk using data mining technologies [4], [15], [20]. If the STEP system can accurately predict the escalation risk of known defect reports, then some escalations can be prevented by correcting those high-risk defect reports with a much lower cost within the software development and testing cycle before release. This would save a huge amount of money for the enterprise software vendors [9].

Indeed, the business goal of STEP (and many industrial applications using data mining) is to maximize the net profit, that is, the difference in the cost of defect resolution before and after introducing the data mining solution, as opposed to the usual data-mining measures such as accuracy, AUC (area under the ROC curve), misclassification cost [41], lift, or recall and precision combinations [26]. However, it is clear that the net profit is not equivalent to any of these standard machine learning measures, and we have found little previous work that directly optimizes the net profit as the data mining effort.

In this paper, we first set up a simple framework in which the problem of maximum net profit can be converted to minimum total cost in cost-sensitive learning under certain conditions (see Section 2). We then apply and compare four well-known cost-sensitive learning algorithms on a defect report dataset to see how they perform, in terms of maximum net profit (Section 5). Our results (see Mini-Summary in Sections 5.5 and 6) suggest that cost-sensitive decision tree is best for producing the highest positive net profit. Conclusions drawn in this study not only help enterprise software vendors to improve profit in software production by reducing the cost of escalations, but also provide some general guidelines for mining imbalanced datasets [25], [36], [46] and cost-sensitive learning.

To the best of our knowledge, applying data mining for predicting software defect escalations is novel in software business. Software development is an extremely complex process, and hundreds or even thousands of defect reports may exist within a large enterprise software product. Predicting and prioritizing defect reports for evaluation and resolution is crucial in software engineering development. Our data-mining based STEP is the first and important step toward improving the effectiveness and efficiency of this process through automated analysis. As we will show in Sections 5 Comparing cost-sensitive learning approaches for step, 6 Experiments on public datasets, 7 Deployment, our STEP performs quite well. The system is currently deployed with product groups of a software vendor, and the system has quickly become a popular tool for prioritization.

In summary, this is a real-world application paper. It has four main contributions as follows. (1) It proposes a software defect escalation prediction system. (2) It converts a maximum net profit problem in software engineering to cost-sensitive learning. (3) It introduces negative values in the cost matrix, which are corresponding to the benefit obtained from correct classification. This is seldom discussed in existing cost-sensitive learning algorithms, which focus on the cost of misclassification. (4) The comparison studies of different approaches shed light for data mining practitioners on algorithm selection and for data mining researchers to see the performance of different techniques on real-world applications.

The paper is organized as follows: in the next section, we describe the maximum net profit and its relationship with cost-sensitive learning. Section 3 reviews several popular approaches of cost-sensitive learning. Then Section 4 describes an STEP dataset, and Section 5 compares different cost-sensitive learning approaches for maximum net profit. In Section 6, we further investigate the performance of different approaches on five real-world datasets. Section 7 discusses the deployment of our work, and Section 8 concludes the paper.

Section snippets

Maximum net profit and cost-sensitive learning

As we have discussed in the Introduction section, correcting defects after an escalation occurs is much more expensive than correcting defects before they become escalated. If we treat defect escalations as positive examples, then the cost of false negative FN (correcting an escalated defect) should cost many times more than the cost of false positive FP (correcting a non-escalated defect). If the cost of FN and FP is known, like in our study, then this would seem to be a straightforward

Review of cost-sensitive learning

Cost-sensitive learning is an inductive learning which takes costs into consideration. It is one of the most active and important research areas in machine learning, and it plays an important role in real-world data mining applications. It involves a large variety of different types of costs in data mining and machine learning, including misclassification costs, data acquisition cost (instance costs and attribute costs), active learning costs, computation cost, and human–computer interaction

The datasets

Our dataset consists of historical defect reports from industry software projects of an enterprise software vendor. Defect reports change over time and so there is an opportunity to learn from multiple different versions of a single defect report. Additionally, the same defect can be reported several times by different parties. Therefore, numerous data records in the dataset may belong to only one defect. Confidentiality of the data only allows us to give a brief description of the data. The

Comparing cost-sensitive learning approaches for step

In this section we compare the four cost-sensitive learning approaches (Costing, Relabeling, Weighting, and CSTree) reviewed in Section 3. Since the first three algorithms are cost-sensitive meta-learning approaches (the fourth method is a single algorithm of cost-sensitive decision tree), we will investigate their performance with different base learning algorithms. More than a dozen different algorithms in WEKA [48] had originally been chosen for the first three approaches, but due to the

Experiments on public datasets

To further verify our results, we apply these methods on five real-world publicly available datasets.1 The characteristics of these datasets are listed in Table 12.

We first investigate the impact of the improvement of CSTree discussed in Section 3, by comparing with CSTree without pruning (denoted as CSTree-NP) and with its previous version denoted as CSDT. The experiments are conducted on the datasets listed in Table 12. We run the experiments

Deployment

Our STEP system has been in deployment in the product group of an enterprise software vendor where the dataset comes from. It has been used to make suggestions on current defect reports with high risks of escalation.

We have evaluated STEP using the defect reports submitted or updated during the most recent three weeks in the test set. Any records corresponding to defect reports which had already been escalated at the time of preparing the data set were also removed. After STEP makes its

Conclusions

In this paper, we have presented a successful case for predicting and preventing escalations from known product defect reports for enterprise software vendors. A software defect escalation prediction (STEP) system based on data-mining for the maximum net profit has been proposed and tested, and is currently deployed at an enterprise software vendor. Results provide strong evidence that we can indeed make useful predictions about the escalation risk of product defects. The enterprise software

Acknowledgments

The authors would like to thank the anonymous reviewers for their insightful and constructive comments and suggestions that have helped improve the quality of this paper. This research has been supported by the US National Science Foundation (IIS-1115417).

References (52)

  • N. Abe, B. Zadrozny, J. Langford, An iterative method for multiclass cost-sensitive learning, in: Proceedings of the...
  • E. Bauer et al.

    An empirical comparison of voting classification algorithms: bagging, boosting, and variants

    Mach. Learn.

    (1999)
  • B.W. Beohm et al.

    Software defect reduction top 10 list

    Computer

    (2001)
  • M.J.A. Berry et al.

    Data Mining Techniques: For Marketing, Sales, and Customer Support

    (1997)
  • B.W. Boehm

    Software Engineering Economics

    (1981)
  • J.P. Bradford, C. Kuntz, R. Kohavi, C. Brunk, C.E. Brodley, Pruning decision trees with misclassification costs, in:...
  • U. Brefeld, P. Geibel, F. Wysotzki, Support vector machines with example dependent costs, in: Proceedings of the 14th...
  • L. Breiman

    Bagging Predictors

    Mach. Learn.

    (1996)
  • T. Bruckhaus, C.X. Ling, N.H. Madhavji, S. Sheng, Software escalation prediction with data mining, in: Workshop on...
  • M. Cebe, C. Gunduz-Demir, Test-cost sensitive classification based on conditioned loss functions, in: Proceeding of the...
  • X. Chai et al.

    Test-cost sensitive Naïve Bayesian classification

  • N.V. Chawla et al.

    SMOTE: synthetic minority over-sampling technique

    J. Artif. Intell. Res.

    (2002)
  • N.V. Chawla, N. Japkowicz, A. Kolcz (Eds.), Special Issue on Learning from Imbalanced Datasets. SIGKDD, vol. 6(1), ACM...
  • S. Chulani, B.W. Boehm, Modeling software defect introduction, California Software Symposium, November...
  • H. Dai, (Ed.), Proceedings of the International Workshop on Data Mining for Software Engineering and Knowledge...
  • P. Domingos

    MetaCost: a general method for making classifiers cost-sensitive

  • C. Drummond, R.C. Holte, C4.5, Class Imbalance, and Cost Sensitivity: Why under-sampling beats over-sampling, in:...
  • C. Elkan, The foundations of cost-sensitive learning, in: Proceedings of the International Joint Conference of...
  • W. Fan, S.J. Stolfo, J. Zhang, P.K. Chan, AdaCost: misclassification cost-sensitive boosting, in: Proceedings of the...
  • U. Fayyad, G. Piatetsky-Shapiro, P. Smyth, R. Uthurusamy (Eds.), Advances in Knowledge Discovery and Data Mining,...
  • Y. Freund, R.E. Schapire, Experiments with a New Boosting Algorithm, in: Proceedings of International Conference on...
  • N. Japkowicz, Concept-learning in the presence of between-class and within-class imbalances, in: Proceedings of the...
  • M.V. Joshi, R.C. Agarwal, V. Kumar, Mining needles in a haystack: classifying rare classes via two-phase rule...
  • U. Knoll, G. Nakhaeizadeh, B. Tausend, Cost-sensitive pruning of decision trees, in: Proceedings of the 8th European...
  • M. Kubat, S. Matwin, Addressing the curse of imbalanced training sets: one-sided selection, in: Proceedings of the 14th...
  • C.X. Ling, C. Li, Data mining for direct marketing: specific problems and solutions, in Proceedings of the Fourth...
  • Cited by (27)

    • Non-revisiting genetic cost-sensitive sparse autoencoder for imbalanced fault diagnosis

      2022, Applied Soft Computing
      Citation Excerpt :

      For imbalanced fault diagnosis, the data sampling techniques are well addressed, such as synthetic minority oversampling technique(SMOTE) [10] and generative adversarial network(GAN) [11]. On the contrary, few researchers utilize the cost-sensitive learning for imbalanced fault diagnosis although it has been fully utilized in other fields, such as software defect prediction [12], financial misstatements detection [13] and escalation defection [14]. Besides, implementation of cost-sensitive learning is still a hard problem, and the main techniques employed can be divided into two categories: direct methods and meta-learning methods [15].

    • Novel algorithms for cost-sensitive classification and knowledge discovery in class imbalanced datasets with an application to NASA software defects

      2018, Information Sciences
      Citation Excerpt :

      Therefore, these studies separated a software project’s source code into each function, then predicted which function contained bugs. SDP studies either take a non cost-sensitive approach [11,14,17] or a cost-sensitive approach [12,19,21,22]. When performed non-cost-sensitively, the aim is to make as many correct predictions as possible.

    • Software defect prediction using a cost sensitive decision forest and voting, and a potential solution to the class imbalance problem

      2015, Information Systems
      Citation Excerpt :

      In this paper we propose a cost-sensitive decision forest algorithm called CSForest and a suitable cost sensitive voting technique called CSVoting in order to reduce the classification cost. We also empirically compare our proposed technique with two existing cost-sensitive classifiers called Weighting [23] and CSTree [16,12] and two cost-insensitive classifiers called C4.5 [15] and SysFor [11]. We use six (6) publicly available real-world datasets available from the NASA MDP repository [20].

    View all citing articles on Scopus
    View full text