Elsevier

Applied Soft Computing

Volume 49, December 2016, Pages 1085-1093
Applied Soft Computing

Classification with reject option for software defect prediction

https://doi.org/10.1016/j.asoc.2016.06.023Get rights and content

Highlights

  • We propose the use of classification with reject option for software defect prediction (SDP) as a way to incorporate additional knowledge in the SDP process.

  • We propose two variants of the extreme learning machine with reject option.

  • It is proposed an ELM with reject option for imbalanced datasets.

  • The proposed method is tested on five real world software datasets.

  • An example is shown to illustrate how the rejected software modules can be further analyzed to improve the final SDP accuracy.

Abstract

Context

Software defect prediction (SDP) is an important task in software engineering. Along with estimating the number of defects remaining in software systems and discovering defect associations, classifying the defect-proneness of software modules plays an important role in software defect prediction. Several machine-learning methods have been applied to handle the defect-proneness of software modules as a classification problem. This type of “yes” or “no” decision is an important drawback in the decision-making process and if not precise may lead to misclassifications. To the best of our knowledge, existing approaches rely on fully automated module classification and do not provide a way to incorporate extra knowledge during the classification process. This knowledge can be helpful in avoiding misclassifications in cases where system modules cannot be classified in a reliable way.

Objective

We seek to develop a SDP method that (i) incorporates a reject option in the classifier to improve the reliability in the decision-making process; and (ii) makes it possible postpone the final decision related to rejected modules for an expert analysis or even for another classifier using extra domain knowledge.

Method

We develop a SDP method called rejoELM and its variant, IrejoELM. Both methods were built upon the weighted extreme learning machine (ELM) with reject option that makes it possible postpone the final decision of non-classified modules, the rejected ones, to another moment. While rejoELM aims to maximize the accuracy for a rejection rate, IrejoELM maximizes the F-measure. Hence, IrejoELM becomes an alternative for classification with reject option for imbalanced datasets.

Results

rejoEM and IrejoELM are tested on five datasets of source code metrics extracted from real world open-source software projects. Results indicate that rejoELM has an accuracy for several rejection rates that is comparable to some state-of-the-art classifiers with reject option. Although IrejoELM shows lower accuracies for several rejection rates, it clearly outperforms all other methods when the F-measure is used as a performance metric.

Conclusion

It is concluded that rejoELM is a valid alternative for classification with reject option problems when classes are nearly equally represented. On the other hand, IrejoELM is shown to be the best alternative for classification with reject option on imbalanced datasets. Since SDP problems are usually characterized as imbalanced learning problems, the use of IrejoELM is recommended.

Introduction

Software defect prediction (SDP) remains an important research topic in the software engineering field after more than 30 years of research [1]. SDP approaches focuses on [1]: (i) estimating the number of defects remaining in software systems; (ii) discovering defect associations; and (iii) classifying the defect-proneness of software modules into defect-prone and not defect-prone. Building a successful SDP system may provide means to allocate test resources more efficiently thus reducing software development costs [2]. The interest in SDP has grown in recent years as reported in [3]. The work consists on a recent review paper that presented 208 related works. Among the commonly used techniques, machine learning methods have achieved the most significant results.

Traditionally, machine learning for SDP are modeled as a classification problem. Several software modules are categorized into defect-prone and non defect-prone and represented by a set of metrics [4]. An algorithm is trained with this dataset so that it can distinguish between the two categories given a vector of metrics from a given software module.

Although previous works can handle important issues, all approaches designed so far are based on fully automated methods. Given a vector of metric from a module, the system automatically assigns it to the class of defect-prone or non defect-prone. The automated procedure of these algorithms do not provide a way to incorporate any human expert knowledge. This knowledge is specially useful when facing situations that are significantly different from the ones available on training set. Human expert (e.g., software developer, maintainer, and tester) knowledge can also be useful in critical applications where a classification error (usually related to misdetecting a defective module) may have serious consequences.

In such situations, a possible solution is to incorporate a reject option on the classifier. In doing so, the classifier may either choose between the two classes or not to classify (reject) the sample. The rejected sample may be further analyzed by a specialist that will give the final decision. The decision to reject a sample is based on the degree of certainty that the classifier have. When both classes are almost equally probable, the classifier chooses to reject the sample.

Classification with reject option is a paradigm that has been successfully applied in many areas but more extensively in medical applications. Examples can be seen for vertebral column diseases [5], tumor detection [6], and breast cancer diagnostics [7]. In these medical applications, the proposed classifiers with reject option aim to reduce the workload of the medical doctor instead of being and automatic diagnostic system. The workload is reduced since most of the cases can be classified correctly and the most difficult ones are analyzed by the specialist.

Similarly to what is done in medical applications, classification with reject option can provide significant improvements in SDP problems. In complex software systems comprising a large number of modules the expenditures in testing may represent a significant amount of the total cost. By using classification with reject option, the classifier may detect most of the defect-prone modules. The rejected ones (the ones that the classifier is not certain) may be sent to reduced team of specialists (e.g., software maintainers and testers).

This paper proposes the application of classifiers with reject option in software defect prediction problems. Additionally, the authors propose two classifiers with reject option based on the extreme learning machine (ELM). ELM is a supervised learning method that presented good results in many problems as can be seen in [8], [9], [10]. Its main advantages are the fast training procedure and its simple formulation. The proposed methods (rejoELM and IrejoELM) are built upon the weighted ELM [11]. While rejoELM aims to maximize the accuracy for a rejection rate, IrejoELM maximizes the F-measure. Hence, IrejoELM presents an alternative for classification with reject option on imbalanced datasets. The methods are tested on 5 datasets extracted from real world software projects and results are compared to several classification with reject option algorithms available on the literature.

The remaining part of this paper is organized as follows. Section 2 presents a brief literature review of recent works on machine learning for SDP. Section 3 show some important concepts related to software defect prediction, classification with reject option and extreme learning machines. The proposed methods are presented in Section 4. Experiments and a discussion about the results are shown in Sections 5 Experiments and results, 6 Discussion, respectively. Threats to validity are shown in Section 7 and conclusions are shown in Section 8.

Section snippets

Related work

Different machine learning methods have been used to solve the software defect prediction problem. Neural networks [2], random forests [12], logistic regression [13] and support vector machines [14] are some of the methods available in the literature.

Apart from the applications of standard classification methods, several works addressed important issues related to SDP. In [2] the authors point that non defect-prone modules happen more frequently than defect-prone ones. This fact may lead to an

Software defect prediction

In software engineering, a software defect (also known as software bug) is an error, or fault, in a software system that manifests during its execution, leading it to behave erroneously or improperly (i.e., different from what is expected). In the software life-cycle, the most software defects arise from mistakes and errors made by software engineers in either a software source code or its design. Software debugging is the process of locating and fixing defects [22]. However, according to the

Defect prediction metrics and dataset

As previously mentioned in Section 3.1, the most of SDP approaches are based on diverse information, such as source code metrics (e.g., lines of code and complexity) and process metrics (e.g., number of changes and recent activity). The source code metrics are related to the software source code itself and can be extracted fully automatically from it using a proper extraction tool (e.g., inFusion1, STAN4J2, and Metrics3

Experiments and results

The performance of rejoELM and IrejoELM was assessed for five SDP datasets comprising CK+OO metrics from real world open-source software projects. Two thirds of the data were used for training and the remaining third was used for testing. All hyper-parameters were chosen using grid search and 5-fold cross validation. Table 2 displays information regarding the size of each data set, the number of positive examples (faulty modules) and the imbalance ratio (IR). IR quantifies the imbalance degree

Discussion

On the basis of our experiments we can state that rejoELM performance is quite similar to the rejoRBF. Thus, the computational complexity takes an important role in the decision making process of selecting the most appropriate method. In this regard, ELM is known to be less complex than RBF. Hence rejoELM can be considered valid alternative for classification with reject option.

Even though rejoELM results are comparable to other state-of-the art classification with reject option methods, it

Construct validity

In this work, the proposed methods were tested considering five different datasets. Such datasets were extract following a well-defined process. Any mistake made during the datasets construction can be seen as a possible threat to the validity of our work. However, in the specialized literature on software defect predication you can find several authors that have been used the same datasets to validate their works. This fact increases our confidence in the datasets consistence and reliability.

Internal validity

Conclusions

In this paper we propose an ELM classifier with reject option with application to software defect prediction. The proposed method, named rejoELM, was tested in five real world datasets and outperformed other commonly used machine learning methods with reject option. For all datasets, source code metrics were used in the experiments.

The use of the reject option paradigm aims to change the standard fully automated classification procedure used in previous works to a semi-automated defect

References (34)

  • A. Bounsiar et al.

    General solution and learning method for binary classification with performance constraints

    Pattern Recognit. Lett.

    (2008)
  • G.B. Huang et al.

    Extreme learning machine: theory and applications

    Neurocomputing

    (2006)
  • Q. Song et al.

    A general software defect-proneness prediction framework

    IEEE Trans. Softw. Eng.

    (2011)
  • T. Hall et al.

    A systematic literature review on fault prediction performance in software engineering

    IEEE Trans. Softw. Eng.

    (2012)
  • S. Lessmann et al.

    Benchmarking classification models for software defect prediction: a proposed framework and novel findings

    IEEE Trans. Softw. Eng.

    (2008)
  • A.R. da Rocha Neto et al.

    Diagnostic of pathology on the vertebral column with embedded reject option

  • F. Condessa et al.

    Classification with reject option using contextual information

  • Cited by (38)

    • An ensemble meta-estimator to predict source code testability[Formula presented]

      2022, Applied Soft Computing
      Citation Excerpt :

      Indeed, as an inherent feature, human factors should not affect testability. So far, machine learning approaches have been applied to different aspects of software testing and debugging [40], including test data generation [41], fault prediction [42–44], and fault localization [45,46]. Mesquita et al. [44] have used the extreme learning machine (ELM) algorithm to classify source code modules as faulty and nonfaulty with a reject option using 17 source code metrics.

    • Evaluating pointwise reliability of machine learning prediction

      2022, Journal of Biomedical Informatics
      Citation Excerpt :

      For instance, Bayes classifiers were exploited to detect reliable regions in gene expression data [34], while posterior probability and contextual information are used to classify images from teratoma tissues and reject non-reliable portions [15]. By identifying samples for which the classification may be wrong, classification reliability and classification with reject option can be seen as synonyms [3,27,61]. Learning with rejection often implies the definition of a reject threshold.

    • CFPS: Collaborative filtering based source projects selection for cross-project defect prediction

      2021, Applied Soft Computing
      Citation Excerpt :

      One way to improve software quality is software defect prediction, which has been an important research topic in the field of software engineering. Software defect prediction aims to find fault-prone modules in software [1–8], which helps organizations to allocate limited resources reasonably and provides an effective means to reduce the workload of software code inspection or testing. Currently, within-project defect prediction (WPDP) [9–15] and cross-project defect prediction (CPDP) [16–20] are two popular but different directions for software defect prediction research.

    • Artificial neural network based software fault detection and correction prediction models considering testing effort

      2020, Applied Soft Computing Journal
      Citation Excerpt :

      A hybrid method using an artificial neural network and quantum particle swarm optimization was presented for software fault-prone prediction [36]. Based on extreme learning machine, Mesquita et al. [37] proposed two classifiers with reject option for software defect prediction problems. Juneja [38] proposed a fuzzy-filtered neuro-fuzzy framework for software fault prediction.

    View all citing articles on Scopus
    View full text