Cost-Sensitive back-propagation neural networks with binarization techniques in addressing multi-class problems and non-competent classifiers
Introduction
Classification is one of the most important problems in machine learning and data mining. A classification problem consists of learning a mapping function, which is called a classifier, from a set of n input examples x1, x2, …, xn labeled by m classes . In classical classification algorithms, the classifiers attempt to minimize the errors of classification and usually presume that all classification errors have equal costs. Unfortunately, with such an assumption, the classifiers are always invalid in practice when new examples with unequal costs of error classification are needed to be classified. For example, it is considerably more expensive for a spam classification system to classify an email with important information as a negative. Indeed, cost-sensitive problems exist in many real-world applications, such as fraud detection [1], medical diagnosis [2], software defect prediction [3], and face recognition [4].
Compared with regular classification, cost-sensitive learning attaches different costs of misclassification error. The task of cost-sensitive learning is to gain a cost-sensitive classifier from a training dataset with misclassification cost that attempts to minimize the expected cost instead of expected error. Specially, cost-sensitive learning is considered as one of the strategies to address imbalance classification problems [5], [6], when the minority classes are assigned higher misclassification cost. In recent years, many methods have been proposed for addressing cost-sensitive classification problems, such as MetaCost [7], Cost-Sensitive Back-Propagation Neural Network (CSBPNN) [8], cost-sensitive tree [9], cost-sensitive boosting [10], [11], [12] and cost-sensitive nearest neighbors [13].
Numerous research efforts have indicated that it is useful and effective for cost-sensitive learning approaches to address two-class cost-sensitive problems [7], [14], [15], However, several studies [8], [16] have shown that previous research on cost-sensitive approaches used in multi-class problems is not as useful as in two-class problems. The reason is that multi-class cost-sensitive classification problems are overly complex to address, since misclassification can occur in multiple ways in a multi-class classification case. Therefore, some works [16], [17] have initiated research on multi-class cost-sensitive problems. However, almost all the previous studies on multi-class cost-sensitive classification have investigated problems with direct solutions.
A popular approach called binarization technique has been proven as an effective method to address multi-class problems [18], [19]. This method allows simplifying the initial multi-class classification problems using divide-and-conquer methodology, i.e., by applying a decomposition strategy to divide the original multi-class classification problems into several easier-to-solve binary sub-problems. Several alternatives for the decomposition of multi-class problems can be found in the specialized literature [20]. One of the most popular binarization techniques is the One-vs-One (OVO) scheme, which consists of decomposing multi-class problems into as many as possible pairs of binary sub-problems. When the binary classifiers are built, an aggregation method, which combines the binary classifiers to produce the final output over the class labels, must be studied. In recent years, many different methods for the combination of outputs from base classifiers have been proposed including voting strategy [21], weighted voting strategy [22], learning valued preference for classification [23], and preference relations solved by non-dominance criterion [24], [25].
Unlike a classical method using a single constant classifier to distinguish paired classes, several proposals have aimed to extend the ensemble learning approach to an OVO scheme. A dynamic approach for an OVO scheme was proposed in [26], where the best binary classifier was dynamically selected in each sub-problem for each unknown pattern. Recently, in our previous work [27], we employed an ensemble learning technique for each sub-problem derived from the decomposition procedure to address multi-class imbalanced data.
Through deep analysis of the OVO procedure, it can be determined that all binary classifiers will be fired for a given unknown pattern; even classifiers that submit an erroneous score to the system that could mislead the final decision. This problem is considered as the “non-competent classifiers problem” [22], which hinders the classifying of an unknown pattern and leads to an incorrect label. However, it is impossible to know a priori which classifier should be used for classifying the query instance [28], otherwise the classification problem would be solved. Therefore, the techniques that have been proposed in the specialized literature aim to manage non-competent classifiers based on a dynamic procedure. In [28], a method called Dynamic Classifier Selection for OVO scheme (DCS-OVO) was proposed to select the nearest classes of the query instance in order to neglect those classifiers that are not related to the unknown pattern. Following this research line, the authors developed an approach known as Distance-based Relative Competence Weighting for OVO strategy (DRCW-OVO) in [29], where the negative impact from the non-competent classifiers was alleviated by adopting a weighted method based on the distance between the unknown pattern and each one of the classes of the problem.
The aim of this study is to analyze the hypothesis of using an OVO decomposition scheme with CSBPNN to address multi-class cost-sensitive problems. We design an OVO scheme for multi-class cost-sensitive learning problems. Further, we consider different aggregation approaches for combining binary cost-sensitive classifiers; we also provide an analysis of the management of non-competent classifiers for this scenario.
To draw meaningful conclusions, a thorough empirical analysis is performed in this study. A set of 25 datasets from the KEEL dataset repository is selected for our experimental study. Because we do not know the actual cost matrices of the datasets, we analyze the behavior of the proposed method on different types of cost matrices. The proper statistical analysis techniques suggested in [30], [31] are employed to support the extracted findings drawn from the experimental study. In this work, our main contributions can be summarized as follows:
- •
We analyze experimentally if a positive synergy of the CSBPNN and the OVO strategy exists for addressing the multi-class cost-sensitive classification problems.
- •
Then, we perform a comparison of the well-known combination methods used for OVO scheme to demonstrate the property of aggregation strategies in the scenario of multi-class cost-sensitive problems.
- •
Finally, since the strategy of the management of non-competent classifiers for OVO scheme has been proven as an appropriate technique to address multi-class problems [29], we develop a study to verify if this strategy can also improve behavior in addressing multi-class cost-sensitive problems.
The remainder of this paper is organized as follows. Section 2 introduces the cost-sensitive classification problem including a description of the problem, strategies for cost-sensitive learning, and CSBPNN based on a threshold moving strategy. In Section 3, we describe the OVO decomposition scheme, aggregation methods, and strategies for management of non-competent classifiers. The experimental framework is presented in Section 4, including the selected datasets, cost matrices for the multi-class datasets, parameter settings for the algorithms, performance measures, and statistical tests. Section 5 describes the results and analysis of the complete experimental study. The learned lessons of this study are presented in Section 6 and the conclusions can be found in the final section.
Section snippets
Cost-sensitive classification problems: the CSBPNN model and threshold-moving strategy
The problems of cost-sensitive classification are described in Section 2.1. Then, the strategies used for addressing the cost-sensitive problems are presented in Subsection 2.2. Finally, in Section 2.3, we provide a short description of CSBPNN based on threshold-moving, which is employed as the base classifier in the decomposition scheme for this study.
Decomposition of multi-class classification problems
In this section, we first introduce the OVO decomposition scheme for solving multi-class classification problems in Subsection 3.1. Then, Subsection 3.2 presents the state-of-the-art combination methods for the OVO scheme. Finally, in Subsection 3.3, two well-known strategies for managing non-competent classifiers in an OVO scheme are described.
Experimental framework
In this section, we describe the configuration for the experimental study developed in Section 5. In Subsection 4.1, the datasets chosen from real-world applications are presented. Then, in Subsection 4.2, we explain three different types of cost matrices tested in the experimental study. In Subsection 4.3, we provide a description of parameter settings for the algorithms used in the experimental study. Finally, Subsection 4.4 describes the statistical tests for performance comparison.
Results and discussion
In this section, we present the complete results of the experimental study. We aim to answer the following questions:
- 1.
Is it necessary to adopt a binarization technique to address multi-class cost-sensitive problems?
- 2.
Which is the most appropriate aggregation method for combining the binary cost-sensitive classifiers?
- 3.
Can further improvement be realized with the management of non-competent classifiers for multi-class cost-sensitive problems?
Thereby we divide this section into three parts, each
Lessons learned
In this section, we summarize the most important research findings of this study.
- •
Effectiveness of binarization technique. The results demonstrate that the use of a binarization technique is beneficial to address multi-class cost-sensitive problems when compared with both cost-insensitive and cost-sensitive classifiers, which are directly used to handle the multi-class cost-sensitive problems. The statistical tests confirm that in general, significant difference can always be found, which
Conclusions
In this study, we analyzed our hypothesis on the use of a cost-sensitive back-propagation neural network with OVO decomposition scheme to address multi-class cost-sensitive learning problems. In this manner, we were able to extend the method that has been proven effective in two-class cost-sensitive tasks to the significantly more challenging scenarios of multi-class problems, where the relationships among classes are no longer obvious.
Regarding the analysis of the effect of the management of
Acknowledgments
The authors would like to thank the (anonymous) reviewers for their constructive comments. This work is financially supported by the National Natural Science Foundation of China (NSFC Proj. 71171039 and 61273204) and CSC Scholarship Program (CSC NO. 201406080059). It is also supported by the Spanish National Research Project TIN2014-57251-P.
References (45)
- et al.
A cost-sensitive decision tree approach for fraud detection
Expert Syst. Appl.
(2013) - et al.
Cost-sensitive case-based reasoning using a genetic algorithm: application to medical diagnosis
Artif. Intell. Med.
(2011) - et al.
Software defect prediction using cost-sensitive neural network
Appl. Soft Comput.
(2015) - et al.
Discriminative cost sensitive Laplacian score for face recognition
Neurocomputing
(2015) - et al.
An insight into classification with imbalanced data: empirical results and current trends on using data intrinsic characteristics
Infor. Sci.
(2013) - et al.
Cost-sensitive decision tree ensembles for effective imbalanced classification
Appl. Soft Comput.
(2014) - et al.
Cost-sensitive boosting for classification of imbalanced data
Pattern Recogn.
(2007) - et al.
An overview of ensemble methods for binary classifiers in multi-class problems: experimental study on one-vs-one and one-vs-all schemes
Pattern Recogn.
(2011) - et al.
Analysing the classification of imbalanced data-sets with multiple classes: binarization techniques and ad-hoc approaches
Knowl.-Based Syst.
(2013) - et al.
Combining predictions in pairwise classification: an optimal adaptive voting strategy and its relation to weighted voting
Pattern Recogn.
(2010)
Learning valued preference structures for solving classification problems
Fuzzy Sets Syst.
Decision-making with a fuzzy preference relation
Fuzzy Sets Syst.
Solving multi-class problems with linguistic fuzzy rule based classification systems based on pairwise learning and preference relations
Fuzzy Sets Syst.
Dynamic selection of the best base classifier in One versus One
Knowl.-Based Syst.
Empowering one-vs-one decomposition with ensemble learning for multi-class imbalanced data
Knowl.-Based Syst.
Dynamic classifier selection for One-vs-One strategy: avoiding non-competent classifiers
Pattern Recogn.
DRCW-OVO: Distance-based relative competence weighting combination for One-vs-One strategy in multi-class problems
Pattern Recogn.
A practical tutorial on the use of nonparametric statistical tests as a methodology for comparing evolutionary and swarm intelligence algorithms
Swarm Evol. Comput.
Speaker identification using vowels features through a combined method of formants, wavelets, and neural network classifiers
Appl. Soft Comput.
Speech recognition with artificial neural networks
Digital Signal Process.
Optimized face recognition algorithm using radial basis function neural networks and its practical applications
Neural Netw.
Class imbalance revisited: a new experimental setup to assess the performance of treatment methods
Knowl. Infor. Syst.
Cited by (36)
Undersampling method based on minority class density for imbalanced data
2024, Expert Systems with ApplicationsReview of resampling techniques for the treatment of imbalanced industrial data classification in equipment condition monitoring
2023, Engineering Applications of Artificial IntelligenceAngle-based cost-sensitive multicategory classification
2021, Computational Statistics and Data AnalysisHybrid attribute conditional adversarial denoising autoencoder for zero-shot classification of mechanical intelligent fault diagnosis
2020, Applied Soft Computing JournalCitation Excerpt :Therefore, fault diagnosis of mechanical equipment has great engineering significance. Recently, with the development of big data, data-based intelligent fault diagnosis has become a research hotspot, which relies on historical data and requires less prior knowledge [3,4]. It is also easy to model and more suitable for fault diagnosis in complex mechanical equipment.
Random Balance ensembles for multiclass imbalance learning
2020, Knowledge-Based SystemsCitation Excerpt :They use ReLU activations, softmax outputs and categorical cross-entropy loss. Cost-sensitive approaches have also been proposed [25–27]. A cost-sensitive boosting algorithm for multiple classes, AdaC2.M1, is developed in [25].