Elsevier

Applied Soft Computing

Volume 56, July 2017, Pages 357-367
Applied Soft Computing

Cost-Sensitive back-propagation neural networks with binarization techniques in addressing multi-class problems and non-competent classifiers

https://doi.org/10.1016/j.asoc.2017.03.016Get rights and content

Highlights

  • A novel method based on cost-sensitive neural networks with binarization techniques for multi-class problems is developed.

  • The effect of aggregation methods for the proposed method is studied.

  • The positive synergy between the management of non-competent classifiers and the proposed method is found.

  • The effectiveness of our method tested on three different kinds of cost matrices is investigated.

  • In this study, 25 real-world applications, from KEEL dataset repository, are selected for the experimental study.

Abstract

Multi-class classification problems can be addressed by using decomposition strategy. One of the most popular decomposition techniques is the One-vs-One (OVO) strategy, which consists of dividing multi-class classification problems into as many as possible pairs of easier-to-solve binary sub-problems. To discuss the presence of classes with different cost, in this paper, we examine the behavior of an ensemble of Cost-Sensitive Back-Propagation Neural Networks (CSBPNN) with OVO binarization techniques for multi-class problems. To implement this, the original multi-class cost-sensitive problem is decomposed into as many sub-problems as possible pairs of classes and each sub-problem is learnt in an independent manner using CSBPNN. Then a combination method is used to aggregate the binary cost-sensitive classifiers. To verify the synergy of the binarization technique and CSBPNN for multi-class cost-sensitive problems, we carry out a thorough experimental study. Specifically, we first develop the study to check the effectiveness of the OVO strategy for multi-class cost-sensitive learning problems. Then, we develop a comparison of several well-known aggregation strategies in our scenario. Finally, we explore whether further improvement can be achieved by using the management of non-competent classifiers. The experimental study is performed with three types of cost matrices and proper statistical analysis is employed to extract the meaningful findings.

Introduction

Classification is one of the most important problems in machine learning and data mining. A classification problem consists of learning a mapping function, which is called a classifier, from a set of n input examples x1, x2, …, xn labeled by m classes Y={y1,y2,...,ym}. In classical classification algorithms, the classifiers attempt to minimize the errors of classification and usually presume that all classification errors have equal costs. Unfortunately, with such an assumption, the classifiers are always invalid in practice when new examples with unequal costs of error classification are needed to be classified. For example, it is considerably more expensive for a spam classification system to classify an email with important information as a negative. Indeed, cost-sensitive problems exist in many real-world applications, such as fraud detection [1], medical diagnosis [2], software defect prediction [3], and face recognition [4].

Compared with regular classification, cost-sensitive learning attaches different costs of misclassification error. The task of cost-sensitive learning is to gain a cost-sensitive classifier h:X{y1,y2,...,ym} from a training dataset with misclassification cost that attempts to minimize the expected cost instead of expected error. Specially, cost-sensitive learning is considered as one of the strategies to address imbalance classification problems [5], [6], when the minority classes are assigned higher misclassification cost. In recent years, many methods have been proposed for addressing cost-sensitive classification problems, such as MetaCost [7], Cost-Sensitive Back-Propagation Neural Network (CSBPNN) [8], cost-sensitive tree [9], cost-sensitive boosting [10], [11], [12] and cost-sensitive nearest neighbors [13].

Numerous research efforts have indicated that it is useful and effective for cost-sensitive learning approaches to address two-class cost-sensitive problems [7], [14], [15], However, several studies [8], [16] have shown that previous research on cost-sensitive approaches used in multi-class problems is not as useful as in two-class problems. The reason is that multi-class cost-sensitive classification problems are overly complex to address, since misclassification can occur in multiple ways in a multi-class classification case. Therefore, some works [16], [17] have initiated research on multi-class cost-sensitive problems. However, almost all the previous studies on multi-class cost-sensitive classification have investigated problems with direct solutions.

A popular approach called binarization technique has been proven as an effective method to address multi-class problems [18], [19]. This method allows simplifying the initial multi-class classification problems using divide-and-conquer methodology, i.e., by applying a decomposition strategy to divide the original multi-class classification problems into several easier-to-solve binary sub-problems. Several alternatives for the decomposition of multi-class problems can be found in the specialized literature [20]. One of the most popular binarization techniques is the One-vs-One (OVO) scheme, which consists of decomposing multi-class problems into as many as possible pairs of binary sub-problems. When the binary classifiers are built, an aggregation method, which combines the binary classifiers to produce the final output over the class labels, must be studied. In recent years, many different methods for the combination of outputs from base classifiers have been proposed including voting strategy [21], weighted voting strategy [22], learning valued preference for classification [23], and preference relations solved by non-dominance criterion [24], [25].

Unlike a classical method using a single constant classifier to distinguish paired classes, several proposals have aimed to extend the ensemble learning approach to an OVO scheme. A dynamic approach for an OVO scheme was proposed in [26], where the best binary classifier was dynamically selected in each sub-problem for each unknown pattern. Recently, in our previous work [27], we employed an ensemble learning technique for each sub-problem derived from the decomposition procedure to address multi-class imbalanced data.

Through deep analysis of the OVO procedure, it can be determined that all binary classifiers will be fired for a given unknown pattern; even classifiers that submit an erroneous score to the system that could mislead the final decision. This problem is considered as the “non-competent classifiers problem” [22], which hinders the classifying of an unknown pattern and leads to an incorrect label. However, it is impossible to know a priori which classifier should be used for classifying the query instance [28], otherwise the classification problem would be solved. Therefore, the techniques that have been proposed in the specialized literature aim to manage non-competent classifiers based on a dynamic procedure. In [28], a method called Dynamic Classifier Selection for OVO scheme (DCS-OVO) was proposed to select the nearest classes of the query instance in order to neglect those classifiers that are not related to the unknown pattern. Following this research line, the authors developed an approach known as Distance-based Relative Competence Weighting for OVO strategy (DRCW-OVO) in [29], where the negative impact from the non-competent classifiers was alleviated by adopting a weighted method based on the distance between the unknown pattern and each one of the classes of the problem.

The aim of this study is to analyze the hypothesis of using an OVO decomposition scheme with CSBPNN to address multi-class cost-sensitive problems. We design an OVO scheme for multi-class cost-sensitive learning problems. Further, we consider different aggregation approaches for combining binary cost-sensitive classifiers; we also provide an analysis of the management of non-competent classifiers for this scenario.

To draw meaningful conclusions, a thorough empirical analysis is performed in this study. A set of 25 datasets from the KEEL dataset repository is selected for our experimental study. Because we do not know the actual cost matrices of the datasets, we analyze the behavior of the proposed method on different types of cost matrices. The proper statistical analysis techniques suggested in [30], [31] are employed to support the extracted findings drawn from the experimental study. In this work, our main contributions can be summarized as follows:

  • We analyze experimentally if a positive synergy of the CSBPNN and the OVO strategy exists for addressing the multi-class cost-sensitive classification problems.

  • Then, we perform a comparison of the well-known combination methods used for OVO scheme to demonstrate the property of aggregation strategies in the scenario of multi-class cost-sensitive problems.

  • Finally, since the strategy of the management of non-competent classifiers for OVO scheme has been proven as an appropriate technique to address multi-class problems [29], we develop a study to verify if this strategy can also improve behavior in addressing multi-class cost-sensitive problems.

The remainder of this paper is organized as follows. Section 2 introduces the cost-sensitive classification problem including a description of the problem, strategies for cost-sensitive learning, and CSBPNN based on a threshold moving strategy. In Section 3, we describe the OVO decomposition scheme, aggregation methods, and strategies for management of non-competent classifiers. The experimental framework is presented in Section 4, including the selected datasets, cost matrices for the multi-class datasets, parameter settings for the algorithms, performance measures, and statistical tests. Section 5 describes the results and analysis of the complete experimental study. The learned lessons of this study are presented in Section 6 and the conclusions can be found in the final section.

Section snippets

Cost-sensitive classification problems: the CSBPNN model and threshold-moving strategy

The problems of cost-sensitive classification are described in Section 2.1. Then, the strategies used for addressing the cost-sensitive problems are presented in Subsection 2.2. Finally, in Section 2.3, we provide a short description of CSBPNN based on threshold-moving, which is employed as the base classifier in the decomposition scheme for this study.

Decomposition of multi-class classification problems

In this section, we first introduce the OVO decomposition scheme for solving multi-class classification problems in Subsection 3.1. Then, Subsection 3.2 presents the state-of-the-art combination methods for the OVO scheme. Finally, in Subsection 3.3, two well-known strategies for managing non-competent classifiers in an OVO scheme are described.

Experimental framework

In this section, we describe the configuration for the experimental study developed in Section 5. In Subsection 4.1, the datasets chosen from real-world applications are presented. Then, in Subsection 4.2, we explain three different types of cost matrices tested in the experimental study. In Subsection 4.3, we provide a description of parameter settings for the algorithms used in the experimental study. Finally, Subsection 4.4 describes the statistical tests for performance comparison.

Results and discussion

In this section, we present the complete results of the experimental study. We aim to answer the following questions:

  • 1.

    Is it necessary to adopt a binarization technique to address multi-class cost-sensitive problems?

  • 2.

    Which is the most appropriate aggregation method for combining the binary cost-sensitive classifiers?

  • 3.

    Can further improvement be realized with the management of non-competent classifiers for multi-class cost-sensitive problems?

Thereby we divide this section into three parts, each

Lessons learned

In this section, we summarize the most important research findings of this study.

  • Effectiveness of binarization technique. The results demonstrate that the use of a binarization technique is beneficial to address multi-class cost-sensitive problems when compared with both cost-insensitive and cost-sensitive classifiers, which are directly used to handle the multi-class cost-sensitive problems. The statistical tests confirm that in general, significant difference can always be found, which

Conclusions

In this study, we analyzed our hypothesis on the use of a cost-sensitive back-propagation neural network with OVO decomposition scheme to address multi-class cost-sensitive learning problems. In this manner, we were able to extend the method that has been proven effective in two-class cost-sensitive tasks to the significantly more challenging scenarios of multi-class problems, where the relationships among classes are no longer obvious.

Regarding the analysis of the effect of the management of

Acknowledgments

The authors would like to thank the (anonymous) reviewers for their constructive comments. This work is financially supported by the National Natural Science Foundation of China (NSFC Proj. 71171039 and 61273204) and CSC Scholarship Program (CSC NO. 201406080059). It is also supported by the Spanish National Research Project TIN2014-57251-P.

References (45)

  • E. Hüllermeier et al.

    Learning valued preference structures for solving classification problems

    Fuzzy Sets Syst.

    (2008)
  • S.A. Orlovsky

    Decision-making with a fuzzy preference relation

    Fuzzy Sets Syst.

    (1978)
  • A. Fernández et al.

    Solving multi-class problems with linguistic fuzzy rule based classification systems based on pairwise learning and preference relations

    Fuzzy Sets Syst.

    (2010)
  • I. Mendialdua et al.

    Dynamic selection of the best base classifier in One versus One

    Knowl.-Based Syst.

    (2015)
  • Z. Zhang et al.

    Empowering one-vs-one decomposition with ensemble learning for multi-class imbalanced data

    Knowl.-Based Syst.

    (2016)
  • M. Galar et al.

    Dynamic classifier selection for One-vs-One strategy: avoiding non-competent classifiers

    Pattern Recogn.

    (2013)
  • M. Galar et al.

    DRCW-OVO: Distance-based relative competence weighting combination for One-vs-One strategy in multi-class problems

    Pattern Recogn.

    (2015)
  • J. Derrac et al.

    A practical tutorial on the use of nonparametric statistical tests as a methodology for comparing evolutionary and swarm intelligence algorithms

    Swarm Evol. Comput.

    (2011)
  • K. Daqrouq et al.

    Speaker identification using vowels features through a combined method of formants, wavelets, and neural network classifiers

    Appl. Soft Comput.

    (2015)
  • G. Dede et al.

    Speech recognition with artificial neural networks

    Digital Signal Process.

    (2010)
  • S.-H. Yoo et al.

    Optimized face recognition algorithm using radial basis function neural networks and its practical applications

    Neural Netw.

    (2015)
  • R.C. Prati et al.

    Class imbalance revisited: a new experimental setup to assess the performance of treatment methods

    Knowl. Infor. Syst.

    (2015)
  • Cited by (36)

    • Angle-based cost-sensitive multicategory classification

      2021, Computational Statistics and Data Analysis
    • Hybrid attribute conditional adversarial denoising autoencoder for zero-shot classification of mechanical intelligent fault diagnosis

      2020, Applied Soft Computing Journal
      Citation Excerpt :

      Therefore, fault diagnosis of mechanical equipment has great engineering significance. Recently, with the development of big data, data-based intelligent fault diagnosis has become a research hotspot, which relies on historical data and requires less prior knowledge [3,4]. It is also easy to model and more suitable for fault diagnosis in complex mechanical equipment.

    • Random Balance ensembles for multiclass imbalance learning

      2020, Knowledge-Based Systems
      Citation Excerpt :

      They use ReLU activations, softmax outputs and categorical cross-entropy loss. Cost-sensitive approaches have also been proposed [25–27]. A cost-sensitive boosting algorithm for multiple classes, AdaC2.M1, is developed in [25].

    View all citing articles on Scopus
    View full text