Multiple classifiers combination by clustering and selection

doi:10.1016/S1566-2535(01)00033-1

Information Fusion

Volume 2, Issue 3, September 2001, Pages 163-168

https://doi.org/10.1016/S1566-2535(01)00033-1 Get rights and content

Abstract

This paper proposes a novel algorithm for multiple classifiers combination based on clustering and selection technique (called M3CS), which can find in the feature space the regions where each classifier has best classification performance. The proposed method may be divided into two steps: clustering and selection (operation). At clustering step, the feature space is partitioned into several regions by clustering separately the correctly and incorrectly classified training samples from each classifier, and the performances of the classifier in each region are calculated. In the selection step, the most accurate classifier in the vicinity of the input sample is nominated to provide the final decision of the committee. The performance comparison between M3CS and Kuncheva's CS+DT method, as well as some simple aggregation methods such as maximum, minimum, average, and majority vote, confirms the validity of the proposed scheme.

Introduction

The aim of designing a pattern recognition system is to achieve the best possible classification performance for the given task. This objective traditionally leads to the development of different classifier designs, and the classifier with the best performance is selected as a final solution to the problem. However, it had been observed that different classifier designs, which potentially offered complementary information about the patterns to be classified [1], [5], [6], [7], [8], [9], could be used simultaneously to achieve considerably improved performance over that of the best individual classifier.

Algorithms for multiple classifiers combination may take two approaches: classifier fusion and classifier selection. In classifier fusion algorithms, all classifiers are supposed to be equally “experienced” in the whole feature space and their outputs are combined in some manner to achieve a consensus [9], [10], [11], [12]. Classifier selection scheme assumes that each classifier has expertise in some local regions of the feature space and attempts to determine which classifier is most likely to be correct for an unknown sample.

For the classifier selection scheme, a method of partitioning the feature space and estimating the performance of each classifier is required. In Woods' DCS-LA approach [2], the classification accuracy was estimated in small regions of feature space surrounding an unknown test sample, then the most locally accurate classifier was nominated to make the final decision. This method, however, was too time-consuming due to the accuracy estimation for each test sample. Kuncheva [3] presented an algorithm to statically select the best classifier. In this method, the training data were clustered to form the decision regions, and a confidence interval was used to determine whether one or multiple classifiers should be used to make a decision. However, the number of clusters must be predetermined, and the class labels of the training samples were disregarded in the clustering procedure.

In this paper, a clustering [4] and selection based algorithm (called M3CS) is proposed to integrate multiple classifiers. With each classifier, the training samples may be divided into correctly and incorrectly classified ones, which are then clustered, respectively, to form a partition of the feature space. Due to the difference between the classifiers' error characteristics, the partitions resulted from different classifiers are generally not the same. For a certain partition, the performance of the corresponding classifier in each region is estimated using training, or validation data. In the operation phase, the most accurate classifier in the vicinity of the input sample is appointed to make the final decision. See Fig. 1 in Section 2 for a geometrical view of the basic idea.

This paper is arranged as follows: First, the basic idea of the algorithm is introduced, then the iteration procedure is presented in detail. In Section 3, the data sets used for the experiment are described briefly. The experiment results are given in Section 4 and we conclude this paper in Section 5.

Section snippets

Feature space partition and classifier selection

The basic idea of the M3CS may be illustrated with an example, where two trained classifiers, denoted as F₁ and F₂, are available in the committee. According to these classifiers, two feature space partitions, as shown in Fig. 1, can be obtained. According to F₁, the feature space is divided into six regions, say R₁,R₂,…,R₆. The classification accuracy of F₁ in these six regions is computed using the training or validation data. However, seven clusters are obtained from F₂, and the performances

Data used

To evaluate the performance of M3CS, we select four data sets: Clouds, Phoneme, Satimage, and Waveform, from the ELENA project¹ as well as the UCI data set repository [13]. These data sets are carefully selected so

Experiments and analysis

A series of experiments have been carried out to verify the performance of M3CS, and how it rates among some other aggregation methods such as average, maximum, minimum, majority vote and Kuncheva's CS+DT method. All these experiments are implemented on the four data sets introduced in the previous section, where the former 1500 samples in each data set are used for training while the remaining samples are used for testing. All the involved parameters, such as the parameters of the individual

Conclusions

Clustering and selection based multiple classifiers combination algorithm (M3CS) is proposed in this paper. First, the training data set is clustered according to each classifier's performance. Then, the output of the best classifier with response for the vicinity of the input sample is adopted as the final combination result. The performance of the proposed model is verified through the experiments. It should be noted that the fusion result is affected by the values of the parameters,

References (13)

L.Y. Tseng et al.
A genetic clustering algorithm for data with non-spherical-shape clusters
Pattern Recognition
(2000)
J. Kittler et al.
Combing evidence in personal identity verification systems
Pattern Recognition Letters
(1997)
A. Verikas et al.
Soft combination of neural classifiers: a comparative study
Pattern Recognition Letters
(1999)
J.M. Keller et al.
Advances in fuzzy integration for pattern recognition
Fuzzy Sets and Systems
(1994)
J. Kittler et al.
On combing classifiers
IEEE Transactions on Pattern Analysis and Machine Intelligence
(1998)
K. Woods et al.
Combination of multiple classifiers using local accuracy estimates
IEEE Transactions on Pattern Analysis and Machine Intelligence
(1997)

There are more references available in the full text version of this article.

Cited by (51)

Multi-Manifold based Rotation Forest for classification
2018, Applied Soft Computing Journal
Citation Excerpt :
Finally, Section 6 draws the conclusion. According to review of contributions in the field of ELSs, there are five methods for combining and generating the classifiers: CF [4,8], SCS [5,8,24], SES [7,8], DCS [10,11,25] and DES [8,9,24]. CF, SCS, and SES are static classifiers ensembles, while DCS and DES are dynamic.
Rotation Forest (RF) is a powerful ensemble classifier which has attracted substantial attention due to its performance. The RF algorithm uses Principal Component Analysis (PCA) for constructing the rotation matrix and extracting new features. In this paper, with the aim of extracting new features, three well-known manifold learning techniques are utilized to extract new features and incorporate into PCA for feature extraction. This new RF algorithm is hereby called Multi-Manifold RF (MMRF), and several experiments are conducted in the present study in order to evaluate its performance. The obtained results reported for nineteen datasets show the high efficiency of MMRF compared to fourteen state-of-the-art ensemble methods in terms of classification accuracy and computational effort. Furthermore, two statistical non-parametric tests (Friedman and Wilcoxon) are carried out to compare the average classification accuracies of MMRF with those of the other methods The experimental results demonstrate that MMRF outperforms twelve of these methods, while there is no significant difference between MMRF and the other two powerful ensemble-based methods, namely the SES-NSGAII and the IDES-P.
Learning simultaneous adaptive clustering and classification via MOEA
2016, Pattern Recognition
Citation Excerpt :
In this way, the classification learning benefits from the clustering learning, but it cannot overcome the difficulty of overtraining. Similar strategy is also adopted in [12]. In [13], a clustering-launched classification (CLC) is proposed.
Clustering learning and classification learning are two major tasks in pattern recognition. The traditional hybrid clustering and classification algorithms handle them in a sequential way rather than a simultaneous way. Fortunately, multiobjective optimization provides a way to solve this problem. In this paper, an algorithm that learns simultaneous clustering and classification adaptively via multiobjective evolutionary algorithm is proposed. The main idea of this paper is to optimize two objective functions which represent fuzzy cluster connectedness and classification error rate to achieve the goal of simultaneous learning. Firstly, we adopt a graph based representation scheme to encode so that it can generate a set of solutions with different number of clusters in a single run. Then the relationship between clustering and classification is built via the Bayesian theory during the optimization process. The quality of clustering and classification is measured by the objective functions and the feedback drawn from both aspects is used to guide the mutation. At last, a set of nondominated solutions are generated, from which the final Pareto optimal solution is selected by using Adjusted Rand Index. The results on synthetic datasets and real-life datasets demonstrate the rationality and effectiveness of the proposed algorithm. Furthermore, we apply the proposed algorithm to image segmentation including texture images and synthetic aperture radar images, the experimental results show the superiority of the proposed algorithm compared with other five algorithms.
Ensemble classification based on supervised clustering for credit scoring
2016, Applied Soft Computing Journal
Credit scoring aims to assess the risk associated with lending to individual consumers. Recently, ensemble classification methodology has become popular in this field. However, most researches utilize random sampling to generate training subsets for constructing the base classifiers. Therefore, their diversity is not guaranteed, which may lead to a degradation of overall classification performance. In this paper, we propose an ensemble classification approach based on supervised clustering for credit scoring. In the proposed approach, supervised clustering is employed to partition the data samples of each class into a number of clusters. Clusters from different classes are then pairwise combined to form a number of training subsets. In each training subset, a specific base classifier is constructed. For a sample whose class label needs to be predicted, the outputs of these base classifiers are combined by weighted voting. The weight associated with a base classifier is determined by its classification performance in the neighborhood of the sample. In the experimental study, two benchmark credit data sets are adopted for performance evaluation, and an industrial case study is conducted. The results show that compared to other ensemble classification methods, the proposed approach is able to generate base classifiers with higher diversity and local accuracy, and improve the accuracy of credit scoring.
A new ensemble learning methodology based on hybridization of classifier ensemble selection approaches
2015, Applied Soft Computing Journal
Ensemble learning is a system that improves the performance and robustness of the classification problems. How to combine the outputs of base classifiers is one of the fundamental challenges in ensemble learning systems. In this paper, an optimized Static Ensemble Selection (SES) approach is first proposed on the basis of NSGA-II multi-objective genetic algorithm (called SES-NSGAII), which selects the best classifiers along with their combiner, by simultaneous optimization of error and diversity objectives. In the second phase, the Dynamic Ensemble Selection-Performance (DES-P) is improved by utilizing the first proposed method. The second proposed method is a hybrid methodology that exploits the abilities of both SES and DES approaches and is named Improved DES-P (IDES-P). Accordingly, combining static and dynamic ensemble strategies as well as utilizing NSGA-II are the main contributions of this research. Findings of the present study confirm that the proposed methods outperform the other ensemble approaches over 14 datasets in terms of classification accuracy. Furthermore, the experimental results are described from the view point of Pareto front with the aim of illustrating the relationship between diversity and the over-fitting problem.
Dynamic selection of the best base classifier in One versus One
2015, Knowledge-Based Systems
Citation Excerpt :
Giacinto and Roli [19] also extend Woods’s work incorporating distance weighted and classifiers confidence levels to two new methods called A Priori and A Posteriori. On the other hand, there are also other works which are not based on the K-NN method, for instance, Liu and Yuan [28] propose to use clustering: they divide the feature space into several clusters for each base classifier. The unknown sample is assigned to a cluster for each base classifier, and the classifier of the most accurate cluster is selected to classify the unknown sample.
Class binarization strategies decompose the original multi-class problem into several binary sub-problems. One versus One (OVO) is one of the most popular class binarization techniques, which considers every pair of classes as a different sub-problem. Usually, the same classifier is applied to every sub-problem and then all the outputs are combined by some voting scheme. In this paper we present a novel idea where for each test instance we try to assign the best classifier in each sub-problem of OVO. To do so, we have used two simple Dynamic Classifier Selection (DCS) strategies that have not been yet used in this context. The two DCS strategies use K-NN to obtain the local region of the test-instance, and the classifier that performs the best for those instances in the local region, is selected to classify the new test instance. The difference between the two DCS strategies remains in the weight of the instance. In this paper we have also proposed a novel approach in those DCS strategies. We propose to use the K-Nearest Neighbor Equality (K-NNE) method to obtain the local accuracy. K-NNE is an extension of K-NN in which all the classes are treated independently: the K nearest neighbors belonging to each class are selected. In this way all the classes take part in the final decision. We have carried out an empirical study over several UCI databases, which shows the robustness of our proposal.
Confidence ratio affinity propagation in ensemble selection of neural network classifiers for distributed privacy-preserving data mining
2015, Neurocomputing
We consider distributed privacy-preserving data mining in large decentralized data locations which can build several neural networks to form an ensemble. The best neural network classifiers are selected via the proposed confidence ratio affinity propagation in an asynchronous distributed and privacy-preserving computing cycle. Existing methods usually need a shared to all classifiers dataset, in order to examine the classification accuracy of each pair of classifiers. This process is neither distributed nor privacy-preserving. On the other hand in the proposed distributed privacy-preserving solution the classifiers validate each other in a local way. The training set of one classifier becomes the validation set of the other and vice versa and only partial sums of confidences for the correctly and the falsely classified examples are collected. By locally defining a confidence ratio between each pair of classifiers the well known affinity propagation algorithm finds the most representative ones. The construction is parallelizable and the cost is O(LN) for L classifiers and N examples. A-priori knowledge for the number of best classifiers is not required since in affinity propagation algorithm this number emerges automatically. Experimental simulations on benchmark datasets and comparisons with other pair-wise diversity based measures and other existing pruning methods are promising.

View all citing articles on Scopus

^☆: Supported by National Science Foundation Committee (No. 69789301).

View full text

Multiple classifiers combination by clustering and selection☆

Abstract

Introduction

Section snippets

Feature space partition and classifier selection

Data used

Experiments and analysis

Conclusions

Pattern Recognition

Pattern Recognition Letters

Pattern Recognition Letters

Fuzzy Sets and Systems

On combing classifiers

IEEE Transactions on Pattern Analysis and Machine Intelligence

Combination of multiple classifiers using local accuracy estimates

IEEE Transactions on Pattern Analysis and Machine Intelligence