FuzzyBagging: A novel ensemble of classifiers

doi:10.1016/j.patcog.2005.10.002

Pattern Recognition

Volume 39, Issue 3, March 2006, Pages 488-490

https://doi.org/10.1016/j.patcog.2005.10.002 Get rights and content

Abstract

In this work, a new method for the creation of classifier ensembles is introduced. The patterns are partitioned into clusters to group together similar patterns, a training set is built using the patterns that belong to a cluster. Each of the new sets is used to train a classifier. We show that the approach here presented, called FuzzyBagging, obtains performance better than Bagging.

Introduction

Multiclassifier systems are special cases of approaches that integrate several data-driven models for the same problem. A key goal is to obtain a better composite global model, with more accurate and reliable estimates. The combination methods proposed in the literature are based on “voting” rules, statistical techniques, belief functions, and other “classifier fusion” schemes. As an example, the “majority” voting rule interprets each classification result as a “vote” for one of the data classes and assigns the input pattern to the class receiving the majority of votes. Such methods assume that, for each pattern, different classifiers make different classification errors. Recently, a number of classifier combination methods, called ensemble methods, have been proposed in the field of machine learning [1]. Given a single classifier, called the base classifier, a set of classifiers can be automatically generated. In this paper, we propose a new ensemble creation method that is based on Bagging [1] and Fuzzy C-Means clustering [2]. Bagging, given a training set $S$ of size n, generates K new training sets $S_{1}; \dots; S_{K}$ , each of size n, by randomly drawing elements of the original training set, where the same element may be drawn multiple times. In this paper, the new training sets $S_{1}; \dots; S_{K}$ are generated dividing into K overlapping clusters the training set $S$ , the training set $S_{i}$ is build using the patterns that belong to ith cluster. We train a classifier for each training set $S_{i}$ , we classify each pattern of the test set using these classifiers, then we combine these classifiers with the “max-rule”.

Section snippets

System

The new training sets $S_{1}; \dots; S_{K}$ are generated partitioning into K clusters the training set $S$ , a different training set $S_{i}$ is build using the patterns that belong to $i$ th cluster. We assign to a cluster $I$ , the 63.2% of patterns of each class with higher membership to that cluster.

We have chosen this value, since in the “base” Bagging, if the probability of being drawn is equally distributed over the training set $S$ , 63.2 is the percentage of training elements contained in each modified training set

Experimental results

We conducted our experiments on 4 datasets of varying complexity, 3 coming from the UCI Repository of Machine Learning Databases [3] and the remaining (NIST4) from [4]. A summary of the characteristics of these datasets (number of attributes, number of examples, number of classes) is reported in Table 1. For all the databases but NIST4 the same testing protocol has been adopted: we averaged the results over 10 tests, each time randomly resampling training and test sets (containing half of the

Conclusions

In this work, we propose to use Fuzzy C-Means to obtain several training set and then train a classifier on each training set. We combine the output of classifiers using the “max rule”. The experimental results prove that our new approach is a successful attempt to obtain an error reduction with respect to the performance of standard Bagging. We plan in the future to evaluate the performance of our FuzzyBagging as a function of the percentage of training elements contained in each modified

Acknowledgements

This work has been supported by Italian PRIN prot. 2004098034 and by European Commission IST-2002-507634 Biosecure NoE projects.

References (5)

L. Breiman
Bagging predictors
Mach. Learn.
(1996)
J.C. Bezdek
Pattern Recognition with Fuzzy Objective Function Algorithms
(1981)

There are more references available in the full text version of this article.

Cited by (31)

Effect of ensemble classifier composition on offline cursive character recognition
2013, Information Processing and Management
Citation Excerpt :
The class chosen by most base classifiers is the final verdict of the ensemble classifier in majority voting. There are a number of variants of bagging and aggregation approaches including random forests (Breiman, 2001), ordered aggregation (Munoz, Lobato, & Suarez, 2009), and fuzzy bagging (Nanni & Lumini, 2006). In boosting (Schapire, 1990), the base classifiers are also trained on subsets of the training data that are created by re-sampling the training examples.
In this paper we present novel ensemble classifier architectures and investigate their influence for offline cursive character recognition. Cursive characters are represented by feature sets that portray different aspects of character images for recognition purposes. The recognition accuracy can be improved by training ensemble of classifiers on the feature sets. Given the feature sets and the base classifiers, we have developed multiple ensemble classifier compositions under four architectures. The first three architectures are based on the use of multiple feature sets whereas the fourth architecture is based on the use of a unique feature set. Type-1 architecture is composed of homogeneous base classifiers and Type-2 architecture is constructed using heterogeneous base classifiers. Type-3 architecture is based on hierarchical fusion of decisions. In Type-4 architecture a unique feature set is learned by a set of homogeneous base classifiers with different learning parameters. The experimental results demonstrate that the recognition accuracy achieved using the proposed ensemble classifier (with best composition of base classifiers and feature sets) is better than the existing recognition accuracies for offline cursive character recognition.
Reduced reward-punishment editing for building ensembles of classifiers
2011, Expert Systems with Applications
Citation Excerpt :
pattern perturbation: each new training set is built changing the patterns that belong to the training set. Some examples of this class are: Bagging (Breiman, 1996), the new training sets S1, … , SK are subsets of the original one; Arcing (Bologna & Appel, 2002), the patterns contained in each new training set are selected according to the probabilities calculated considering the number of times that a given training pattern is misclassified by the previous K − 1 classifiers of the multi-classifier system; Class Switching (Martı´nez-Muñoz & Suárez, 2005), the K training sets are created randomly changing the classes of a subset of the training examples; Decorate (Melville & Mooney, 2005), the K training sets are created by adding artificial patterns whose labels disagree with the current decision of the multi-classifier; Boosting (Freund & Schapire, 1997), a given weight is assigned to each training pattern, the weights are increased at each iteration for the patterns that are difficult to classify (the ith classifier is built considering the patterns that have been difficult to classify for the previous (i − 1) classifier of the ensemble); in Nanni and Lumini (2006) the new training sets S1, … , SK are obtained considering different clusterization of the training patterns. Perturbation of the features1: each new training set is built changing the feature set.
In this work a novel technique for building ensemble of classifiers is presented. The proposed approaches are based on a Reduced Reward-punishment editing approach for selecting several subsets of patterns, which are subsequently used to train different classifiers. The basic idea of the Reduced Reward-punishment editing algorithm is to reward patterns that contribute to a correct classification and to punish those that provide a wrong one.
We propose ensembles based on the perturbation of patterns; in particular we propose a bagging-based algorithm and two variants of recent feature transform based ensemble methods (Rotation Forest and Input Decimated Ensemble). In our variants the different subsets of patterns find by the Reward-punishment editing are used to create a different subspace projection (the Principal Component Analysis and the Independent Component Analysis are tested in this work). These feature transformations are applied to the whole dataset and a classifier D_i is trained using these transformed patterns. To combine the set of classifiers obtained the sum rule is used.
Experiments carried out on several classification problems show the superiority of this method with respect to other well known state-of-the-art approaches for building ensembles of classifiers.
Creating ensembles of classifiers via fuzzy clustering and deflection
2010, Fuzzy Sets and Systems
Ensembles of classifiers can increase the performance of pattern recognition, and have become a hot research topic. High classification accuracy and diversity of the component classifiers are essential to obtain good generalization capability of an ensemble. We review the methods used to learn diverse classifiers, employ fuzzy clustering with deflection to learn the distribution characteristics of the training data, and propose a novel sampling approach to generate training data sets for the component classifiers. Our approach increases the classification accuracy and diversity of the component classifiers. The approach is evaluated using the base classifier c4.5, and the experimental results show that it outperforms Bagging and AdaBoost on almost all the randomly selected 20 benchmark UCI data sets.
Input Decimated Ensemble based on Neighborhood Preserving Embedding for spectrogram classification
2009, Expert Systems with Applications
In this work a novel technique for building ensembles of classifiers for spectrogram classification is presented. We propose a simple approach for classifying signals from a large database of plant echoes, these echoes are highly complex stochastic signals, anyway their spectrograms contain enough information for extracting a good set of features for training the proposed ensemble of classifiers.
The proposed ensemble of classifiers is a novel modified version of a recent feature transform based ensemble method: the Input Decimated Ensemble. In the proposed variant different subsets of randomly extracted training patterns are used to create a set of different Neighborhood Preserving Embedding subspace projections. These feature transformations are applied to the whole dataset and a set of decision trees are trained using these transformed spaces. Finally, the scores of this set of classifiers are combined by sum rule.
Experiments carried out on a yet proposed dataset show the superiority of this method with respect to other approaches. The proposed approach outperforms the yet proposed, for the tested dataset, combination of principal component analysis and support vector machine (SVM). Moreover, we show that the fusion between the proposed ensemble and the system based on SVM outperforms both the stand-alone methods.
Ensemble generation and feature selection for the identification of students with learning disabilities
2009, Expert Systems with Applications
In this paper, we have made an extensive study of artificial intelligence (AI) techniques like ensemble of classifiers and feature selection for the identification of students with learning disabilities. The experimental results show that our best method, which combines both ensemble of classifiers and feature selection, can correctly identify up to 50% of the learning disabilities (LD) students with 100% confidence. Also when predicting samples in “junior high school” using model built on the “elementary school” students and when the “junior high school” samples are used to build the model we predict the samples in the “elementary school” dataset.
In particular, we propose variants of two recent Feature Transform-based ensemble methods (Rotation Forest and Input Decimated Ensemble). In the Rotation Forest, the feature set is randomly split into subsets and Principal Component Analysis (PCA) is used to transform the features that belong to a subset. The Input Decimated Ensemble first singles out a given class i and runs PCA on this data only. This transformation is applied to the whole dataset and a classifier D_i is trained using these transformed patterns. This choice limits the size of the ensemble to the number of classes. In this paper, we perform an empirical comparison varying the Feature Transform method used in the Rotation Forest technique and we propose a clustering method to overcome the drawback of the Input Decimated Ensemble.
Relationship between data size, accuracy, diversity and clusters in neural network ensembles
2013, International Journal of Computational Intelligence and Applications

View all citing articles on Scopus

View full text

Rapid and brief communicationFuzzyBagging: A novel ensemble of classifiers

Abstract

Introduction

Section snippets

System

Experimental results

Conclusions

Acknowledgements

Bagging predictors

Mach. Learn.

Pattern Recognition with Fuzzy Objective Function Algorithms

Rapid and brief communication
FuzzyBagging: A novel ensemble of classifiers