Elsevier

Applied Soft Computing

Volume 28, March 2015, Pages 237-249
Applied Soft Computing

A new reverse reduce-error ensemble pruning algorithm

https://doi.org/10.1016/j.asoc.2014.10.045Get rights and content

Highlights

  • An interesting RRE pruning algorithm incorporated with the operation of subtraction is proposed in this work.

  • The WSM is chosen and its votes are subtracted from the votes made by those selected components.

  • The backfitting step of RE algorithm is replaced with the selection step of a WSB in RRE.

  • The problem of ties might be solved more naturally with RRE.

  • Soft voting approach is employed in the testing to RRE algorithm.

Abstract

Although greedy algorithms possess high efficiency, they often receive suboptimal solutions of the ensemble pruning problem, since their exploration areas are limited in large extent. And another marked defect of almost all the currently existing ensemble pruning algorithms, including greedy ones, consists in: they simply abandon all of the classifiers which fail in the competition of ensemble selection, causing a considerable waste of useful resources and information. Inspired by these observations, an interesting greedy Reverse Reduce-Error (RRE) pruning algorithm incorporated with the operation of subtraction is proposed in this work. The RRE algorithm makes the best of the defeated candidate networks in a way that, the Worst Single Model (WSM) is chosen, and then, its votes are subtracted from the votes made by those selected components within the pruned ensemble. The reason is because, for most cases, the WSM might make mistakes in its estimation for the test samples. And, different from the classical RE, the near-optimal solution is produced based on the pruned error of all the available sequential subensembles. Besides, the backfitting step of RE algorithm is replaced with the selection step of a WSM in RRE. Moreover, the problem of ties might be solved more naturally with RRE. Finally, soft voting approach is employed in the testing to RRE algorithm. The performances of RE and RRE algorithms, and two baseline methods, i.e., the method which selects the Best Single Model (BSM) in the initial ensemble, and the method which retains all member networks of the initial ensemble (ALL), are evaluated on seven benchmark classification tasks under different initial ensemble setups. The results of the empirical investigation show the superiority of RRE over the other three ensemble pruning algorithms.

Introduction

It has been extensively reported in the literature that pooling together complementary classifiers is a desirable strategy to construct robust classification systems with good generalization performance [1], [2], [3]. Remarkable improvement in generalization performance has been observed from ensemble learning in a broad scope of applications, for example: face recognition [4], optical character recognition [5], scientific image analysis [6], [7], medical diagnosis [8], [9], financial time series prediction [6], military purposes [10], intrusion detection [11] etc.

Despite its remarkable performance, a main disadvantage of ensemble learning is that, generally, it is necessary to combine a large number of classifiers to make certain that the error converges to its asymptotic value. This brings on large computational requirements, including the training costs, the storage needs and the prediction time. And moreover, a large amount of communication costs are required, when classifiers are distributed over a network. To alleviate these drawbacks, various ensemble pruning algorithms have been proposed [12], [13], [14], [15], [16], [17], [18], [19], [20].

However, the problem of selecting the subset of classifiers that has the best generalization performance, i.e. ensemble pruning, has proven to be an NP-complete problem [21], [22]. Assuming that the generalization capability can be estimated based on some quantity measured on the pruning set, selective ensemble is a combinatorial search problem whose complexity grows exponentially with the size of the initial ensemble, since for an ensemble of T base classifiers, the number of nonempty subsets is 2T  1. Therefore, finding the exact solution using enumerative algorithm is unfeasible for typical ensemble sizes. To solve this problem, it is proposed to use approximate algorithms that, with high probability, select near-optimal subensembles. In particular, genetic algorithms (GAs) [23], [24] and semidefinite programming (SDP) [25] have been employed to address the problem of ensemble pruning. Despite the fact that the computational complexities of these algorithms are not exponential in the size of the initial ensemble any more, their computational costs are still rather large [26].

Several efficient ensemble pruning algorithms, which are based on greedy selection strategy in the space of subensembles, report good classification results and desirable generalization performances [12], [13], [14], [17]. These algorithms start with an empty (or full) initial ensemble and explore the space of different subsets by iteratively broadening (or shrinking) the initial ensemble by an individual classifier. The greedy exploration is guided by an evaluation measure in terms of either the classification accuracy or the diversity of the candidate subsets, which is the main ingredient of a greedy ensemble pruning algorithm and differentiates those algorithms that fall into this category. Some evaluation measures that have been successfully employed to guide the greedy selection process include Reduce-Error (RE) pruning [14], Kappa pruning [14], Complementarity Measure (CC) [17], Margin Distance Minimization (MDSQ) [17], Orientation Ordering (OO) [18], Boosting-Based pruning (BB) [27], and Uncertainty Weighted Accuracy (UWA) [28]. A more informative and thorough literature survey about existing ensemble pruning algorithms has been given in Section 2 of this paper.

Although greedy algorithms possess relatively high efficiency, as they only consider a very small subspace within the entire solution space, they might receive suboptimal solutions of the ensemble pruning problem [12], [13], [14], [17], [21], [27]. Besides, another notable disadvantage of almost all the existing ensemble pruning algorithms, including greedy ones, consists in that they hastily discard all of the classifiers which are not selected into the pruned ensemble, causing a waste of useful resource and information. Since all the models in the initial ensemble are generated through training phase, which requires rather heavy consumptions of computing time and storage space. And moreover, in most cases, the classifiers which are not selected into the pruned ensemble, and therefore will be abandoned, constitute the majority of the component networks in the initial ensemble. From this point of view, the waste of resource and information is impressive and considerable.

Aiming at ameliorating the above mentioned drawbacks, an interesting greedy Reverse Reduce-Error (RRE) pruning algorithm incorporated with subtraction operation is proposed in this paper. Different from all the other currently existing ensemble pruning algorithms, in RRE, the classifiers not selected into the pruned ensemble are not simply and hastily abandoned. In contrast, the RRE algorithm makes the best of the defeated candidate networks in a way that, the Worst Single Model (WSM) is selected, which gets the largest pruned error within the set of defeated networks. And then, in the testing phase to the pruned ensemble, the opinion of WSM is considered and taken. In particular, the votes made by the WSM are subtracted from the votes made by those selected components within the pruned ensemble. The rationale of this step of vote subtraction lies in that, for most cases, the WSM might make a wrong predictive decision for the test sample. Therefore, it is expected that subtracting the votes made by WSM from the ones made by those selected members will increase the classification accuracy and enhance the generalization performance of the pruned ensemble furthermore.

And in the classical Reduce-Error (RE) pruning algorithm, the final selection operation is implemented according to the desired amount of pruning, where the first υ classifiers in the reordered sequence are selected [14]. While the subensemble selection operation of the proposed RRE algorithm is implemented differently, which is achieved based upon the validated error of all the available sequential subensembles on the pruning set.

In the original RE algorithm, backfitting is applied after each step of incorporation of a new component network. Backfitting sequentially attempts to replace one of the selected networks by another network not yet included in the subensemble. A replacement is made if a network that reduces the subensemble validated error is found in the pool of unselected networks. If one or more networks are replaced, then backfitting is applied repeatedly with a limit of 100 iterations. It is reported that, in bagging ensembles, when the training set is used as the pruning set, backfitting does not significantly reduce the generalization error of the pruned ensembles [18]. And another severe defect of backfitting consists in its significant increase in the computational cost of the original RE pruning algorithm [26]. In contrast, in RRE pruning algorithm, backfitting is replaced with the selection step of a WSM based on its performance on the pruning set.

In some works, e.g. [26], ties are resolved by discarding the votes of the last classifiers included in the ensemble, one at a time, until the tie is broken. In the proposed RRE algorithm, the votes made by the WSM are subtracted, which might solve the problem of ties more reasonably and naturally.

Besides, in the testing phase of RRE algorithm, soft voting technique is employed [29], viz., the voting process is implemented by the summation of the computing results of all the selected models and the subsequent subtraction of the result of the WSM, and the final classification decision is made according to the results of this soft voting.

Moreover, we have proposed a novel Ensemble Pruning method based on BackTracking algorithm (EnPBT) in our previous work [30]. In comparison with other pruning methods including greedy algorithms, the pruned ensemble achieved with EnPBT algorithm generally possesses significantly stronger classification and generalization performance. However, we find an obvious defect in EnPBT that, the definition of its solution space contains many redundant solution vectors. And naturally, the solution space tree of EnPBT also contains many redundant solution vectors. This causes a number of redundant explorations of EnPBT algorithm, and finally affects its entire searching efficiency.

Aiming at improving the above mentioned defect of EnPBT algorithm, we further proposed a Modified Backtracking Ensemble Pruning Algorithm (ModEnPBT) [30]. In contrast, its solution space is compact with no repeated solution vectors. And naturally, its solution space tree is also concise, having no redundant solution vectors. Therefore, it possesses relatively higher searching efficiency in comparison with EnPBT algorithm.

Compared with the RRE pruning algorithm proposed in this work, EnPBT and ModEnPBT algorithm aim at improving classification and generalization performance of some state-of-the-art pruning methods, including greedy algorithms, based on BackTracking algorithm [30], whereas the proposal of RRE pruning algorithm aims at ameliorating those specific drawbacks of the classical RE pruning algorithm. Consequently, their designing motivations and starting points are totally different. With regard to the comparison of their performances, the three algorithms possess similar performances. In some cases, EnPBT or ModEnPBT performs the best [30], while in other cases, RRE pruning algorithm has the optimal performances. We could make choice from these three algorithms according to the actual application problem to be addressed.

The remains of this paper are structured as follows. A thorough discussion on the existing literature on the domain of ensemble pruning is provided in Section 2. Section 3 briefly reviews the classical Reduce-Error (RE) ensemble pruning algorithm. Section 4 presents the proposed Reverse Reduce-Error (RRE) pruning algorithm. Section 5 reports the results of experimental study, and from these experimental results, the final conclusions are drawn in Section 6.

Section snippets

A literature survey

During the past decade, many effective ensemble pruning approaches have been proposed. Roughly speaking, these approaches can be classified into four categories: (1) ordering-based pruning; (2) clustering-based pruning; (3) optimization-based pruning; (4) other pruning methods [31], [32].

In the following description about the main characteristics of each category, the original ensemble is denoted as ENS{nt(x)}t=1T; and the pruning set is denoted as Pr = {(xi, yi), i = 1, 2, …, NPr}, where xi

Reduce-Error (RE) ensemble pruning algorithm

Before a detailed presentation of the Reduce-Error (RE) pruning algorithm, it is necessary to introduce some notations. Ensemble methods generate a collection of classifiers during the training phase, which are combined to produce a final decision by either weighted or simple majority voting, stacking, or some other combination approaches in the test phase. The result of combining the decisions of the classifiers in an ensemble ENS{nt(x)}t=1T using simple majority voting isHENS(x)=argmaxyt=1TI

The rationale of RRE pruning algorithm

The first stage of Reverse Reduce-Error (RRE) pruning algorithm is identical with the component nets reordering process carried out according to the classical Reduce-Error (RE) pruning algorithm [14]. The original random order of the classifiers in the initial ensemble is reordered so that the classifiers that are expected to perform better when combined are aggregated first. And in the classical RE, the final selection operation is implemented according to the desired amount of pruning, where

The task of blocks classification [45]

The dataset was donated by Donato Malerba of Dipartimento di Informatica, University of Bari, Italy. This data set has been used to try different simplification methods for decision trees. A summary of the results can be found in [46]. The problem consists in classifying all the blocks of the page layout of a document that has been detected by a segmentation process. This is an essential step in document analysis in order to separate text from graphic areas. Indeed, the five classes are: text,

Conclusion

A marked limitation of almost all the existing ensemble pruning algorithms, including greedy ones, consists in that they simply cast away all of the candidate classifiers which failed to be chosen into the pruned ensemble, causing a considerable waste of useful resource and information. As all the components in the initial ensemble are generated through training, which requires heavy consumptions of execution time and storage space.

Aiming at ameliorating this limitation, an interesting greedy

Acknowledgements

This work is supported by the National Natural Science Foundation of China under Grants no. 61473150, 61100108, and 61375021. It is also supported by the Natural Science Foundation of Jiangsu Province of China under Grant no. SBK201322136, and is supported by the “Fundamental Research Funds for the Central Universities,” no. NZ2013306, and the Qing Lan Project, no. YPB13001. We would like to express our appreciation for the valuable comments from reviewers and editors.

References (55)

  • Q. Dai

    A novel ensemble pruning algorithm based on randomized greedy selective strategy and ballot

    Neurocomputing

    (2013)
  • Q. Dai

    A competitive ensemble pruning approach based on cross-validation technique

    Know.-Based Syst.

    (2013)
  • Q. Dai

    An efficient ensemble pruning algorithm using one-path and two-trips searching approach

    Know.-Based Syst.

    (2013)
  • B.P. Roe et al.

    Boosted decision trees as an alternative to artificial neural networks for particle identification

    Nuclear Instrum. Methods Phys. Res.

    (2005)
  • Q. Dai et al.

    The build of n-bits binary coding ICBP ensemble system

    Neurocomputing

    (2011)
  • L.K. Hansen et al.

    Neural network ensembles

    IEEE Trans. Pattern Anal. Mach. Intell.

    (1990)
  • A. Krogh et al.

    Neural network ensembles, cross validation, and active learning

  • L. Kuncheva et al.

    Measures of diversity in classifier ensembles and their relationship with the ensemble accuracy

    Machine Learning

    (2003)
  • F.J. Huang et al.

    Pose Invariant Face Recognition

    (2000)
  • L.K. Hansen et al.

    Ensemble Methods for Handwritten Digit Recognition

    (1992)
  • Y. Zhao et al.

    A Survey of Neural Network Ensemble

    (2005)
  • K.J. Cherkauer

    Human Expert Level Performance on A Scientific Image Analysis Task by a System Using Combined Artificial Neural Networks

    (1996)
  • R. Caruana et al.

    Ensemble Selection from Libraries of Models

    (2004)
  • D. Margineantu et al.

    Pruning Adaptive Boosting

    (1997)
  • G. Giacinto et al.

    Design of effective multiple classifier systems by clustering of classifiers

    (2000)
  • W. Fan et al.

    Pruning and Dynamic Scheduling of Cost-sensitive Ensembles

    (2002)
  • G. Martinez-Munoz et al.

    Aggregation Ordering in Bagging

    (2004)
  • Cited by (0)

    View full text