Elsevier

Information Fusion

Volume 20, November 2014, Pages 49-59
Information Fusion

Convex ensemble learning with sparsity and diversity

https://doi.org/10.1016/j.inffus.2013.11.003Get rights and content

Abstract

Classifier ensemble has been broadly studied in two prevalent directions, i.e., to diversely generate classifier components, and to sparsely combine multiple classifiers. While most current approaches are emphasized on either sparsity or diversity only, we investigate classifier ensemble focused on both in this paper. We formulate the classifier ensemble problem with the sparsity and diversity learning in a general mathematical framework, which proves beneficial for grouping classifiers. In particular, derived from the error-ambiguity decomposition, we design a convex ensemble diversity measure. Consequently, accuracy loss, sparseness regularization, and diversity measure can be balanced and combined in a convex quadratic programming problem. We prove that the final convex optimization leads to a closed-form solution, making it very appealing for real ensemble learning problems. We compare our proposed novel method with other conventional ensemble methods such as Bagging, least squares combination, sparsity learning, and AdaBoost, extensively on a variety of UCI benchmark data sets and the Pascal Large Scale Learning Challenge 2008 webspam data. Experimental results confirm that our approach has very promising performance.

Introduction

A variety of classifiers with different feature representations, construction architectures, learning algorithms, or training data sets usually exhibit different and complementary classification behaviors. Combination of their classification results can usually yield higher performance than the best individual classifier. Consequently, Classifier ensemble has been intensively studied for a long period [1], [2], [3], [4], [5]. In this field, many famous models have been proposed [6], [7], [8], [9], [10], [11], [12], [13], [14], [15]. At the same time, classifier ensemble methods have been widely applied in many real-world applications [16], [17]. Generally speaking, besides the accuracy of classifier components, there are two very important issues relevant to the performance of a classifier ensemble: (1) How to generate diverse classifiers; and (2) how to combine available multiple classifiers.

On the one hand, diversity learning for an ensemble is performed in two approaches, i.e. seeking implicit or explicit diversity [18]. The common way for the prior approach is to train individual classifiers on different training sets, for example Bagging [19], and Boosting [20], [21]. For the latter approach, the general way is to train multiple classifiers by using different classifier architectures or different feature sets [1], [22], [23]. On the other hand, there are also numerous strategies for combining multiple classifiers. Some famous combination methods include averaging (e.g., Bagging [19]), weighting (e.g. Boosting [21]), and non-linear combining (e.g., Stacking [24]). Given a number of available component classifiers, many researchers argue that the sparse ensemble or pruning ensemble, which ensembles of parts of available component classifiers, may be better than ensemble as a whole [10], [25], [26], [27], [28], [29].

The diversity learning and the sparsity learning1 for classifier ensemble have different purposes and algorithmic treatments. Consequently, algorithms implementing these different learning strategies are initially separate and independent. Obviously, it is more rational to combine classifiers with both sparsity and diversity. However, there have been very few researchers who focus on such techniques for ensemble learning. Chen and Yao et al. analyzed diversity and regularization in neural network ensembles for balancing diversity, regularization and accuracy of multi-objectives [12], [30]. Their methods were specifically designed for component classifier training and combination in neural network ensembles.

In this paper, for a general classifier ensemble with available numerous component classifiers, we formulate the sparsity and diversity learning problem in a general mathematical framework. In particular, derived from the error-ambiguity decomposition, we design a convex ensemble diversity measure. Consequently, accuracy loss, sparseness regularization, and diversity measure can be balanced and combined in a quadratic programming problem. We prove that the final convex optimization leads to a closed-form solution.

The main contributions of this work are summarized as follows. First, we present a general mathematical framework for learning both sparsity and diversity in classifier ensemble. Unlike conventional methods with implicit notes of sparsity or/and diversity, our approach explicitly combines and optimizes both in a unified learning model. Second, derived from the error-ambiguity decomposition, the sparsity and diversity learning can be formulated in a convex quadratic programming optimization problem. Distinct from those conventional methods with some heuristic or multi-stage algorithms [31], [32], our approach leads to a closed-form solution, which is highly convenient for real ensemble learning problems.

The rest of this paper is organized as follows. Related work is presented in Section 2. Section 3 describes the problem statement and several learning models for classifier ensemble. Section 4 demonstrates our sparsity and diversity learning algorithm with convex quadratic programming. Comparison experiments with UCI data sets and the Pascal Competition 2008 spam data are conducted in Section 5. Final remarks are presented in Section 6.

Section snippets

Related work

Classifier ensemble can be divided into two categories. The first one aims at learning multiple classifiers at the feature level, where multiple classifiers are trained and combined in the learning process, e.g., Boosting [20] and Bagging [19]. The second tries to combine classifiers at the output level, where the results of multiple available classifiers are combined to solve the targeted problem, e.g., multiple classifier systems, or mixture of experts [14]. In this paper, we focus on the

Setting and notation

In ensemble learning with a classification problem, each instance a is associated with a label y. To classify one instance a into K classes {ω1,,ωK}, assume that we have N different classifiers (hypotheses) {h1,,hN}, each using a certain feature vector for a. On a processed instance a, each classifier hn outputs discriminant measures xn=hn(a). With all classifiers we get x=[x1xN]T.

By classifier ensemble, the decisions of the component classifier are deferred and the final classification is

Convex ensemble with sparsity and diversity learning

One fundamental aspect for solving Eq. (10) or (11) is how to measure the diversity in classifier ensemble. In this section, we design a convex ensemble diversity measure, which demonstrates a very desirable property for optimization. Furthermore, we show that the final ensemble learning can elegantly be formulated as a convex quadratic programming problem, which leads to a closed-form solution.

Experiments

We perform several classification experiments with classifier ensembles under different experimental settings. First, we present a case study for accuracy, sparsity and diversity relation of our proposed ensemble method. Then, ten data sets from UCI machine learning repository are used for a 3-fold cross validation experiment. We also evaluate different methods on a challenging data set, the Pascal Large Scale Learning Challenge 2008 webspam data, with 10-fold cross validation. Moreover,

Conclusion

Classifier ensemble is widely considered as an effective technique for improving accuracy and stability of various classifier components. While most previous ensemble methods focus on diversity or sparsity only, we propose a convex mathematical framework of classifier ensemble, which takes into account both sparsity and diversity. The proposed framework can finally lead to a convex quadratic programming problem, which enjoys a nice closed-form solution. In experiments, we compare our sparsity

Acknowledgments

We would like to thank the anonymous reviewers for their constructive comments. The research is partly supported by National Basic Research Program of China (2012CB316301), National Natural Science Foundation of China (61105018 and 61175020), and R&D Special Fund for Public Welfare Industry (Meteorology) of China (GYHY201106039 and GYHY201106047).

References (64)

  • D. Ruta et al.

    Classifier selection for majority voting

    Inf. Fusion

    (2005)
  • A. Ulas et al.

    Incremental construction of classifier and discriminant ensembles

    Inf. Sci.

    (2009)
  • L. Kuncheva

    Diversity in multiple classifier systems

    Inf. Fusion

    (2005)
  • G. Brown et al.

    Diversity creation methods: a survey and categorisation

    Inf. Fusion

    (2005)
  • R.E. Banfield et al.

    Ensemble diversity measures and their application to thinning

    Inf. Fusion

    (2005)
  • T. Windeatt

    Diversity measures for multiple classifier system analysis and design

    Inf. Fusion

    (2005)
  • T. Ho et al.

    Decision combination in multiple classifier systems

    IEEE Trans. Pattern Anal. Mach. Intell.

    (1994)
  • J. Kittler et al.

    On combining classifiers

    IEEE Trans. Pattern Anal. Mach. Intell.

    (1998)
  • T. Dietterich, Ensemble methods in machine learning, in: Proceedings of International Workshop on Multiple Classifier...
  • L. Kuncheva et al.

    Measures of diversity in classifier ensembles and their relationship with the ensemble accuracy

    Mach. Learn.

    (2003)
  • Z.-H. Zhou

    Ensemble Methods: Foundations and Algorithms

    (2012)
  • L.K. Hansen et al.

    Neural network ensembles

    IEEE Trans. Pattern Anal. Mach. Intell.

    (1990)
  • A. Krogh, J. Vedelsby, Neural network ensembles, cross validation, and active learning, in: Advances in Neural...
  • H. Schwenk et al.

    Boosting neural networks

    Neural Comput.

    (2000)
  • H. Chen et al.

    Regularized negative correlation learning for neural network ensembles

    IEEE Trans. Neural Networks

    (2009)
  • H. Chen et al.

    Multiobjective neural network ensembles based on regularized negative correlation learning

    IEEE Trans. Knowl. Data Eng.

    (2009)
  • P. Castro et al.

    Learning ensembles of neural networks by means of a bayesian artificial immune system

    IEEE Trans. Neural Networks

    (2011)
  • S.E. Yuksel et al.

    Twenty years of mixture of experts

    IEEE Trans. Neural Networks Learn. Syst.

    (2012)
  • C. Oza et al.

    Applications of ensemble methods

    Inf. Fusion

    (2008)
  • E. Tang et al.

    An analysis of diversity measures

    Mach. Learn.

    (2006)
  • L. Breiman

    Bagging predictors

    Mach. Learn.

    (1996)
  • R. Schapire

    The strength of weak learnability

    Mach. Learn.

    (1990)
  • Cited by (49)

    • A survey on epistemic (model) uncertainty in supervised learning: Recent advances and applications

      2022, Neurocomputing
      Citation Excerpt :

      Moreover, there is not yet a formally accepted definition of the diversity term [101,103,96,97], so existing methods of diversity creation were designed heuristically using different definitions and the methods will be reviewed in Section 2.2.3. Based on the work presented in [108], more studies have been conducted later on by using diversity as a regularization term [97,109–114], such that the ensemble accuracy and diversity can be optimized simultaneously. In particular, Cavalcanti et al. [97] proposed to combine different pairwise measures of diversity for ensemble pruning, while the genetic algorithm is used to optimize the combined diversity to obtain several candidate ensembles that are evaluated using the validation data for selecting the final ensemble.

    • A non-specialized ensemble classifier using multi-objective optimization

      2020, Neurocomputing
      Citation Excerpt :

      Other work achieves a similar effect by using sparse weights for classifier selection [68]. Genetic algorithms have also been used to reduce the number of base-classifiers [69], as has convex quadratic programming [70]. Taking a slightly different perspective, research has also been undertaken into keeping all the base-classifiers in the ensemble, and then dynamically selecting a subset of them depending on the test record being classified [33,41].

    • AEM: Attentional Ensemble Model for personalized classifier weight learning

      2019, Pattern Recognition
      Citation Excerpt :

      -MetaSVM: It is a stacking ensemble method [16], which uses the outputs of base classifiers as features to train a linear SVM model to make predictions. -ConvexDS [18]: It learns the weights of base classifiers with consideration of both sparsity and diversity. -Agnostic [19]: It is an agnostic Bayesian learning method, which learns the weights by Bayesian inference.

    View all citing articles on Scopus
    View full text