Convex ensemble learning with sparsity and diversity

doi:10.1016/j.inffus.2013.11.003

Information Fusion

Volume 20, November 2014, Pages 49-59

https://doi.org/10.1016/j.inffus.2013.11.003 Get rights and content

Abstract

Classifier ensemble has been broadly studied in two prevalent directions, i.e., to diversely generate classifier components, and to sparsely combine multiple classifiers. While most current approaches are emphasized on either sparsity or diversity only, we investigate classifier ensemble focused on both in this paper. We formulate the classifier ensemble problem with the sparsity and diversity learning in a general mathematical framework, which proves beneficial for grouping classifiers. In particular, derived from the error-ambiguity decomposition, we design a convex ensemble diversity measure. Consequently, accuracy loss, sparseness regularization, and diversity measure can be balanced and combined in a convex quadratic programming problem. We prove that the final convex optimization leads to a closed-form solution, making it very appealing for real ensemble learning problems. We compare our proposed novel method with other conventional ensemble methods such as Bagging, least squares combination, sparsity learning, and AdaBoost, extensively on a variety of UCI benchmark data sets and the Pascal Large Scale Learning Challenge 2008 webspam data. Experimental results confirm that our approach has very promising performance.

Introduction

A variety of classifiers with different feature representations, construction architectures, learning algorithms, or training data sets usually exhibit different and complementary classification behaviors. Combination of their classification results can usually yield higher performance than the best individual classifier. Consequently, Classifier ensemble has been intensively studied for a long period [1], [2], [3], [4], [5]. In this field, many famous models have been proposed [6], [7], [8], [9], [10], [11], [12], [13], [14], [15]. At the same time, classifier ensemble methods have been widely applied in many real-world applications [16], [17]. Generally speaking, besides the accuracy of classifier components, there are two very important issues relevant to the performance of a classifier ensemble: (1) How to generate diverse classifiers; and (2) how to combine available multiple classifiers.

On the one hand, diversity learning for an ensemble is performed in two approaches, i.e. seeking implicit or explicit diversity [18]. The common way for the prior approach is to train individual classifiers on different training sets, for example Bagging [19], and Boosting [20], [21]. For the latter approach, the general way is to train multiple classifiers by using different classifier architectures or different feature sets [1], [22], [23]. On the other hand, there are also numerous strategies for combining multiple classifiers. Some famous combination methods include averaging (e.g., Bagging [19]), weighting (e.g. Boosting [21]), and non-linear combining (e.g., Stacking [24]). Given a number of available component classifiers, many researchers argue that the sparse ensemble or pruning ensemble, which ensembles of parts of available component classifiers, may be better than ensemble as a whole [10], [25], [26], [27], [28], [29].

The diversity learning and the sparsity learning¹ for classifier ensemble have different purposes and algorithmic treatments. Consequently, algorithms implementing these different learning strategies are initially separate and independent. Obviously, it is more rational to combine classifiers with both sparsity and diversity. However, there have been very few researchers who focus on such techniques for ensemble learning. Chen and Yao et al. analyzed diversity and regularization in neural network ensembles for balancing diversity, regularization and accuracy of multi-objectives [12], [30]. Their methods were specifically designed for component classifier training and combination in neural network ensembles.

In this paper, for a general classifier ensemble with available numerous component classifiers, we formulate the sparsity and diversity learning problem in a general mathematical framework. In particular, derived from the error-ambiguity decomposition, we design a convex ensemble diversity measure. Consequently, accuracy loss, sparseness regularization, and diversity measure can be balanced and combined in a quadratic programming problem. We prove that the final convex optimization leads to a closed-form solution.

The main contributions of this work are summarized as follows. First, we present a general mathematical framework for learning both sparsity and diversity in classifier ensemble. Unlike conventional methods with implicit notes of sparsity or/and diversity, our approach explicitly combines and optimizes both in a unified learning model. Second, derived from the error-ambiguity decomposition, the sparsity and diversity learning can be formulated in a convex quadratic programming optimization problem. Distinct from those conventional methods with some heuristic or multi-stage algorithms [31], [32], our approach leads to a closed-form solution, which is highly convenient for real ensemble learning problems.

The rest of this paper is organized as follows. Related work is presented in Section 2. Section 3 describes the problem statement and several learning models for classifier ensemble. Section 4 demonstrates our sparsity and diversity learning algorithm with convex quadratic programming. Comparison experiments with UCI data sets and the Pascal Competition 2008 spam data are conducted in Section 5. Final remarks are presented in Section 6.

Section snippets

Related work

Classifier ensemble can be divided into two categories. The first one aims at learning multiple classifiers at the feature level, where multiple classifiers are trained and combined in the learning process, e.g., Boosting [20] and Bagging [19]. The second tries to combine classifiers at the output level, where the results of multiple available classifiers are combined to solve the targeted problem, e.g., multiple classifier systems, or mixture of experts [14]. In this paper, we focus on the

Setting and notation

In ensemble learning with a classification problem, each instance $a$ is associated with a label y. To classify one instance $a$ into K classes ${ω_{1}, \dots, ω_{K}}$ , assume that we have N different classifiers (hypotheses) ${h_{1}, \dots, h_{N}}$ , each using a certain feature vector for $a$ . On a processed instance $a$ , each classifier $h_{n}$ outputs discriminant measures $x^{n} = h_{n} (a)$ . With all classifiers we get $x = {[x^{1} \dots x^{N}]}^{T}$ .

By classifier ensemble, the decisions of the component classifier are deferred and the final classification is

Convex ensemble with sparsity and diversity learning

One fundamental aspect for solving Eq. (10) or (11) is how to measure the diversity in classifier ensemble. In this section, we design a convex ensemble diversity measure, which demonstrates a very desirable property for optimization. Furthermore, we show that the final ensemble learning can elegantly be formulated as a convex quadratic programming problem, which leads to a closed-form solution.

Experiments

We perform several classification experiments with classifier ensembles under different experimental settings. First, we present a case study for accuracy, sparsity and diversity relation of our proposed ensemble method. Then, ten data sets from UCI machine learning repository are used for a 3-fold cross validation experiment. We also evaluate different methods on a challenging data set, the Pascal Large Scale Learning Challenge 2008 webspam data, with 10-fold cross validation. Moreover,

Conclusion

Classifier ensemble is widely considered as an effective technique for improving accuracy and stability of various classifier components. While most previous ensemble methods focus on diversity or sparsity only, we propose a convex mathematical framework of classifier ensemble, which takes into account both sparsity and diversity. The proposed framework can finally lead to a convex quadratic programming problem, which enjoys a nice closed-form solution. In experiments, we compare our sparsity

Acknowledgments

We would like to thank the anonymous reviewers for their constructive comments. The research is partly supported by National Basic Research Program of China (2012CB316301), National Natural Science Foundation of China (61105018 and 61175020), and R&D Special Fund for Public Welfare Industry (Meteorology) of China (GYHY201106039 and GYHY201106047).

References (64)

R. Liu et al.
Multiple classifiers combination by clustering and selection
Inf. Fusion
(2001)
Z.-H. Zhou et al.
Ensembling neural networks: many could be better than all?
Artif. Intell.
(2002)
T. Woloszynski et al.
A measure of competence based on random classification for dynamic ensemble selection
Inf. Fusion
(2012)
B.V. Dasarathy
A special issue on applications of ensemble methods
Inf. Fusion
(2008)
Y. Freund et al.
A decision-theoretic generalization of on-line learning and an application to boosting
J. Comput. Syst. Sci.
(1997)
C.-L. Liu
Classifier combination based on confidence transformation
Pattern Recogn.
(2005)
X.-C. Yin et al.
Feature combination using boosting
Pattern Recogn. Lett.
(2005)
L. Zhang et al.
Sparse ensembles using weighted combination methods based on linear programming
Pattern Recogn.
(2011)
H.-W. Hao et al.
Handwritten chinese character recognition by metasynthetic approach
Pattern Recogn.
(1997)
Y. Bi et al.
The combination of multiple classifiers using an evidential reasoning approach
Artificial Intell.
(2008)

D. Ruta et al.

Classifier selection for majority voting

Inf. Fusion

(2005)

A. Ulas et al.

Incremental construction of classifier and discriminant ensembles

Inf. Sci.

(2009)

L. Kuncheva

Diversity in multiple classifier systems

Inf. Fusion

(2005)

G. Brown et al.

Diversity creation methods: a survey and categorisation

Inf. Fusion

(2005)

R.E. Banfield et al.

Ensemble diversity measures and their application to thinning

Inf. Fusion

(2005)

T. Windeatt

Diversity measures for multiple classifier system analysis and design

Inf. Fusion

(2005)

T. Ho et al.

Decision combination in multiple classifier systems

IEEE Trans. Pattern Anal. Mach. Intell.

(1994)

J. Kittler et al.

On combining classifiers

IEEE Trans. Pattern Anal. Mach. Intell.

(1998)

T. Dietterich, Ensemble methods in machine learning, in: Proceedings of International Workshop on Multiple Classifier...

L. Kuncheva et al.

Measures of diversity in classifier ensembles and their relationship with the ensemble accuracy

Mach. Learn.

(2003)

Z.-H. Zhou

Ensemble Methods: Foundations and Algorithms

(2012)

L.K. Hansen et al.

Neural network ensembles

IEEE Trans. Pattern Anal. Mach. Intell.

(1990)

A. Krogh, J. Vedelsby, Neural network ensembles, cross validation, and active learning, in: Advances in Neural...

H. Schwenk et al.

Boosting neural networks

Neural Comput.

(2000)

H. Chen et al.

Regularized negative correlation learning for neural network ensembles

IEEE Trans. Neural Networks

(2009)

H. Chen et al.

Multiobjective neural network ensembles based on regularized negative correlation learning

IEEE Trans. Knowl. Data Eng.

(2009)

P. Castro et al.

Learning ensembles of neural networks by means of a bayesian artificial immune system

IEEE Trans. Neural Networks

(2011)

S.E. Yuksel et al.

Twenty years of mixture of experts

IEEE Trans. Neural Networks Learn. Syst.

(2012)

C. Oza et al.

Applications of ensemble methods

Inf. Fusion

(2008)

E. Tang et al.

An analysis of diversity measures

Mach. Learn.

(2006)

L. Breiman

Bagging predictors

Mach. Learn.

(1996)

R. Schapire

The strength of weak learnability

Mach. Learn.

(1990)

Cited by (49)

A Genetic Algorithm-based sequential instance selection framework for ensemble learning
2024, Expert Systems with Applications
The accumulation of large amounts of historical data has led to the wide application of ensemble learning over the past few decades, but the balance between the individual accuracy of base classifiers (BCs) and the diversity among these BCs is rarely considered in the construction of ensemble models. Since such a balance is crucial to the success of ensemble models, this paper proposes a Genetic Algorithm-based sequential instance selection framework to address this research gap. The novelties of the proposed framework include: transforming the balance between the individual accuracy of BCs and the diversity among BCs into a general combinatorial optimization model and designing a Genetic Algorithm-based evolutionary instance selection method to solve this model. The proposed framework not only overcomes the inherent limitations of the Genetic Algorithm in some high-dimensional tasks but also provides an explicit and automatic way to balance the accuracy and diversity by searching appropriate training data subsets for different component BCs. Based on obtained training data subsets, the component BCs of the ensemble model are generated sequentially, and their predictions are further combined with the weighted majority voting rule. Using 30 real datasets collected from various practical applications, such as medicine, business, and industry, the effectiveness of the proposed framework in constructing powerful ensemble models is examined and compared with six benchmark ensemble learning methods. In addition, the capability of the proposed framework to improve convergence performances is also examined by the comparison with the traditional Genetic Algorithm.
Sequential ensemble learning for next item recommendation
2023, Knowledge-Based Systems
Predicting the next item that users may engage in is a key task of recommender systems, and many methods have been proposed to deal with this task from different aspects. Theoretically, the proper ensemble of multiple different methods (a.k.a. base models) can make more accurate and stable recommendations. However, most of the existing ensemble methods rely on static aggregation strategies, which fail to capture base models’ dynamic predictive abilities for each user over time. In addition, most of the existing diversity measures used in regression or classification ensemble methods rely on a distance metric of base models’ outputs, which makes it intractably apply for next-item recommendation whose base models output sequential ranking lists.
To solve the above problems, we propose a Sequential Ensemble Method, named SEM, to aggregate different base models for next-item recommendation. We assume users’ concentration and base models’ expertise can be inferred from users’ sequential behaviors and base models’ prediction results. Therefore, we propose to explicitly model base models’ dynamic predictive abilities on different users over time based on users’ concentration and base models’ expertise. In addition, we propose a new diversity measure for sequential ranking ensemble, which can perform diversity-based learning over time for better performance of next-item recommendation. Extensive experiments on six real-world data sets show that our method consistently outperforms state-of-the-art methods.
A survey on epistemic (model) uncertainty in supervised learning: Recent advances and applications
2022, Neurocomputing
Citation Excerpt :
Moreover, there is not yet a formally accepted definition of the diversity term [101,103,96,97], so existing methods of diversity creation were designed heuristically using different definitions and the methods will be reviewed in Section 2.2.3. Based on the work presented in [108], more studies have been conducted later on by using diversity as a regularization term [97,109–114], such that the ensemble accuracy and diversity can be optimized simultaneously. In particular, Cavalcanti et al. [97] proposed to combine different pairwise measures of diversity for ensemble pruning, while the genetic algorithm is used to optimize the combined diversity to obtain several candidate ensembles that are evaluated using the validation data for selecting the final ensemble.
Quantifying the uncertainty of supervised learning models plays an important role in making more reliable predictions. Epistemic uncertainty, which usually is due to insufficient knowledge about the model, can be reduced by collecting more data or refining the learning models. Over the last few years, scholars have proposed many epistemic uncertainty handling techniques which can be roughly grouped into two categories, i.e., Bayesian and ensemble. This paper provides a comprehensive review of epistemic uncertainty learning techniques in supervised learning over the last five years. As such, we, first, decompose the epistemic uncertainty into bias and variance terms. Then, a hierarchical categorization of epistemic uncertainty learning techniques along with their representative models is introduced. In addition, several applications such as computer vision (CV) and natural language processing (NLP) are presented, followed by a discussion on research gaps and possible future research directions.
A non-specialized ensemble classifier using multi-objective optimization
2020, Neurocomputing
Citation Excerpt :
Other work achieves a similar effect by using sparse weights for classifier selection [68]. Genetic algorithms have also been used to reduce the number of base-classifiers [69], as has convex quadratic programming [70]. Taking a slightly different perspective, research has also been undertaken into keeping all the base-classifiers in the ensemble, and then dynamically selecting a subset of them depending on the test record being classified [33,41].
Ensemble classification algorithms are often designed for data with certain properties, such as imbalanced class labels, a large number of attributes, or continuous data. While high-performing, these algorithms sacrifice performance when applied to data outside the targeted domain. We propose a non-specific ensemble classification algorithm that uses multi-objective optimization instead of relying on heuristics and fragile user-defined parameters. Only two user-defined parameters are included, with both being found to have large windows of values that produce statistically indistinguishable results, indicating the low level of expertise required from the user to achieve good results. Additionally, when given a large initial set of trained base-classifiers, we demonstrate that a multi-objective genetic algorithm aiming to optimize prediction accuracy and diversity will prefer particular types of classifiers over others. The total number of chosen classifiers is also surprisingly small – only 10.14 classifiers on average, out of an initial pool of 900. This occurs without any explicit preference for small ensembles of classifiers. Even with these small ensembles, significantly lower empirical classification error is achieved compared to the current state-of-the-art.
AEM: Attentional Ensemble Model for personalized classifier weight learning
2019, Pattern Recognition
Citation Excerpt :
-MetaSVM: It is a stacking ensemble method [16], which uses the outputs of base classifiers as features to train a linear SVM model to make predictions. -ConvexDS [18]: It learns the weights of base classifiers with consideration of both sparsity and diversity. -Agnostic [19]: It is an agnostic Bayesian learning method, which learns the weights by Bayesian inference.
Proper combination of multiple classifiers can achieve better performance than any individual one. As one of the most popular combination methods, weighted combination assigns the same or different but invariant weights to base classifiers without consideration of the specificity of different instances. However, each classifier usually has different predictive ability on different instances. To deal with this problem, we propose a combination method named attentional ensemble model (AEM), which learns personalized classifier weights w.r.t different instances via an attentional mechanism. Specifically, we first embed classifiers and instances into a latent space. Then the latent representations are used to profile the predictive ability of classifiers w.r.t different instances. In addition, we design an explicit diversity measure in the latent space, based on which we can enhance the diversity among base classifiers to further improve the performance of combination. Extensive experiments on 20 public data sets show the effectiveness of the proposed method compared with state-of-the-art methods.
A competitive ensemble model for permeability prediction in heterogeneous oil and gas reservoirs
2019, Applied Computing and Geosciences
One important property of oil and gas reservoirs is permeability, which has proven to be difficult to predict. Empirical and regression models are the current industrial practice for predicting permeability due to high cost and time consumption associated with laboratory measurement. In recent times, machine learning algorithms have been employed for the prediction of permeability due to their better predictive capability. In this study, a novel competitive ensemble machine learning model is introduced for predicting permeability in heterogeneous oil and gas reservoirs. In this approach, the well log data were partitioned into a number of regions using a clustering algorithm and each region was modeled by a machine learning algorithm. Contrary to previous studies that combine the outputs of the models from each region, we propose two selection mechanisms: nearest cluster center and K-nearest neighbor, which select the best representative ensemble member for a data point to be predicted. The novelties of the proposed ensemble are the simplicity of the selection mechanism employed as well as the elimination of functions and/or training that are normally required in finding the weight of models forming the ensemble. Several homogeneous and heterogeneous combinations of ensemble members were developed and investigated to compare their performances. The results showed that the competitive ensemble models improve the prediction accuracy of the permeability when compared to the single models. Competitive heterogeneous ensembles show better performance when compared with their homogeneous counterparts. In addition, the proposed competitive ensemble outperform three other existing ensemble approaches. These experimental results showed the potentials of the developed competitive ensemble for better prediction of permeability in heterogeneous reservoirs.

View all citing articles on Scopus

View full text

Convex ensemble learning with sparsity and diversity

Abstract

Introduction

Section snippets

Related work

Setting and notation

Convex ensemble with sparsity and diversity learning

Experiments

Conclusion

Acknowledgments

Inf. Fusion

Artif. Intell.

Inf. Fusion

Inf. Fusion

J. Comput. Syst. Sci.

Pattern Recogn.

Pattern Recogn. Lett.

Pattern Recogn.

Pattern Recogn.

Artificial Intell.

Inf. Fusion

Inf. Sci.

Inf. Fusion

Inf. Fusion

Inf. Fusion

Inf. Fusion

Decision combination in multiple classifier systems

IEEE Trans. Pattern Anal. Mach. Intell.

On combining classifiers

IEEE Trans. Pattern Anal. Mach. Intell.

Measures of diversity in classifier ensembles and their relationship with the ensemble accuracy

Mach. Learn.

Ensemble Methods: Foundations and Algorithms

Neural network ensembles

IEEE Trans. Pattern Anal. Mach. Intell.

Boosting neural networks

Neural Comput.

Regularized negative correlation learning for neural network ensembles

IEEE Trans. Neural Networks

Multiobjective neural network ensembles based on regularized negative correlation learning

IEEE Trans. Knowl. Data Eng.

Learning ensembles of neural networks by means of a bayesian artificial immune system

IEEE Trans. Neural Networks

Twenty years of mixture of experts

IEEE Trans. Neural Networks Learn. Syst.

Applications of ensemble methods

Inf. Fusion

An analysis of diversity measures

Mach. Learn.

Bagging predictors

Mach. Learn.

The strength of weak learnability

Mach. Learn.