Convex ensemble learning with sparsity and diversity
Introduction
A variety of classifiers with different feature representations, construction architectures, learning algorithms, or training data sets usually exhibit different and complementary classification behaviors. Combination of their classification results can usually yield higher performance than the best individual classifier. Consequently, Classifier ensemble has been intensively studied for a long period [1], [2], [3], [4], [5]. In this field, many famous models have been proposed [6], [7], [8], [9], [10], [11], [12], [13], [14], [15]. At the same time, classifier ensemble methods have been widely applied in many real-world applications [16], [17]. Generally speaking, besides the accuracy of classifier components, there are two very important issues relevant to the performance of a classifier ensemble: (1) How to generate diverse classifiers; and (2) how to combine available multiple classifiers.
On the one hand, diversity learning for an ensemble is performed in two approaches, i.e. seeking implicit or explicit diversity [18]. The common way for the prior approach is to train individual classifiers on different training sets, for example Bagging [19], and Boosting [20], [21]. For the latter approach, the general way is to train multiple classifiers by using different classifier architectures or different feature sets [1], [22], [23]. On the other hand, there are also numerous strategies for combining multiple classifiers. Some famous combination methods include averaging (e.g., Bagging [19]), weighting (e.g. Boosting [21]), and non-linear combining (e.g., Stacking [24]). Given a number of available component classifiers, many researchers argue that the sparse ensemble or pruning ensemble, which ensembles of parts of available component classifiers, may be better than ensemble as a whole [10], [25], [26], [27], [28], [29].
The diversity learning and the sparsity learning1 for classifier ensemble have different purposes and algorithmic treatments. Consequently, algorithms implementing these different learning strategies are initially separate and independent. Obviously, it is more rational to combine classifiers with both sparsity and diversity. However, there have been very few researchers who focus on such techniques for ensemble learning. Chen and Yao et al. analyzed diversity and regularization in neural network ensembles for balancing diversity, regularization and accuracy of multi-objectives [12], [30]. Their methods were specifically designed for component classifier training and combination in neural network ensembles.
In this paper, for a general classifier ensemble with available numerous component classifiers, we formulate the sparsity and diversity learning problem in a general mathematical framework. In particular, derived from the error-ambiguity decomposition, we design a convex ensemble diversity measure. Consequently, accuracy loss, sparseness regularization, and diversity measure can be balanced and combined in a quadratic programming problem. We prove that the final convex optimization leads to a closed-form solution.
The main contributions of this work are summarized as follows. First, we present a general mathematical framework for learning both sparsity and diversity in classifier ensemble. Unlike conventional methods with implicit notes of sparsity or/and diversity, our approach explicitly combines and optimizes both in a unified learning model. Second, derived from the error-ambiguity decomposition, the sparsity and diversity learning can be formulated in a convex quadratic programming optimization problem. Distinct from those conventional methods with some heuristic or multi-stage algorithms [31], [32], our approach leads to a closed-form solution, which is highly convenient for real ensemble learning problems.
The rest of this paper is organized as follows. Related work is presented in Section 2. Section 3 describes the problem statement and several learning models for classifier ensemble. Section 4 demonstrates our sparsity and diversity learning algorithm with convex quadratic programming. Comparison experiments with UCI data sets and the Pascal Competition 2008 spam data are conducted in Section 5. Final remarks are presented in Section 6.
Section snippets
Related work
Classifier ensemble can be divided into two categories. The first one aims at learning multiple classifiers at the feature level, where multiple classifiers are trained and combined in the learning process, e.g., Boosting [20] and Bagging [19]. The second tries to combine classifiers at the output level, where the results of multiple available classifiers are combined to solve the targeted problem, e.g., multiple classifier systems, or mixture of experts [14]. In this paper, we focus on the
Setting and notation
In ensemble learning with a classification problem, each instance is associated with a label y. To classify one instance into K classes , assume that we have N different classifiers (hypotheses) , each using a certain feature vector for . On a processed instance , each classifier outputs discriminant measures . With all classifiers we get .
By classifier ensemble, the decisions of the component classifier are deferred and the final classification is
Convex ensemble with sparsity and diversity learning
One fundamental aspect for solving Eq. (10) or (11) is how to measure the diversity in classifier ensemble. In this section, we design a convex ensemble diversity measure, which demonstrates a very desirable property for optimization. Furthermore, we show that the final ensemble learning can elegantly be formulated as a convex quadratic programming problem, which leads to a closed-form solution.
Experiments
We perform several classification experiments with classifier ensembles under different experimental settings. First, we present a case study for accuracy, sparsity and diversity relation of our proposed ensemble method. Then, ten data sets from UCI machine learning repository are used for a 3-fold cross validation experiment. We also evaluate different methods on a challenging data set, the Pascal Large Scale Learning Challenge 2008 webspam data, with 10-fold cross validation. Moreover,
Conclusion
Classifier ensemble is widely considered as an effective technique for improving accuracy and stability of various classifier components. While most previous ensemble methods focus on diversity or sparsity only, we propose a convex mathematical framework of classifier ensemble, which takes into account both sparsity and diversity. The proposed framework can finally lead to a convex quadratic programming problem, which enjoys a nice closed-form solution. In experiments, we compare our sparsity
Acknowledgments
We would like to thank the anonymous reviewers for their constructive comments. The research is partly supported by National Basic Research Program of China (2012CB316301), National Natural Science Foundation of China (61105018 and 61175020), and R&D Special Fund for Public Welfare Industry (Meteorology) of China (GYHY201106039 and GYHY201106047).
References (64)
- et al.
Multiple classifiers combination by clustering and selection
Inf. Fusion
(2001) - et al.
Ensembling neural networks: many could be better than all?
Artif. Intell.
(2002) - et al.
A measure of competence based on random classification for dynamic ensemble selection
Inf. Fusion
(2012) A special issue on applications of ensemble methods
Inf. Fusion
(2008)- et al.
A decision-theoretic generalization of on-line learning and an application to boosting
J. Comput. Syst. Sci.
(1997) Classifier combination based on confidence transformation
Pattern Recogn.
(2005)- et al.
Feature combination using boosting
Pattern Recogn. Lett.
(2005) - et al.
Sparse ensembles using weighted combination methods based on linear programming
Pattern Recogn.
(2011) - et al.
Handwritten chinese character recognition by metasynthetic approach
Pattern Recogn.
(1997) - et al.
The combination of multiple classifiers using an evidential reasoning approach
Artificial Intell.
(2008)
Classifier selection for majority voting
Inf. Fusion
Incremental construction of classifier and discriminant ensembles
Inf. Sci.
Diversity in multiple classifier systems
Inf. Fusion
Diversity creation methods: a survey and categorisation
Inf. Fusion
Ensemble diversity measures and their application to thinning
Inf. Fusion
Diversity measures for multiple classifier system analysis and design
Inf. Fusion
Decision combination in multiple classifier systems
IEEE Trans. Pattern Anal. Mach. Intell.
On combining classifiers
IEEE Trans. Pattern Anal. Mach. Intell.
Measures of diversity in classifier ensembles and their relationship with the ensemble accuracy
Mach. Learn.
Ensemble Methods: Foundations and Algorithms
Neural network ensembles
IEEE Trans. Pattern Anal. Mach. Intell.
Boosting neural networks
Neural Comput.
Regularized negative correlation learning for neural network ensembles
IEEE Trans. Neural Networks
Multiobjective neural network ensembles based on regularized negative correlation learning
IEEE Trans. Knowl. Data Eng.
Learning ensembles of neural networks by means of a bayesian artificial immune system
IEEE Trans. Neural Networks
Twenty years of mixture of experts
IEEE Trans. Neural Networks Learn. Syst.
Applications of ensemble methods
Inf. Fusion
An analysis of diversity measures
Mach. Learn.
Bagging predictors
Mach. Learn.
The strength of weak learnability
Mach. Learn.
Cited by (49)
A Genetic Algorithm-based sequential instance selection framework for ensemble learning
2024, Expert Systems with ApplicationsSequential ensemble learning for next item recommendation
2023, Knowledge-Based SystemsA survey on epistemic (model) uncertainty in supervised learning: Recent advances and applications
2022, NeurocomputingCitation Excerpt :Moreover, there is not yet a formally accepted definition of the diversity term [101,103,96,97], so existing methods of diversity creation were designed heuristically using different definitions and the methods will be reviewed in Section 2.2.3. Based on the work presented in [108], more studies have been conducted later on by using diversity as a regularization term [97,109–114], such that the ensemble accuracy and diversity can be optimized simultaneously. In particular, Cavalcanti et al. [97] proposed to combine different pairwise measures of diversity for ensemble pruning, while the genetic algorithm is used to optimize the combined diversity to obtain several candidate ensembles that are evaluated using the validation data for selecting the final ensemble.
A non-specialized ensemble classifier using multi-objective optimization
2020, NeurocomputingCitation Excerpt :Other work achieves a similar effect by using sparse weights for classifier selection [68]. Genetic algorithms have also been used to reduce the number of base-classifiers [69], as has convex quadratic programming [70]. Taking a slightly different perspective, research has also been undertaken into keeping all the base-classifiers in the ensemble, and then dynamically selecting a subset of them depending on the test record being classified [33,41].
AEM: Attentional Ensemble Model for personalized classifier weight learning
2019, Pattern RecognitionCitation Excerpt :-MetaSVM: It is a stacking ensemble method [16], which uses the outputs of base classifiers as features to train a linear SVM model to make predictions. -ConvexDS [18]: It learns the weights of base classifiers with consideration of both sparsity and diversity. -Agnostic [19]: It is an agnostic Bayesian learning method, which learns the weights by Bayesian inference.
A competitive ensemble model for permeability prediction in heterogeneous oil and gas reservoirs
2019, Applied Computing and Geosciences