Building forests of local trees
Introduction
The underlying idea that motivates the adoption of classifier ensembles is that, under proper conditions, a pool of classifiers can achieve better performance than a single one. The foundations of classifier ensembles are rooted in several proposals. For essential information on this matter, the interested reader may consult, for instance, [9], [18], [25], [41], [45].
Some conceptually simple guidelines must be followed to obtain good performance from a classifier ensemble, including the adoption of a suitable output combination policy and a training strategy or technique able to enforce diversity among its components. As for output combination policies, they have been investigated over the years –see, for instance [7], [34], [47]. By far, the most common are majority / plurality voting and simple averaging. Voting policies apply to ensembles whose base classifiers output a label, whereas averaging is recommended when base classifiers output a numeric value. Notably, weighted versions for the cited policies are also common. As for diversity, this desired property can be obtained focusing on different aspects (not necessarily mutually exclusive). In particular, let us recall dataset manipulation [9], [11], [19], [23], feature manipulation [8], [12], [15], [26], [36], [44], randomness injection on relevant parameters of the learning algorithm [17], [29], methods based on the architectural heterogeneity of base classifiers [22], [27], [46], and specific strategies adopted during ensemble creation or thinning [4], [16], [24]. For more information on diversity analysis and enforcement, the interested reader may also consult [14], [31].
Looking outside the limits constrained by voting policies and diversity, it is worth mentioning that great efforts (both theoretical and experimental) have been made to quantify the gain in terms of performance of an ensemble with respect to its embedded classifiers, taken individually. In particular, it has been shown that an ensemble is expected to have an error rate lower than the one provided by its best-performing component under the hypothesis that embedded classifiers have low bias and high pairwise diversity [41].
In the proposed algorithm, the diversity between the components of the ensemble is increased by forcing them to become experts on different regions of the sample space. One may note that the error rate on the whole feature space should increase while limiting the scope of a classifier. However, this drawback is only apparent, provided that a proper output combination policy is adopted. Dynamically assigning a degree of expertise to each embedded classifier, depending on the distance between the sample at hand and the centroid of the region over which the classifier has been trained, has shown to be a viable policy.
The rest of the paper is organized as follows. In the next section, some relevant ensemble methods proposed in the literature are briefly recalled. Section 3 provides a gentle introduction to the proposed algorithm by means of the well known committee-based metaphor, whereas Section 4 introduces the proposal with relevant technical details. Section 5 provides experimental results and comparisons with other relevant kinds of ensemble classifiers, showing the effectiveness of the proposed approach. In Section 6, the proposed approach is discussed with particular reference to the similarities with other approaches. Its strengths and weaknesses are also briefly commented. Section 7 ends the manuscript.
Section snippets
Relevant work on ensemble methods
In 1996, Breiman [9] proposed bootstrap aggregating (Bagging, for short). The Bagging strategy consists of simulating the presence of multiple individual training sets (one for each classifier of the ensemble), obtained by bootstrapping the available dataset. In particular, each individual training set is generated by randomly selecting N samples, with replacement. Characterized by low bias and high variance, decision trees (DT) are the default base classifier used for Bagging. As for output
The expert committee metaphor and its links to our proposal
To facilitate the understanding of our proposal, let us first describe it by means of the well known expert committee metaphor. Let us consider a group of people, each with moderate performance levels, whose decisions on any given issue are combined to produce a collective outcome. According to the well known Condorcet jury theorem [32], if individual decisions are independent and with a probability of being correct > 0.5, then an increase in the collective performance, as a group, is
Forests of local trees
Laying off the metaphor, let us better illustrate our proposal by focusing on its training and on the way it classifies unknown instances. Let us preliminarily note that the proposed ensemble strategy has been named “forest of local trees” (FLT, hereinafter). The reason for this designation jointly depends on two aspects, i.e., the injection of local knowledge during the training phase, on the one hand, and the design choice of using RDT as base learners, on the other hand. Each component of an
Experiments and results
FLT has been tested on two test beds: i) on 36 benchmark datasets, whose list is reported in Table 2, and ii) on 10 medium-size datasets, whose list is reported in Table 3. They will be referenced as and hereinafter. Almost all datasets have been downloaded from the ML repository at UCI.2 Experiments have been run using scikit-learn [35], a
Discussion
In this section, the proposed approach is discussed, emphasizing its relations with other relevant approaches. Moreover, its strengths and weaknesses are briefly summarized.
As for the links to other approaches, FLT takes credits mainly from RF, from an architectural perspective the ensemble being in fact an RF. However, similarities with MRPE and boosting can also be found. Similarities with MRPE occur as, by construction, each RDT has an expertise that gradually decreases with the distance of
Conclusions and future work
In this paper, a novel method for training ensemble classifiers, called FLT, has been proposed, inspired by the concept of building ensemble classifiers whose components have local expertise on the given domain. FLT has demonstrated its effectiveness on a wide range of standard classification domains. It has been shown that the accuracy of FLT is almost always higher than the one obtained by AdaBoost, and often better than the one obtained by Bagging and RF.
As for future work, several options
Acknowledgments
This work has been supported by LR7 (grant number: 8029-1122) 2009 (Investment Funds for Basic Research) and by PIA (grant number: 1492-118) 2010 (Integrated Subsidized Packages), both funded by the local government of Sardinia.
Giuliano Armano is associate professor of computer engineering at the University of Cagliari, Italy. His research interests are on classifier ensembles (in particular mixtures of experts), hierarchical classification and performance measures for classifier systems. These topics have been experimented on various application fields, including bioinformatics, information retrieval and text categorization.
References (47)
- et al.
Diversity creation methods: a survey and categorisation
Inf. Fusion
(2005) - et al.
Attribute bagging: improving accuracy of classifier ensembles by using random feature subsets
Pattern Recognit.
(2003) - et al.
A decision-theoretic generalization of on-line learning and an application to boosting
J. Comput. Syst. Sci.
(1997) Random subspace based ensemble sparse representation
Pattern Recognit.
(2018)- et al.
Learn++.MF: a random subspace approach for the missing feature problem
Pattern Recognit.
(2010) - et al.
Diversity in search strategies for ensemble feature selection
Inf. Fusion
(2005) - et al.
A survey of multiple classifier systems as hybrid systems
Inf. Fusion
(2014) NXCS experts for financial time series forecasting
- et al.
Mixture of random prototype-based local experts
- et al.
Random prototype-based oracle for selection-fusion ensembles
Proc. of the 20th Int. Conference on Pattern Recognition (ICPR’10)
(2010)
A comparison of decision tree ensemble creation techniques
IEEE Trans. Pattern Anal. Mach. Intell.
Analysis of a random forests model
J. Mach. Learn. Res.
Consistency of random forests and other averaging classifiers
J. Mach. Learn. Res.
New ensemble methods for evolving data streams
Proc. of the 15th ACM SIGKDD Int. Conference on Knowledge Discovery and Data Mining
Random rotation ensembles
J. Mach. Learn. Res.
Bagging predictors
Mach. Learn.
Arcing classifier (with discussion and a rejoinder by the author)
Ann. Stat.
Randomizing outputs to increase prediction accuracy
Mach. Learn.
Random forests
Mach. Learn.
Classification and Regression Trees
Dynamic selection of classifiers - a comprehensive review
Pattern Recognit.
An experimental comparison of three methods for constructing ensembles of decision trees: bagging, boosting, and randomization
Mach. Learn.
Ensemble methods in machine learning
Proc. of the First Int. Workshop on Multiple Classifier Systems (MCS’00)
Cited by (17)
OLP++: An online local classifier for high dimensional data
2023, Information FusionCitation Excerpt :Another distinction can be done in terms of when the local information is taken into account, which can happen in each one (or several) of the MCS phases. Ensembles based on the divide and conquer principle tend to integrate the region definition in the generation phase [8,21,23], so that each member of the pool can specialize over a part of the feature space, therefore encouraging ensemble diversity through localization. They also usually apply a dynamic selection or aggregation rule in order to assign a higher importance to the base-classifiers that were encouraged to learn over a given area.
OIS-RF: A novel overlap and imbalance sensitive random forest
2021, Engineering Applications of Artificial IntelligenceHeterogeneous oblique random forest
2020, Pattern RecognitionCitation Excerpt :From extensive applications of random forests in many diverse domains, it is evident that the recursive partitioning of the training data while optimizing some impurity criterion aid random forest to generalize better and by the virtue of subspace and ensemble method, RaF is able to achieve state-of-the-art performances. As reported in [26], employing linear classifiers in each internal node can lead to both “accurate” and “diverse” decision trees which is of vital importance for the success of random forests [1,27]. This is further verified by the large-scale benchmarking with state-of-the-art classifiers in [16,17].
Online local pool generation for dynamic classifier selection
2019, Pattern RecognitionCitation Excerpt :Although the learning algorithm provides a local perspective on the classification problem, its concept was not used in the context of producing a pool of locally accurate classifiers for DS techniques. Other related works, such as the Mixture of Random Prototype-based Local Experts [11] and the Forest of Local Trees [12] techniques, explore the divide-to-conquer approach of MCS by locally training their base classifiers in different regions of the feature space and weighting the classifiers’ votes based on the distance between the query sample and their assigned region. As opposed to these works, in which the pool generation is paired to a selection based on dynamic distance weighting, our approach consists of producing on the fly a locally accurate pool to be coupled with a DCS technique.
Vote-boosting ensembles
2018, Pattern RecognitionCitation Excerpt :In this method, complementarity is favored by simultaneously training all the classifiers in the ensemble: The parameters of the individual classifiers and the weights of the combination of their outputs are determined globally by minimizing a cost function that penalizes coincident predictions. One can also build ensembles of base learners that are trained to focus on different regions in feature space [5]. Boosting is another ensemble method in which complementarity among the classifiers is explicitly favored.
Giuliano Armano is associate professor of computer engineering at the University of Cagliari, Italy. His research interests are on classifier ensembles (in particular mixtures of experts), hierarchical classification and performance measures for classifier systems. These topics have been experimented on various application fields, including bioinformatics, information retrieval and text categorization.
Emanuele Tamponi obtained his Ph.D. in computer engineering at the University of Cagliari, Italy (2015). His research interests are on data analysis, data complexity, ensemble methods, hierarchical classification and classifier complexity measures. Part of his Ph.D. was devoted to investigate the possibility of improving the performance of random forests.