Elsevier

Pattern Recognition

Volume 76, April 2018, Pages 380-390
Pattern Recognition

Building forests of local trees

https://doi.org/10.1016/j.patcog.2017.11.017Get rights and content

Highlights

  • A novel approach in the field of classifier ensembles is proposed.

  • The approach uses an ensemble of random decision trees.

  • Each decision tree is trained on a different area of the input space.

  • Areas can overlap and a good coverage of the input space is ensured.

  • Experimental results confirm the validity of the approach.

Abstract

Ensemble methods have shown to be more effective than monolithic classifiers, in particular when diversity holds among their components. How to enforce diversity in classifier ensembles has received much attention from machine learning researchers, yielding a variety of different techniques and algorithms. In this paper, a novel algorithm for ensemble classifiers is proposed, in which ensemble components are trained with focus on different regions of the sample space. In so doing, diversity is mainly a consequence of the intention to limit the scope of base classifiers. The algorithm proposed in this paper shares roots with several ensemble paradigms, in particular with random forests, as it generates forests of decision trees as well. As decision trees are trained with focus on specific subsets of the sample space, the resulting ensemble is in fact a forest of “local” trees. Comparative experimental results highlight that, on average, these ensembles perform better than other relevant kinds of ensemble classifiers, including random forests.

Introduction

The underlying idea that motivates the adoption of classifier ensembles is that, under proper conditions, a pool of classifiers can achieve better performance than a single one. The foundations of classifier ensembles are rooted in several proposals. For essential information on this matter, the interested reader may consult, for instance, [9], [18], [25], [41], [45].

Some conceptually simple guidelines must be followed to obtain good performance from a classifier ensemble, including the adoption of a suitable output combination policy and a training strategy or technique able to enforce diversity among its components. As for output combination policies, they have been investigated over the years –see, for instance [7], [34], [47]. By far, the most common are majority / plurality voting and simple averaging. Voting policies apply to ensembles whose base classifiers output a label, whereas averaging is recommended when base classifiers output a numeric value. Notably, weighted versions for the cited policies are also common. As for diversity, this desired property can be obtained focusing on different aspects (not necessarily mutually exclusive). In particular, let us recall dataset manipulation [9], [11], [19], [23], feature manipulation [8], [12], [15], [26], [36], [44], randomness injection on relevant parameters of the learning algorithm [17], [29], methods based on the architectural heterogeneity of base classifiers [22], [27], [46], and specific strategies adopted during ensemble creation or thinning [4], [16], [24]. For more information on diversity analysis and enforcement, the interested reader may also consult [14], [31].

Looking outside the limits constrained by voting policies and diversity, it is worth mentioning that great efforts (both theoretical and experimental) have been made to quantify the gain in terms of performance of an ensemble with respect to its embedded classifiers, taken individually. In particular, it has been shown that an ensemble is expected to have an error rate lower than the one provided by its best-performing component under the hypothesis that embedded classifiers have low bias and high pairwise diversity [41].

In the proposed algorithm, the diversity between the components of the ensemble is increased by forcing them to become experts on different regions of the sample space. One may note that the error rate on the whole feature space should increase while limiting the scope of a classifier. However, this drawback is only apparent, provided that a proper output combination policy is adopted. Dynamically assigning a degree of expertise to each embedded classifier, depending on the distance between the sample at hand and the centroid of the region over which the classifier has been trained, has shown to be a viable policy.

The rest of the paper is organized as follows. In the next section, some relevant ensemble methods proposed in the literature are briefly recalled. Section 3 provides a gentle introduction to the proposed algorithm by means of the well known committee-based metaphor, whereas Section 4 introduces the proposal with relevant technical details. Section 5 provides experimental results and comparisons with other relevant kinds of ensemble classifiers, showing the effectiveness of the proposed approach. In Section 6, the proposed approach is discussed with particular reference to the similarities with other approaches. Its strengths and weaknesses are also briefly commented. Section 7 ends the manuscript.

Section snippets

Relevant work on ensemble methods

In 1996, Breiman [9] proposed bootstrap aggregating (Bagging, for short). The Bagging strategy consists of simulating the presence of multiple individual training sets (one for each classifier of the ensemble), obtained by bootstrapping the available dataset. In particular, each individual training set is generated by randomly selecting N samples, with replacement. Characterized by low bias and high variance, decision trees (DT) are the default base classifier used for Bagging. As for output

The expert committee metaphor and its links to our proposal

To facilitate the understanding of our proposal, let us first describe it by means of the well known expert committee metaphor. Let us consider a group of people, each with moderate performance levels, whose decisions on any given issue are combined to produce a collective outcome. According to the well known Condorcet jury theorem [32], if individual decisions are independent and with a probability of being correct  > 0.5, then an increase in the collective performance, as a group, is

Forests of local trees

Laying off the metaphor, let us better illustrate our proposal by focusing on its training and on the way it classifies unknown instances. Let us preliminarily note that the proposed ensemble strategy has been named “forest of local trees” (FLT, hereinafter). The reason for this designation jointly depends on two aspects, i.e., the injection of local knowledge during the training phase, on the one hand, and the design choice of using RDT as base learners, on the other hand. Each component of an

Experiments and results

FLT has been tested on two test beds: i) on 36 benchmark datasets, whose list is reported in Table 2, and ii) on 10 medium-size datasets, whose list is reported in Table 3. They will be referenced as T1 and T2 hereinafter. Almost all datasets have been downloaded from the ML repository at UCI.2 Experiments have been run using scikit-learn [35], a

Discussion

In this section, the proposed approach is discussed, emphasizing its relations with other relevant approaches. Moreover, its strengths and weaknesses are briefly summarized.

As for the links to other approaches, FLT takes credits mainly from RF, from an architectural perspective the ensemble being in fact an RF. However, similarities with MRPE and boosting can also be found. Similarities with MRPE occur as, by construction, each RDT has an expertise that gradually decreases with the distance of

Conclusions and future work

In this paper, a novel method for training ensemble classifiers, called FLT, has been proposed, inspired by the concept of building ensemble classifiers whose components have local expertise on the given domain. FLT has demonstrated its effectiveness on a wide range of standard classification domains. It has been shown that the accuracy of FLT is almost always higher than the one obtained by AdaBoost, and often better than the one obtained by Bagging and RF.

As for future work, several options

Acknowledgments

This work has been supported by LR7 (grant number: 8029-1122) 2009 (Investment Funds for Basic Research) and by PIA (grant number: 1492-118) 2010 (Integrated Subsidized Packages), both funded by the local government of Sardinia.

Giuliano Armano is associate professor of computer engineering at the University of Cagliari, Italy. His research interests are on classifier ensembles (in particular mixtures of experts), hierarchical classification and performance measures for classifier systems. These topics have been experimented on various application fields, including bioinformatics, information retrieval and text categorization.

References (47)

  • R.E. Banfield et al.

    A comparison of decision tree ensemble creation techniques

    IEEE Trans. Pattern Anal. Mach. Intell.

    (2007)
  • G. Biau

    Analysis of a random forests model

    J. Mach. Learn. Res.

    (2012)
  • G. Biau et al.

    Consistency of random forests and other averaging classifiers

    J. Mach. Learn. Res.

    (2008)
  • A. Bifet et al.

    New ensemble methods for evolving data streams

    Proc. of the 15th ACM SIGKDD Int. Conference on Knowledge Discovery and Data Mining

    (2009)
  • R. Blaser et al.

    Random rotation ensembles

    J. Mach. Learn. Res.

    (2016)
  • L. Breiman

    Bagging predictors

    Mach. Learn.

    (1996)
  • L. Breiman

    Arcing classifier (with discussion and a rejoinder by the author)

    Ann. Stat.

    (1998)
  • L. Breiman

    Randomizing outputs to increase prediction accuracy

    Mach. Learn.

    (2000)
  • L. Breiman

    Random forests

    Mach. Learn.

    (2001)
  • L. Breiman et al.

    Classification and Regression Trees

    (1984)
  • A. De Souza BrittoJr. et al.

    Dynamic selection of classifiers - a comprehensive review

    Pattern Recognit.

    (2014)
  • T.G. Dietterich

    An experimental comparison of three methods for constructing ensembles of decision trees: bagging, boosting, and randomization

    Mach. Learn.

    (2000)
  • T.G. Dietterich

    Ensemble methods in machine learning

    Proc. of the First Int. Workshop on Multiple Classifier Systems (MCS’00)

    (2000)
  • Cited by (17)

    • OLP++: An online local classifier for high dimensional data

      2023, Information Fusion
      Citation Excerpt :

      Another distinction can be done in terms of when the local information is taken into account, which can happen in each one (or several) of the MCS phases. Ensembles based on the divide and conquer principle tend to integrate the region definition in the generation phase [8,21,23], so that each member of the pool can specialize over a part of the feature space, therefore encouraging ensemble diversity through localization. They also usually apply a dynamic selection or aggregation rule in order to assign a higher importance to the base-classifiers that were encouraged to learn over a given area.

    • OIS-RF: A novel overlap and imbalance sensitive random forest

      2021, Engineering Applications of Artificial Intelligence
    • Heterogeneous oblique random forest

      2020, Pattern Recognition
      Citation Excerpt :

      From extensive applications of random forests in many diverse domains, it is evident that the recursive partitioning of the training data while optimizing some impurity criterion aid random forest to generalize better and by the virtue of subspace and ensemble method, RaF is able to achieve state-of-the-art performances. As reported in [26], employing linear classifiers in each internal node can lead to both “accurate” and “diverse” decision trees which is of vital importance for the success of random forests [1,27]. This is further verified by the large-scale benchmarking with state-of-the-art classifiers in [16,17].

    • Online local pool generation for dynamic classifier selection

      2019, Pattern Recognition
      Citation Excerpt :

      Although the learning algorithm provides a local perspective on the classification problem, its concept was not used in the context of producing a pool of locally accurate classifiers for DS techniques. Other related works, such as the Mixture of Random Prototype-based Local Experts [11] and the Forest of Local Trees [12] techniques, explore the divide-to-conquer approach of MCS by locally training their base classifiers in different regions of the feature space and weighting the classifiers’ votes based on the distance between the query sample and their assigned region. As opposed to these works, in which the pool generation is paired to a selection based on dynamic distance weighting, our approach consists of producing on the fly a locally accurate pool to be coupled with a DCS technique.

    • Vote-boosting ensembles

      2018, Pattern Recognition
      Citation Excerpt :

      In this method, complementarity is favored by simultaneously training all the classifiers in the ensemble: The parameters of the individual classifiers and the weights of the combination of their outputs are determined globally by minimizing a cost function that penalizes coincident predictions. One can also build ensembles of base learners that are trained to focus on different regions in feature space [5]. Boosting is another ensemble method in which complementarity among the classifiers is explicitly favored.

    View all citing articles on Scopus

    Giuliano Armano is associate professor of computer engineering at the University of Cagliari, Italy. His research interests are on classifier ensembles (in particular mixtures of experts), hierarchical classification and performance measures for classifier systems. These topics have been experimented on various application fields, including bioinformatics, information retrieval and text categorization.

    Emanuele Tamponi obtained his Ph.D. in computer engineering at the University of Cagliari, Italy (2015). His research interests are on data analysis, data complexity, ensemble methods, hierarchical classification and classifier complexity measures. Part of his Ph.D. was devoted to investigate the possibility of improving the performance of random forests.

    View full text