Elsevier

Pattern Recognition Letters

Volume 20, Issue 9, 20 September 1999, Pages 961-965
Pattern Recognition Letters

Shrinking classification trees for bootstrap aggregation

https://doi.org/10.1016/S0167-8655(99)00064-1Get rights and content

Abstract

Bootstrap aggregating (bagging) classification trees has been shown to improve the accuracy over using a single classification tree. However, it has been noticed that bagging pruned trees does not necessarily result in better performance than bagging unpruned trees does. It is our goal here to discuss this issue in the context of shrinking, instead of pruning, trees. Due to many duplicated observations in a bootstrap sample, the usual shrinking determined by cross-validation (CV) in bagging is so conservative that the resulting shrunken tree is not much different from the unshrunken tree, leading to their close performance. We propose to choose the shrinkage parameter for each base tree in bagging by using only extra-bootstrap observations as test cases. For the digit data taken from Breiman et al. (1984), we find that our proposal leads to improved accuracy over that from bagging unshrunken trees.

Introduction

We consider the classification problem: suppose we have some observations D={(Xi,Yi):i=1,…,n}, where Xi is a vector of some features or attributes and Yi is the corresponding class label. The goal is to learn a classifier from {(Xi,Yi)}, which can then be used to predict the class label Yj of any future observed attributes Xj. One popular tool is the classification or decision tree, such as CART and C4.5 (Breiman et al., 1984; Quinlan, 1993). Since a fully grown classification tree may over-fit the data and hence have a poor performance in predicting future observations, pruning and shrinking are adopted to avoid over-fitting (Breiman et al., 1984; Hastie and Pregibon, 1990).

Breiman (1996a) introduced bootstrap aggregation (bagging) as a method to combine multiple versions of an unstable estimator, each constructed from a bootstrap sample, and showed that it improves over the single estimate based on the original whole sample. In particular, it is well-known that classification trees are unstable. Therefore, bagging classification trees has since attracted much research attention. Bagging classification trees works as follows. First we draw a bootstrap sample Db from the original sample D; i.e. each of n observations in Db is independently drawn from D with an equal probability and with replacement (Efron and Tibshirani, 1993). Then we construct a base tree Tb using Db, which may or may not be pruned or shrunken. This process is repeated for B>1 times. Then for a future observation x, each trained tree Tb will have a prediction of its class as yb, and the bagging ensemble estimates the class of x as the one with plurality in {yb}.

Some recent work (e.g. Bauer and Kohavi, 1998) shows that bagging pruned trees does not necessarily yield a better performance than bagging unpruned trees does. In this paper we explore a similar issue in bagging shrunken/unshrunken trees. Using a well-known data set, we show that the usual method of bagging shrunken trees, analogous to bagging pruned trees, does not work well. This observation motivated us to propose a new method to shrink the base trees, leading to an ensemble with an improved accuracy over that using the usual method.

This paper is organized as follows. Section 2 is a brief review of shrinking and bagging classifications tree. A modification is proposed to shrink the base trees of the bagging ensemble in Section 3.

Section snippets

Shrinking and bagging

We do not expect that shrinking would differ from pruning technically in our current setting. However, since pruning is better studied in bagging, it is interesting in its own right to study shrinking in bagging. All of our simulations were conducted in S environment (Becker et al., 1988).

In this paper, any unshrunken (i.e., full) tree is by default fully grown to fit the data exactly; in other words, all of the training examples in any of its terminal nodes belong to the same class. As an

A proposal for shrinking in bagging

Comparing the ratios of the shrunken and unshrunken tree sizes in a bagging ensemble and in a single tree (Fig. 1), we will notice that indeed there is less shrinkage effect in bagging. This was also observed by Bauer and Kohavi (1998) in the context of bagging pruned trees. From the bootstrap theory (Efron and Tibshirani, 1993), we know that on average only 63% of the original observations appear in a bootstrap sample. In other words, there are many replicated observations in a bootstrap

Acknowledgments

The author is grateful to the Referees and Editors for helpful suggestions.

References (16)

  • Y. Freund et al.

    A decision-theoretic generalization of on-line learning and an application to boosting

    J. Computer and System Sci.

    (1997)
  • L.R. Bahl et al.

    A tree-based statistical language model for natural language speech recognition

    IEEE Trans. Acoust. Speech Signal Process.

    (1989)
  • Bauer, E., Kohavi, R., 1998. An empirical comparison of voting classification algorithms: Bagging, Boosting, and...
  • R.A. Becker et al.

    The New S Language: A Programming Environment for Data Analysis and Graphics

    (1988)
  • L. Breiman

    Bagging predictors

    Machine Learning

    (1996)
  • Breiman, L. 1996b. Pasting bites together for prediction in large data sets and on-line. Technical Report, Statistics...
  • L. Breiman et al.

    Classification and Regression Trees

    (1984)
  • W.L. Buntine

    Learning classification trees

    Statistics and Computing

    (1992)
There are more references available in the full text version of this article.

Cited by (0)

View full text