Abstract
Boosting is a new, powerful method for classification. It is an iterative procedure which successively classifies a weighted version of the sample, and then reweights this sample dependent on how successful the classification was. In this paper we review some of the commonly used methods for performing boosting and show how they can be fit into a Bayesian setup at each iteration of the algorithm. We demonstrate how this formulation gives rise to a new splitting criterion when using a domain-partitioning classification method such as a decision tree. Further we can improve the predictive performance of simple decision trees, known as stumps, by using a posterior weighted average of them to classify at each step of the algorithm, rather than just a single stump. The main advantage of this approach is to reduce the number of boosting iterations required to produce a good classifier with only a minimal increase in the computational complexity of the algorithm.
Similar content being viewed by others
References
Bernardo J.M. and Smith A.F.M. 1994. Bayesian Theory. Wiley, Chichester.
Breiman L., Friedman J.H., Olshen R., and Stone C.J. 1984. Classification and Regression Trees. Wadsworth, Belmont, CA.
Buntine W.L. 1992. Learning classification trees. Statistics and Computing 2: 63-73.
Chipman H., George E.I., and McCulloch R.E. 1998. Bayesian CART model search (with discussion). Journal of the American Statistical Association 93: 935-960.
Denison D.G.T., Mallick B.K., and Smith A.F.M. 1998a. Automatic Bayesian curve fitting. Journal of the Royal Statistical Society B 60: 333-350.
Denison D.G.T., Mallick B.K., and Smith A.F.M. 1998b. A Bayesian CART algorithm. Biometrika85: 363-377.
Freund Y. 1995. Boosting a weak learning algorithm by majority. Information and Computation 121: 256-285.
Freund Y. and Schapire R.E. 1996. Experiments with a new boosting algorithm. In: Machine Learning: Proceeding of the 13th International Conference, pp. 148-156.
Friedman J.H., Hastie T.J., and Tibshirani R.J. 1998. Additive logistic regression: A statistical view of boosting. Technical Report, Department of Statistics, Stanford University.
Hoeting J.A., Madigan D., Raftery A.E., and Volinsky C.T. 1999. Bayesian model averaging: A tutorial (with discussion). Statistical Science 14: 382-417.
Holmes C.C., Denison D.G.T., and Mallick B.K. 1999. Bayesian partitioning for classification and regression. Technical Report, Imperial College, London.
Kearns M. and Mansour Y. 1996. On the boosting ability of top-down decision tree learning algorithms. In: Proceedings of the 28th Annual ACM Symposium on the Theory of Computation.
Oliver J.J. and Hand D.J. 1994. Averaging over decision stumps. In: Bergadano F. and de Raedt L. (Eds.), Machine Learning: ECML-94. Springer-Verlag, Berlin.
Raftery A.E., Madigan D., and Volinsky C.T. 1996. Accounting for model uncertainty in survival analysis improves predictive performance (with discussion). In: Bernardo J.M., Berger J.O., Dawid A.P., and Smith A.F.M. (Eds.), Bayesian Statistics V. Clarendon Press, Oxford, pp. 323-349.
Schapire R.E. 1990. The strength of weak learnability. Machine Learning 5: 197-227.
Schapire R.E. and Singer Y. 1998. Improved boosting algorithms using confidence-rated predictions. In: Proceedings of the 11th Annual Conference on Computational Learning Theory.
Smith M. and Kohn R. 1996. Nonparametric regression using Bayesian variable selection. Journal of Econometrics 75: 317-344.
Venables W.N. and Ripley B.D. 1997. Modern Applied Statistics with S-PLUS, 2nd Edn. Springer-Verlag, New York.
West M. 1988. Modelling expert opinion. In: Bernardo J.M., DeGroot M.H., Lindley D.V., and Smith A.F.M. (Eds.), Bayesian Statistics III. Oxford University Press, Oxford, pp. 493-508.
Author information
Authors and Affiliations
Rights and permissions
About this article
Cite this article
Denison, D.G.T. Boosting with Bayesian stumps. Statistics and Computing 11, 171–178 (2001). https://doi.org/10.1023/A:1008931416845
Issue Date:
DOI: https://doi.org/10.1023/A:1008931416845