Skip to main content
Log in

On the Interpretation of Ensemble Classifiers in Terms of Bayes Classifiers

  • Published:
Journal of Classification Aims and scope Submit manuscript

Abstract

Many of the best classifiers are ensemble methods such as bagging, random forests, boosting, and Bayes model averaging. We give conditions under which each of these four classifiers can be regarded as a Bayes classifier. We also give conditions under which stacking achieves the minimal Bayes risk.

We compare the four classifiers with a logistic regression classifier to assess the cost of interpretability. First we characterize the increase in risk from using an ensemble method in a logistic classifier versus using it directly. Second, we characterize the change in risk from applying logistic regression to an ensemble method versus using the logistic classifier itself. Third, we give necessary and sufficient conditions for the logistic classifier to be worse than combining the logistic classifier and the Bayes classifier. Hence these results extend to ensemble classifiers that are asymptotically Bayes.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  • BERGER, J. (1985), Statistical Decision Theory and Bayesian Analysis (2nd ed.), New York: Springer.

    Book  MATH  Google Scholar 

  • BERNARDO, J., and SMITH, A. (2000), Bayesian Theory, Chichester: John Wiley and Sons.

    MATH  Google Scholar 

  • BILLINGSLEY, P. (2012), Probability and Measure, New Jersey: Wiley.

    MATH  Google Scholar 

  • BREIMAN, L. (1996a), “Bagging Predictors”, Machine Learning, 24, 123–140.

    MATH  Google Scholar 

  • BREIMAN, L. (1996b), “Stacked Regressions”, Machine Learning, 24, 49–64.

    MATH  Google Scholar 

  • BREIMAN, L. (2000), “Some Infinity Theory for Predictor Ensembles”, Technical Report No. 577, Statistics Department, University of California, Berkeley.

  • BREIMAN, L. (2001), “Random Forests”, Machine Learning, 45, 5–32.

    Article  MATH  Google Scholar 

  • CLARKE, B. (1999), “Asymptotic Normality of the Posterior in Relative Entropy”, IEEE Transactions on Information Theory, 45, 165–176.

    Article  MathSciNet  MATH  Google Scholar 

  • CLARKE, B. (2003), “Bayes Model Averaging and Stacking When Model Approximation Error Cannot Be Ignored”, Journal of Machine Learning Research, 683–712.

  • CLEMEN, R. (1989), “Combining Forecasts: A Review and Annotated Bibliography”, International Journal of Forecasting, 5, 559–583.

    Article  Google Scholar 

  • CLYDE, M., and IVERSEN, E. (2013), “Bayesian Model Averaging in the M-Open Framework”, in Bayesian Theory and Applications, eds. P. Damien, P. Dellaportas, N. Polson, and D. Stephens, Oxford: Oxford University Press, pp. 484–498.

    Google Scholar 

  • DASGUPTA, A. (2008), Asymptotic Theory of Statistics and Probability, Springer.

  • DAWID, A. (1982), “The Well-Calibrated Bayesian”, Journal of the American Statistical Association, 77, 605–610.

    Article  MathSciNet  MATH  Google Scholar 

  • DEVROYE, L., GYORFI, L., and LUGOSI, G. (2000), A Probabilistic Theory of Pattern Recognition, Springer.

  • FREUND, Y., and SCHAPIRE, R. (1996), “Experiments with a New Boosting Algorithm”, Proceedings of the Thirteenth International Conference on Machine Learning, 148–156.

  • FRIEDMAN, J., HASTIE, T., and TIBSHIRANI, R. (2000), “A Statistical View of Boosting”, The Annals of Statistics, 28, 337–407.

    Article  MathSciNet  MATH  Google Scholar 

  • GALTON, F. (1907), “The Wisdom of Crowds”, Nature, 75, 450-451.

    Article  Google Scholar 

  • HASTIE, T., TIBSHIRANI, R., and FRIEDMAN, J. (2008), The Elements of Statistical Learning: Data Mining, Inference, and Prediction (2nd ed.), Springer.

  • LE, T., and CLARKE, B. (2016), “A Bayes Interpretation of Stacking for M-Complete And M-Open Settings”, arXiv:1602.05162.

  • O’HAGAN, A., and FORSTER, J. (1999), The Advanced Theory of Statistics, Vol. 2B: Bayesian Inference, New York: Oxford University Press.

    MATH  Google Scholar 

  • OZAY, M., and YARMAN VURAL, F.T. (2012), “A New Fuzzy Stacked Generalization Technique and Analysis of Its Performance”, arXiv:1204.0171.

  • ROKACH, L. (2010), “Ensemble-Based Classifiers”, Artificial Intelligence Review, 33, 1–39.

    Article  Google Scholar 

  • SAVAGE, L. (1954), The Foundations of Statistics, New York: Wiley.

    MATH  Google Scholar 

  • SCHAPIRE, R. (1990), “The Strength of Weak Learnability”, Machine Learning, 5, 197–227.

    Google Scholar 

  • SILL, J., TAKACS, G., MACKEY, L., and LIN, D. (2009), “Feature-Weighted Linear Stacking”, arXiv:0911.0460.

  • SMYTH, P., and WOLPERT, D. (1999), “Linearly Combining Density Estimators via Stacking”, Machine Learning Journal, 36, 59–83.

    Article  Google Scholar 

  • TING, K.M., and WITTEN, I. (1999), “Issues in Stacked Generalization”, Journal of Artificial Intelligent Research, 10, 271–289.

    Article  MATH  Google Scholar 

  • WOLPERT, D. (1992), “Stacked Generalization”, Neural Networks, 5, 241–259.

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Bertrand Clarke.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Le, T., Clarke, B. On the Interpretation of Ensemble Classifiers in Terms of Bayes Classifiers. J Classif 35, 198–229 (2018). https://doi.org/10.1007/s00357-018-9257-y

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00357-018-9257-y

Keywords

Navigation