Abstract
Many of the best classifiers are ensemble methods such as bagging, random forests, boosting, and Bayes model averaging. We give conditions under which each of these four classifiers can be regarded as a Bayes classifier. We also give conditions under which stacking achieves the minimal Bayes risk.
We compare the four classifiers with a logistic regression classifier to assess the cost of interpretability. First we characterize the increase in risk from using an ensemble method in a logistic classifier versus using it directly. Second, we characterize the change in risk from applying logistic regression to an ensemble method versus using the logistic classifier itself. Third, we give necessary and sufficient conditions for the logistic classifier to be worse than combining the logistic classifier and the Bayes classifier. Hence these results extend to ensemble classifiers that are asymptotically Bayes.
Similar content being viewed by others
References
BERGER, J. (1985), Statistical Decision Theory and Bayesian Analysis (2nd ed.), New York: Springer.
BERNARDO, J., and SMITH, A. (2000), Bayesian Theory, Chichester: John Wiley and Sons.
BILLINGSLEY, P. (2012), Probability and Measure, New Jersey: Wiley.
BREIMAN, L. (1996a), “Bagging Predictors”, Machine Learning, 24, 123–140.
BREIMAN, L. (1996b), “Stacked Regressions”, Machine Learning, 24, 49–64.
BREIMAN, L. (2000), “Some Infinity Theory for Predictor Ensembles”, Technical Report No. 577, Statistics Department, University of California, Berkeley.
BREIMAN, L. (2001), “Random Forests”, Machine Learning, 45, 5–32.
CLARKE, B. (1999), “Asymptotic Normality of the Posterior in Relative Entropy”, IEEE Transactions on Information Theory, 45, 165–176.
CLARKE, B. (2003), “Bayes Model Averaging and Stacking When Model Approximation Error Cannot Be Ignored”, Journal of Machine Learning Research, 683–712.
CLEMEN, R. (1989), “Combining Forecasts: A Review and Annotated Bibliography”, International Journal of Forecasting, 5, 559–583.
CLYDE, M., and IVERSEN, E. (2013), “Bayesian Model Averaging in the M-Open Framework”, in Bayesian Theory and Applications, eds. P. Damien, P. Dellaportas, N. Polson, and D. Stephens, Oxford: Oxford University Press, pp. 484–498.
DASGUPTA, A. (2008), Asymptotic Theory of Statistics and Probability, Springer.
DAWID, A. (1982), “The Well-Calibrated Bayesian”, Journal of the American Statistical Association, 77, 605–610.
DEVROYE, L., GYORFI, L., and LUGOSI, G. (2000), A Probabilistic Theory of Pattern Recognition, Springer.
FREUND, Y., and SCHAPIRE, R. (1996), “Experiments with a New Boosting Algorithm”, Proceedings of the Thirteenth International Conference on Machine Learning, 148–156.
FRIEDMAN, J., HASTIE, T., and TIBSHIRANI, R. (2000), “A Statistical View of Boosting”, The Annals of Statistics, 28, 337–407.
GALTON, F. (1907), “The Wisdom of Crowds”, Nature, 75, 450-451.
HASTIE, T., TIBSHIRANI, R., and FRIEDMAN, J. (2008), The Elements of Statistical Learning: Data Mining, Inference, and Prediction (2nd ed.), Springer.
LE, T., and CLARKE, B. (2016), “A Bayes Interpretation of Stacking for M-Complete And M-Open Settings”, arXiv:1602.05162.
O’HAGAN, A., and FORSTER, J. (1999), The Advanced Theory of Statistics, Vol. 2B: Bayesian Inference, New York: Oxford University Press.
OZAY, M., and YARMAN VURAL, F.T. (2012), “A New Fuzzy Stacked Generalization Technique and Analysis of Its Performance”, arXiv:1204.0171.
ROKACH, L. (2010), “Ensemble-Based Classifiers”, Artificial Intelligence Review, 33, 1–39.
SAVAGE, L. (1954), The Foundations of Statistics, New York: Wiley.
SCHAPIRE, R. (1990), “The Strength of Weak Learnability”, Machine Learning, 5, 197–227.
SILL, J., TAKACS, G., MACKEY, L., and LIN, D. (2009), “Feature-Weighted Linear Stacking”, arXiv:0911.0460.
SMYTH, P., and WOLPERT, D. (1999), “Linearly Combining Density Estimators via Stacking”, Machine Learning Journal, 36, 59–83.
TING, K.M., and WITTEN, I. (1999), “Issues in Stacked Generalization”, Journal of Artificial Intelligent Research, 10, 271–289.
WOLPERT, D. (1992), “Stacked Generalization”, Neural Networks, 5, 241–259.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Le, T., Clarke, B. On the Interpretation of Ensemble Classifiers in Terms of Bayes Classifiers. J Classif 35, 198–229 (2018). https://doi.org/10.1007/s00357-018-9257-y
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00357-018-9257-y