On the Interpretation of Ensemble Classifiers in Terms of Bayes Classifiers

Le, Tri; Clarke, Bertrand

doi:10.1007/s00357-018-9257-y

On the Interpretation of Ensemble Classifiers in Terms of Bayes Classifiers

Published: 26 July 2018

Volume 35, pages 198–229, (2018)
Cite this article

Journal of Classification Aims and scope Submit manuscript

Tri Le¹ &
Bertrand Clarke¹

267 Accesses
5 Citations
Explore all metrics

Abstract

Many of the best classifiers are ensemble methods such as bagging, random forests, boosting, and Bayes model averaging. We give conditions under which each of these four classifiers can be regarded as a Bayes classifier. We also give conditions under which stacking achieves the minimal Bayes risk.

We compare the four classifiers with a logistic regression classifier to assess the cost of interpretability. First we characterize the increase in risk from using an ensemble method in a logistic classifier versus using it directly. Second, we characterize the change in risk from applying logistic regression to an ensemble method versus using the logistic classifier itself. Third, we give necessary and sufficient conditions for the logistic classifier to be worse than combining the logistic classifier and the Bayes classifier. Hence these results extend to ensemble classifiers that are asymptotically Bayes.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

References

BERGER, J. (1985), Statistical Decision Theory and Bayesian Analysis (2nd ed.), New York: Springer.
Book MATH Google Scholar
BERNARDO, J., and SMITH, A. (2000), Bayesian Theory, Chichester: John Wiley and Sons.
MATH Google Scholar
BILLINGSLEY, P. (2012), Probability and Measure, New Jersey: Wiley.
MATH Google Scholar
BREIMAN, L. (1996a), “Bagging Predictors”, Machine Learning, 24, 123–140.
MATH Google Scholar
BREIMAN, L. (1996b), “Stacked Regressions”, Machine Learning, 24, 49–64.
MATH Google Scholar
BREIMAN, L. (2000), “Some Infinity Theory for Predictor Ensembles”, Technical Report No. 577, Statistics Department, University of California, Berkeley.
BREIMAN, L. (2001), “Random Forests”, Machine Learning, 45, 5–32.
Article MATH Google Scholar
CLARKE, B. (1999), “Asymptotic Normality of the Posterior in Relative Entropy”, IEEE Transactions on Information Theory, 45, 165–176.
Article MathSciNet MATH Google Scholar
CLARKE, B. (2003), “Bayes Model Averaging and Stacking When Model Approximation Error Cannot Be Ignored”, Journal of Machine Learning Research, 683–712.
CLEMEN, R. (1989), “Combining Forecasts: A Review and Annotated Bibliography”, International Journal of Forecasting, 5, 559–583.
Article Google Scholar
CLYDE, M., and IVERSEN, E. (2013), “Bayesian Model Averaging in the M-Open Framework”, in Bayesian Theory and Applications, eds. P. Damien, P. Dellaportas, N. Polson, and D. Stephens, Oxford: Oxford University Press, pp. 484–498.
Google Scholar
DASGUPTA, A. (2008), Asymptotic Theory of Statistics and Probability, Springer.
DAWID, A. (1982), “The Well-Calibrated Bayesian”, Journal of the American Statistical Association, 77, 605–610.
Article MathSciNet MATH Google Scholar
DEVROYE, L., GYORFI, L., and LUGOSI, G. (2000), A Probabilistic Theory of Pattern Recognition, Springer.
FREUND, Y., and SCHAPIRE, R. (1996), “Experiments with a New Boosting Algorithm”, Proceedings of the Thirteenth International Conference on Machine Learning, 148–156.
FRIEDMAN, J., HASTIE, T., and TIBSHIRANI, R. (2000), “A Statistical View of Boosting”, The Annals of Statistics, 28, 337–407.
Article MathSciNet MATH Google Scholar
GALTON, F. (1907), “The Wisdom of Crowds”, Nature, 75, 450-451.
Article Google Scholar
HASTIE, T., TIBSHIRANI, R., and FRIEDMAN, J. (2008), The Elements of Statistical Learning: Data Mining, Inference, and Prediction (2nd ed.), Springer.
LE, T., and CLARKE, B. (2016), “A Bayes Interpretation of Stacking for M-Complete And M-Open Settings”, arXiv:1602.05162.
O’HAGAN, A., and FORSTER, J. (1999), The Advanced Theory of Statistics, Vol. 2B: Bayesian Inference, New York: Oxford University Press.
MATH Google Scholar
OZAY, M., and YARMAN VURAL, F.T. (2012), “A New Fuzzy Stacked Generalization Technique and Analysis of Its Performance”, arXiv:1204.0171.
ROKACH, L. (2010), “Ensemble-Based Classifiers”, Artificial Intelligence Review, 33, 1–39.
Article Google Scholar
SAVAGE, L. (1954), The Foundations of Statistics, New York: Wiley.
MATH Google Scholar
SCHAPIRE, R. (1990), “The Strength of Weak Learnability”, Machine Learning, 5, 197–227.
Google Scholar
SILL, J., TAKACS, G., MACKEY, L., and LIN, D. (2009), “Feature-Weighted Linear Stacking”, arXiv:0911.0460.
SMYTH, P., and WOLPERT, D. (1999), “Linearly Combining Density Estimators via Stacking”, Machine Learning Journal, 36, 59–83.
Article Google Scholar
TING, K.M., and WITTEN, I. (1999), “Issues in Stacked Generalization”, Journal of Artificial Intelligent Research, 10, 271–289.
Article MATH Google Scholar
WOLPERT, D. (1992), “Stacked Generalization”, Neural Networks, 5, 241–259.
Article Google Scholar

Download references

Author information

Authors and Affiliations

University of Nebraska-Lincoln, 340 Hardin Hall North Wing, Lincoln, NE, USA, 68583-0963
Tri Le & Bertrand Clarke

Authors

Tri Le
View author publications
You can also search for this author in PubMed Google Scholar
Bertrand Clarke
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Bertrand Clarke.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Le, T., Clarke, B. On the Interpretation of Ensemble Classifiers in Terms of Bayes Classifiers. J Classif 35, 198–229 (2018). https://doi.org/10.1007/s00357-018-9257-y

Download citation

Published: 26 July 2018
Issue Date: July 2018
DOI: https://doi.org/10.1007/s00357-018-9257-y

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

On the Interpretation of Ensemble Classifiers in Terms of Bayes Classifiers

Abstract

Access this article

Similar content being viewed by others

An Empirical Analysis of Classifiers Using Ensemble Techniques

A geometric framework for multiclass ensemble classifiers

Statistical comparison of classifiers through Bayesian hierarchical modelling

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

On the Interpretation of Ensemble Classifiers in Terms of Bayes Classifiers

Abstract

Access this article

Similar content being viewed by others

An Empirical Analysis of Classifiers Using Ensemble Techniques

A geometric framework for multiclass ensemble classifiers

Statistical comparison of classifiers through Bayesian hierarchical modelling

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation