Skip to main content
Log in

Explainable Ensemble Trees

  • Original paper
  • Published:
Computational Statistics Aims and scope Submit manuscript

Abstract

Ensemble methods are supervised learning algorithms that provide highly accurate solutions by training many models. Random forest is probably the most widely used in regression and classification problems. It builds decision trees on different samples and takes their majority vote for classification and average in case of regression. However, such an algorithm suffers from a lack of explainability and thus does not allow users to understand how particular decisions are made. To improve on that, we propose a new way of interpreting an ensemble tree structure. Starting from a random forest model, our approach is able to explain graphically the relationship structure between the response variable and predictors. The proposed method appears to be useful in all real-world cases where model interpretation for predictive purposes is crucial. The proposal is evaluated by means of real data sets.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3

Similar content being viewed by others

References

  • Aria M, Cuccurullo C, Gnasso A (2021) A comparison among interpretative proposals for random forests. Mach Learn Appl 6:100094

    Google Scholar 

  • Breiman L (1996) Bagging predictors. Mach Learn 24(2):123–140

    Article  Google Scholar 

  • Breiman L (2001) Random forests. Mach Learn 45(1):5–32

    Article  Google Scholar 

  • Breiman L, Friedman JH, Olshen RA, Stone CJ (1984) Classification and regression trees. Belmont, CA: Wadsworth. Int Group 432:151–166

    Google Scholar 

  • Burkart N, Huber MF (2021) A survey on the explainability of supervised machine learning. J Artif Intell Res 70:245–317

    Article  MathSciNet  Google Scholar 

  • Caruana R, Niculescu-Mizil A (2006) An empirical comparison of supervised learning algorithms. In: Proceedings of the 23rd international conference on machine learning, pp 161–168

  • Caruana R, Karampatziakis N, Yessenalina A (2008) An empirical evaluation of supervised learning in high dimensions. In: Proceedings of the 25th international conference on Machine learning, pp 96–103

  • Chipman H, George E, McCulloh R (1998) Making sense of a forest of trees. Comput Sci Stat, 84–92

  • Doshi-Velez F, Kim B (2017) Towards a rigorous science of interpretable machine learning. arXiv:1702.08608

  • Dvořák J (2019) Classification trees with soft splits optimized for ranking. Comput Stat 34(2):763–786

    Article  MathSciNet  Google Scholar 

  • Ehrlinger J (2015) ggRandomForests: random forests for regression. R package version 1(4) (2015)

  • Fisher RA (1936) The use of multiple measurements in taxonomic problems. Ann Eugen 7(2):179–188

    Article  Google Scholar 

  • Friedman J, Hastie T, Tibshirani R (2001) The elements of statistical learning, vol 1. Springer series in statistics. Springer, New York

    Google Scholar 

  • Gilpin LH, Bau D, Yuan BZ, Bajwa A, Specter M, Kagal L (2018) Explaining explanations: an overview of interpretability of machine learning. In: 2018 IEEE 5th international conference on data science and advanced analytics (DSAA). IEEE, pp 80–89

  • Grömping U (2009) Variable importance assessment in regression: linear regression versus random forest. Am Stat 63(4):308–319

    Article  MathSciNet  Google Scholar 

  • Guidotti R, Monreale A, Ruggieri S, Turini F, Giannotti F, Pedreschi D (2018) A survey of methods for explaining black box models. ACM Comput Surv (CSUR) 51(5):1–42

    Article  Google Scholar 

  • Guyon I et al. (1997) A scaling law for the validation-set training-set size ratio. AT &T Bell Laboratories 1(11)

  • Haddouchi M, Berrado A (2018) Assessing interpretation capacity in machine learning: a critical review. In: Proceedings of the 12th international conference on intelligent systems: theories and applications, pp 1–6

  • Haddouchi M, Berrado A (2019) A survey of methods and tools used for interpreting random forest. In: 2019 1st international conference on smart systems and data science (ICSSD). IEEE, pp 1–6

  • Iorio C, Aria M, D’Ambrosio A, Siciliano R (2019) Informative trees by visual pruning. Expert Syst Appl 127:228–240

    Article  Google Scholar 

  • Liaw A, Wiener M et al (2002) Classification and regression by randomforest. R News 2(3):18–22

    Google Scholar 

  • Lipton ZC (2018) The mythos of model interpretability: in machine learning, the concept of interpretability is both important and slippery. Queue 16(3):31–57

    Article  Google Scholar 

  • Lisboa PJ (2013) Interpretability in machine learning—principles and practice. In: International workshop on fuzzy logic and applications. Springer, pp 15–21

  • Louppe G, Wehenkel L, Sutera A, Geurts P (2013) Understanding variable importances in forests of randomized trees. In: Advances in neural information processing systems, pp 431–439

  • Lundberg SM, Lee S-I (2017) A unified approach to interpreting model predictions. Adv Neural Inf Process Syst 30

  • Molnar C, Casalicchio G, Bischl B (2020) Interpretable machine learning—a brief history, state-of-the-art and challenges. In: Joint European conference on machine learning and knowledge discovery in databases. Springer, pp 417–431

  • Plumb G, Molitor D, Talwalkar AS (2018) Model agnostic supervised local explanations. Adv Neural Inf Process Syst 31

  • Ribeiro MT, Singh S, Guestrin C (2016) “Why should I trust you?” explaining the predictions of any classifier. In: Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pp 1135–1144

  • Shmueli G (2010) To explain or to predict? Stat Sci 25(3):289–310

    Article  MathSciNet  Google Scholar 

  • Štrumbelj E, Bosnić Z, Kononenko I, Zakotnik B, Grašič Kuhar C (2010) Explanation and reliability of prediction models: the case of breast cancer recurrence. Knowl Inf Syst 24(2):305–324

    Article  Google Scholar 

  • Tan S, Soloviev M, Hooker G, Wells MT (2020) Tree space prototypes: another look at making tree ensembles interpretable. In: Proceedings of the 2020 ACM-IMS on foundations of data science conference, pp 23–34

  • Wang M, Zhang H (2009) Search for the smallest random forest. Stat Interface 2(3):381–388

    Article  MathSciNet  Google Scholar 

  • Zhao X, Wu Y, Lee DL, Cui W (2018) iforest: Interpreting random forests via visual analytics. IEEE Trans Vis Comput Graph 25(1):407–416

    Article  Google Scholar 

Download references

Funding

Funding was provided by Ministero dell’Istruzione, dell’Università e della Ricerca (Grant No. PRIN 2017, ID: 2017KZZLYP). Giuseppe Pandolfo acknowledges the support of the National Operative Program (PON) Ricerca e Innovazione 2014-2020 (PON R &I) - Azione IV.4 - “Dottorati e contratti di ricerca su tematiche dell’innovazione.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Massimo Aria.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Aria, M., Gnasso, A., Iorio, C. et al. Explainable Ensemble Trees. Comput Stat 39, 3–19 (2024). https://doi.org/10.1007/s00180-022-01312-6

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00180-022-01312-6

Keywords

Navigation