Elsevier

Information Sciences

Volume 572, September 2021, Pages 522-542
Information Sciences

Approximating XGBoost with an interpretable decision tree

https://doi.org/10.1016/j.ins.2021.05.055Get rights and content

Highlights

  • A GBDT model can be converted into a single decision tree.

  • The generated tree approximates the accuracy of its source forest.

  • The developed tree provides interpretable classifications as opposed to GBDT.

  • The generated tree outperforms CARET induced trees in terms of predictive performance.

  • The complexity of the tree can be configured by the method user.

Abstract

The increasing usage of machine-learning models in critical domains has recently stressed the necessity of interpretable machine-learning models. In areas like healthcare, finary – the model consumer must understand the rationale behind the model output in order to use it when making a decision. For this reason, it is impossible to use black-box models in these scenarios, regardless of their high predictive performance. Decision forests, and in particular Gradient Boosting Decision Trees (GBDT), are examples of this kind of model. GBDT models are considered the state-of-the-art in many classification challenges, reflected by the fact that the majority of Kaggle’s recent winners used GBDT methods as a part of their solution (such as XGBoost). But despite their superior predictive performance, they cannot be used in tasks that require transparency. This paper presents a novel method for transforming a decision forest of any kind into an interpretable decision tree. The method extends the tool-set available for machine learning practitioners, who want to exploit the interpretability of decision trees without significantly impairing the predictive performance gained by GBDT models like XGBoost. We show in an empirical evaluation that in some cases the generated tree is able to approximate the predictive performance of a XGBoost model while enabling better transparency of the outputs.

Introduction

The increased deployment of machine-learning models into new domains has recently accelerated a broad discussion around the importance of model interoperability [5]. In some domains, experts are required to either understand the mechanism by which the model works or to be able to justify decisions that are based on model outputs. It is not sufficient, for example, for a medical diagnosis model to be accurate. It also needs to be transparent to the expert that uses its output when making a decision about a certain patient. But even in scenarios that allow a larger margin of error, humans are less likely to accept models that they cannot comprehend. Interpretable machine-learning models address these issues. They are defined as models that can be clearly visualized or explained using plain text to the end-user [29]. A decision tree is a broadly used example of an interpretable model. Every classification made by a decision tree can be associated with a corresponding decision path. In addition, the hierarchical structure of the model as a whole can be easily visualized or described to users of any level of expertise [4]. But despite their high level of interpretability, decision trees have limited predictive performance due to the myopic nature of their induction algorithms. These algorithms usually fail to capture complex interactions among the input features, leading to fundamental biases in cases where such interactions exist. This issue can be addressed by training an ensemble of decision trees, also known as a decision forest.

Decision forests combine multiple decision trees towards providing a single output in supervised machine-learning tasks. The ability of decision forests to integrate different hypotheses in a single model and their robustness to any type of relational dataset have driven their popularity within the data science community [37]. Gradient Boosting Decision Tree (GBDT) is a sub-group of decision forests that includes models like XGBoost, CatBoost, and LightGBM. These models have recently found to be highly effective in numerous tasks, as reflected by the fact that most of Kaggle’s recent winners used these methods in their solutions. However, decision forests as a whole, and GBDT models, in particular, are considered to be black-box models. Each classification that is made by a decision forest is required to go over numerous different trees. Therefore, the end-user cannot have a clear justification of the predictions made by the model. Furthermore, it is impossible for the end-user to grasp the model structure as it practically composed of numerous single models. A plethora of studies addressed the complexity of decision forests by presenting ensemble pruning approaches. These approaches aim at filtering a subset of base trees that performs at least as well as the original decision forest [28]. While these methods reduce the complexity without impairing the predictive performance, the end-result cannot be considered as an interpretable model. Several studies, mainly from the past few years presented methods for transforming a decision forest into a single decision tree. Some of these methods require the synthesis of a large set of unlabelled data [16], [50] while others propose algorithms that manipulate the ensemble nodes into a new decision tree [42], [43], [45].

This paper presents a novel method for generating a single decision tree that is based on a previously trained decision forest. The generated tree aims at approximating the predictive performance of the decision forest. At the same time, it provides an explanatory mechanism for its classifications, enabling the end-user to understand its structure. This work can be viewed as an extension of the Forest-Based Tree (FBT) method that was developed and evaluated for independently induced decision forests, i.e. – decision forests in which the base-trees are trained independently [39] and are usually consist of a small number of deep trees rather than a large number of shallow trees. The main contribution of the new method is its applicability for both independently induced decision forests (e.g., random forest) and dependently induced decision forests (e.g. gradient boosting machines and XGBoost). It is done by refining the algorithm to be more focused on extracting information that is relevant to the original training set rather than considering only the explicit characteristics of the base trees. Another important contribution of the developed method is that it enables the configuration of the tree complexity by determining its maximum depth. Consequently, this configurable method allows its users to better control and address the trade-off between interpretability and predictive performance. The method includes three main stages. First, we conduct an ensemble pruning to the pre-trained ensemble. Then we extract a representative set of conjunctions from the pruned ensemble and finally, we build a decision tree that organizes the conjunction set in a tree structure. The remainder of the paper is structured as follows: In Section 2 we lay the scientific background and present related studies. In Section 3, we present the developed method. Section 4 presents an experimental evaluation for binary classification challenges and discusses its results. Section 5 concludes and suggests future research directions.

Section snippets

Background

Decision forests, and in particular gradient boosting decision trees (GBDT), are considered the best practice in many classification challenges [11], [38]. However, interpretable models like decision trees usually opt over decision forests when either the model or its predictions are required to be transparent to the end-user. Building a decision tree that approximates the predictive performance of a given decision forest, with the focus on GBDT models, is the subject of this paper and in the

Converting gradient boosting decision tree into a single tree

This section presents a method for generating a decision tree from a given decision forest. The presented method includes extensions and refinements to the forest-based tree (FBT) method presented in [39]. The main refinement is in adapting the conjunction set generation stage that breaks the decision forest into numerous building blocks. This stage was refined to consider the properties of the training data (e.g., class distribution, conjunctions that are relevant to the training instances) in

Experimental evaluation

Here we present an empirical evaluation of the developed algorithm by testing its ability to convert GBDT into a decision tree without impairing the predictive performance of the original forest. The main objective is to examine GBDT methods as the previous version of this algorithm was already found to be effective for ensembles of independently induced base trees [39]. It also enables us to analyze whether the algorithm performs differently for different ensemble types. The evaluation focuses

Conclusion and future work

This paper presented a method that builds a decision tree that approximates the predictive performance of a pre-trained ensemble of trees (namely, decision forest). The developed method is an extension of the work presented in [39]. This method is compatible with both independently induced forests (e.g., random-forest) and dependently induced forests (e.g. XGBoost and GBM) and not only to the former type like the previous version of this work. Among the extensions that were added to the

CRediT authorship contribution statement

Omer Sagi: Conceptualization, Data curation, Formal analysis, Investigation, Methodology, Software, Validation, Visualization, Writing - original draft. Lior Rokach: Conceptualization, Methodology, Validation, Supervision, Writing - review & editing.

Declaration of Competing Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

References (50)

  • A. Abdul et al.

    Trends and trajectories for explainable, accountable and intelligible systems: an hci research agenda

  • A. Adadi et al.

    Peeking inside the black-box: a survey on explainable artificial intelligence (xai)

    IEEE Access

    (2018)
  • Y. Akiba, S. Kaneda, H. Almuallim, Turning majority voting classifiers into a single decision tree, in: Tools with...
  • C. Apté et al.

    Data mining with decision trees and decision rules

    Future Gener. Comput. Syst.

    (1997)
  • A.B. Arrieta et al.

    Explainable artificial intelligence (xai): Concepts, taxonomies, opportunities and challenges toward responsible ai

    Inf. Fusion

    (2020)
  • O. Bastani, C. Kim, H. Bastani, Interpretability via model extraction. arXiv preprint arXiv:1706.09773,...
  • R.K. Bellamy et al.

    Think your artificial intelligence software is fair? think again

    IEEE Softw.

    (2019)
  • L. Breiman

    Classification and regression trees

    (2017)
  • T. Chen et al.

    Xgboost: a scalable tree boosting system

  • X. Chen et al.

    Egbmmda: extreme gradient boosting machine for mirna-disease association prediction

    Cell Death Disease

    (2018)
  • Z. Chen et al.

    Xgboost classifier for ddos attack detection and analysis in sdn-based cloud

  • A. Chouldechova et al.

    A case study of algorithm-assisted decision making in child maltreatment hotline screening decisions

  • M. Craven, J.W. Shavlik, Extracting tree-structured representations of trained networks, in: Advances in neural...
  • J. Demšar

    Statistical comparisons of classifiers over multiple data sets

    J. Mach. Learn. Res.

    (2006)
  • T.G. Dietterich, Ensemble methods in machine learning, in: International workshop on multiple classifier systems,...
  • P. Domingos

    Knowledge discovery via multiple models

    Intell. Data Anal.

    (1998)
  • W. Fan, F. Chu, H. Wang, P.S. Yu, Pruning and dynamic scheduling of cost-sensitive ensembles, in: AAAI/IAAI, 2002, pp....
  • J.H. Friedman

    Greedy function approximation: a gradient boosting machine

    Ann. Stat.

    (2001)
  • R. Guidotti et al.

    A survey of methods for explaining black box models

    ACM Comput. Surveys

    (2018)
  • R. Guidotti et al.

    A survey of methods for explaining black box models

    ACM Comput. Surveys

    (2019)
  • J. Hatwell et al.

    Chirps: explaining random forest classification

    Artif. Intell. Rev.

    (2020)
  • H. He et al.

    A novel ensemble method for credit scoring: adaption of different imbalance ratios

    Expert Syst. Appl.

    (2018)
  • Q. Hu et al.

    Eros: ensemble rough subspaces

    Pattern Recogn.

    (2007)
  • X. Jiang, C.-A. Wu, H. Guo, Forest pruning based on branch importance, Comput. Intell. Neurosci....
  • S. Kandula et al.

    Reappraising the utility of google flu trends

    PLoS Comput. Biol.

    (2019)
  • Cited by (151)

    • Prioritizing Causation in Decision Trees: A Framework for Interpretable Modeling

      2024, Engineering Applications of Artificial Intelligence
    View all citing articles on Scopus
    View full text