Approximating XGBoost with an interpretable decision tree
Introduction
The increased deployment of machine-learning models into new domains has recently accelerated a broad discussion around the importance of model interoperability [5]. In some domains, experts are required to either understand the mechanism by which the model works or to be able to justify decisions that are based on model outputs. It is not sufficient, for example, for a medical diagnosis model to be accurate. It also needs to be transparent to the expert that uses its output when making a decision about a certain patient. But even in scenarios that allow a larger margin of error, humans are less likely to accept models that they cannot comprehend. Interpretable machine-learning models address these issues. They are defined as models that can be clearly visualized or explained using plain text to the end-user [29]. A decision tree is a broadly used example of an interpretable model. Every classification made by a decision tree can be associated with a corresponding decision path. In addition, the hierarchical structure of the model as a whole can be easily visualized or described to users of any level of expertise [4]. But despite their high level of interpretability, decision trees have limited predictive performance due to the myopic nature of their induction algorithms. These algorithms usually fail to capture complex interactions among the input features, leading to fundamental biases in cases where such interactions exist. This issue can be addressed by training an ensemble of decision trees, also known as a decision forest.
Decision forests combine multiple decision trees towards providing a single output in supervised machine-learning tasks. The ability of decision forests to integrate different hypotheses in a single model and their robustness to any type of relational dataset have driven their popularity within the data science community [37]. Gradient Boosting Decision Tree (GBDT) is a sub-group of decision forests that includes models like XGBoost, CatBoost, and LightGBM. These models have recently found to be highly effective in numerous tasks, as reflected by the fact that most of Kaggle’s recent winners used these methods in their solutions. However, decision forests as a whole, and GBDT models, in particular, are considered to be black-box models. Each classification that is made by a decision forest is required to go over numerous different trees. Therefore, the end-user cannot have a clear justification of the predictions made by the model. Furthermore, it is impossible for the end-user to grasp the model structure as it practically composed of numerous single models. A plethora of studies addressed the complexity of decision forests by presenting ensemble pruning approaches. These approaches aim at filtering a subset of base trees that performs at least as well as the original decision forest [28]. While these methods reduce the complexity without impairing the predictive performance, the end-result cannot be considered as an interpretable model. Several studies, mainly from the past few years presented methods for transforming a decision forest into a single decision tree. Some of these methods require the synthesis of a large set of unlabelled data [16], [50] while others propose algorithms that manipulate the ensemble nodes into a new decision tree [42], [43], [45].
This paper presents a novel method for generating a single decision tree that is based on a previously trained decision forest. The generated tree aims at approximating the predictive performance of the decision forest. At the same time, it provides an explanatory mechanism for its classifications, enabling the end-user to understand its structure. This work can be viewed as an extension of the Forest-Based Tree (FBT) method that was developed and evaluated for independently induced decision forests, i.e. – decision forests in which the base-trees are trained independently [39] and are usually consist of a small number of deep trees rather than a large number of shallow trees. The main contribution of the new method is its applicability for both independently induced decision forests (e.g., random forest) and dependently induced decision forests (e.g. gradient boosting machines and XGBoost). It is done by refining the algorithm to be more focused on extracting information that is relevant to the original training set rather than considering only the explicit characteristics of the base trees. Another important contribution of the developed method is that it enables the configuration of the tree complexity by determining its maximum depth. Consequently, this configurable method allows its users to better control and address the trade-off between interpretability and predictive performance. The method includes three main stages. First, we conduct an ensemble pruning to the pre-trained ensemble. Then we extract a representative set of conjunctions from the pruned ensemble and finally, we build a decision tree that organizes the conjunction set in a tree structure. The remainder of the paper is structured as follows: In Section 2 we lay the scientific background and present related studies. In Section 3, we present the developed method. Section 4 presents an experimental evaluation for binary classification challenges and discusses its results. Section 5 concludes and suggests future research directions.
Section snippets
Background
Decision forests, and in particular gradient boosting decision trees (GBDT), are considered the best practice in many classification challenges [11], [38]. However, interpretable models like decision trees usually opt over decision forests when either the model or its predictions are required to be transparent to the end-user. Building a decision tree that approximates the predictive performance of a given decision forest, with the focus on GBDT models, is the subject of this paper and in the
Converting gradient boosting decision tree into a single tree
This section presents a method for generating a decision tree from a given decision forest. The presented method includes extensions and refinements to the forest-based tree (FBT) method presented in [39]. The main refinement is in adapting the conjunction set generation stage that breaks the decision forest into numerous building blocks. This stage was refined to consider the properties of the training data (e.g., class distribution, conjunctions that are relevant to the training instances) in
Experimental evaluation
Here we present an empirical evaluation of the developed algorithm by testing its ability to convert GBDT into a decision tree without impairing the predictive performance of the original forest. The main objective is to examine GBDT methods as the previous version of this algorithm was already found to be effective for ensembles of independently induced base trees [39]. It also enables us to analyze whether the algorithm performs differently for different ensemble types. The evaluation focuses
Conclusion and future work
This paper presented a method that builds a decision tree that approximates the predictive performance of a pre-trained ensemble of trees (namely, decision forest). The developed method is an extension of the work presented in [39]. This method is compatible with both independently induced forests (e.g., random-forest) and dependently induced forests (e.g. XGBoost and GBM) and not only to the former type like the previous version of this work. Among the extensions that were added to the
CRediT authorship contribution statement
Omer Sagi: Conceptualization, Data curation, Formal analysis, Investigation, Methodology, Software, Validation, Visualization, Writing - original draft. Lior Rokach: Conceptualization, Methodology, Validation, Supervision, Writing - review & editing.
Declaration of Competing Interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.
References (50)
- et al.
Trends and trajectories for explainable, accountable and intelligible systems: an hci research agenda
- et al.
Peeking inside the black-box: a survey on explainable artificial intelligence (xai)
IEEE Access
(2018) - Y. Akiba, S. Kaneda, H. Almuallim, Turning majority voting classifiers into a single decision tree, in: Tools with...
- et al.
Data mining with decision trees and decision rules
Future Gener. Comput. Syst.
(1997) - et al.
Explainable artificial intelligence (xai): Concepts, taxonomies, opportunities and challenges toward responsible ai
Inf. Fusion
(2020) - O. Bastani, C. Kim, H. Bastani, Interpretability via model extraction. arXiv preprint arXiv:1706.09773,...
- et al.
Think your artificial intelligence software is fair? think again
IEEE Softw.
(2019) Classification and regression trees
(2017)- et al.
Xgboost: a scalable tree boosting system
- et al.
Egbmmda: extreme gradient boosting machine for mirna-disease association prediction
Cell Death Disease
(2018)
Xgboost classifier for ddos attack detection and analysis in sdn-based cloud
A case study of algorithm-assisted decision making in child maltreatment hotline screening decisions
Statistical comparisons of classifiers over multiple data sets
J. Mach. Learn. Res.
Knowledge discovery via multiple models
Intell. Data Anal.
Greedy function approximation: a gradient boosting machine
Ann. Stat.
A survey of methods for explaining black box models
ACM Comput. Surveys
A survey of methods for explaining black box models
ACM Comput. Surveys
Chirps: explaining random forest classification
Artif. Intell. Rev.
A novel ensemble method for credit scoring: adaption of different imbalance ratios
Expert Syst. Appl.
Eros: ensemble rough subspaces
Pattern Recogn.
Reappraising the utility of google flu trends
PLoS Comput. Biol.
Cited by (151)
Individual-specific postural discomfort prediction using decision tree models
2024, Applied ErgonomicsPrioritizing Causation in Decision Trees: A Framework for Interpretable Modeling
2024, Engineering Applications of Artificial IntelligenceCrash energy management optimization of high-speed trains by machine learning methods
2024, International Journal of Mechanical SciencesMulti-source driven estimation of earthquake economic losses: A comprehensive and interpretable ensemble machine learning model
2024, International Journal of Disaster Risk ReductionAn XGBoost-assisted evolutionary algorithm for expensive multiobjective optimization problems
2024, Information SciencesA deep learning method for the prediction of ship fuel consumption in real operational conditions
2024, Engineering Applications of Artificial Intelligence