Approximating XGBoost with an interpretable decision tree

doi:10.1016/j.ins.2021.05.055

Information Sciences

Volume 572, September 2021, Pages 522-542

https://doi.org/10.1016/j.ins.2021.05.055 Get rights and content

Highlights

•
A GBDT model can be converted into a single decision tree.
•
The generated tree approximates the accuracy of its source forest.
•
The developed tree provides interpretable classifications as opposed to GBDT.
•
The generated tree outperforms CARET induced trees in terms of predictive performance.
•
The complexity of the tree can be configured by the method user.

Abstract

The increasing usage of machine-learning models in critical domains has recently stressed the necessity of interpretable machine-learning models. In areas like healthcare, finary – the model consumer must understand the rationale behind the model output in order to use it when making a decision. For this reason, it is impossible to use black-box models in these scenarios, regardless of their high predictive performance. Decision forests, and in particular Gradient Boosting Decision Trees (GBDT), are examples of this kind of model. GBDT models are considered the state-of-the-art in many classification challenges, reflected by the fact that the majority of Kaggle’s recent winners used GBDT methods as a part of their solution (such as XGBoost). But despite their superior predictive performance, they cannot be used in tasks that require transparency. This paper presents a novel method for transforming a decision forest of any kind into an interpretable decision tree. The method extends the tool-set available for machine learning practitioners, who want to exploit the interpretability of decision trees without significantly impairing the predictive performance gained by GBDT models like XGBoost. We show in an empirical evaluation that in some cases the generated tree is able to approximate the predictive performance of a XGBoost model while enabling better transparency of the outputs.

Introduction

The increased deployment of machine-learning models into new domains has recently accelerated a broad discussion around the importance of model interoperability [5]. In some domains, experts are required to either understand the mechanism by which the model works or to be able to justify decisions that are based on model outputs. It is not sufficient, for example, for a medical diagnosis model to be accurate. It also needs to be transparent to the expert that uses its output when making a decision about a certain patient. But even in scenarios that allow a larger margin of error, humans are less likely to accept models that they cannot comprehend. Interpretable machine-learning models address these issues. They are defined as models that can be clearly visualized or explained using plain text to the end-user [29]. A decision tree is a broadly used example of an interpretable model. Every classification made by a decision tree can be associated with a corresponding decision path. In addition, the hierarchical structure of the model as a whole can be easily visualized or described to users of any level of expertise [4]. But despite their high level of interpretability, decision trees have limited predictive performance due to the myopic nature of their induction algorithms. These algorithms usually fail to capture complex interactions among the input features, leading to fundamental biases in cases where such interactions exist. This issue can be addressed by training an ensemble of decision trees, also known as a decision forest.

Decision forests combine multiple decision trees towards providing a single output in supervised machine-learning tasks. The ability of decision forests to integrate different hypotheses in a single model and their robustness to any type of relational dataset have driven their popularity within the data science community [37]. Gradient Boosting Decision Tree (GBDT) is a sub-group of decision forests that includes models like XGBoost, CatBoost, and LightGBM. These models have recently found to be highly effective in numerous tasks, as reflected by the fact that most of Kaggle’s recent winners used these methods in their solutions. However, decision forests as a whole, and GBDT models, in particular, are considered to be black-box models. Each classification that is made by a decision forest is required to go over numerous different trees. Therefore, the end-user cannot have a clear justification of the predictions made by the model. Furthermore, it is impossible for the end-user to grasp the model structure as it practically composed of numerous single models. A plethora of studies addressed the complexity of decision forests by presenting ensemble pruning approaches. These approaches aim at filtering a subset of base trees that performs at least as well as the original decision forest [28]. While these methods reduce the complexity without impairing the predictive performance, the end-result cannot be considered as an interpretable model. Several studies, mainly from the past few years presented methods for transforming a decision forest into a single decision tree. Some of these methods require the synthesis of a large set of unlabelled data [16], [50] while others propose algorithms that manipulate the ensemble nodes into a new decision tree [42], [43], [45].

This paper presents a novel method for generating a single decision tree that is based on a previously trained decision forest. The generated tree aims at approximating the predictive performance of the decision forest. At the same time, it provides an explanatory mechanism for its classifications, enabling the end-user to understand its structure. This work can be viewed as an extension of the Forest-Based Tree (FBT) method that was developed and evaluated for independently induced decision forests, i.e. – decision forests in which the base-trees are trained independently [39] and are usually consist of a small number of deep trees rather than a large number of shallow trees. The main contribution of the new method is its applicability for both independently induced decision forests (e.g., random forest) and dependently induced decision forests (e.g. gradient boosting machines and XGBoost). It is done by refining the algorithm to be more focused on extracting information that is relevant to the original training set rather than considering only the explicit characteristics of the base trees. Another important contribution of the developed method is that it enables the configuration of the tree complexity by determining its maximum depth. Consequently, this configurable method allows its users to better control and address the trade-off between interpretability and predictive performance. The method includes three main stages. First, we conduct an ensemble pruning to the pre-trained ensemble. Then we extract a representative set of conjunctions from the pruned ensemble and finally, we build a decision tree that organizes the conjunction set in a tree structure. The remainder of the paper is structured as follows: In Section 2 we lay the scientific background and present related studies. In Section 3, we present the developed method. Section 4 presents an experimental evaluation for binary classification challenges and discusses its results. Section 5 concludes and suggests future research directions.

Section snippets

Background

Decision forests, and in particular gradient boosting decision trees (GBDT), are considered the best practice in many classification challenges [11], [38]. However, interpretable models like decision trees usually opt over decision forests when either the model or its predictions are required to be transparent to the end-user. Building a decision tree that approximates the predictive performance of a given decision forest, with the focus on GBDT models, is the subject of this paper and in the

Converting gradient boosting decision tree into a single tree

This section presents a method for generating a decision tree from a given decision forest. The presented method includes extensions and refinements to the forest-based tree (FBT) method presented in [39]. The main refinement is in adapting the conjunction set generation stage that breaks the decision forest into numerous building blocks. This stage was refined to consider the properties of the training data (e.g., class distribution, conjunctions that are relevant to the training instances) in

Experimental evaluation

Here we present an empirical evaluation of the developed algorithm by testing its ability to convert GBDT into a decision tree without impairing the predictive performance of the original forest. The main objective is to examine GBDT methods as the previous version of this algorithm was already found to be effective for ensembles of independently induced base trees [39]. It also enables us to analyze whether the algorithm performs differently for different ensemble types. The evaluation focuses

Conclusion and future work

This paper presented a method that builds a decision tree that approximates the predictive performance of a pre-trained ensemble of trees (namely, decision forest). The developed method is an extension of the work presented in [39]. This method is compatible with both independently induced forests (e.g., random-forest) and dependently induced forests (e.g. XGBoost and GBM) and not only to the former type like the previous version of this work. Among the extensions that were added to the

CRediT authorship contribution statement

Omer Sagi: Conceptualization, Data curation, Formal analysis, Investigation, Methodology, Software, Validation, Visualization, Writing - original draft. Lior Rokach: Conceptualization, Methodology, Validation, Supervision, Writing - review & editing.

Declaration of Competing Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

References (50)

A. Abdul et al.
Trends and trajectories for explainable, accountable and intelligible systems: an hci research agenda
A. Adadi et al.
Peeking inside the black-box: a survey on explainable artificial intelligence (xai)
IEEE Access
(2018)
Y. Akiba, S. Kaneda, H. Almuallim, Turning majority voting classifiers into a single decision tree, in: Tools with...
C. Apté et al.
Data mining with decision trees and decision rules
Future Gener. Comput. Syst.
(1997)
A.B. Arrieta et al.
Explainable artificial intelligence (xai): Concepts, taxonomies, opportunities and challenges toward responsible ai
Inf. Fusion
(2020)
O. Bastani, C. Kim, H. Bastani, Interpretability via model extraction. arXiv preprint arXiv:1706.09773,...
R.K. Bellamy et al.
Think your artificial intelligence software is fair? think again
IEEE Softw.
(2019)
L. Breiman
Classification and regression trees
(2017)
T. Chen et al.
Xgboost: a scalable tree boosting system
X. Chen et al.
Egbmmda: extreme gradient boosting machine for mirna-disease association prediction
Cell Death Disease
(2018)

Z. Chen et al.

Xgboost classifier for ddos attack detection and analysis in sdn-based cloud

A. Chouldechova et al.

A case study of algorithm-assisted decision making in child maltreatment hotline screening decisions

M. Craven, J.W. Shavlik, Extracting tree-structured representations of trained networks, in: Advances in neural...

J. Demšar

Statistical comparisons of classifiers over multiple data sets

J. Mach. Learn. Res.

(2006)

T.G. Dietterich, Ensemble methods in machine learning, in: International workshop on multiple classifier systems,...

P. Domingos

Knowledge discovery via multiple models

Intell. Data Anal.

(1998)

W. Fan, F. Chu, H. Wang, P.S. Yu, Pruning and dynamic scheduling of cost-sensitive ensembles, in: AAAI/IAAI, 2002, pp....

J.H. Friedman

Greedy function approximation: a gradient boosting machine

Ann. Stat.

(2001)

R. Guidotti et al.

A survey of methods for explaining black box models

ACM Comput. Surveys

(2018)

R. Guidotti et al.

A survey of methods for explaining black box models

ACM Comput. Surveys

(2019)

J. Hatwell et al.

Chirps: explaining random forest classification

Artif. Intell. Rev.

(2020)

H. He et al.

A novel ensemble method for credit scoring: adaption of different imbalance ratios

Expert Syst. Appl.

(2018)

Q. Hu et al.

Eros: ensemble rough subspaces

Pattern Recogn.

(2007)

X. Jiang, C.-A. Wu, H. Guo, Forest pruning based on branch importance, Comput. Intell. Neurosci....

S. Kandula et al.

Reappraising the utility of google flu trends

PLoS Comput. Biol.

(2019)

Cited by (151)

Individual-specific postural discomfort prediction using decision tree models
2024, Applied Ergonomics
The objective of the current study was to explore the utilization of the decision tree (DT) algorithm to model posture-discomfort relationships at the individual level. The DT algorithm has the advantage that it makes no assumptions about the distribution of data, is robust in representing non-linear data with noise, and produces white-box models that are interpretable. Individual-level modelling is essential for examining individual-specific postural discomfort perception processes and understanding the inter-individual variability. It also has practical applications, including the development of individual-specific digital human models and more precise and informative population accommodation analysis. Individual-specific DT models were generated using postural discomfort rating data for various seated upper body postures to predict discomfort based on postural and task variables. The individual-specific DT models accurately predicted postural discomfort and revealed large inter-individual variability in the modelling results. DT modelling is expected to greatly facilitate investigating the human discomfort perception process.
Prioritizing Causation in Decision Trees: A Framework for Interpretable Modeling
2024, Engineering Applications of Artificial Intelligence
As a popular machine learning model, decision trees classify and generalize well, but face challenges in engineering applications: 1) Sensitivity to perturbations and lack of interpretability due to correlation reliance. 2) Manual setting of stopping criterion which is unrelated to correlation strength and easily leads to over-partitioning. To address these two challenges, we first theoretically analyze what leads to sub-optimal decision trees. By incorporating causal discovery, this limitation can be attributed to the fact that trees grown with spurious correlations often fall into sub-optimal that lead to overfitting and unfair behaviors. Neglecting causality motivates us to develop a ‘better’ tree with low Kolmogorov complexity and high generalization capability. Then we propose a causality decision tree framework, CausalDT, based on our theoretical expectation, where Hilbert-Schmidt independence criterion serves as a baseline. Unlike previous approaches that prioritize relevance, our framework determines branch nodes based on causation between features, with the significance level determining whether the tree should be expanded further. Experimental results demonstrate that our model maintains performance while reducing average tree depth by 35% on various datasets. Furthermore, our model enhances decision fairness and interpretability.
Crash energy management optimization of high-speed trains by machine learning methods
2024, International Journal of Mechanical Sciences
With the increasing speed of railway vehicles, the intricacies inherent in train collision systems pose challenges in the rational allocation of energy during collision events. In this study, an efficient strategy for train crash energy management was proposed by integrating machine learning and the multi-objective optimization method. A 3-D finite element model of an eight-marshalling train was established and a train collision database was built by simulating train-to-train collisions. The machine learning methods were used to construct prediction models for the energy absorption in the head car's interface (E_a) and the standard deviation of the energy absorption in each intermediate car's interface (σ). The machine learning prediction model served as the fitness function for the multi-objective optimization algorithm, to achieve a maximum E_a and minimum σ based on the idea of collision energy management. The sample data of 340 groups were found to be sufficient to construct a machine learning model for energy absorption prediction, and the XGBoost was chosen to predict the collision energy absorption with R² of 0.923 for E_a and 0.927 for σ, respectively. The optimal alternative of train crash energy management was obtained (i.e., F₁ = 1784.69 kN, F₂ = 2881.38 kN, F₃ = 1596.43 kN, F₄ = 1353.44 kN, F₅ = 1765.68 kN, and F₆ = 1200.64 kN), compared to the traditional configuration of the equivalent values (i.e., 1500 kN). The optimized E_a increased by 10.51 % and σ decreased by 12.59 %, and the main energy absorption interfaces of the intermediate cars changed from the original 6 to 8 interfaces. The optimized train displayed better crashworthiness performance in terms of instantaneous acceleration, living space, and peak interfacial forces. These findings demonstrated that the proposed approach was effective in optimizing train crash energy management.
Multi-source driven estimation of earthquake economic losses: A comprehensive and interpretable ensemble machine learning model
2024, International Journal of Disaster Risk Reduction
Rapid and precise quantification of economic losses post-earthquake is critical for crafting informed disaster management strategies by governmental and insurance entities. This study introduces an ensemble model, constituted by Support Vector Machines (SVM) and Extreme Gradient Boosting (XGBoost), tailored for quick and interpretative prediction of GDP-related seismic loss assessments, with Sichuan Province serving as the empirical backdrop. The ensemble approach, pre-trained for expediency, draws on a knowledge-driven selection of 30 features that span seismic risks, socio-economic variables, and exposure factors. Enhanced by feature scaling and Bayesian hyperparameter optimization, the model's efficacy is validated through 6 metrics such as R-Squared, MSE, and MAE etc. The incorporation of SHAP values further unravels the model's decision-making, providing transparency to the often opaque computations of machine learning. This methodology offers a scalable, interpretable framework that equips stakeholders with timely and accurate insights for risk mitigation, ultimately strengthening earthquake resilience.
An XGBoost-assisted evolutionary algorithm for expensive multiobjective optimization problems
2024, Information Sciences
Many expensive optimization problems exist in various real-world applications. However traditional evolutionary algorithms are inadequate for solving these problems directly. Surrogate-assisted evolutionary algorithm (SAEA) can effectively solve expensive optimization problems using computationally inexpensive surrogate models. However, both the Kriging and ensemble models most SAEAs adopted have limited uncertainty of prediction, especially for expensive multiobjective optimization problems (EMOPs). To enhance the optimization performance of SAEA for EMOPs, this paper proposes a new XGBoost-assisted evolutionary algorithm, calling XGBEA. Specifically, XGBoost is used as the surrogate model, and a neighborhood density selection strategy based on a mixed population and archive space (NDS-MPA) is proposed to measure the uncertainties of individuals. XGBoost helps to best fit objective functions with different fitness landscapes. NDS-MPA selects non-dominated individuals with minimal density for re-evaluation, incorporating considerations of convergence, diversity and uncertainty. Experimental results on two well-studied benchmarks demonstrated the superiority of XGBEA over seven state-of-the-art SAEAs.
A deep learning method for the prediction of ship fuel consumption in real operational conditions
2024, Engineering Applications of Artificial Intelligence
In recent years, the European Commission and the International Maritime Organization (IMO) implemented various operational measures and policies to reduce ship fuel consumption and related emissions. The effectiveness of these measures relies upon developing accurate predictive models encompassing the influence of real operational conditions. This paper presents a deep learning method for the prediction of ship fuel consumption. The method utilizes big data analytics from sensors, voyage reporting and hydrometeorological data, comprising of 266 variables made available following sea trials of a Kamsarmax bulk carrier of Laskaridis Shipping Co. Ltd. A variable importance estimation model using a Decision Tree (DT) is used to understand the underlying relationships in the available dataset. Consequently, a deep learning model is developed to understand the influence of sailing speed, heading, displacement/draft, trim, weather, sea conditions, etc. on ship fuel consumption (SFC). This is achieved by incorporating attention mechanism into Bi-directional Long Short-Term Memory (Bi-LSTM) network. The potential of the new method is demonstrated by training data streams corresponding to real ship fuel consumption rates as well as internal and external operational conditions. A comprehensive comparison with existing methods indicates that the Bi-LSTM with attention mechanism presents the best fit when using high frequency data. It is concluded that subject to further testing and validation the method could be used for the development of decision support systems for monitoring environmentally sustainable ship operations.

View all citing articles on Scopus

View full text

Approximating XGBoost with an interpretable decision tree

Highlights

Abstract

Introduction

Section snippets

Background

Converting gradient boosting decision tree into a single tree

Experimental evaluation

Conclusion and future work

CRediT authorship contribution statement

Declaration of Competing Interest

Trends and trajectories for explainable, accountable and intelligible systems: an hci research agenda

Peeking inside the black-box: a survey on explainable artificial intelligence (xai)

IEEE Access

Data mining with decision trees and decision rules

Future Gener. Comput. Syst.

Explainable artificial intelligence (xai): Concepts, taxonomies, opportunities and challenges toward responsible ai

Inf. Fusion

Think your artificial intelligence software is fair? think again

IEEE Softw.

Classification and regression trees

Xgboost: a scalable tree boosting system

Egbmmda: extreme gradient boosting machine for mirna-disease association prediction

Cell Death Disease

Xgboost classifier for ddos attack detection and analysis in sdn-based cloud

A case study of algorithm-assisted decision making in child maltreatment hotline screening decisions

Statistical comparisons of classifiers over multiple data sets

J. Mach. Learn. Res.

Knowledge discovery via multiple models

Intell. Data Anal.

Greedy function approximation: a gradient boosting machine

Ann. Stat.

A survey of methods for explaining black box models

ACM Comput. Surveys

A survey of methods for explaining black box models

ACM Comput. Surveys

Chirps: explaining random forest classification

Artif. Intell. Rev.

A novel ensemble method for credit scoring: adaption of different imbalance ratios

Expert Syst. Appl.

Eros: ensemble rough subspaces

Pattern Recogn.

Reappraising the utility of google flu trends

PLoS Comput. Biol.