1 Introduction

The adoption of machine learning models in Artificial Intelligence (AI) has found application in increasingly sensitive and diverse areas such as speech recognition, image classification, biology, and medicine. When approaching a machine learning classifier, one has to take into consideration several potential issues such as overfitting, fragility to adversarial attacks, and over-parameterization. These well-known weaknesses highlight the underlying complexity of the generalization problem and have been addressed by several scholars in the field which leverage other learning tools, such as distillation and dataset enriching [7, 11].

A recent prominent research area is that of Explainable AI, which instead of addressing the model complexity in a ante-hoc fashion, subsumes it in human-understandable explanations. In this setting the objective is to explain the decisions of “black box” machine learning classifiers [6]. Explanations are a powerful tool which enables model inspection [22], validation [5], and human-in-the-loop systems [14]. Explainability also gained attention from institutional bodies which recently put into law the General Data Protection Regulation (GDPR). Besides giving people control over their personal data, the GDPR provides restrictions for automated decision-making processes. It introduces a right to meaningful’ explanation: an individual has the right to obtain “meaningful information about the logic involved” when automated decision making takes place [17, 19, 24].

In spite of the common interest and effort in the explainability field, the formal definition of what an explanation is remains an open question [18]. However, the research community is converging towards a small set of families of explanations (Sect. 2). With respect to tabular data, that is the focus of our work, explanations can take the form of: prototypes [3, 13], that is samples representative of some cluster of interest; sets of relevant features [1, 12]; or decision rules [23, 27]. In our work, we focus on rule-based explanations. Single-instance explanations, also known as local explanations, have shown promising results in approximating the behavior and motivating the decisions of black box models, and seldom they are able to outperform global interpretable-by-design models.

In light of these results we introduce the local-to-global problem (Sect. 3), a generalization problem which aims to relax the locality constraints of single-instance explanations [19]. It is based on the idea of deriving a global explanation by subsuming local logical rules. We propose to address this problem with a scoring system which subsumes a given set of local interpretable decision rules into a smaller set, then it is used to perform predictions and to describe the overall logic of the black box model (Sect. 4). In particular, we aim to derive explanations with a good trade-off among the following properties. Conciseness, which describes the succinctness of an explainable model: a concise model is composed of a small number of rules. Completeness which identifies the validity boundaries of explanations: complete models provide the user with explanations for a large number of instances. Finally, complexity which measures the inherent complexity of an explanation. We attempt to take into account these properties in the scoring system by defining a local Rule Relevance Score (rrs). We empirically show the effectiveness of the proposed explanation method explaining the decisions of two different black box models on four datasets in which each entry represents a human (Sect. 5). The local-to-global scoring system using the rrs, thanks to the aforementioned properties, is able to compete with and outperforms a set of baselines and explainable-by-design models.

2 Related Work

We report in this section some of the most relevant explainability techniques with a focus on our area of application, tabular dataset. Our scope is rule-based classifiers and rule generation/selection algorithms.

There are two main actors in an explainability problem: an opaque classifier, also called black box whose behavior must be explained, and a dataset to train the explainable model [6]. Explanation algorithms and models can be split into two branches of local and global explanation method [10]. The former provides explanations on the model behavior on a single prediction while the latter provides explanations on the whole model behavior. In this setting, local explainability problems operate on an available dataset comprised of a single instance. Local explanation algorithms tend to focus either on a neighborhood or on a candidate approach. Given a black box, a distance measure, and a record x, neighborhood approaches generate a synthetic neighborhood of x, then exploits an interpretable algorithm (such as a decision tree or a rule-based classifier) to extract a local explanation from it. lime [22] and lore [9] tackle the neighborhood generation through input perturbation and genetic algorithms, respectively. Candidate approaches instead focus on greedily exploring the problem space. anchors [23] generates a starting one-premise rule, then iteratively adds relevant premises by leveraging multi-bandit algorithms. Global explanation algorithms instead leverage the whole dataset and try to explain the overall logic of the black box classifier with explainable-by-design models. trepan [4] for instance, is a revised decision tree, and tries to jointly optimize gain ratio and fidelity to the given black box. This feature allows to reduce erroneous splits and dampen overfitting in the deeper levels of the tree.

While the methods previously discussed try to approximate either the local or global behavior of a black box, interpretable classifiers are explainable by design [8, 10] and are meant to substitute it in the classification task. However, at the cost of their interpretability comes a generally lower performance than those of black boxes. Decision trees like C4.5 [20] are probably the most notorious family of interpretable models. Another large family is the one of rule-based classifiers like foil [21] and cpar [27] that operate by iteratively generating detailed rulesets. Restricting ourselves to rule-based algorithms, there are several recent proposals in the literature. Decision sets [15] and MUSE [16] optimize an objective function balancing accuracy and complexity of the output ruleset, thus yielding a set of sorted and mutually exclusive rules. In [26] the authors introduce the Scalable Bayesian Rule Lists (sbrl), i.e., a Bayesian model to filter a given ruleset. The authors set up a prior distribution over the output ruleset bounded in number of premises per rule and size of the ruleset. The posterior is then addressed with a probabilistic scheme. A Bayesian formulation is also applied by the Falling Rule Lists [25], where the ruleset is updated with random operations such as premise swapping, replacement, addition and removal. Finally, corels [2] introduces an algorithmically bounded ruleset construction procedure with a strong emphasis on optimality.

The explanation methods reviewed above operate by either generating local (lore, anchors) or global rules (cpar, foil, corels, sbrl, etc.). The problem we address is instead that of subsuming a set of local rules to a set of global ones guaranteeing high affordability with the black box and a low complexity in the explanation for a better understanding. Note that the problem we deal with extracts explanations from other explanation, rather than directly from the data, as it is the case for the above global models. As a consequence, to the best of the our knowledge, our proposal is conceptually different from all those existing in the literature. However, in the experiment section we try to exploit existing methods as a replacement of the proposed one.

3 Problem Formulation

We first recall basic notations on classification and explanation. Afterwards, we define the local-to-global explanation problem for which we propose a solution.

We name black box b a not interpretable classification model, such as a neural network or a random forest. It is defined as a function \(b:\mathcal {X}^{(m)} \rightarrow \mathcal {Y}\) which maps records x from a feature space \(\mathcal {X}^{(m)}\) with m input features to a decision y in a target spaceFootnote 1 \(\mathcal {Y}\). We write \(b(x) = y\) to denote the decision y predicted by b, and \(b(X) = Y\) as a shorthand for \(\{b(x) \ |\ x \in X\} = Y\). An instance x consists of a set of m attribute-value pairs \((a_i, v_i)\), where \(a_i\) is a feature (or attribute) and \(v_i\) is a value from the domain of \(a_i\). We assume that b can be queried at will. Given b and an instance x for which the outcome \(b(x)=y\) has to be explained, we model a local explanation e of such decision as a decision rule \(r = p \rightarrow y\), where each premise \(p_i \in p\) is associated to a feature \(a_i\) and a range \([v_i^{(l)}, v_i^{(u)}]\). We can now formalize the local-to-global explanation problem as follows:

Definition 1 (Local-to-Global Explanation)

Let b be a black box classifier, \({X = \{x_1, \dots , x_n\}}\) a set of instances and \(R = \{r_1, \dots , r_n\}\) a set of the rule-based local explanations of b for all the instances in X. The local-to-global explanation problem consists in deriving from R an interpretable rule-based classifier approximating the global behavior of b.

Therefore, starting from a set of local explanations, our objective is to find a global interpretable classifier from which is possible to understand the overall logic followed by the black box for taking its decision.

4 Scoring Methods and Rule Relevance Score

In this section, we describe a scoring system for solving the local-to-global explanation problem. The proposed approach can be summarized as follows. Given a set of rules R as local explanation of a black box classifier b, the scoring system calculates a score for each rule \(r_i \in R\). Then, it prunes out the rules with a score lower than a given threshold. The resulting set of rules \(R^* \subseteq R\) is the global explanation approximating the behavior of the black box b.

In particular, our target is to select a small set of rules sufficiently large and precise to approximate the black box b, i.e., to extract from R a subset \(R^*\) rewarding the following properties. Firstly, generality: we wish for rules to be general, and hence applicable to large subsets of the dataset. The more general a rule set is, the larger the probability that a record in the dataset can be explained by it. Secondly, high accuracy: naturally, we wish for the predictions of the rule set to be accurate. Lastly, “outliers accuracy”. The results in [27] suggest that, in the solution space, accuracy and coverage are involved in a trade-off relationship. We wish to reward rules which capture rule-outliers, i.e., rules able to explain records matched by few other rules, as they are outliers in the solution space of explanations. Moreover, rewarding such rules allows us to reduce the overlap between rules and to discard a large chunk of the most “obvious” rules. Measures embedding these properties can act as a proxy for model completeness, as highly general rule sets lower the probability of occurring in non-explainable records. As a side-effect, fixed the rule set, general models tend to be simpler, since the more complex and detailed an explanation is, the lower its generality. Therefore, the effectiveness of the scoring system lies in the definition of a scoring function implementing the above properties.

In this proposal, we define the Rule Relevance Score (rrs). The proposed scoring formulation accounts for the required generality and accuracy constraints by weighting them in a tunable linear sum:

(1)

where c is a coverage score, s is a sparsity score, a is an association score, \(\widetilde{c}\) is a prediction coverage score, \(\widetilde{a}\) is a prediction association score, and \(\alpha _1, \dots , \alpha _5\) are tunable weightsFootnote 2. Coverage and sparsity act as a proxy for model complexity: the longer a rule is, the lower its coverage. It also follows that high-coverage rulesets yield highly complete models: the larger the ruleset coverage, the more records can be explained. Score vectors are computed on a given ruleset R and validation set X. Next, we detail each component of rrs defined in Eq. 1.

4.1 Coverage

Given a rule \(r = p \rightarrow y\) and a dataset X, we define the coverage of the rule r on X as the set of records \(x \in X\) that satisfy the premise of the rule, i.e.,

$$\begin{aligned} \varGamma (r, X) = \{x \in X \mid \forall a_i \in p,\ (a_i,v_i) \in x.\ v_i^{(l)} \le v_i < v_i^{(u)}\}. \end{aligned}$$
(2)

In addition, we call the inverse of the coverage function of a record the associated ruleset of a record, that is the set of rules satisfied by the record. Moreover, we extend the notion of coverage to that of perfect coverage of a rule r with target y, that is the subset of records covered and correctly predicted by r:

$$\begin{aligned} \widetilde{\varGamma }(r, X) = \{x \in X \mid x \in \varGamma (r, X) \wedge b(x) = y\}. \end{aligned}$$
(3)

The definition of perfect associated ruleset of a record is analogous to the non-perfect version and replaces the coverage function with its perfect extension.

We turn the above sets into the scores of the rrs formula as follows. Given a ruleset R, the coverage matrix \(C_{R, X}\) of R over X is a binary matrix such that \(C_{R, X}[i, j]=1\) if and only if the i-th rule in R covers the record j, i.e., if \(x_j \in \varGamma (R, \{x_j\})\).

It is then straightforward to define both the coverage score vector \(c_{R, X}\) and association score vector \(c^{-1}_{R, X}\) as the ratio of covered records and the ratio of the covering ruleset, respectively:

$$\begin{aligned} c_{R, X} = 1/|X| \cdot C_{R, X} \cdot \mathbbm {1},\;\;\; c^{-1}_{R, X} = 1/|R| \cdot \mathbbm {1}^T \cdot C_{R, X} \end{aligned}$$
(4)

where \(\mathbbm {1}\) is a column vector of appropriate size and 1 entries. The coverage score vector accounts for the normalized coverage of the records, while the association score vector accounts for the coverage of the rules.

4.2 Associated Rule Coverage

In order to accommodate also “outlier coverage”, i.e., the coverage of rare records, we apply product between the \(c_{R, X}\) and \(c^{-1}_{R, X}\), resulting in the associated rule-coverage score vector \(a_{R, X}\):

$$\begin{aligned} a_{R, X} = C_{R, X} (c^{-1}_{R, X})^{-1}. \end{aligned}$$
(5)

This score captures, for each rule, the average associated rule set cardinality of its covered records. Hence, rules covering less-covered records will tend to have a large associated rule-coverage.

We define the perfect coverage matrix \(\widetilde{C}_{R, X}\) using the \(\widetilde{\varGamma }\) operator, and in line with Eqs. 4 and 5 we name \(\widetilde{c}_{R, X}\) and \(\widetilde{a}_{R, X}\) the perfect coverage score and perfect associated rule-coverage of the rrs formula.

4.3 Sparsity

Coverage is not necessarily the unique measure to account for the coverage of a ruleset. We also account for the distance among the records covered by a rule with an average pairwise distance of the covered records. Let \(D_X\) be the pairwise symmetric distance matrix in which element (ij) holds the distance between record i and record j, we define sparsity as:

$$\begin{aligned} s_{R, X} = 1/D_X \cdot C_{R, X} \cdot D_X. \end{aligned}$$
(6)

4.4 Model Explanation and Prediction

The model explanation is comprised of two phases: a pruning phase, which extracts a global set of rules from a set of local ones, and a prediction phase, which employs the global set of rules to classify a given instance.

Pruning. Given a set R of local rules, and a validation set X, we calculate the rrs vector. Then, we extract a subset of rules \(R^*\) from R by pruning out the rules having a rrs lower than a threshold. As a threshold, we adopt a percentile of the values in the rrs vector. Formally, given \(\beta \) we prune R to \(R^*\) by removing all \(r \in R\) with score lower than the \(\beta ^{th}\) percentile. The ruleset \(R^*\) represents the global interpretation of the black box b explained by the scoring system rrs.

Prediction. Given a set of relevant rules \(R^*\) and a record x, we adopt the Laplacian schema introduced in [27]. Given a record x, the set of relevant rules \(R^*\) and a validation set X, the prediction of \(R^*\) on x is the prediction of the rule with the highest Laplacian accuracy in the associated ruleset of x.

5 Experiments

In this section we present an array of experiments showing the validity of the proposed solutionFootnote 3. In particular, we show the effectiveness of the scoring system using rrs in subsuming an optimal set of rules with respect to baseline scores and to state-of-the-art rule-based explainable by design methods.

Table 1. Dataset cardinality and encoded dimensionality.

5.1 Experimental Setting

We selected a set of standard binary classification tasks with datasets pre-processed in a one-hot formatFootnote 4: adult is a dataset on future income predictionFootnote 5; churn is a Kaggle dataset on telephone plan subscription predictionFootnote 6; compas is a dataset on recidivism predictionFootnote 7; german is a dataset on creditor predictionFootnote 8. We split each dataset in a stratified fashion: 80% is used for training the black box classifiers, and we explain the remaining 20%, namely X. Table 1 reports basic information about the datasetsFootnote 9. As black box classifiers, we report experimentsFootnote 10 explaining a Neural Network (NN) and a Random Forest (RF). As initial set of local rules R we adopt the explanation rules extracted using the local-explanation method lore [9] on the dataset X. As validation set, we adopt the test set X from which we extract the local explanations.

In order to evaluate the requirements reported in the previous section, we validate the explanation methods using the following measures.

  • \( fidelity (X, R, b) \in [0, 1]\), the fidelity of the interpretable model with respect to a given black box b on a dataset X. It indicates how well the interpretable model mimics the black box.

  • \( coverage (X, R) \in [0, 1]\) the normalized coverage of the interpretable model R on the given dataset X, i.e., \(c_{R,X}\). It indicates how many records the interpretable model is able to deal with.

  • \( hmean (X, R, b) \in [0, 1]\) the harmonic score, that is the harmonic mean of fidelity and coverage, striking a balance between the two.

  • \( size (R) \in [0, +\infty )\), the conciseness of the interpretable model in terms of cardinality of the ruleset R, i.e., |R|.

  • \( len (R) \in [0, +\infty )\), the complexity of the interpretable model in terms of average number of conditions in the premises of the rules in R.

As baselines, we compare the proposed rrs, with a trivial fidelity-based scoring schema fs, and with a coverage-based scoring schema cs. In practice, we replace the rrs adopted in the pruning phase of the proposed scoring system with fs or cs. Moreover, we compare the rrs scoring schema against global rule-based state-of-the-artFootnote 11 explainable-by-design classifiers: cpar [27], corels [2] and sbrl [26]. In addition, we prove that the global rules, extracted by these classifiers and provided as input to the rrs scoring system, do not guarantee the same performance of the local rules.

Fig. 1.
figure 1

Fidelity, coverage and harmonic score for rrs, fs and cs on local explanations extracted from a NN, for the different datasets varying the pruning percentile threshold \(\beta \). The highest score is highlighted by a double marker.

5.2 Rule Relevance Score vs. Fidelity and Coverage Scores

In this section, we show the importance of using a compound score like rrs in the pruning phase of the scoring system instead of trivial scores like fs or cs.

Figure 1 shows how fidelity, coverage and harmonic score varies when varying the pruning percentile threshold \(\beta \) for the various datasets using the NN black box classifier. Results using the RF as black box are close to those obtained using the NN and are not reported due to lack of space.

Regardless of the score rrs, fs or cs, most datasets show increasingly higher fidelity on higher pruning factors. We attribute this behavior to a large number of poorly performing rules which sway the ensemble towards the wrong prediction. fs shows the highest and the most stable fidelity across pruning factors. This pattern is probably due to the low usage of each rule. rrs and cs show almost no difference in terms of fidelity with a slight increase, indicating that (i) the fidelity score does not play a crucial role in the pruning, (ii) the coverage may hinder the prediction performance on lower pruning factors.

The differences between fs, rrs and cs grow significantly when coverage, and hence harmonic score, is measured. While both rrs and cs display a stable trend, fs dips in coverage between the \(50^{th}\) and \(80^{th}\) percentile, regardless of the dataset. As suggested in the fidelity analysis, coverage does not seem to correlate with fidelity. We notice that the decrease in coverage in fs does not correlate with a decrease in fidelity. This suggests that most of the rules in R and therefore in \(R^*\) are not useful in prediction, and thus, that the fidelity measure strongly relies on the default rule with majority target label.

5.3 Local vs Global Rules

In this section we compare the rrs scoring schema against the global rule-based classifiers: cpar, corels and sbrl. Tables 2 and 3 report the harmonic score, and ruleset size as a proxy of conciseness (the lower the better) and average rule length as a proxy for complexity (the lower the better) of the interpretable models for the NN and RF explanations, respectively. On the one hand, we have the scoring system with rrs subsuming the best local rules; on the other hand, we have the global rules from the explainable by design algorithms. rrs shows the highest harmonic score at the cost of a not very low complexity and conciseness. On the NN rules, cpar has an overall lower harmonic score, and higher complexity and conciseness than rrs. On the RF rules, instead, there is not a clear winner. Viceversa, corels, and sbrl provide a low-complexity highly concise model at the cost of the harmonic score. Finally, it is worth to underline that rrs displays consistent and stable performance across all the metrics independently from the dataset or the black box.

Table 2. Harmonic score, conciseness and complexity (in terms of ruleset size and average rule length, respectively) for rrs with \(\beta = 75\), and for the global interpretable models cpar, corels and sbrl explaining the NN black box.
Table 3. Harmonic score, conciseness and complexity (in terms of ruleset size and average rule length, respectively) for rrs with \(\beta = 75\), and for the global interpretable models cpar, corels and sbrl explaining the RF black box.

In Fig. 2 we show that if we replace the local rules in the rrs scoring system with the global rules extracted by cpar, corels and sbrl there is a clear drop in the performance with respect to rrs. Analyzing the fidelity and coverage we observe that several methods show sub-par fidelity regardless of the rule filtering, and in some cases, they fail in generating output rules (corels on german), with cpar being the best method after the scoring system with rrs. We attribute the poor performance of corels and sbrl to the low number of rules generated (see Tables 2 and 3 for \(\beta = 0\)).

Fig. 2.
figure 2

Fidelity, coverage and harmonic score for rrs on local explanations and global methods for NN, for the different datasets varying the pruning percentile threshold \(\beta \). The highest score is highlighted by a double marker.

5.4 Qualitative Evaluation

In this section, we explore the rules employed by the rrs scoring system, cpar, corels and sbrl to explain the decision of a sample of instances. In particular, we consider two instances \(x_1\) adn \(x_2\) from the compas dataset for which using the RF as black box we have \(b(x_1) = High \) and \(b(x_2)= Low \).

figure a

We report in the following the rules selected to explain the black box decision.

figure b

We notice that while all methods are able to capture significant features (age, priors count, past recidivism), rrs leverages longer and more detailed rules than cpar. This behavior is also empirically supported by the data shown in Tables 2 and 3 and is due to the local input rules, which are longer than the global ones. We leave the study of the effect of input length on rrs and human-subject experiments for future study.

6 Conclusion

In this paper, we have proposed a scoring system for explaining the global behavior of a black box classifier starting from a set of local explanations in the form of rules. To guarantee high performance and to account for important properties when selecting the most relevant rules, we have defined the rule relevance score (rrs). We have compared rrs to baseline scores finding comparable fidelity and significantly better performances in terms of coverage. We have also found that coverage does not correlate with fidelity. In addition, we have compared the rrs scoring system with state-of-the-art global explainers, observing that rrs has comparable performance but is much more stable across different datasets and black box models, both in terms of accountability and complexity. As future work, we indicate the definition of more fine-grained filtering scores to further reduce the output size. Moreover, we would like to experiment with different local explanations. Finally, a case study involving real users would be helpful to better asses the goodness of the global explanation derived with our approach.