Explanation with the Winter value: Efficient computation for hierarchical Choquet integrals

doi:10.1016/j.ijar.2022.09.008

International Journal of Approximate Reasoning

Volume 151, December 2022, Pages 225-250

https://doi.org/10.1016/j.ijar.2022.09.008 Get rights and content

Abstract

Multi-Criteria Decision Aiding arises in many industrial applications where the user needs an explanation of the recommendation. We consider, in particular, an explanation taking the form of a contribution level assigned to each variable. Decision models are often hierarchical, and the influence is computed by the Winter value, which is an extension of the Shapley value on trees. The contribution of the paper is to propose two exact methods to efficiently compute the Winter values for a very general class of decision models known as the Choquet integral. The first one is an analytical expression for a flat model. The second one is an exact algorithm for a hierarchical model. The main idea of this algorithm is to prune the combinatorial structure on which the Winter value is computed, based on the upper and lower bounds of the utility on subtrees. Extensive simulations show that this new algorithm provides very significant computation gains compared to the state of the art.

Introduction

The ability to explain AI algorithms is key in many domains [4]. We are interested in explaining decisions evaluated on several criteria. Multi-Criteria Decision Aiding (MCDA) aims at helping a decision maker to evaluate a set of alternatives on the basis of multiple criteria potentially conflicting with each other [24]. A classical end goal for the user is to select one alternative among several. A very versatile MCDA model is the Hierarchical Choquet Integral (HCI) model [62], [10], [9]. This latter is composed of a set of Choquet integrals organized in a hierarchical way, where the hierarchy comes from domain knowledge and eases the interpretability of the model. The Choquet integral generalizes the weighted sum and can capture various forms of interaction among criteria [16].

Example 1

The supervision of a metro line requires monitoring the satisfaction of passengers daily. It is measured from several quality criteria: P (Punctuality of the trains w.r.t. timetable), CT (number of Canceled Trains), R (Regularity: mean time between two successive trains) and TTT (Train Travel Time: average journey time inside a train). These criteria are assessed against two types of hours (PH: Peak Hour; OPH: Off-Peak Hour), and three line segments on the metro line (M1, M2 and M3). In the end, there are $4 \times 3 \times 2 = 24$ elementary criteria.

The criteria are organized hierarchically – see Fig. 1. The organization of the four elementary criteria (P, …,TTT) is only shown for node PH-M₁ for readability reasons. But they are also present in the other nodes, in place of “⋯”. The top node PQoS represents the Passengers' Quality of Service. The second level is a decomposition regarding the hour (PH vs. OPH). The next level separates the evaluation of each segment. For each hour type and segment, the four criteria (P, …,TTT) are also organized hierarchically. P and CT are related to the Planned Schedule (PS), whereas R and TTT are related to the Overall Train Travel (OTT). ■

In Example 1, the supervision operator wishes to know whether the current situation is preferred (according to the PQoS utility) to a situation of the past. As an explanation, the operator needs to understand which nodes in the tree are at the origin of this preference. This is obtained by computing an index measuring the influence of each node in the tree, on the preference between two options. This influence index can be computed to a leaf node such as PH-M₁-P, or an intermediate node such as PH-M₁, as all nodes in the tree make sense to the user. There are many connections with Feature Attribution (FA)¹ in Machine Learning. Computing the level of contribution of a feature in a classification black-box model or that of a criterion in an MCDA model is indeed similar.

The Shapley value is one of the leading concepts for FA [75]. Unlike our situation, feature attribution only computes the influence of leaves in a model. It has been argued that the Shapley value is not appropriates on trees [47], when we are interested in knowing the contribution level of not only the leaves but also other nodes. A specific value for trees – called the Winter value – has been defined [76], [47]. The Winter value takes the form of a recursive call of the Shapley value at several nodes in the hierarchy. Our influence index applied to any node in a tree thus is obtained by the Winter value [47].

The main drawback of the Shapley and the Winter values is that their expressions contain an exponential number of terms in the number of inputs. Several approaches have been proposed to approximate the computation of the Shapley value [12], [53] – see Section 6. The drawback of these methods is that it is hard to have accurate and reliable bounds of the error made. We explore other avenues in this paper. This paper aims to propose efficient methods to compute the Winter value on hierarchical Choquet integrals. Section 2 recalls the background on MCDA, the Choquet integral, and the Winter value.

The first efficient method considers the case in which there are many criteria (possible several hundred or thousands) organized flatly – see Section 3. As the Choquet integral is a linear combination of minimum functions, we focus on the min function. We show that the Winter value of a min function can be computed by a linear number of operations. Section 3.3 considers the problem of interpreting an HCI model, which amounts to explaining the difference between two particular alternatives (namely alternatives with extreme values). A very simple and intuitive formula is given.

Section 4 presents a novel algorithm computing the Winter value for an HCI model. We first note that the Winter value (and also the Shapley value) can be written as an average added value of a given attribute over a combinatorial structure. Taking inspiration from Branch & Bound (BB) algorithms, we develop a method that prunes the exploration of the combinatorial structure when computing the average. An advantage of our algorithm compared to the existing ones is that it is exact. We develop this idea when the aggregation functions are Choquet integrals. A major difficulty is handling the hierarchy of aggregation models. Our algorithm is not a simple adaptation of BB as these latter compute the optimal solution in a combinatorial structure whereas we need to compute the average value of a quantity over all leaves of the tree.

Finally, for a particular class of Choquet integrals (namely the 2-additive model) that is widely used in practice, we experimentally demonstrate in Section 5 the efficiency of our algorithm on a large number of randomly generated trees and models.

Section snippets

Multi-attribute preference and aggregation models

We are interested in representing the preferences of a decision maker regarding how to compare several alternatives described by a finite set A of attributes. Attribute $j \in A$ is depicted by a set $X_{j}$ , and alternatives are elements of $X_{A}$ , where $X_{B} : = \times_{j \in B} X_{j}$ for any $B \subseteq A$ . Preferences are assumed to be modeled by a numerical utility $u : X_{A} \to R$ . For $x \in X_{A}$ and $B \subseteq A$ , $x_{B}$ denotes the restriction of x on B.

To be interpretable, model u is organized hierarchically, as in Fig. 1. We consider a rooted tree $T$ whose

Approach “computation over the top node first while fixing the other levels”

Following Miller's law [58], an aggregation node is easily understandable by a decision maker if it has a relatively small number of children (say between 2 and 6). Most hierarchical MCDA models developed in applications satisfy this property. However, it might arise in practice that some nodes contain a very large number of inputs.

Example 4

Example 1

The number of children of aggregation nodes j=PH and j=OPH is the number of metro line segments (3 in the example). In practice, if we consider a large city with 20

Computation of the explanation for the HCI model by an approach “enumerate over top node first”

We present in this section a new algorithm to compute the influence index (6) of a given criterion $i \in A$ when all aggregation functions $F_{j}$ are general Choquet integrals. We first give some pruning and recursive properties on the influence index and then we describe the exact algorithm. We proceed in the opposite way compared to Section 3. Instead of fixing all levels but the first one, we start by simplifying the aggregation function at the first level (decomposition w.r.t. additivity, reduction

Experimental analysis of the computation time

We are interested in this section on comparing the computation time of WINT-P compared to WINT. In the general case, we cannot obtain interesting bounds between these two algorithms. In the best case, WINT-P can prune the tree at the highest level and requires only a few operations while WINT is exponential in the number of criteria. In the worst case, there might be no pruning in WINT-P, so that the whole tree $N_{1}^{'} \times \dots \times N_{q}^{'}$ needs to be explored, as for WINT. However, there are extra-computation

Related works

We have considered in this work a general class of MCDA model – namely the HCI. This concept is similar to multiple criteria hierarchy process for Choquet integrals [3]. Some decomposability properties of HCI are described in [27].

Several papers have proposed to learn this model. In [13], [40], a combination of genetic algorithm and neural network is used to learn the structure and the parameters of an HCI model. The elicitation of a Choquet integral and its utility functions based on linear

Conclusion

The Winter value, which is an extension of the Shapley value on trees, is used to identify which attributes mostly contribute to the decision. A bottleneck for the practical usage of these two values is their exponential number of terms. We have proposed two approaches to compute this value.

The first one is dedicated to the particular case where the root node has a large number of children. As the Choquet integral is a linear combination of minimum functions and the Winter value is linear in

Proof of Theorem 1

Before giving the proof of this result, let us start with elementary results.

Lemma 8

For $m \leq n$ , we have $\sum_{T \subseteq K} \frac{(s + t)! (n - s - t - 1)!}{n!} = \frac{s! (n - s - m - 1)!}{(n - m)!}$

Proof

Let $F_{m, n}^{s} : = \sum_{T \subseteq K} \frac{(s + t)! (n - s - t - 1)!}{n!}$ with $m \leq n$ . For $K = \emptyset$ , we have $F_{0, n}^{s} = \frac{s! (n - s - 1)!}{n!}$ .

For $K \neq \emptyset$ , we write for $i \in K$ $F_{m, n}^{s} = \sum_{T \subseteq K ∖ {i}} \frac{(s + t)! (n - s - t - 1)!}{n!} + \sum_{T^{'} \subseteq K ∖ {i}, T = T^{'} \cup {i}} \frac{(s + t^{'} + 1)! (n - s - t^{'} - 2)!}{n!} = \sum_{T \subseteq K ∖ {i}} \frac{(s + t)! (n - s - t - 2)!}{n!} [(n - s - t - 1) + (s + t + 1)] = \sum_{T \subseteq K ∖ {i}} \frac{(s + t)! (n - s - t - 2)!}{(n - 1)!} = F_{m - 1, n - 1}^{s}$ Hence as $m \leq n$ , $F_{m, n}^{s} = F_{m - 1, n - 1}^{s} = \dots = F_{0, n - m}^{s} = \frac{s! (n - m - s - 1)!}{(n - m)!}$ . ■

Proof of Theorem 1

First of all, if there exists $j \in A ∖ {i}$

CRediT authorship contribution statement

Christophe Labreuche: Conceptualization, Formal analysis, Software, Supervision, Validation, Visualization, Writing – original draft, Writing – review & editing.

Declaration of Competing Interest

The author reports was provided by Thales SA. Christophe Labreuche, H. Pouyllau, B. Goujon has patent #BET 17P3607 licensed to Thales SA.

Acknowledgements

This paper is supported by the European Union's Horizon 2020 research and innovation programme under grant agreement No 825619. AI4EU Project.²

References (79)

G. Carenini et al.
Generating and evaluating evaluative arguments
Artif. Intell.
(2006)
J. Castro et al.
Polynomial calculation of the Shapley value based on sampling
Comput. Oper. Res.
(2009)
C.Y. Chen et al.
Integration of genetic algorithms and neural networks for the formation of the classifier of the hierarchical Choquet integral
Inf. Sci.
(2020)
S. Fatima et al.
A linear approximation method for the Shapley value
Artif. Intell. J.
(2008)
M. Grabisch
The application of fuzzy integrals in multicriteria decision making
Eur. J. Oper. Res.
(1996)
M. Grabisch et al.
A review of capacity identification methods for Choquet integral based multi-attribute utility theory — applications of the Kappalab R package
Eur. J. Oper. Res.
(2008)
Y.C. Hu et al.
Choquet integral-based hierarchical networks for evaluating customer service perceptions on fast food stores
Expert Syst. Appl.
(2010)
I. Kojadinovic
Estimation of the weights of interacting criteria from the set of profiles by means of information-theoretic functionals
Eur. J. Oper. Res.
(2004)
Ch. Labreuche
A general framework for explaining the results of a multi-attribute preference model
Artif. Intell.
(2011)
C. Marco-Detchart et al.
Neuro-inspired edge feature fusion using Choquet integrals
Inf. Sci.
(2021)

K. Aas et al.

Explaining individual predictions when features are dependent: more accurate approximations to Shapley values

M. Ancona et al.

Explaining deep neural networks with a polynomial time algorithm for Shapley value approximation

S. Angilella et al.

Multiple criteria hierarchy process for the Choquet integral

A.B. Arrieta et al.

Explainable artificial intelligence (XAI): concepts, taxonomies, opportunities and challenges toward responsible AI

Inf. Fusion

(2020)

K. Belahcène et al.

Accountable approval sorting

K. Belahcène et al.

Comparing options with argument schemes powered by cancellation

G. Beliakov et al.

Choquet integral-based measures of economic welfare and species diversity

Int. J. Intell. Syst.

(2022)

P. Bhowal et al.

Fuzzy ensemble of deep learning models using Choquet fuzzy integral, coalition game and information theory for breast cancer histology classification

Expert Syst. Appl.

(2022)

R. Bresson et al.

On the identifiability of hierarchical decision models

Roman Bresson et al.

Neural representation and learning of hierarchical 2-additive Choquet integrals

H. Chen et al.

Explaining models by propagating Shapley values of local components

J. Chen et al.

L-Shapley and C-Shapley: efficient model interpretation for structured data

G. Choquet

Theory of capacities

Ann. Inst. Fourier

(1953)

S. Cohen et al.

Feature selection based on the Shapley value

A. Datta et al.

Algorithmic transparency via quantitative input influence: theory and experiments with learning systems

H. Evangelista de Oliveira et al.

Identification of the Choquet integral parameters in the interaction index domain by means of sparse modeling

X. Deng et al.

On the complexity of cooperative solution concepts

Math. Oper. Res.

(1994)

S. Dumnić et al.

Application of the Choquet integral: a case study on a personnel selection problem

Sustainability

(2022)

ESASSP

EUROCONTROL Specification for ATM Surveillance System Performance (Volume 1)

(2012)

C. Frye et al.

Asymmetric Shapley values: incorporating causal knowledge into model-agnostic explainability

D. Fryer et al.

Shapley values for feature selection: the good, the bad, and the axioms

IEEE Access

(2021)

K. Fujimoto et al.

Hierarchical decomposition of the Choquet integral

L. Galand et al.

Dominance rules for the Choquet integral in multiobjective dynamic programming

A. Ghorbani et al.

Data Shapley: equitable valuation of data for machine learning

A. Ghorbani et al.

Neuron Shapley: discovering the responsible neurons

P. Giudici et al.

Shapley-Lorenz explainable artificial intelligence

M. Grabisch et al.

A decade of application of the Choquet and Sugeno integrals in multi-criteria decision aid

Ann. Oper. Res.

(2010)

R. Guidotti et al.

A survey of methods for explaining black box models

ACM Comput. Surv.

(2018)

Cited by (2)

Explaining black-box classifiers: Properties and functions
2023, International Journal of Approximate Reasoning
Explaining black-box classification models is a hot topic in AI, with the overall goal of improving trust in decisions made by such models. Several works have been done and diverse functions have been proposed. However, their formal properties and links have not been sufficiently studied. This paper presents four contributions: The first consists of investigating global explanations of black-box classifiers. We provide a formal and unifying framework in which such explanations are defined from the whole feature space. The framework is based on two concepts, which are seen as two types of global explanations: arguments in favour of (or pro) predictions and arguments against (or con) predictions. The second contribution consists of defining various types of local explanations (abductive explanations, counterfactuals, contrastive explanations) from the whole feature space, investigating their properties, links and differences, and showing how they relate to global explanations. The third contribution consists of analysing and defining explanation functions that generate (global, local) abductive explanations from incomplete information (i.e., from a subset of the feature space). We start by proposing two desirable properties that an explainer would satisfy, namely success and coherence. The former ensures the existence of explanations while the latter ensures their correctness. We show that in the incomplete case, the two properties cannot be satisfied together. The fourth contribution consists of proposing two functions that generate abductive explanations and which satisfy coherence at the expense of success.
Questionable stepwise explanations for a robust additive preference model
2023, International Journal of Approximate Reasoning
In this paper, we study Multiple Criteria Decision Aiding (MCDA) problems modeled using an additive value function. We consider an epistemic framework in which the preferences of the decision-maker are imprecisely specified, yielding a robust additive preference model. In this context, we are interested in explaining recommendations derived from such robust model using a transitive sequence of preference swaps. Previous work laid the foundations for explaining the necessary preference relation through a sequence of necessary preference swaps. We extend this to take into account non-necessary preference, yielding to so-called “questionable explanations”: a chain of alternatives which is non-increasing w.r.t. preference of the decision-maker. This approach provides additional descriptive power for explaining robust recommendations. We propose an efficient resolution engine based on Mixed-Integer Linear Programs, and we conduct numerical experiments to assess the benefit of our explanation strategy.

^☆: This paper is an extended version of the conference paper [50].

View full text

Explanation with the Winter value: Efficient computation for hierarchical Choquet integrals☆

Abstract

Introduction

Section snippets

Multi-attribute preference and aggregation models

Approach “computation over the top node first while fixing the other levels”

Example 1

Computation of the explanation for the HCI model by an approach “enumerate over top node first”

Experimental analysis of the computation time

Related works

Conclusion

Proof of Theorem 1

CRediT authorship contribution statement

Declaration of Competing Interest

Acknowledgements

Artif. Intell.

Comput. Oper. Res.

Inf. Sci.

Artif. Intell. J.

Eur. J. Oper. Res.

Eur. J. Oper. Res.

Expert Syst. Appl.

Eur. J. Oper. Res.

Artif. Intell.

Inf. Sci.

Explaining individual predictions when features are dependent: more accurate approximations to Shapley values

Explaining deep neural networks with a polynomial time algorithm for Shapley value approximation

Multiple criteria hierarchy process for the Choquet integral

Explainable artificial intelligence (XAI): concepts, taxonomies, opportunities and challenges toward responsible AI

Inf. Fusion

Accountable approval sorting

Comparing options with argument schemes powered by cancellation

Choquet integral-based measures of economic welfare and species diversity

Int. J. Intell. Syst.

Fuzzy ensemble of deep learning models using Choquet fuzzy integral, coalition game and information theory for breast cancer histology classification

Expert Syst. Appl.

On the identifiability of hierarchical decision models

Neural representation and learning of hierarchical 2-additive Choquet integrals

Explaining models by propagating Shapley values of local components

L-Shapley and C-Shapley: efficient model interpretation for structured data

Theory of capacities

Ann. Inst. Fourier

Feature selection based on the Shapley value

Algorithmic transparency via quantitative input influence: theory and experiments with learning systems

Identification of the Choquet integral parameters in the interaction index domain by means of sparse modeling

On the complexity of cooperative solution concepts

Math. Oper. Res.

Application of the Choquet integral: a case study on a personnel selection problem

Sustainability

EUROCONTROL Specification for ATM Surveillance System Performance (Volume 1)

Asymmetric Shapley values: incorporating causal knowledge into model-agnostic explainability

Shapley values for feature selection: the good, the bad, and the axioms

IEEE Access

Hierarchical decomposition of the Choquet integral

Dominance rules for the Choquet integral in multiobjective dynamic programming

Data Shapley: equitable valuation of data for machine learning

Neuron Shapley: discovering the responsible neurons

Shapley-Lorenz explainable artificial intelligence

A decade of application of the Choquet and Sugeno integrals in multi-criteria decision aid

Ann. Oper. Res.

A survey of methods for explaining black box models

ACM Comput. Surv.