Explanation with the Winter value: Efficient computation for hierarchical Choquet integrals☆
Introduction
The ability to explain AI algorithms is key in many domains [4]. We are interested in explaining decisions evaluated on several criteria. Multi-Criteria Decision Aiding (MCDA) aims at helping a decision maker to evaluate a set of alternatives on the basis of multiple criteria potentially conflicting with each other [24]. A classical end goal for the user is to select one alternative among several. A very versatile MCDA model is the Hierarchical Choquet Integral (HCI) model [62], [10], [9]. This latter is composed of a set of Choquet integrals organized in a hierarchical way, where the hierarchy comes from domain knowledge and eases the interpretability of the model. The Choquet integral generalizes the weighted sum and can capture various forms of interaction among criteria [16].
Example 1 The supervision of a metro line requires monitoring the satisfaction of passengers daily. It is measured from several quality criteria: P (Punctuality of the trains w.r.t. timetable), CT (number of Canceled Trains), R (Regularity: mean time between two successive trains) and TTT (Train Travel Time: average journey time inside a train). These criteria are assessed against two types of hours (PH: Peak Hour; OPH: Off-Peak Hour), and three line segments on the metro line (M1, M2 and M3). In the end, there are elementary criteria. The criteria are organized hierarchically – see Fig. 1. The organization of the four elementary criteria (P, …,TTT) is only shown for node PH-M1 for readability reasons. But they are also present in the other nodes, in place of “⋯”. The top node PQoS represents the Passengers' Quality of Service. The second level is a decomposition regarding the hour (PH vs. OPH). The next level separates the evaluation of each segment. For each hour type and segment, the four criteria (P, …,TTT) are also organized hierarchically. P and CT are related to the Planned Schedule (PS), whereas R and TTT are related to the Overall Train Travel (OTT). ■
In Example 1, the supervision operator wishes to know whether the current situation is preferred (according to the PQoS utility) to a situation of the past. As an explanation, the operator needs to understand which nodes in the tree are at the origin of this preference. This is obtained by computing an index measuring the influence of each node in the tree, on the preference between two options. This influence index can be computed to a leaf node such as PH-M1-P, or an intermediate node such as PH-M1, as all nodes in the tree make sense to the user. There are many connections with Feature Attribution (FA)1 in Machine Learning. Computing the level of contribution of a feature in a classification black-box model or that of a criterion in an MCDA model is indeed similar.
The Shapley value is one of the leading concepts for FA [75]. Unlike our situation, feature attribution only computes the influence of leaves in a model. It has been argued that the Shapley value is not appropriates on trees [47], when we are interested in knowing the contribution level of not only the leaves but also other nodes. A specific value for trees – called the Winter value – has been defined [76], [47]. The Winter value takes the form of a recursive call of the Shapley value at several nodes in the hierarchy. Our influence index applied to any node in a tree thus is obtained by the Winter value [47].
The main drawback of the Shapley and the Winter values is that their expressions contain an exponential number of terms in the number of inputs. Several approaches have been proposed to approximate the computation of the Shapley value [12], [53] – see Section 6. The drawback of these methods is that it is hard to have accurate and reliable bounds of the error made. We explore other avenues in this paper. This paper aims to propose efficient methods to compute the Winter value on hierarchical Choquet integrals. Section 2 recalls the background on MCDA, the Choquet integral, and the Winter value.
The first efficient method considers the case in which there are many criteria (possible several hundred or thousands) organized flatly – see Section 3. As the Choquet integral is a linear combination of minimum functions, we focus on the min function. We show that the Winter value of a min function can be computed by a linear number of operations. Section 3.3 considers the problem of interpreting an HCI model, which amounts to explaining the difference between two particular alternatives (namely alternatives with extreme values). A very simple and intuitive formula is given.
Section 4 presents a novel algorithm computing the Winter value for an HCI model. We first note that the Winter value (and also the Shapley value) can be written as an average added value of a given attribute over a combinatorial structure. Taking inspiration from Branch & Bound (BB) algorithms, we develop a method that prunes the exploration of the combinatorial structure when computing the average. An advantage of our algorithm compared to the existing ones is that it is exact. We develop this idea when the aggregation functions are Choquet integrals. A major difficulty is handling the hierarchy of aggregation models. Our algorithm is not a simple adaptation of BB as these latter compute the optimal solution in a combinatorial structure whereas we need to compute the average value of a quantity over all leaves of the tree.
Finally, for a particular class of Choquet integrals (namely the 2-additive model) that is widely used in practice, we experimentally demonstrate in Section 5 the efficiency of our algorithm on a large number of randomly generated trees and models.
Section snippets
Multi-attribute preference and aggregation models
We are interested in representing the preferences of a decision maker regarding how to compare several alternatives described by a finite set A of attributes. Attribute is depicted by a set , and alternatives are elements of , where for any . Preferences are assumed to be modeled by a numerical utility . For and , denotes the restriction of x on B.
To be interpretable, model u is organized hierarchically, as in Fig. 1. We consider a rooted tree whose
Approach “computation over the top node first while fixing the other levels”
Following Miller's law [58], an aggregation node is easily understandable by a decision maker if it has a relatively small number of children (say between 2 and 6). Most hierarchical MCDA models developed in applications satisfy this property. However, it might arise in practice that some nodes contain a very large number of inputs.
Example 4 The number of children of aggregation nodes j=PH and j=OPH is the number of metro line segments (3 in the example). In practice, if we consider a large city with 20Example 1
Computation of the explanation for the HCI model by an approach “enumerate over top node first”
We present in this section a new algorithm to compute the influence index (6) of a given criterion when all aggregation functions are general Choquet integrals. We first give some pruning and recursive properties on the influence index and then we describe the exact algorithm. We proceed in the opposite way compared to Section 3. Instead of fixing all levels but the first one, we start by simplifying the aggregation function at the first level (decomposition w.r.t. additivity, reduction
Experimental analysis of the computation time
We are interested in this section on comparing the computation time of WINT-P compared to WINT. In the general case, we cannot obtain interesting bounds between these two algorithms. In the best case, WINT-P can prune the tree at the highest level and requires only a few operations while WINT is exponential in the number of criteria. In the worst case, there might be no pruning in WINT-P, so that the whole tree needs to be explored, as for WINT. However, there are extra-computation
Related works
We have considered in this work a general class of MCDA model – namely the HCI. This concept is similar to multiple criteria hierarchy process for Choquet integrals [3]. Some decomposability properties of HCI are described in [27].
Several papers have proposed to learn this model. In [13], [40], a combination of genetic algorithm and neural network is used to learn the structure and the parameters of an HCI model. The elicitation of a Choquet integral and its utility functions based on linear
Conclusion
The Winter value, which is an extension of the Shapley value on trees, is used to identify which attributes mostly contribute to the decision. A bottleneck for the practical usage of these two values is their exponential number of terms. We have proposed two approaches to compute this value.
The first one is dedicated to the particular case where the root node has a large number of children. As the Choquet integral is a linear combination of minimum functions and the Winter value is linear in
Proof of Theorem 1
Before giving the proof of this result, let us start with elementary results.
Lemma 8 For , we have
Proof Let with . For , we have . For , we write for Hence as , . ■
Proof of Theorem 1 First of all, if there exists
CRediT authorship contribution statement
Christophe Labreuche: Conceptualization, Formal analysis, Software, Supervision, Validation, Visualization, Writing – original draft, Writing – review & editing.
Declaration of Competing Interest
The author reports was provided by Thales SA. Christophe Labreuche, H. Pouyllau, B. Goujon has patent #BET 17P3607 licensed to Thales SA.
Acknowledgements
This paper is supported by the European Union's Horizon 2020 research and innovation programme under grant agreement No 825619. AI4EU Project.2
References (79)
- et al.
Generating and evaluating evaluative arguments
Artif. Intell.
(2006) - et al.
Polynomial calculation of the Shapley value based on sampling
Comput. Oper. Res.
(2009) - et al.
Integration of genetic algorithms and neural networks for the formation of the classifier of the hierarchical Choquet integral
Inf. Sci.
(2020) - et al.
A linear approximation method for the Shapley value
Artif. Intell. J.
(2008) The application of fuzzy integrals in multicriteria decision making
Eur. J. Oper. Res.
(1996)- et al.
A review of capacity identification methods for Choquet integral based multi-attribute utility theory — applications of the Kappalab R package
Eur. J. Oper. Res.
(2008) - et al.
Choquet integral-based hierarchical networks for evaluating customer service perceptions on fast food stores
Expert Syst. Appl.
(2010) Estimation of the weights of interacting criteria from the set of profiles by means of information-theoretic functionals
Eur. J. Oper. Res.
(2004)A general framework for explaining the results of a multi-attribute preference model
Artif. Intell.
(2011)- et al.
Neuro-inspired edge feature fusion using Choquet integrals
Inf. Sci.
(2021)
Explaining individual predictions when features are dependent: more accurate approximations to Shapley values
Explaining deep neural networks with a polynomial time algorithm for Shapley value approximation
Multiple criteria hierarchy process for the Choquet integral
Explainable artificial intelligence (XAI): concepts, taxonomies, opportunities and challenges toward responsible AI
Inf. Fusion
Accountable approval sorting
Comparing options with argument schemes powered by cancellation
Choquet integral-based measures of economic welfare and species diversity
Int. J. Intell. Syst.
Fuzzy ensemble of deep learning models using Choquet fuzzy integral, coalition game and information theory for breast cancer histology classification
Expert Syst. Appl.
On the identifiability of hierarchical decision models
Neural representation and learning of hierarchical 2-additive Choquet integrals
Explaining models by propagating Shapley values of local components
L-Shapley and C-Shapley: efficient model interpretation for structured data
Theory of capacities
Ann. Inst. Fourier
Feature selection based on the Shapley value
Algorithmic transparency via quantitative input influence: theory and experiments with learning systems
Identification of the Choquet integral parameters in the interaction index domain by means of sparse modeling
On the complexity of cooperative solution concepts
Math. Oper. Res.
Application of the Choquet integral: a case study on a personnel selection problem
Sustainability
EUROCONTROL Specification for ATM Surveillance System Performance (Volume 1)
Asymmetric Shapley values: incorporating causal knowledge into model-agnostic explainability
Shapley values for feature selection: the good, the bad, and the axioms
IEEE Access
Hierarchical decomposition of the Choquet integral
Dominance rules for the Choquet integral in multiobjective dynamic programming
Data Shapley: equitable valuation of data for machine learning
Neuron Shapley: discovering the responsible neurons
Shapley-Lorenz explainable artificial intelligence
A decade of application of the Choquet and Sugeno integrals in multi-criteria decision aid
Ann. Oper. Res.
A survey of methods for explaining black box models
ACM Comput. Surv.
Cited by (2)
Explaining black-box classifiers: Properties and functions
2023, International Journal of Approximate ReasoningQuestionable stepwise explanations for a robust additive preference model
2023, International Journal of Approximate Reasoning