Explanation with the Winter value: Efficient computation for hierarchical Choquet integrals

https://doi.org/10.1016/j.ijar.2022.09.008Get rights and content

Abstract

Multi-Criteria Decision Aiding arises in many industrial applications where the user needs an explanation of the recommendation. We consider, in particular, an explanation taking the form of a contribution level assigned to each variable. Decision models are often hierarchical, and the influence is computed by the Winter value, which is an extension of the Shapley value on trees. The contribution of the paper is to propose two exact methods to efficiently compute the Winter values for a very general class of decision models known as the Choquet integral. The first one is an analytical expression for a flat model. The second one is an exact algorithm for a hierarchical model. The main idea of this algorithm is to prune the combinatorial structure on which the Winter value is computed, based on the upper and lower bounds of the utility on subtrees. Extensive simulations show that this new algorithm provides very significant computation gains compared to the state of the art.

Introduction

The ability to explain AI algorithms is key in many domains [4]. We are interested in explaining decisions evaluated on several criteria. Multi-Criteria Decision Aiding (MCDA) aims at helping a decision maker to evaluate a set of alternatives on the basis of multiple criteria potentially conflicting with each other [24]. A classical end goal for the user is to select one alternative among several. A very versatile MCDA model is the Hierarchical Choquet Integral (HCI) model [62], [10], [9]. This latter is composed of a set of Choquet integrals organized in a hierarchical way, where the hierarchy comes from domain knowledge and eases the interpretability of the model. The Choquet integral generalizes the weighted sum and can capture various forms of interaction among criteria [16].

Example 1

The supervision of a metro line requires monitoring the satisfaction of passengers daily. It is measured from several quality criteria: P (Punctuality of the trains w.r.t. timetable), CT (number of Canceled Trains), R (Regularity: mean time between two successive trains) and TTT (Train Travel Time: average journey time inside a train). These criteria are assessed against two types of hours (PH: Peak Hour; OPH: Off-Peak Hour), and three line segments on the metro line (M1, M2 and M3). In the end, there are 4×3×2=24 elementary criteria.

The criteria are organized hierarchically – see Fig. 1. The organization of the four elementary criteria (P, …,TTT) is only shown for node PH-M1 for readability reasons. But they are also present in the other nodes, in place of “⋯”. The top node PQoS represents the Passengers' Quality of Service. The second level is a decomposition regarding the hour (PH vs. OPH). The next level separates the evaluation of each segment. For each hour type and segment, the four criteria (P, …,TTT) are also organized hierarchically. P and CT are related to the Planned Schedule (PS), whereas R and TTT are related to the Overall Train Travel (OTT).  

In Example 1, the supervision operator wishes to know whether the current situation is preferred (according to the PQoS utility) to a situation of the past. As an explanation, the operator needs to understand which nodes in the tree are at the origin of this preference. This is obtained by computing an index measuring the influence of each node in the tree, on the preference between two options. This influence index can be computed to a leaf node such as PH-M1-P, or an intermediate node such as PH-M1, as all nodes in the tree make sense to the user. There are many connections with Feature Attribution (FA)1 in Machine Learning. Computing the level of contribution of a feature in a classification black-box model or that of a criterion in an MCDA model is indeed similar.

The Shapley value is one of the leading concepts for FA [75]. Unlike our situation, feature attribution only computes the influence of leaves in a model. It has been argued that the Shapley value is not appropriates on trees [47], when we are interested in knowing the contribution level of not only the leaves but also other nodes. A specific value for trees – called the Winter value – has been defined [76], [47]. The Winter value takes the form of a recursive call of the Shapley value at several nodes in the hierarchy. Our influence index applied to any node in a tree thus is obtained by the Winter value [47].

The main drawback of the Shapley and the Winter values is that their expressions contain an exponential number of terms in the number of inputs. Several approaches have been proposed to approximate the computation of the Shapley value [12], [53] – see Section 6. The drawback of these methods is that it is hard to have accurate and reliable bounds of the error made. We explore other avenues in this paper. This paper aims to propose efficient methods to compute the Winter value on hierarchical Choquet integrals. Section 2 recalls the background on MCDA, the Choquet integral, and the Winter value.

The first efficient method considers the case in which there are many criteria (possible several hundred or thousands) organized flatly – see Section 3. As the Choquet integral is a linear combination of minimum functions, we focus on the min function. We show that the Winter value of a min function can be computed by a linear number of operations. Section 3.3 considers the problem of interpreting an HCI model, which amounts to explaining the difference between two particular alternatives (namely alternatives with extreme values). A very simple and intuitive formula is given.

Section 4 presents a novel algorithm computing the Winter value for an HCI model. We first note that the Winter value (and also the Shapley value) can be written as an average added value of a given attribute over a combinatorial structure. Taking inspiration from Branch & Bound (BB) algorithms, we develop a method that prunes the exploration of the combinatorial structure when computing the average. An advantage of our algorithm compared to the existing ones is that it is exact. We develop this idea when the aggregation functions are Choquet integrals. A major difficulty is handling the hierarchy of aggregation models. Our algorithm is not a simple adaptation of BB as these latter compute the optimal solution in a combinatorial structure whereas we need to compute the average value of a quantity over all leaves of the tree.

Finally, for a particular class of Choquet integrals (namely the 2-additive model) that is widely used in practice, we experimentally demonstrate in Section 5 the efficiency of our algorithm on a large number of randomly generated trees and models.

Section snippets

Multi-attribute preference and aggregation models

We are interested in representing the preferences of a decision maker regarding how to compare several alternatives described by a finite set A of attributes. Attribute jA is depicted by a set Xj, and alternatives are elements of XA, where XB:=×jBXj for any BA. Preferences are assumed to be modeled by a numerical utility u:XAR. For xXA and BA, xB denotes the restriction of x on B.

To be interpretable, model u is organized hierarchically, as in Fig. 1. We consider a rooted tree T whose

Approach “computation over the top node first while fixing the other levels”

Following Miller's law [58], an aggregation node is easily understandable by a decision maker if it has a relatively small number of children (say between 2 and 6). Most hierarchical MCDA models developed in applications satisfy this property. However, it might arise in practice that some nodes contain a very large number of inputs.

Example 4

Example 1

The number of children of aggregation nodes j=PH and j=OPH is the number of metro line segments (3 in the example). In practice, if we consider a large city with 20

Computation of the explanation for the HCI model by an approach “enumerate over top node first”

We present in this section a new algorithm to compute the influence index (6) of a given criterion iA when all aggregation functions Fj are general Choquet integrals. We first give some pruning and recursive properties on the influence index and then we describe the exact algorithm. We proceed in the opposite way compared to Section 3. Instead of fixing all levels but the first one, we start by simplifying the aggregation function at the first level (decomposition w.r.t. additivity, reduction

Experimental analysis of the computation time

We are interested in this section on comparing the computation time of WINT-P compared to WINT. In the general case, we cannot obtain interesting bounds between these two algorithms. In the best case, WINT-P can prune the tree at the highest level and requires only a few operations while WINT is exponential in the number of criteria. In the worst case, there might be no pruning in WINT-P, so that the whole tree N1××Nq needs to be explored, as for WINT. However, there are extra-computation

Related works

We have considered in this work a general class of MCDA model – namely the HCI. This concept is similar to multiple criteria hierarchy process for Choquet integrals [3]. Some decomposability properties of HCI are described in [27].

Several papers have proposed to learn this model. In [13], [40], a combination of genetic algorithm and neural network is used to learn the structure and the parameters of an HCI model. The elicitation of a Choquet integral and its utility functions based on linear

Conclusion

The Winter value, which is an extension of the Shapley value on trees, is used to identify which attributes mostly contribute to the decision. A bottleneck for the practical usage of these two values is their exponential number of terms. We have proposed two approaches to compute this value.

The first one is dedicated to the particular case where the root node has a large number of children. As the Choquet integral is a linear combination of minimum functions and the Winter value is linear in

Proof of Theorem 1

Before giving the proof of this result, let us start with elementary results.

Lemma 8

For mn, we haveTK(s+t)!(nst1)!n!=s!(nsm1)!(nm)!

Proof

Let Fm,ns:=TK(s+t)!(nst1)!n! with mn. For K=, we have F0,ns=s!(ns1)!n!.

For K, we write for iKFm,ns=TK{i}(s+t)!(nst1)!n!+TK{i},T=T{i}(s+t+1)!(nst2)!n!=TK{i}(s+t)!(nst2)!n![(nst1)+(s+t+1)]=TK{i}(s+t)!(nst2)!(n1)!=Fm1,n1s Hence as mn, Fm,ns=Fm1,n1s==F0,nms=s!(nms1)!(nm)!. 

Proof of Theorem 1

First of all, if there exists jA{i}

CRediT authorship contribution statement

Christophe Labreuche: Conceptualization, Formal analysis, Software, Supervision, Validation, Visualization, Writing – original draft, Writing – review & editing.

Declaration of Competing Interest

The author reports was provided by Thales SA. Christophe Labreuche, H. Pouyllau, B. Goujon has patent #BET 17P3607 licensed to Thales SA.

Acknowledgements

This paper is supported by the European Union's Horizon 2020 research and innovation programme under grant agreement No 825619. AI4EU Project.2

References (79)

  • K. Aas et al.

    Explaining individual predictions when features are dependent: more accurate approximations to Shapley values

  • M. Ancona et al.

    Explaining deep neural networks with a polynomial time algorithm for Shapley value approximation

  • S. Angilella et al.

    Multiple criteria hierarchy process for the Choquet integral

  • A.B. Arrieta et al.

    Explainable artificial intelligence (XAI): concepts, taxonomies, opportunities and challenges toward responsible AI

    Inf. Fusion

    (2020)
  • K. Belahcène et al.

    Accountable approval sorting

  • K. Belahcène et al.

    Comparing options with argument schemes powered by cancellation

  • G. Beliakov et al.

    Choquet integral-based measures of economic welfare and species diversity

    Int. J. Intell. Syst.

    (2022)
  • P. Bhowal et al.

    Fuzzy ensemble of deep learning models using Choquet fuzzy integral, coalition game and information theory for breast cancer histology classification

    Expert Syst. Appl.

    (2022)
  • R. Bresson et al.

    On the identifiability of hierarchical decision models

  • Roman Bresson et al.

    Neural representation and learning of hierarchical 2-additive Choquet integrals

  • H. Chen et al.

    Explaining models by propagating Shapley values of local components

  • J. Chen et al.

    L-Shapley and C-Shapley: efficient model interpretation for structured data

  • G. Choquet

    Theory of capacities

    Ann. Inst. Fourier

    (1953)
  • S. Cohen et al.

    Feature selection based on the Shapley value

  • A. Datta et al.

    Algorithmic transparency via quantitative input influence: theory and experiments with learning systems

  • H. Evangelista de Oliveira et al.

    Identification of the Choquet integral parameters in the interaction index domain by means of sparse modeling

  • X. Deng et al.

    On the complexity of cooperative solution concepts

    Math. Oper. Res.

    (1994)
  • S. Dumnić et al.

    Application of the Choquet integral: a case study on a personnel selection problem

    Sustainability

    (2022)
  • ESASSP

    EUROCONTROL Specification for ATM Surveillance System Performance (Volume 1)

    (2012)
  • C. Frye et al.

    Asymmetric Shapley values: incorporating causal knowledge into model-agnostic explainability

  • D. Fryer et al.

    Shapley values for feature selection: the good, the bad, and the axioms

    IEEE Access

    (2021)
  • K. Fujimoto et al.

    Hierarchical decomposition of the Choquet integral

  • L. Galand et al.

    Dominance rules for the Choquet integral in multiobjective dynamic programming

  • A. Ghorbani et al.

    Data Shapley: equitable valuation of data for machine learning

  • A. Ghorbani et al.

    Neuron Shapley: discovering the responsible neurons

  • P. Giudici et al.

    Shapley-Lorenz explainable artificial intelligence

  • M. Grabisch et al.

    A decade of application of the Choquet and Sugeno integrals in multi-criteria decision aid

    Ann. Oper. Res.

    (2010)
  • R. Guidotti et al.

    A survey of methods for explaining black box models

    ACM Comput. Surv.

    (2018)
  • Cited by (2)

    • Explaining black-box classifiers: Properties and functions

      2023, International Journal of Approximate Reasoning
    • Questionable stepwise explanations for a robust additive preference model

      2023, International Journal of Approximate Reasoning

    This paper is an extended version of the conference paper [50].

    View full text