Skip to main content

How Much Can Stratification Improve the Approximation of Shapley Values?

  • Conference paper
  • First Online:
Explainable Artificial Intelligence (xAI 2024)

Abstract

Over the last decade, the Shapley value has become one of the most widely applied tools to provide post-hoc explanations for black box models. However, its theoretically justified solution to the problem of dividing a collective benefit to the members of a group, such as features or data points, comes at a price. Without strong assumptions, the exponential number of member subsets excludes an exact calculation of the Shapley value. In search for a remedy, recent works have demonstrated the efficacy of approximations based on sampling with stratification, in which the sample space is partitioned into smaller subpopulations. The effectiveness of this technique mainly depends on the degree to which the allocation of available samples over the formed strata mirrors their unknown variances. To uncover the hypothetical potential of stratification, we investigate the gap in approximation quality caused by the lack of knowledge of the optimal allocation. Moreover, we combine recent advances to propose two state-of-the-art algorithms Adaptive SVARM and Continuous Adaptive SVARM that adjust the sample allocation on-the-fly. The potential of our approach is assessed in an empirical evaluation.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

References

  1. European commission. Ethics guidelines for trustworthy AI (2019). https://ec.europa.eu/digital-single-market/en/news/ethics-guidelines-trustworthy-ai

  2. Ancona, M., Öztireli, C., Gross, M.H.: Explaining deep neural networks with a polynomial time algorithm for shapley value approximation. In: Proceedings of International Conference on Machine Learning (ICML), pp. 272–281 (2019)

    Google Scholar 

  3. Aumann, R.J.J.: Economic applications of the shapley value. In: Mertens, J.F., Sorin, S. (eds.) Game-Theoretic Methods in General Equilibrium Analysis. NATO ASI Serie, vol. 77, pp. 121–133. Springer, Dordrecht (1994). https://doi.org/10.1007/978-94-017-1656-7_12

    Chapter  Google Scholar 

  4. Burgess, M.A., Chapman, A.C.: Approximating the shapley value using stratified empirical bernstein sampling. In: Proceedings of International Joint Conference on Artificial Intelligence (IJCAI), pp. 73–81 (2021)

    Google Scholar 

  5. Castro, J., Gómez, D., Molina, E., Tejada, J.: Improving polynomial estimation of the shapley value by stratified random sampling with optimum allocation. Comput. Oper. Res. 82, 180–188 (2017)

    Article  MathSciNet  Google Scholar 

  6. Castro, J., Gómez, D., Tejada, J.: Polynomial calculation of the shapley value based on sampling. Comput. Oper. Res. 36(5), 1726–1730 (2009)

    Article  MathSciNet  Google Scholar 

  7. Covert, I., Lundberg, S., Lee, S.I.: Shapley feature utility. In: Machine Learning in Computational Biology (2019)

    Google Scholar 

  8. Covert, I., Lundberg, S.M., Lee, S.: Understanding global feature contributions with additive importance measures. In: Proceedings of Advances in Neural Information Processing Systems (NeurIPS) (2020)

    Google Scholar 

  9. Deng, J., Dong, W., Socher, R., Li, L., Li, K., Fei-Fei, L.: Imagenet: a large-scale hierarchical image database. In: IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 248–255 (2009)

    Google Scholar 

  10. Deng, X., Papadimitriou, C.H.: On the complexity of cooperative solution concepts. Math. Oper. Res. 19(2), 257–266 (1994)

    Article  MathSciNet  Google Scholar 

  11. Fanaee-T, H.: Bike Sharing. UCI Machine Learning Repository (2013) https://doi.org/10.24432/C5W894

  12. Ghorbani, A., Zou, J.Y.: Data shapley: equitable valuation of data for machine learning. In: Proceedings of International Conference on Machine Learning (ICML), pp. 2242–2251 (2019)

    Google Scholar 

  13. Ghorbani, A., Zou, J.Y.: Neuron shapley: discovering the responsible neurons. In: Proceedings of Advances in Neural Information Processing Systems (NeurIPS) (2020)

    Google Scholar 

  14. Hofmann, H.: Statlog (German Credit Data). UCI Machine Learning Repository (1994). https://doi.org/10.24432/C5NC77

  15. Illés, F., Kerényi, P.: Estimation of the shapley value by ergodic sampling. CoRR abs/1906.05224 (2019)

    Google Scholar 

  16. Jia, R., et al.: Towards efficient data valuation based on the shapley value. In: Proceedings of the International Conference on Artificial Intelligence and Statistics (AISTATS), pp. 1167–1176 (2019)

    Google Scholar 

  17. Kohavi, R.: Scaling up the accuracy of Naive-Bayes classifiers: a decision-tree hybrid. In: Proceedings of International Conference on Knowledge Discovery and Data Mining (KDD), pp. 202–207 (1996)

    Google Scholar 

  18. Kolpaczki, P., Bengs, V., Muschalik, M., Hüllermeier, E.: Approximating the shapley value without marginal contributions. In: Proceeedings of AAAI Conference on Artificial Intelligence (AAAI), pp. 13246–13255 (2024)

    Google Scholar 

  19. Lundberg, S.M., Erion, G.G., Lee, S.: Consistent individualized feature attribution for tree ensembles. CoRR abs/1802.03888 (2018)

    Google Scholar 

  20. Lundberg, S.M., Lee, S.I.: A unified approach to interpreting model predictions. In: Proceedings of Advances in Neural Information Processing Systems (NeurIPS), pp. 4768–4777 (2017)

    Google Scholar 

  21. Maleki, S., Tran-Thanh, L., Hines, G., Rahwan, T., Rogers, A.: Bounding the estimation error of sampling-based shapley value approximation with/without stratifying. CoRR abs/1306.4265 (2013)

    Google Scholar 

  22. Mitchell, R., Cooper, J., Frank, E., Holmes, G.: Sampling permutations for shapley value estimation. J. Mach. Learn. Res. 23(43), 1–46 (2022)

    MathSciNet  Google Scholar 

  23. Moro, S., Laureano, R., Cortez, P.: Using data mining for bank direct marketing: an application of the CRIDP-DM methodology. In: Proceedings of European Simulation and Modelling Conference (ESM) (2011)

    Google Scholar 

  24. Neyman, J.: On the two different aspects of the representative method: the method of stratified sampling and the method of purposive selection. J. Roy. Stat. Soc. 97(4), 558–625 (1934)

    Article  Google Scholar 

  25. O’Brien, G., Gamal, A.E., Rajagopal, R.: Shapley value estimation for compensation of participants in demand response programs. IEEE Trans. Smart Grid 6(6), 2837–2844 (2015)

    Article  Google Scholar 

  26. Okhrati, R., Lipani, A.: A multilinear sampling algorithm to estimate shapley values. In: Proceedings of International Conference on Pattern Recognition (ICPR), pp. 7992–7999 (2020)

    Google Scholar 

  27. Rozemberczki, B., Sarkar, R.: The shapley value of classifiers in ensemble games. In: Proceedings of ACM International Conference on Information and Knowledge Management (CIKM), pp. 1558–1567 (2021)

    Google Scholar 

  28. Rozemberczki, B., et al.: The shapley value in machine learning. In: Proceedings of International Joint Conference on Artificial Intelligence (IJCAI), pp. 5572–5579 (2022)

    Google Scholar 

  29. Shapley, L.S.: A value for n-person games. In: Contributions to the Theory of Games (AM-28), Volume II, pp. 307–318. Princeton University Press (1953)

    Google Scholar 

  30. van Campen, T., Hamers, H., Husslage, B., Lindelauf, R.: A new approximation method for the shapley value applied to the WTC 9/11 terrorist attack. Soc. Netw. Anal. Min. 8(3), 1–12 (2018)

    Google Scholar 

Download references

Acknowledgments

This research was supported by the research training group Dataninja (Trustworthy AI for Seamless Problem Solving: Next Generation Intelligence Joins Robust Data Analysis) funded by the German federal state of North Rhine-Westphalia.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Patrick Kolpaczki .

Editor information

Editors and Affiliations

Ethics declarations

Disclosure of Interests

The authors have no competing interests to declare that are relevant to the content of this article.

Appendices

A Pseudocode

CalculateAllocation begins by computing the variance estimates prepared by the observations during the exploration phase. Next, it estimates the optimal allocation in potentially multiple iterations. The sizes \(\ell \) which are still part of the optimization problem are kept in \(\mathcal {M}\). Each size whose sample number \(\hat{m}_{\ell }^*\) does not exceed its number \(c_{\ell }\) from the exploration phase, i.e. \(\hat{m}_{\ell }^* \le c_{\ell }\), is assigned after computation to the set of dropouts \(\mathcal {F}\). The available budget \(\bar{T}'\) considered is reduced by the sum of sample numbers of the current dropouts. We assume Round to return a vector of natural numbers by rounding appropriately.

Update updates the affected stratum estimates identical to [18] by iterating over all players. The variance estimation is prepared by maintaining for each stratum the sum observed coalition values in \(\varSigma _{i,\ell }^{+/-}\) and the sum of squared values in \(\varOmega _{i,\ell }^{+/-}\). Note that the value function is only accessed once.

ExactCalculation iterates over all coalitions of size \(0,1,n-1,n\). Each coalition is used to call the Update procedure. After \(2n+2\) spending budget tokens the strata values \(\phi _{i,0}^-,\phi _{i,0^+},\phi _{i,1}^-,\phi _{i,n-2}^+,\phi _{i,n-1}^-,\phi _{i,n-1}^+\) are computed exactly.

Algorithm 2
figure b

CalculateAllocation

Algorithm 3
figure c

Update(A)

Algorithm 4
figure d

ExactCalculation [18]

B Proofs

Static Allocation for Stratified Sampling. Proof of Theorem 1: Let \(x_{i,\ell }^{(m)}\) be the m-th sampled marginal contribution of player i’s \(\ell \)-th stratum. Since \(m_{i,\ell } \ge 1\) for all \(i \in \mathcal {N}\) and \(\ell \in \mathcal {L}_{\varDelta }\), and the marginal contributions are drawn uniformly at random from their stratum, each estimate \(\hat{\phi }_{i,\ell }\) and \(\hat{\phi }_i\) is unbiased:

$$\begin{aligned} \mathbb {E} \left[ \hat{\phi }_i \right] = \frac{1}{n} \sum \limits _{\ell =0}^{n-1} \mathbb {E} \left[ \hat{\phi }_{i,\ell } \right] = \frac{1}{n} \sum \limits _{\ell =0}^{n-1} \mathbb {E} \left[ \frac{1}{m_{i,\ell }} \sum \limits _{m=1}^{m_{i,\ell }}x_{i,\ell }^{(m)} \right] = \frac{1}{n} \sum \limits _{\ell =0}^{n-1} \phi _{i,\ell } = \phi _i \, . \end{aligned}$$

Next, we investigate the variance where make use of the independent samples:

$$\begin{aligned} \mathbb {V} \left[ \hat{\phi }_i \right] = \frac{1}{n^2} \sum \limits _{\ell =0}^{n-1} \mathbb {V} \left[ \hat{\phi }_{i,\ell } \right] = \frac{1}{n^2} \sum \limits _{\ell =0}^{n-1} \frac{1}{m_{i,\ell }^2} \sum \limits _{m=1}^{m_{i,\ell }} \mathbb {V} \left[ x_{i,\ell }^{(m)} \right] = \frac{1}{n^2} \sum \limits _{\ell =0}^{n-1} \frac{\sigma _{i,\ell }^2}{m_{i,\ell }} \, . \end{aligned}$$

The bias-variance decomposition allows to combine both intermediate results and quantify the MSE for a single player. Averaging over all players yields:

$$\begin{aligned} \frac{1}{n} \sum \limits _{i=1}^n \mathbb {E} \left[ \left( \hat{\phi }_i - \phi _i \right) ^2 \right] = \frac{1}{n} \sum \limits _{i=1}^n \left( \mathbb {E} \left[ \hat{\phi }_i \right] - \phi _i \right) ^2 + \mathbb {V} \left[ \hat{\phi }_i \right] = \frac{1}{n^3} \sum \limits _{i=1}^n \sum \limits _{\ell =0}^{n-1} \frac{\sigma _{i,\ell }^2}{m_{i,\ell }} \, . \end{aligned}$$

   \(\square \)

Static Allocation for Stratified SVARM. Proof of Theorem 2: The unbiasedness of each estimate \(\hat{\phi }_{i,\ell }^+\), \(\hat{\phi }_{i,\ell }^-\), and \(\hat{\phi }_i\) has been shown in Lemma 8, Lemma 9, and Theorem 5 [18]. Let \(\bar{m}_{i,\ell ,+}, \bar{m}_{i,\ell ,-}\) be the number of times the estimate \(\hat{\phi }_{i,\ell }^+\) and respectively \(\hat{\phi }_{i,\ell }^-\) got updated after the warmup and \(m_{i,\ell ,+}, m_{i,\ell ,-}\) be the total numbers including the warmup.

Lemma 1

For any \(i \in \mathcal {N}\) and \(\ell \in \mathcal {L}_{\nu }'\) the number of updates are binomially distributed with \( \bar{m}_{i,\ell -1,+} \sim Bin\left( m_{\ell }, \frac{\ell }{n} \right) \) and \(\bar{m}_{i,\ell ,-} \sim Bin\left( m_{\ell }, \frac{n-\ell }{n} \right) \).

The proof is analogous to Lemma 10 in [18]. We utilize Lemma 1 to bound the following expectations for all \(i \in \mathcal {N}\) and all \(\ell \in \mathcal {L}_{\nu }'\) (cf. Lemma 13 [18]):

$$\begin{aligned} \mathbb {E} \left[ \frac{1}{m_{i,\ell -1,+}} \right] \le \frac{n}{m_{\ell } \cdot \ell }\quad \text {and} \quad \mathbb {E} \left[ \frac{1}{m_{i,\ell ,-}} \right] \le \frac{1}{\mathbb {E} \left[ \bar{m}_{i,\ell ,-} \right] } \le \frac{n}{m_\ell (n-\ell )} \, . \end{aligned}$$

From the proof of Theorem 6 in [18] we extract and reuse:

$$\begin{aligned} \mathbb {V} \left[ \hat{\phi }_i \right] \le & \ \mathbb {E} \left[ \frac{1}{n^2} \sum \limits _{\ell =2}^{n-2} \frac{\sigma _{i,\ell -1,+}^2}{m_{i,\ell -1,+}} + \frac{\sigma _{i,\ell ,-}^2}{m_{i,\ell ,-}} \right] \\ = & \ \frac{1}{n^2} \sum \limits _{\ell =2}^{n-2} \sigma _{i,\ell -1,+}^2 \cdot \mathbb {E} \left[ \frac{1}{m_{i,\ell -1,+}} \right] + \sigma _{i,\ell ,-}^2 \cdot \mathbb {E} \left[ \frac{1}{m_{i,\ell ,-}} \right] \\ \le & \ \frac{1}{n} \sum \limits _{\ell =2}^{n-2} \frac{1}{m_{\ell }} \left( \frac{\sigma _{i,\ell -1,+}^2}{\ell } + \frac{\sigma _{i,\ell ,-}^2}{n-\ell } \right) \, . \end{aligned}$$

The bound on the variance enables us to take advantage of the bias-variance decomposition since the unbiasedness is already given:

$$\begin{aligned} \mathbb {E} \left[ \left( \hat{\phi }_i - \phi _i \right) ^2 \right] = \left( \mathbb {E} \left[ \hat{\phi }_i \right] - \phi _i \right) ^2 + \mathbb {V} \left[ \hat{\phi }_i \right] \le \frac{1}{n} \sum \limits _{\ell =2}^{n-2} \frac{1}{m_{\ell }} \left( \frac{\sigma _{i,\ell -1,+}^2}{\ell } + \frac{\sigma _{i,\ell ,-}^2}{n-\ell } \right) \, . \end{aligned}$$

Averaging the mean squared error over the players completes the proof:

$$\begin{aligned} \frac{1}{n} \sum \limits _{i=1}^n \mathbb {E} \left[ \left( \hat{\phi }_i - \phi _i \right) ^2 \right] \le \frac{1}{n^2} \sum \limits _{\ell =2}^{n-2} \frac{1}{m_{\ell }} \sum \limits _{i=1}^n \frac{\sigma _{i,\ell -1,+}^2}{\ell } + \frac{\sigma _{i,\ell ,-}^2}{n-\ell } \, . \end{aligned}$$

   \(\square \)

Optimal Allocation for Stratified Sampling. Proof of Theorem 3: The relaxed sample allocation problem for marginal contributions is given by:

$$\begin{aligned} (m_{i,\ell }^*)_{i \in \mathcal {N}, \ell \in \mathcal {L}_{\varDelta }'} = \mathop {\text {arg min}}\limits _{(m_{i,\ell })_{i \in \mathcal {N}, \ell \in \mathcal {L}_{\varDelta }'}} & \qquad \frac{1}{n^3} \sum \limits _{i=1}^n \sum \limits _{\ell =1}^{n-2} \frac{\sigma _{i,\ell }^2}{m_{i,\ell }} \\ \text {s.t.} & \qquad 2 \sum \limits _{i=1}^n \sum \limits _{\ell =1}^{n-2} m_{i,\ell } = \tilde{T} \\ & \qquad m_{i,\ell } \in \mathbb {R}_{>0} \qquad \qquad \forall i \in \mathcal {N}, \ell \in \mathcal {L}_{\varDelta }' \, . \end{aligned}$$

We derive the solution by employing Lagrangian multipliers, hence we minimize

$$\begin{aligned} L \left( (m_{i,\ell })_{i \in \mathcal {N}, \ell \in \mathcal {L}_{\varDelta }'}, \lambda \right) = \frac{1}{n^3} \sum \limits _{i=1}^n \sum \limits _{\ell =1}^{n-2} \frac{\sigma _{i,\ell }^2}{m_{i,\ell }} + \lambda \left( 2 \sum \limits _{i=1}^n \sum \limits _{\ell =1}^{n-2} m_{i,\ell } - \tilde{T} \right) \, . \end{aligned}$$

We solve for \(\nabla L = 0\) which requires

$$\begin{aligned} \frac{\partial }{\partial m_{i,\ell }} L = - \frac{\sigma _{i,\ell }^2}{n^3 m_{i,\ell }^2} + 2\lambda {\mathop {=}\limits ^{!}} 0 \,\, \forall i \in \mathcal {N}, \ell \in \mathcal {L}_{\varDelta }' \quad \text {and} \quad \frac{\partial }{\partial \lambda } L = 2 \sum \limits _{i=1}^n \sum \limits _{\ell =1}^{n-2} m_{i,\ell } - \tilde{T} {\mathop {=}\limits ^{!}} 0 \, . \end{aligned}$$

This equation system yields together with the budget constraint

$$\begin{aligned} m_{i,\ell } = \frac{\sigma _{i,\ell }}{\sqrt{2 \lambda n^3}} \quad \forall i \in \mathcal {N}, \ell \in \mathcal {L}_{\varDelta }' \quad \text {and} \quad \lambda = \frac{2}{n^3\tilde{T}^2} \left( \sum \limits _{i=1}^n \sum \limits _{\ell =1}^{n-2} \sigma _{i,\ell } \right) ^2 \, . \end{aligned}$$

from which we derive the optimal allocation to minimize the objective function:

$$\begin{aligned} m_{i,\ell } = \frac{\sigma _{i,\ell }}{2 \sum \limits _{j=1}^n \sum \limits _{k=1}^{n-2} \sigma _{j,k}} \cdot \tilde{T} \, . \end{aligned}$$

   \(\square \)

Optimal Allocation for Stratified SVARM. Proof of Theorem 4: The relaxed sample allocation problem for coalitions is given by:

$$\begin{aligned} (m_{\ell }^*)_{\ell \in \mathcal {L}'} = \min \limits _{(m_{\ell })_{\ell \in \mathcal {L}'}} & \qquad \frac{1}{n^2} \sum \limits _{\ell =2}^{n-2} \frac{c_{\ell }}{m_{\ell }} \\ \text {s.t.} & \qquad \sum \limits _{\ell =2}^{n-2} m_{\ell } = \bar{T} \\ & \qquad m_{\ell } \in \mathbb {R}_{\ge 0} \qquad \qquad \forall \ell \in \mathcal {L}_{\nu }' \, \end{aligned}$$

where we define the coefficients \(c_{\ell } := \sum \limits _{i=1}^n \frac{\sigma _{i,\ell -1,+}^2}{\ell } + \frac{\sigma _{i,\ell ,-}^2}{n-\ell }\). We derive the solution by employing Lagrangian multipliers, hence we minimize

$$\begin{aligned} L(m_2,\ldots ,m_{n-2},\lambda ) = \frac{1}{n^2} \sum \limits _{\ell =2}^{n-2} \frac{c_{\ell }}{m_{\ell }} + \lambda \left( \sum \limits _{\ell =2}^{n-2} m_{\ell } - \bar{T} \right) \, . \end{aligned}$$

We solve for \(\nabla L = 0\) which requires

$$\begin{aligned} \frac{\partial }{\partial m_{\ell }} L = - \frac{c_{\ell }}{n^2 m_{\ell }^2} + \lambda {\mathop {=}\limits ^{!}} 0 \qquad \forall \ell \in \mathcal {L}_{\nu }' \quad \text {and} \quad \frac{\partial }{\partial \lambda } L = \sum \limits _{\ell =2}^{n-2} m_{\ell } - \bar{T} {\mathop {=}\limits ^{!}} 0 \, . \end{aligned}$$

This equation system yields together with the budget constraint

$$\begin{aligned} m_{\ell } = \sqrt{\frac{c_{\ell }}{\lambda n^2}} \quad \forall \ell \in \mathcal {L}_{\nu }' \quad \text {and} \quad \lambda = \frac{1}{n^2 \bar{T}^2} \left( \sum \limits _{\ell =2}^{n-2} \sqrt{c_{\ell }} \right) ^2 \, . \end{aligned}$$

from which we derive the optimal allocation to minimize the objective function:

$$\begin{aligned} m_{\ell } = \frac{\sqrt{c_{\ell }}}{\sum \limits _{k=2}^{n-2} \sqrt{c_k}} \cdot \bar{T} = \frac{\sqrt{\sum \limits _{i=1}^n \frac{\sigma _{i,\ell -1,+}^2}{\ell } + \frac{\sigma _{i,\ell ,-}^2}{n-\ell }}}{\sum \limits _{k=2}^{n-2} \sqrt{\sum \limits _{i=1}^n \frac{\sigma _{i,k-1,+}^2}{k} + \frac{\sigma _{i,k,-}^2}{n-k}}} \cdot \bar{T} \, . \, \end{aligned}$$

   \(\square \)

Rights and permissions

Reprints and permissions

Copyright information

© 2024 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Kolpaczki, P., Haselbeck, G., Hüllermeier, E. (2024). How Much Can Stratification Improve the Approximation of Shapley Values?. In: Longo, L., Lapuschkin, S., Seifert, C. (eds) Explainable Artificial Intelligence. xAI 2024. Communications in Computer and Information Science, vol 2154. Springer, Cham. https://doi.org/10.1007/978-3-031-63797-1_25

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-63797-1_25

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-63796-4

  • Online ISBN: 978-3-031-63797-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics