Abstract
Over the last decade, the Shapley value has become one of the most widely applied tools to provide post-hoc explanations for black box models. However, its theoretically justified solution to the problem of dividing a collective benefit to the members of a group, such as features or data points, comes at a price. Without strong assumptions, the exponential number of member subsets excludes an exact calculation of the Shapley value. In search for a remedy, recent works have demonstrated the efficacy of approximations based on sampling with stratification, in which the sample space is partitioned into smaller subpopulations. The effectiveness of this technique mainly depends on the degree to which the allocation of available samples over the formed strata mirrors their unknown variances. To uncover the hypothetical potential of stratification, we investigate the gap in approximation quality caused by the lack of knowledge of the optimal allocation. Moreover, we combine recent advances to propose two state-of-the-art algorithms Adaptive SVARM and Continuous Adaptive SVARM that adjust the sample allocation on-the-fly. The potential of our approach is assessed in an empirical evaluation.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
European commission. Ethics guidelines for trustworthy AI (2019). https://ec.europa.eu/digital-single-market/en/news/ethics-guidelines-trustworthy-ai
Ancona, M., Öztireli, C., Gross, M.H.: Explaining deep neural networks with a polynomial time algorithm for shapley value approximation. In: Proceedings of International Conference on Machine Learning (ICML), pp. 272–281 (2019)
Aumann, R.J.J.: Economic applications of the shapley value. In: Mertens, J.F., Sorin, S. (eds.) Game-Theoretic Methods in General Equilibrium Analysis. NATO ASI Serie, vol. 77, pp. 121–133. Springer, Dordrecht (1994). https://doi.org/10.1007/978-94-017-1656-7_12
Burgess, M.A., Chapman, A.C.: Approximating the shapley value using stratified empirical bernstein sampling. In: Proceedings of International Joint Conference on Artificial Intelligence (IJCAI), pp. 73–81 (2021)
Castro, J., Gómez, D., Molina, E., Tejada, J.: Improving polynomial estimation of the shapley value by stratified random sampling with optimum allocation. Comput. Oper. Res. 82, 180–188 (2017)
Castro, J., Gómez, D., Tejada, J.: Polynomial calculation of the shapley value based on sampling. Comput. Oper. Res. 36(5), 1726–1730 (2009)
Covert, I., Lundberg, S., Lee, S.I.: Shapley feature utility. In: Machine Learning in Computational Biology (2019)
Covert, I., Lundberg, S.M., Lee, S.: Understanding global feature contributions with additive importance measures. In: Proceedings of Advances in Neural Information Processing Systems (NeurIPS) (2020)
Deng, J., Dong, W., Socher, R., Li, L., Li, K., Fei-Fei, L.: Imagenet: a large-scale hierarchical image database. In: IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 248–255 (2009)
Deng, X., Papadimitriou, C.H.: On the complexity of cooperative solution concepts. Math. Oper. Res. 19(2), 257–266 (1994)
Fanaee-T, H.: Bike Sharing. UCI Machine Learning Repository (2013) https://doi.org/10.24432/C5W894
Ghorbani, A., Zou, J.Y.: Data shapley: equitable valuation of data for machine learning. In: Proceedings of International Conference on Machine Learning (ICML), pp. 2242–2251 (2019)
Ghorbani, A., Zou, J.Y.: Neuron shapley: discovering the responsible neurons. In: Proceedings of Advances in Neural Information Processing Systems (NeurIPS) (2020)
Hofmann, H.: Statlog (German Credit Data). UCI Machine Learning Repository (1994). https://doi.org/10.24432/C5NC77
Illés, F., Kerényi, P.: Estimation of the shapley value by ergodic sampling. CoRR abs/1906.05224 (2019)
Jia, R., et al.: Towards efficient data valuation based on the shapley value. In: Proceedings of the International Conference on Artificial Intelligence and Statistics (AISTATS), pp. 1167–1176 (2019)
Kohavi, R.: Scaling up the accuracy of Naive-Bayes classifiers: a decision-tree hybrid. In: Proceedings of International Conference on Knowledge Discovery and Data Mining (KDD), pp. 202–207 (1996)
Kolpaczki, P., Bengs, V., Muschalik, M., Hüllermeier, E.: Approximating the shapley value without marginal contributions. In: Proceeedings of AAAI Conference on Artificial Intelligence (AAAI), pp. 13246–13255 (2024)
Lundberg, S.M., Erion, G.G., Lee, S.: Consistent individualized feature attribution for tree ensembles. CoRR abs/1802.03888 (2018)
Lundberg, S.M., Lee, S.I.: A unified approach to interpreting model predictions. In: Proceedings of Advances in Neural Information Processing Systems (NeurIPS), pp. 4768–4777 (2017)
Maleki, S., Tran-Thanh, L., Hines, G., Rahwan, T., Rogers, A.: Bounding the estimation error of sampling-based shapley value approximation with/without stratifying. CoRR abs/1306.4265 (2013)
Mitchell, R., Cooper, J., Frank, E., Holmes, G.: Sampling permutations for shapley value estimation. J. Mach. Learn. Res. 23(43), 1–46 (2022)
Moro, S., Laureano, R., Cortez, P.: Using data mining for bank direct marketing: an application of the CRIDP-DM methodology. In: Proceedings of European Simulation and Modelling Conference (ESM) (2011)
Neyman, J.: On the two different aspects of the representative method: the method of stratified sampling and the method of purposive selection. J. Roy. Stat. Soc. 97(4), 558–625 (1934)
O’Brien, G., Gamal, A.E., Rajagopal, R.: Shapley value estimation for compensation of participants in demand response programs. IEEE Trans. Smart Grid 6(6), 2837–2844 (2015)
Okhrati, R., Lipani, A.: A multilinear sampling algorithm to estimate shapley values. In: Proceedings of International Conference on Pattern Recognition (ICPR), pp. 7992–7999 (2020)
Rozemberczki, B., Sarkar, R.: The shapley value of classifiers in ensemble games. In: Proceedings of ACM International Conference on Information and Knowledge Management (CIKM), pp. 1558–1567 (2021)
Rozemberczki, B., et al.: The shapley value in machine learning. In: Proceedings of International Joint Conference on Artificial Intelligence (IJCAI), pp. 5572–5579 (2022)
Shapley, L.S.: A value for n-person games. In: Contributions to the Theory of Games (AM-28), Volume II, pp. 307–318. Princeton University Press (1953)
van Campen, T., Hamers, H., Husslage, B., Lindelauf, R.: A new approximation method for the shapley value applied to the WTC 9/11 terrorist attack. Soc. Netw. Anal. Min. 8(3), 1–12 (2018)
Acknowledgments
This research was supported by the research training group Dataninja (Trustworthy AI for Seamless Problem Solving: Next Generation Intelligence Joins Robust Data Analysis) funded by the German federal state of North Rhine-Westphalia.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Ethics declarations
Disclosure of Interests
The authors have no competing interests to declare that are relevant to the content of this article.
Appendices
A Pseudocode
CalculateAllocation begins by computing the variance estimates prepared by the observations during the exploration phase. Next, it estimates the optimal allocation in potentially multiple iterations. The sizes \(\ell \) which are still part of the optimization problem are kept in \(\mathcal {M}\). Each size whose sample number \(\hat{m}_{\ell }^*\) does not exceed its number \(c_{\ell }\) from the exploration phase, i.e. \(\hat{m}_{\ell }^* \le c_{\ell }\), is assigned after computation to the set of dropouts \(\mathcal {F}\). The available budget \(\bar{T}'\) considered is reduced by the sum of sample numbers of the current dropouts. We assume Round to return a vector of natural numbers by rounding appropriately.
Update updates the affected stratum estimates identical to [18] by iterating over all players. The variance estimation is prepared by maintaining for each stratum the sum observed coalition values in \(\varSigma _{i,\ell }^{+/-}\) and the sum of squared values in \(\varOmega _{i,\ell }^{+/-}\). Note that the value function is only accessed once.
ExactCalculation iterates over all coalitions of size \(0,1,n-1,n\). Each coalition is used to call the Update procedure. After \(2n+2\) spending budget tokens the strata values \(\phi _{i,0}^-,\phi _{i,0^+},\phi _{i,1}^-,\phi _{i,n-2}^+,\phi _{i,n-1}^-,\phi _{i,n-1}^+\) are computed exactly.
ExactCalculation [18]
B Proofs
Static Allocation for Stratified Sampling. Proof of Theorem 1: Let \(x_{i,\ell }^{(m)}\) be the m-th sampled marginal contribution of player i’s \(\ell \)-th stratum. Since \(m_{i,\ell } \ge 1\) for all \(i \in \mathcal {N}\) and \(\ell \in \mathcal {L}_{\varDelta }\), and the marginal contributions are drawn uniformly at random from their stratum, each estimate \(\hat{\phi }_{i,\ell }\) and \(\hat{\phi }_i\) is unbiased:
Next, we investigate the variance where make use of the independent samples:
The bias-variance decomposition allows to combine both intermediate results and quantify the MSE for a single player. Averaging over all players yields:
\(\square \)
Static Allocation for Stratified SVARM. Proof of Theorem 2: The unbiasedness of each estimate \(\hat{\phi }_{i,\ell }^+\), \(\hat{\phi }_{i,\ell }^-\), and \(\hat{\phi }_i\) has been shown in Lemma 8, Lemma 9, and Theorem 5 [18]. Let \(\bar{m}_{i,\ell ,+}, \bar{m}_{i,\ell ,-}\) be the number of times the estimate \(\hat{\phi }_{i,\ell }^+\) and respectively \(\hat{\phi }_{i,\ell }^-\) got updated after the warmup and \(m_{i,\ell ,+}, m_{i,\ell ,-}\) be the total numbers including the warmup.
Lemma 1
For any \(i \in \mathcal {N}\) and \(\ell \in \mathcal {L}_{\nu }'\) the number of updates are binomially distributed with \( \bar{m}_{i,\ell -1,+} \sim Bin\left( m_{\ell }, \frac{\ell }{n} \right) \) and \(\bar{m}_{i,\ell ,-} \sim Bin\left( m_{\ell }, \frac{n-\ell }{n} \right) \).
The proof is analogous to Lemma 10 in [18]. We utilize Lemma 1 to bound the following expectations for all \(i \in \mathcal {N}\) and all \(\ell \in \mathcal {L}_{\nu }'\) (cf. Lemma 13 [18]):
From the proof of Theorem 6 in [18] we extract and reuse:
The bound on the variance enables us to take advantage of the bias-variance decomposition since the unbiasedness is already given:
Averaging the mean squared error over the players completes the proof:
\(\square \)
Optimal Allocation for Stratified Sampling. Proof of Theorem 3: The relaxed sample allocation problem for marginal contributions is given by:
We derive the solution by employing Lagrangian multipliers, hence we minimize
We solve for \(\nabla L = 0\) which requires
This equation system yields together with the budget constraint
from which we derive the optimal allocation to minimize the objective function:
\(\square \)
Optimal Allocation for Stratified SVARM. Proof of Theorem 4: The relaxed sample allocation problem for coalitions is given by:
where we define the coefficients \(c_{\ell } := \sum \limits _{i=1}^n \frac{\sigma _{i,\ell -1,+}^2}{\ell } + \frac{\sigma _{i,\ell ,-}^2}{n-\ell }\). We derive the solution by employing Lagrangian multipliers, hence we minimize
We solve for \(\nabla L = 0\) which requires
This equation system yields together with the budget constraint
from which we derive the optimal allocation to minimize the objective function:
\(\square \)
Rights and permissions
Copyright information
© 2024 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Kolpaczki, P., Haselbeck, G., Hüllermeier, E. (2024). How Much Can Stratification Improve the Approximation of Shapley Values?. In: Longo, L., Lapuschkin, S., Seifert, C. (eds) Explainable Artificial Intelligence. xAI 2024. Communications in Computer and Information Science, vol 2154. Springer, Cham. https://doi.org/10.1007/978-3-031-63797-1_25
Download citation
DOI: https://doi.org/10.1007/978-3-031-63797-1_25
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-63796-4
Online ISBN: 978-3-031-63797-1
eBook Packages: Computer ScienceComputer Science (R0)