Evaluating Decision Makers over Selectively Labelled Data: A Causal Modelling Approach

Laine, Riku; Hyttinen, Antti; Mathioudakis, Michael

doi:10.1007/978-3-030-61527-7_1

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 12323))

Included in the following conference series:

International Conference on Discovery Science

1450 Accesses

Abstract

We present a Bayesian approach to evaluate AI decision systems using data from past decisions. Our approach addresses two challenges that are typically encountered in such settings and prevent a direct evaluation. First, the data may not have included all factors that affected past decisions. And second, past decisions may have led to unobserved outcomes. This is the case, for example, when a bank decides whether a customer should be granted a loan, and the outcome of interest is whether the customer will repay the loan. In this case, the data includes the outcome (if loan was repaid or not) only for customers who were granted the loan, but not for those who were not. To address these challenges, we formalize the decision making process with a causal model, considering also unobserved features. Based on this model, we compute counterfactuals to impute missing outcomes, which in turn allows us to produce accurate evaluations. As we demonstrate over real and synthetic data, our approach estimates the quality of decisions more accurately and robustly compared to previous methods.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 79.99; Price excludes VAT (USA)

Softcover Book: USD 99.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
https://version.helsinki.fi/rikulain/CFBI-public.

References

Austin, P.C.: An introduction to propensity score methods for reducing the effects of confounding in observational studies. Multivar. Behav. Res. 46(3), 399–424 (2011)
Article Google Scholar
Brennan, T., Dieterich, W., Ehret, B.: Evaluating the predictive validity of the COMPAS risk and needs assessment system. Crim. Justice Behav. 36(1), 21–40 (2009)
Article Google Scholar
Corbett-Davies, S., Pierson, E., Feller, A., Goel, S., Huq, A.: Algorithmic decision making and the cost of fairness. In: Proceedings of the ACM SIGKDD (2017)
Google Scholar
Coston, A., Mishler, A., Kennedy, E.H., Chouldechova, A.: Counterfactual risk assessments, evaluation, and fairness. In: Proceedings of the FAT, pp. 582–593 (2020)
Google Scholar
De-Arteaga, M., Dubrawski, A., Chouldechova, A.: Learning under selective labels in the presence of expert consistency. arXiv preprint arXiv:1807.00905 (2018)
Hernán, M.A., Hernández-Díaz, S., Robins, J.M.: A structural approach to selection bias. Epidemiology 15(5), 615–625 (2004)
Article Google Scholar
Jung, J., Concannon, C., Shroff, R., Goel, S., Goldstein, D.G.: Simple rules to guide expert classifications. J. Roy. Stat. Soc.: Ser. A 183, 771–800 (2020)
Article MathSciNet Google Scholar
Jung, J., Shroff, R., Feller, A., Goel, S.: Bayesian sensitivity analysis for offline policy evaluation. In: Proceedings of the AIES (2020)
Google Scholar
Kallus, N., Zhou, A.: Confounding-robust policy improvement. In: Advances in Neural Information Processing Systems, pp. 9269–9279 (2018)
Google Scholar
Kleinberg, J., Lakkaraju, H., Leskovec, J., Ludwig, J., Mullainathan, S.: Human decisions and machine predictions. Q. J. Econ. 133(1), 237–293 (2018)
MATH Google Scholar
Kusner, M.J., Russell, C., Loftus, J.R., Silva, R.: Making decisions that reduce discriminatory impacts. In: Proceedings of the ICML (2019)
Google Scholar
Lakkaraju, H., Kleinberg, J., Leskovec, J., Ludwig, J., Mullainathan, S.: The selective labels problem: evaluating algorithmic predictions in the presence of unobservables. In: Proceedings of the ACM SIGKDD (2017)
Google Scholar
Little, R.J., Rubin, D.B.: Statistical Analysis with Missing Data, vol. 793. Wiley, Hoboken (2019)
MATH Google Scholar
Madras, D., Creager, E., Pitassi, T., Zemel, R.: Fairness through causal awareness: learning causal latent-variable models for biased data. In: Proceedings of the FAT (2019)
Google Scholar
McCandless, L.C., Gustafson, P.: A comparison of Bayesian and Monte Carlo sensitivity analysis for unmeasured confounding. Stat. Med. 36(18), 2887–2901 (2017)
Article MathSciNet Google Scholar
McCandless, L.C., Gustafson, P., Levy, A.: Bayesian sensitivity analysis for unmeasured confounding in observational studies. Stat. Med. 26(11), 2331–2347 (2007)
Article MathSciNet Google Scholar
Pearl, J.: An introduction to causal inference. Int. J. Biostat. 6(2) (2010). https://doi.org/10.2202/1557-4679.1203
Rosenbaum, P.R., Rubin, D.B.: The central role of the propensity score in observational studies for causal effects. Biometrika 70(1), 41–55 (1983)
Article MathSciNet Google Scholar
Thomas, P.S., Brunskill, E.: Data-efficient off-policy policy evaluation for reinforcement learning. In: Proceedings of the ICML (2016)
Google Scholar
Tolan, S., Miron, M., Gómez, E., Castillo, C.: Why machine learning may lead to unfairness: evidence from risk assessment for Juvenile justice in Catalonia. In: Proceedings of the Artificial Intelligence and Law (2019)
Google Scholar
Zhang, J., Bareinboim, E.: Fairness in decision-making - the causal explanation formula. In: Proceedings of the AAAI (2018)
Google Scholar

Download references

Acknowledgments

Authors acknowledge the computer capacity from the Finnish Grid and Cloud Infrastructure (urn:nbn:fi:research-infras-2016072533). RL was supported by HICT; AH by Academy of Finland grants 295673, 316771 and by HIIT; and MM by Research Funds of the University of Helsinki.

Author information

Authors and Affiliations

University of Helsinki, Helsinki, Finland
Riku Laine
HIIT, Department of Computer Science, University of Helsinki, Helsinki, Finland
Antti Hyttinen & Michael Mathioudakis

Authors

Riku Laine
View author publications
You can also search for this author in PubMed Google Scholar
Antti Hyttinen
View author publications
You can also search for this author in PubMed Google Scholar
Michael Mathioudakis
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Riku Laine .

Editor information

Editors and Affiliations

University of Bari Aldo Moro, Bari, Italy
Annalisa Appice
Aristotle University of Thessaloniki, Thessaloniki, Greece
Grigorios Tsoumakas
Open University of Cyprus, Nicosia, Cyprus
Yannis Manolopoulos
Dalhousie University, Halifax, NS, Canada
Stan Matwin

Appendices

Appendix 1 Counterfactual Inference

Here we derive Eq. 4, via Pearl’s counterfactual inference protocol involving three steps: abduction, action, and inference [17]. Our model can be represented with the following structural equations over the graph structure in Fig. 2:

$$\begin{aligned} \mathsf {J}&:= \epsilon _{\mathsf {J}}, \quad \mathsf {Z}:= \epsilon _\mathsf {Z}, \quad \mathsf {X}:= \epsilon _\mathsf {X}, \quad \mathsf {T}:= g(\mathsf {H},\mathsf {X},\mathsf {Z},\epsilon _{\mathsf {T}}), \quad \mathsf {Y}:= f(\mathsf {T},\mathsf {X},\mathsf {Z},\epsilon _\mathsf {Y}). \end{aligned}$$

For any cases where $\mathsf {T} =0$ in the data, we calculate the counterfactual value of $\mathsf {Y} $ if we had $\mathsf {T} =1$. We assume here that all these parameters, functions and distributions are known. In the abduction step we determine $\mathbf {P}(\epsilon _\mathsf {H}, \epsilon _\mathsf {Z}, \epsilon _\mathsf {X}, \epsilon _{\mathsf {T}},\epsilon _\mathsf {Y} |j,x,\mathsf {T} =0)$, the distribution of the stochastic disturbance terms updated to take into account the observed evidence on the decision maker, observed features and the decision (given the decision $\mathsf {T} =0$ disturbances are independent of $\mathsf {Y} $). We directly know $\epsilon _\mathsf {X} =x $ and $\epsilon _{_\mathsf {J}}=j $. Due to the special form of f the observed evidence is independent of $\epsilon _\mathsf {Y} $ when $\mathsf {T} = 0$. We only need to determine $\mathbf {P}(\epsilon _\mathsf {Z},\epsilon _{\mathsf {T}}|h ,x,\mathsf {T} =0)$. Next, the action step involves intervening on $\mathsf {T} $ and setting $\mathsf {T} =1$ by intervention. Finally in the prediction step we estimate $\mathsf {Y} $:

where we used $\epsilon _\mathsf {Z} =z $ and integrated out $\epsilon _\mathsf {T} $ and $\epsilon _\mathsf {Y} $. This gives us the counterfactual expectation of Y for a single subject.

Appendix 2 On the Priors of the Bayesian Model

The priors for $\gamma _\mathsf {X},~\beta _\mathsf {X},~\gamma _\mathsf {Z} $ and $\beta _\mathsf {Z} $ were defined using the gamma-mixture representation of Student’s t-distribution with $\nu =6$ degrees of freedom. The gamma-mixture is obtained by first sampling a precision parameter from $\varGamma $() and then drawing the coefficient from zero-mean Gaussian with that precision. This procedure was applied to the scale parameters $\eta _\mathsf {Z},~\eta _{\beta _\mathsf {X}}$ and $\eta _{\gamma _\mathsf {X}}$ as shown below. For vector-valued $\mathsf {X}$, the components of $\gamma _\mathsf {X} $ ($\beta _\mathsf {X} $) were sampled independently with a joint precision parameter $\eta _{\gamma _\mathsf {X}}$ ($\beta _{\gamma _\mathsf {X}}$). The coefficients for the unobserved confounder $\mathsf {Z}$ were bounded to the positive values to ensure identifiability.

$$\begin{aligned} \eta _\mathsf {Z}, \eta _{\beta _\mathsf {X}}, \eta _{\gamma _\mathsf {X}} \sim \varGamma (3, 3), \; \gamma _\mathsf {Z},\beta _\mathsf {Z} \sim N_+(0, \eta _\mathsf {Z} ^{-1}),\; \gamma _\mathsf {X} \sim N(0, \eta _{\gamma _\mathsf {X}}^{-1}),\; \beta _\mathsf {X} \sim N(0, \eta _{\beta _\mathsf {X}}^{-1}) \end{aligned}$$

The intercepts for the decision makers in the data and outcome $\mathsf {Y}$ had hierarchical Gaussian priors with variances $\sigma _\mathsf {T} ^2$ and $\sigma _\mathsf {Y} ^2$. The decision makers had a joint variance parameter $\sigma _\mathsf {T} ^2$.

$$\begin{aligned} \sigma _\mathsf {T} ^2,~\sigma _\mathsf {Y} ^2 \sim N_+(0, \tau ^2),\quad \alpha _j \sim N(0, \sigma _\mathsf {T} ^2),\quad \alpha _\mathsf {Y} \sim N(0, \sigma _\mathsf {Y} ^2) \end{aligned}$$

The parameters $\sigma _\mathsf {T} ^2$ and $\sigma _\mathsf {Y} ^2$ were drawn independently from Gaussian distributions with mean 0 and variance $\tau ^2=1$, and restricted to the positive real axis.

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Laine, R., Hyttinen, A., Mathioudakis, M. (2020). Evaluating Decision Makers over Selectively Labelled Data: A Causal Modelling Approach. In: Appice, A., Tsoumakas, G., Manolopoulos, Y., Matwin, S. (eds) Discovery Science. DS 2020. Lecture Notes in Computer Science(), vol 12323. Springer, Cham. https://doi.org/10.1007/978-3-030-61527-7_1

Download citation

DOI: https://doi.org/10.1007/978-3-030-61527-7_1
Published: 15 October 2020
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-61526-0
Online ISBN: 978-3-030-61527-7
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics