Causal inference and counterfactual prediction in machine learning for actionable healthcare


Big data, high-performance computing, and (deep) machine learning are increasingly becoming key to precision medicine—from identifying disease risks and taking preventive measures, to making diagnoses and personalizing treatment for individuals. Precision medicine, however, is not only about predicting risks and outcomes, but also about weighing interventions. Interventional clinical predictive models require the correct specification of cause and effect, and the calculation of so-called counterfactuals, that is, alternative scenarios. In biomedical research, observational studies are commonly affected by confounding and selection bias. Without robust assumptions, often requiring a priori domain knowledge, causal inference is not feasible. Data-driven prediction models are often mistakenly used to draw causal effects, but neither their parameters nor their predictions necessarily have a causal interpretation. Therefore, the premise that data-driven prediction models lead to trustable decisions/interventions for precision medicine is questionable. When pursuing intervention modelling, the bio-health informatics community needs to employ causal approaches and learn causal structures. Here we discuss how target trials (algorithmic emulation of randomized studies), transportability (the licence to transfer causal effects from one population to another) and prediction invariance (where a true causal model is contained in the set of all prediction models whose accuracy does not vary across different settings) are linchpins to developing and testing intervention models.

Fig. 1: Conditional versus interventional probabilities.
Fig. 2: Examples of confounding bias and collider bias.
Fig. 3: An example of M-bias.
Fig. 4: A selection diagram for illustrating transportability.

J.B.’s, Y.G.’s and M.P.’s research for this work was in part supported by the University of Florida (UF)’s Creating the Healthiest Generation—Moonshot initiative, supported by the UF Office of the Provost, UF Office of Research, UF Health, UF College of Medicine and UF Clinical and Translational Science Institute. M.W.’s research for this work was supported in part by the Lanzillotti–McKethan Eminent Scholar Endowment.

M.P., Y.G., J.B. and M.W. conceived the premise, wrote the paper, designed the figures and tables, and revised the paper. M.S., X.E. and S.R. contributed to specific sections, aided with the figures and tables, and with revision. J.K., I.B. and J.M. contributed to specific sections and helped with revisions.

Correspondence to Mattia Prosperi.

Competing interests

The authors declare no competing interests.

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Prosperi, M., Guo, Y., Sperrin, M. et al. Causal inference and counterfactual prediction in machine learning for actionable healthcare. Nat Mach Intell 2, 369–375 (2020).

