Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • Perspective
  • Published:

Stable learning establishes some common ground between causal inference and machine learning

Abstract

Causal inference has recently attracted substantial attention in the machine learning and artificial intelligence community. It is usually positioned as a distinct strand of research that can broaden the scope of machine learning from predictive modelling to intervention and decision-making. In this Perspective, however, we argue that ideas from causality can also be used to improve the stronghold of machine learning, predictive modelling, if predictive stability, explainability and fairness are important. With the aim of bridging the gap between the tradition of precise modelling in causal inference and black-box approaches from machine learning, stable learning is proposed and developed as a source of common ground. This Perspective clarifies a source of risk for machine learning models and discusses the benefits of bringing causality into learning. We identify the fundamental problems addressed by stable learning, as well as the latest progress from both causal inference and learning perspectives, and we discuss relationships with explainability and fairness problems.

This is a preview of subscription content, access via your institution

Access options

Buy this article

Prices may be subject to local taxes which are calculated during checkout

Fig. 1: Three ways of generating correlations.
Fig. 2: The physical processes for generating datasets used in predictive modelling, occurring over time.
Fig. 3: Comparison of different learning paradigms.

Similar content being viewed by others

References

  1. Athey, S. C., Bryan, K. A. & Gans, J. S. The allocation of decision authority to human and artificial intelligence. AEA Papers and Proceedings 110, 80–84 (2020).

  2. Rudin, C. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nat. Mach. Intell. 1, 206–215 (2019).

    Article  Google Scholar 

  3. Corbett-Davies, S. & Goel, S. The measure and mismeasure of fairness: a critical review of fair machine learning. Preprint at https://arxiv.org/abs/1808.00023 (2018).

  4. Heinze-Deml, C. & Meinshausen, N. Conditional variance penalties and domain shift robustness. Mach. Learn. 110, 303–348 (2021).

    Article  MathSciNet  Google Scholar 

  5. Pearl, J. Theoretical impediments to machine learning with seven sparks from the causal revolution. In Proc. of the Eleventh ACM International Conference on Web Search and Data Mining (2018).

  6. Imbens, G. W. & Rubin, D. B. Causal Inference in Statistics, Social, and Biomedical Sciences (Cambridge Univ. Press, 2015).

  7. Rosenbaum, P. R. & Rubin, D. B. The central role of the propensity score in observational studies for causal effects. Biometrika 70, 41–55 (1983).

    Article  MathSciNet  Google Scholar 

  8. Athey, S. & Imbens, G. A measure of robustness to misspecification. Am. Econ. Rev. 105, 476–480 (2015).

    Article  Google Scholar 

  9. Holland, P. W. Statistics and causal inference. J. Am. Stat. Assoc. 81, 945–960 (1986).

    Article  MathSciNet  Google Scholar 

  10. Xu, R., Cui, P., Shen, Z., Zhang, X. & Zhang, T. Why stable learning works? A theory of covariate shift generalization. Preprint at https://arxiv.org/abs/2111.02355 (2021).

  11. Kuang, K., Cui, P., Athey, S., Xiong, R. & Li, B. Stable prediction across unknown environments. In Proc. of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining 1617–1626 (2018).

  12. Yu, B. et al. Stability. Bernoulli 19, 1484–1500 (2013).

    Article  MathSciNet  Google Scholar 

  13. Vapnik, V. Principles of risk minimization for learning theory. In Advances in Neural Information Processing Systems 831–838 (1992).

  14. Pan, S. J. et al. A survey on transfer learning. IEEE Trans. Knowl. Data Eng. 22, 1345–1359 (2010).

    Article  Google Scholar 

  15. Shen, Z. et al. Towards out-of-distribution generalization: a survey. Preprint at https://arxiv.org/abs/2108.13624 (2021).

  16. Athey, S., Imbens, G. W. & Wager, S. Approximate residual balancing: debiased inference of average treatment effects in high dimensions. J. R. Stat. Soc. Series B Stat. Methodol. 80.4, 597–623 (2018).

    Article  MathSciNet  Google Scholar 

  17. Zubizarreta, J. R. Stable weights that balance covariates for estimation with incomplete outcome data. J. Am. Stat. Assoc. 110, 910–922 (2015).

    Article  MathSciNet  Google Scholar 

  18. Hainmueller, J. Entropy balancing for causal effects: a multivariate reweighting method to produce balanced samples in observational studies. Political Anal. 20.1, 25–46 (2012).

    Article  Google Scholar 

  19. Guo, R., Cheng, L., Li, J., Hahn, P. R. & Liu, H. A Survey of Learning Causality With Data: Problems and Methods 53.4, 137 (ACM Computing Surveys (CSUR), 2021).

  20. Hicks, R. & Tingley, D. Causal mediation analysis. Stata J. 11, 605–619 (2011).

    Article  Google Scholar 

  21. Pearl, J. Direct and indirect effects. In Proc. of the Seventeenth conference on Uncertainty in Artificial Intelligence 411–420 (2001).

  22. Shen, Z., Cui, P., Kuang, K., Li, B. & Chen, P. Causally regularized learning with agnostic data selection bias. In Proc. of the 26th ACM International Conference on Multimedia 411–419 (2018).

  23. Bisgaard, T. M. & Sasvári, Z. When does e (xk yl)= e (xk) e (yl) imply independence? Stat. Probabil. Lett. 76, 1111–1116 (2006).

    Article  Google Scholar 

  24. Kuang, K., Xiong, R., Cui, P., Athey, S. & Li, B. Stable prediction with model misspecification and agnostic distribution shift. In Proc. of the AAAI Conference on Artificial Intelligence 34, No. 04 (2020).

  25. Shen, Z., Cui, P., Zhang, T. & Kunag, K. Stable learning via sample reweighting. In Proc. of the AAAI Conference on Artificial Intelligence 34, no. 04, 5692–5699 (2020).

  26. Cornelißen, T. & Sonderhof, K. Partial effects in probit and logit models with a triple dummy-variable interaction term. Stata J. 9, 571–583 (2009).

    Article  Google Scholar 

  27. Gelman, A. & Hill, J. in Data Analysis Using Regression and Multilevel/Hierarchical Models 167–198 (Cambridge Univ. Press, 2007).

  28. Holzinger, A., Langs, G., Denk, H., Zatloukal, K. & Müller, H. Causability and explainability of artificial intelligence in medicine. WIREs Data Min. Knowl. Discov. 9, e1312 (2019).

    Google Scholar 

  29. Gunning, D. & Aha, D. W. DARPA’s explainable artificial intelligence program. AI Mag. 40, 44–58 (2019).

    Google Scholar 

  30. Rai, A. Explainable AI: from black box to glass box. J. Acad. Market. Sci. 48, 137–141 (2020).

    Article  Google Scholar 

  31. Zhang, X., Cui, P., Xu, R., Zhou, L., He, Y., & Shen, Z. Deep stable learning for out-of-distribution generalization. In Proc. of the IEEE/CVF Conference on Computer Vision and Pattern Recognition 5372–5382 (2021).

  32. Dwork, C., Hardt, M., Pitassi, T., Reingold, O. & Zemel, R. Fairness through awareness. In Proc. of the 3rd Innovations in Theoretical Computer Science Conference 214–226 (2012).

  33. Hardt, M. et al. Equality of opportunity in supervised learning. In Advances in Neural Information Processing Systems 3315–3323 (2016).

  34. Kusner, M. J., Loftus, J., Russell, C. & Silva, R. Counterfactual fairness. In Advances in Neural Information Processing Systems 4066–4076 (2017).

  35. Kilbertus, N. et al. Avoiding discrimination through causal reasoning. In Advances in Neural Information Processing Systems 656–666 (2017).

  36. Adragna, R., Creager, E., Madras, D. & Zemel, R. Fairness and robustness in invariant learning: a case study in toxicity classification. Preprint at https://arxiv.org/abs/2011.06485 (2020).

  37. Hashimoto, T. B., Srivastava, M., Namkoong, H. & Liang, P. Fairness without demographics in repeated loss minimization. In International Conference on Machine Learning 1929–1938 (PMLR, 2018).

  38. Roh, Y., Lee, K., Whang, S. E. & Suh, C. FR-Train: a mutual information-based approach to fair and robust training. In International Conference on Machine Learning 8147–8157 (PMLR, 2020).

Download references

Acknowledgements

Peng Cui’s research is supported by National Key R&D Program of China (No. 2018AAA0102004), National Natural Science Foundation of China (No. U1936219), Beijing Academy of Artificial Intelligence (BAAI) and Guoqiang Institute of Tsinghua University.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Peng Cui.

Ethics declarations

Competing interests

The authors declare no competing interests.

Peer review

Peer review information

Nature Machine Intelligence thanks Kush Varshney and the other, anonymous, reviewer(s) for their contribution to the peer review of this work.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Cui, P., Athey, S. Stable learning establishes some common ground between causal inference and machine learning. Nat Mach Intell 4, 110–115 (2022). https://doi.org/10.1038/s42256-022-00445-z

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1038/s42256-022-00445-z

This article is cited by

Search

Quick links

Nature Briefing AI and Robotics

Sign up for the Nature Briefing: AI and Robotics newsletter — what matters in AI and robotics research, free to your inbox weekly.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing: AI and Robotics