Skip to main content

Advertisement

Enhanced causal effects estimation based on offline reinforcement learning

  • Published:
Applied Intelligence Aims and scope Submit manuscript

Abstract

Causal effects estimation is essential for analyzing the causal effects of treatment (intervention) on outcome, but traditional methods often rely on the strong assumption of no unobserved confounding factors. We propose ECEE-RL (Enhanced Causal Effects Estimation based on Reinforcement Learning), a novel architecture that leverages offline reinforcement learning to relax this assumption. ECEE-RL innovatively models causal effects estimation as a stateless Markov Decision Process, allowing for adaptive policy optimization through action-reward combinations. By framing estimation as "actions" and sensitivity analysis results as "rewards", ECEE-RL minimizes sensitivity to confounders, including unobserved ones. Theoretical analysis confirms the convergence and robustness of ECEE-RL. Experiments on the two simulated datasets demonstrate significant improvements, with CATE MSE reductions ranging from 5.45% to 66.55% and sensitivity significance reductions of up to 98.29% compared to baseline methods. These results corroborate our theoretical findings on ECEE-RL's improved accuracy and robustness. Application to real-world pilot-aircraft interaction data reveals significant causal effects of control behaviors on bioelectrical signals and emotions, demonstrating ECEE-RL's practical utility. While computationally intensive, ECEE-RL offers a promising approach for causal effects estimation, particularly in scenarios where unobserved confounding may be present, representing an important step towards more reliable causal inference in complex real-world settings.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Algorithm 1
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15
Fig. 16
Fig. 17
Fig. 18

Similar content being viewed by others

Explore related subjects

Discover the latest articles, news and stories from top researchers in related subjects.

Data availability

This study utilizes two simulated datasets to support its findings: the IBM Causal Inference Benchmarking Framework, openly available at https://github.com/IBM-HRL-MLHLS/IBM-Causal-Inference-Benchmarking-Framework, and the Infant Health and Development Program (IHDP) dataset, available at https://raw.githubusercontent.com/AMLab-Amsterdam/CEVAE/master/datasets/IHDP/csv/ihdp_npci_1.csv.

The real dataset used to support the findings of this study are available from the corresponding author upon request.

References

  1. Bollen KA (1989) Structural equations with latent variables. John Wiley & Sons, New York, pp 80–134

  2. Pearl J (2009) Causal inference in statistics: An overview. Stat Surv 3:96–146

    Article  MathSciNet  MATH  Google Scholar 

  3. Peters J, Janzing D, Schölkopf B (2017) Elements of causal inference: foundations and learning algorithms. MIT Press, Cambridge, pp 45–89

  4. Rubin Donald B (1974) Estimating causal effects of treatments in randomized and nonrandomized studies. J Educ Psychol 66(5):688

    Article  MATH  Google Scholar 

  5. Mackey L, Syrgkanis V, Zadik I (2018) Orthogonal machine learning: Power and limitations. In: Proceedings of the 35th International Conference on Machine Learning, PMLR 80:3375–3383

  6. Nie X, Wager S (2021) Quasi-oracle estimation of heterogeneous treatment effects. Biometrika 108(2):299–319

    Article  MathSciNet  MATH  Google Scholar 

  7. Künzel Sören R et al (2019) Metalearners for estimating heterogeneous treatment effects using machine learning. Proc Natl Acad Sci 116(10):4156–4165

    Article  MATH  Google Scholar 

  8. Athey S, Tibshirani J, Wager S (2019) Generalized random forests. Ann Stat 47(2):1148–1178

    Article  MathSciNet  MATH  Google Scholar 

  9. Oprescu M, Syrgkanis V, Wu ZS (2019) Orthogonal random forest for causal inference. In: Proceedings of the 36th International Conference on Machine Learning, PMLR 97:4932–4941

  10. Wagner CH (1982) Simpson’s paradox in real life. Am Stat 36(1):46–48

    Article  MATH  Google Scholar 

  11. Oster E (2019) Unobservable selection and coefficient stability: Theory and evidence. J Bus Econ Stat 37(2):187–204

    Article  MathSciNet  MATH  Google Scholar 

  12. Prudencio RF, Maximo MROA, Colombini EL (2023) A survey on offline reinforcement learning: Taxonomy, review, and open problems. IEEE Trans Neural Netw Learn Syst 34(9):6032–6051

  13. Szepesvári C (2022) Algorithms for reinforcement learning. Synthesis Lectures on Artificial Intelligence and Machine Learning, Morgan & Claypool Publishers 16(1):23–47

  14. Jung Y, Tian J, Bareinboim E (2020) Estimating causal effects using weighting-based estimators. Proc AAAI Conf Artif Intell 34(06):10186–10193

    MATH  Google Scholar 

  15. Kallus N (2020) Generalized optimal matching methods for causal inference. J Mach Learn Res 21(62):1–54

    MathSciNet  MATH  Google Scholar 

  16. Hünermund P, Louw B, Caspi I (2023) Double machine learning and automated confounder selection: A cautionary tale. J Causal Infer 11(1):20220078

    Article  MathSciNet  Google Scholar 

  17. Sant’Anna PHC, Zhao J (2020) Doubly robust difference-in-differences estimators. J Econom 219(1):101–122

    Article  MathSciNet  MATH  Google Scholar 

  18. Tang C, Wang H, Li X et al (2022) Debiased causal tree: Heterogeneous treatment effects estimation with unmeasured confounding. Adv Neural Inf Process Syst 35:5628–5640

    MATH  Google Scholar 

  19. Friedberg R, Tibshirani J, Athey S et al (2020) Local linear forests. J Comput Graph Stat 30(2):503–517

    Article  MathSciNet  MATH  Google Scholar 

  20. Scanagatta M, Salmerón A, Stella F (2019) A survey on Bayesian network structure learning from data. Prog Artif Intell 8(4):425–439

    Article  MATH  Google Scholar 

  21. Bellemare MF, Bloem JR, Wexler N (2020) The Paper of How: Estimating Treatment Effects Using the Front-Door Criterion. Oxf Bull Econ Stat 86(4):951–993

    Article  MATH  Google Scholar 

  22. Pearl J, Bareinboim E (2022) External validity: From do-calculus to transportability across populations. In: Probabilistic Causal Inference: The Works of Judea Pearl, World Scientific, Singapore, pp 451–482

  23. Tudball MJ (2023) Sensitivity analyses for causal inference. PhD Thesis, University of Bristol, Bristol, pp 67–92

  24. Eggers AC, Tuñón G, Dafoe A (2023) Placebo tests for causal inference. Am J Pol Sci 68(3):1106–1121

    Article  MATH  Google Scholar 

  25. Ding P (2022) Sensitivity analysis without an identifying assumption. Ann Stat 50(5):2524–2548

  26. Cinelli C et al (2020) Making Sense of Sensitivity: Extending Omitted Variable Bias. J Royal Stat Soc: Ser B (Stat Methodol) 82(1):39–67

    Article  MathSciNet  MATH  Google Scholar 

  27. Fogarty CB et al (2021) Discrete Optimization for Causal Inference: Strengths, Limitations, and Guidelines for Application. arXiv preprint arXiv:2106.11989

  28. Hazlett C (2021) Kernel balancing: A flexible non-parametric reweighting procedure for causal inference. Am Stat 75(2):137–148

  29. Andrews I et al (2017) A simple algorithm for robust regression with dependent data. arXiv preprint arXiv:1703.08906

  30. Huber M, Chen B, Richardson T, Drton M (2019) Probabilistic integration of causal knowledge and uncertain associations. In: Proceedings of the 35th Conference on Uncertainty in Artificial Intelligence, Tel Aviv, Israel. AUAI Press, pp 385–394

  31. Jang B, Kim M, Harerimana G et al (2019) Q-learning algorithms: A comprehensive classification and applications. IEEE Access 7:133653–133667

    Article  MATH  Google Scholar 

  32. Zhang Y, Zhao B, Liu D (2020) Deterministic policy gradient adaptive dynamic programming for model-free optimal control. Neurocomputing 387:40–50

    Article  MATH  Google Scholar 

  33. Qiu C, Hu Y, Chen Y et al (2019) Deep deterministic policy gradient (DDPG)-based energy harvesting wireless communications. IEEE Internet Things J 6(5):8577–8588

    Article  Google Scholar 

  34. Haarnoja T, Zhou A, Abbeel P, Levine S (2018) Soft actor-critic: Off-policy maximum entropy deep reinforcement learning with a stochastic actor. In: Proceedings of the 35th International Conference on Machine Learning, Stockholm, PMLR 80:1861–1870

  35. Wang L, Yang Z, Wang Z (2021) Provably efficient causal reinforcement learning with confounded observational data. Adv Neural Inf Process Syst 34:21164–21175

    MATH  Google Scholar 

  36. Gasse M, Grasset D, Gaudron G, et al (2021) Causal reinforcement learning using observational and interventional data. arXiv preprint arXiv:2106.14421, pp 4–9

  37. Zhu S, Ng I, Chen Z (2020) Causal discovery with reinforcement learning. In: International Conference on Learning Representations, Virtual Conference, pp 3–8

  38. He X, Yang H, Hu Z, Lv C (2023) Robust Lane Change Decision Making for Autonomous Vehicles: An Observation Adversarial Reinforcement Learning Approach. IEEE Trans Intell Veh 8(1):184–193

    Article  MATH  Google Scholar 

  39. Shi C et al (2023) (2023) Dynamic causal effects evaluation in a/b testing with a reinforcement learning framework. J Am Stat Assoc 118(543):2059–2071

    Article  MATH  Google Scholar 

  40. Zhu Y, Hubbard RA, Chubak J et al (2021) Core concepts in pharmacoepidemiology: Violations of the positivity assumption in the causal analysis of observational data: Consequences and statistical approaches. Pharmacoepidemiol Drug Saf 30(11):1471–1485

    Article  Google Scholar 

  41. Sajons GB (2020) Estimating the causal effect of measured endogenous variables: A tutorial on experimentally randomized instrumental variables. Leadersh Q 31(5):101348

    Article  Google Scholar 

  42. Xing Y, Duan Q, Zhang G, Chen L (2021) Differential evolution algorithm based on entropy weight method to determine the weight to optimize the configuration of wind, solar, and diesel microgrid. J Phys: Conf Ser 1871(1):012034

  43. Rosenman R, van der Laan B, Hubbard J (2019) Generating random confounding for robust causal inference. J Causal Inference 7(1):1–15

    MATH  Google Scholar 

  44. Takuma S, Imai M (2022) [Source code]. https://github.com/takuseno/d3rlpy

  45. Amit S, Kiciman E et al (2019) [Source code]. https://github.com/py-why/dowhy

  46. Shimoni Y, Yanover C, Karavani E, et al (2018) Benchmarking framework for performance-evaluation of causal inference analysis. arXiv preprint arXiv:1802.05046

  47. MacDorman MF, Atkinson JO (1998) Infant mortality statistics from the linked birth/infant death data set - 1995 period data. Mon Vital Stat Rep 46(6):1–22

  48. Kennedy EH (2023) Towards optimal doubly robust estimation of heterogeneous causal effects. Electron J Stat 17(2):3008–3049

  49. Cheng D, Li J, Liu L, Le Jixue Liu T (2020) Local Search for Efficient causal effects estimation. IEEE Trans Knowl Data Eng 35:8823–8837

    Article  MATH  Google Scholar 

  50. Aragam B, Zhou Q (2015) Concave penalized estimation of sparse Gaussian Bayesian networks. J Mach Learn Res 16:2273–2328

    MathSciNet  MATH  Google Scholar 

Download references

Acknowledgements

This research was supported by the National Natural Science Foundation of China (Grant No. 62106269).

Author information

Authors and Affiliations

Authors

Contributions

Huan Xia: Conceptualization, Formal analysis, Methodology, Software, Validation, Visualization, Writing; Chaozhe Jiang: Project administration, Supervision, Resources; Chenyang Zhang: Investigation, Data curation.

Corresponding author

Correspondence to Chaozhe Jiang.

Ethics declarations

Competing interest

The authors declare that they have no conflicts of interest.

Ethical and informed consent for data used

The dataset used in this study includes publicly available IBM Causal Inference Benchmarking Dataset, IHDP Dataset and real pilot operation data. For public datasets, we follow their open license agreement. As for the real operation dataset of pilots, it includes biological measurement data such as control inputs, facial features, and EEG of pilots on the simulator. The collection and use of this dataset were conducted with the informed consent of all relevant pilots, which is in compliance with the ethical requirements of biological behavior research. All pilot identity information has been de identified in the dataset. We promise to use the dataset only for academic analysis and research, and not for any commercial purposes. We will take measures to protect the privacy of pilots.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendix

Appendix

See Table 2

Table 2 Variables and detailed example

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Xia, H., Jiang, C. & Zhang, C. Enhanced causal effects estimation based on offline reinforcement learning. Appl Intell 55, 278 (2025). https://doi.org/10.1007/s10489-024-06009-5

Download citation

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s10489-024-06009-5

Keywords