Skip to main content
Log in

Offline model-based reinforcement learning with causal structured world models

  • Research Article
  • Published:
Frontiers of Computer Science Aims and scope Submit manuscript

Abstract

Model-based methods have recently been shown promising for offline reinforcement learning (RL), which aims at learning good policies from historical data without interacting with the environment. Previous model-based offline RL methods employ a straightforward prediction method that maps the states and actions directly to the next-step states. However, such a prediction method tends to capture spurious relations caused by the sampling policy preference behind the offline data. It is sensible that the environment model should focus on causal influences, which can facilitate learning an effective policy that can generalize well to unseen states. In this paper, we first provide theoretical results that causal environment models can outperform plain environment models in offline RL by incorporating the causal structure into the generalization error bound. We also propose a practical algorithm, oFfline mOdel-based reinforcement learning with CaUsal Structured World Models (FOCUS), to illustrate the feasibility of learning and leveraging causal structure in offline RL. Experimental results on two benchmarks show that FOCUS reconstructs the underlying causal structure accurately and robustly, and, as a result, outperforms both model-based offline RL algorithms and causal model-based offline RL algorithms.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. Yu F, Xian W, Chen Y, Liu F, Liao M, Madhavan V, Darrell T. BDD100K: a diverse driving video database with scalable annotation tooling. 2018, arXiv preprint arXiv: 1805.04687

    Google Scholar 

  2. Gottesman O, Johansson F, Komorowski M, Faisal A, Sontag D, Doshi-Velez F, Celi L A. Guidelines for reinforcement learning in healthcare. Nature Medicine, 2019, 25(1): 16–18

    Article  Google Scholar 

  3. Yu T, Thomas G, Yu L, Ermon S, Zou J, Levine S, Finn C, Ma T. MOPO: model-based offline policy optimization. In: Proceedings of the 34th International Conference on Neural Information Processing Systems. 2020, 1185

    Google Scholar 

  4. Bengio Y, Deleu T, Rahaman N, Ke N R, Lachapelle S, Bilaniuk O, Goyal A, Pal C J. A meta-transfer objective for learning to disentangle causal mechanisms. In: Proceedings of the 8th International Conference on Learning Representations. 2020

    Google Scholar 

  5. de Haan P, Jayaraman D, Levine S. Causal confusion in imitation learning. In: Proceedings of the 33rd International Conference on Neural Information Processing Systems. 2019, 1049

    Google Scholar 

  6. Tenenbaum J. Building machines that learn and think like people. In: Proceedings of the 17th International Conference on Autonomous Agents and MultiAgent Systems. 2018, 5

    Google Scholar 

  7. Edmonds M, Kubricht J, Summers C, Zhu Y, Rothrock B, Zhu S C, Lu H. Human causal transfer: challenges for deep reinforcement learning. In: Proceedings of the 40th Annual Meeting of the Cognitive Science Society. 2018

    Google Scholar 

  8. Spirtes P, Glymour C N, Scheines R. Causation, Prediction, and Search. 2nd ed. Cambridge: MIT Press, 2000

    Google Scholar 

  9. Zhang K, Peters J, Janzing D, Schölkopf B. Kernel-based conditional independence test and application in causal discovery. In: Proceedings of the 27th Conference on Uncertainty in Artificial Intelligence. 2011, 804–813

    Google Scholar 

  10. Sun X, Janzing D, Schölkopf B, Fukumizu K. A kernel-based causal learning algorithm. In: Proceedings of the 24th International Conference on Machine Learning. 2007, 855–862

    Chapter  Google Scholar 

  11. Heckerman D, Meek C, Cooper G. A Bayesian approach to causal discovery. In: Holmes D E, Jain L C, eds. Innovations in Machine Learning: Theory and Applications. Berlin, Heidelberg: Springer, 2006, 1–28

    Google Scholar 

  12. Margaritis D. Distribution-free learning of Bayesian network structure in continuous domains. In: Proceedings of the 20th National Conference on Artificial Intelligence. 2005, 825–830

    Google Scholar 

  13. Ke N R, Bilaniuk O, Goyal A, Bauer S, Larochelle H, Schölkopf B, Mozer M C, Pal C, Bengio Y. Learning neural causal models from unknown interventions. 2019, arXiv preprint arXiv: 1910.01075

    Google Scholar 

  14. Wang Z, Xiao X, Xu Z, Zhu Y, Stone P. Causal dynamics learning for task-independent state abstraction. In: Proceedings of the 39th International Conference on Machine Learning. 2022, 23151–23180

    Google Scholar 

  15. Bellman R. A Markovian decision process. Journal of Mathematics and Mechanics, 1957, 6(5): 679–684

    MathSciNet  Google Scholar 

  16. Kurutach T, Clavera I, Duan Y, Tamar A, Abbeel P. Model-ensemble trust-region policy optimization. In: Proceedings of the 6th International Conference on Learning Representations. 2018

    Google Scholar 

  17. Williams G, Wagener N, Goldfain B, Drews P, Rehg J M, Boots B, Theodorou E A. Information theoretic MPC for model-based reinforcement learning. In: Proceedings of the IEEE International Conference on Robotics and Automation (ICRA). 2017, 1714–1721

    Google Scholar 

  18. Kidambi R, Rajeswaran A, Netrapalli P, Joachims T. MOReL: model-based offline reinforcement learning. In: Proceedings of the 34th International Conference on Neural Information Processing Systems. 2020, 1830

    Google Scholar 

  19. Akkaya I, Andrychowicz M, Chociej M, Litwin M, McGrew B, Petron A, Paino A, Plappert M, Powell G, Ribas R, Schneider J, Tezak N, Tworek J, Welinder P, Weng L, Yuan Q, Zaremba W, Zhang L. Solving Rubik’s cube with a robot hand. 2019, arXiv preprint arXiv: 1910.07113

    Google Scholar 

  20. Hoerl A E, Kennard R W. Ridge regression: biased estimation for nonorthogonal problems. Technometrics, 2000, 42(1): 80–86

    Article  Google Scholar 

  21. Pearl J. Causality: Models, Reasoning, and Inference. Cambridge: Cambridge University Press, 2000

    Google Scholar 

  22. Koller D, Friedman N. Probabilistic Graphical Models: Principles and Techniques. Cambridge: MIT Press, 2009

    Google Scholar 

  23. Yu T, Kumar A, Rafailov R, Rajeswaran A, Levine S, Finn C. COMBO: conservative offline model-based policy optimization. In: Proceedings of the 35th International Conference on Neural Information Processing Systems. 2021, 2218

    Google Scholar 

  24. Todorov E, Erez T, Tassa Y. MuJoCo: a physics engine for model-based control. In: Proceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems. 2012, 5026–5033

    Google Scholar 

  25. Fu J, Kumar A, Nachum O, Tucker G, Levine S. D4RL: datasets for deep data-driven reinforcement learning. 2020, arXiv preprint arXiv: 2004.07219

    Google Scholar 

  26. Xu T, Li Z, Yu Y. Error bounds of imitating policies and environments. In: Proceedings of the 34th International Conference on Neural Information Processing Systems. 2020, 1320

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Yang Yu.

Ethics declarations

Competing interests The authors declare that they have no competing interests or financial conflicts to disclose.

Additional information

Zhengmao Zhu received the BSc degree in Department of Mathematical Science in June 2018 from Zhejiang University, China. He is currently pursuing the PhD degree with the School of Artificial Intelligence, Nanjing University, China. His current research interests mainly include reinforcement learning and causal learning. His works have been accepted in the top conferences of artificial intelligence, including NeurIPS, AAAI, etc. He have served as a reviewer of NeurIPS, ICML, AAAI, etc.

Honglong Tian received the BSc degree in Software Institute of Nanjing University, China in June 2022. He is currently a graduate student of Software Institute of Nanjing University Nanjing University, China. His current research interests mainly include reinforcement learning. His works have been accepted in the top conferences of artificial intelligence, including NeurIPS, AAAI, etc.

Xionghui Chen received the BSc degree from Southeast University, China in 2018. Currently, he is working towards the PhD degree in the National Key Lab for Novel Software Technology, the School of Artificial Intelligence, Nanjing University, China. His research focuses on handling the challenges of reinforcement learning in real-world applications. His works have been accepted in the top conferences of artificial intelligence, including NeurIPS, AAMAS, DAI, KDD, etc. He have served as a reviewer of NeurIPS, IJCAI, KDD, DAI, etc.

Kun Zhang received the BS degree in automation from the University of Science and Technology of China, China in 2001, and the PhD degree in computer science from The Chinese University of Hong Kong, China in 2005. He is currently an associate professor with the Philosophy Department and an Affiliate Faculty Member with the Machine Learning Department, Carnegie Mellon University, Pittsburgh, USA. His research interests lie in causality, machine learning, and artificial intelligence, especially in causal discovery, hidden causal representation learning, transfer learning, and generalpurpose artificial intelligence.

Yang Yu received the PhD degree in computer science from Nanjing University, China in 2011, and is currently a professor at the School of Artificial Intelligence, Nanjing University, China. His research interests include machine learning, mainly reinforcement learning and derivative-free optimization for learning. Prof. Yu was granted the CCF-IEEE CS Young Scientist Award in 2020, recognized as one of the AIs 10 to Watch by IEEE Intelligent Systems, and received the PAKDD Early Career Award in 2018. His team won the Champion of the 2018 OpenAI Retro Contest on transfer reinforcement learning and the 2021 ICAPS Learning to Run a Power Network Challenge with Trust. He served as AC for NeurIPS, ICML, IJCAI, AAAI, etc.

Electronic supplementary material

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Zhu, Z., Tian, H., Chen, X. et al. Offline model-based reinforcement learning with causal structured world models. Front. Comput. Sci. 19, 194347 (2025). https://doi.org/10.1007/s11704-024-3946-y

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s11704-024-3946-y

Keywords