Latent Causal Dynamics Model for Model-Based Reinforcement Learning

Hao, Zhifeng; Zhu, Haipeng; Chen, Wei; Cai, Ruichu

doi:10.1007/978-981-99-8082-6_17

Zhifeng Hao^12,13,
Haipeng Zhu¹²,
Wei Chen¹² &
…
Ruichu Cai¹²

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 14448))

Included in the following conference series:

International Conference on Neural Information Processing

1294 Accesses
2 Citations

Abstract

Learning an accurate dynamics model is the key task for model-based reinforcement learning (MBRL). Most existing MBRL methods learn the dynamics model over states. But in most cases, the relationships among states are complex because the states are affected by the interaction of various factors in the environment. Recently some works are proposed to learn the dynamics model on latent representations space. But the learned model is dense and may contain spurious associations between latent representations. To deal with these problems, we introduce a latent causal dynamics model over latent representations and provide a learning method for MBRL. Specifically, we first learn the latent representations from the observed state space. Second, we learn a latent causal dynamics model among latent representations by a causal discovery method. Finally, the latent causal dynamics model is used to aid policy learning. The above steps are iterative to update the unified loss function until convergence. Experimental results on four tasks show that the performance of our proposed method benefits from the causality and the learned latent representations.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 79.99; Price excludes VAT (USA)

Softcover Book: USD 99.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Camacho, E.F., Alba, C.B.: Model Predictive Control. Springer, Heidelberg (2013). https://doi.org/10.1007/978-0-85729-398-5
Book Google Scholar
Chen, W., Cai, R., Zhang, K., Hao, Z.: Causal discovery in linear non-gaussian acyclic model with multiple latent confounders. IEEE Trans. Neural Netw. Learn. Syst. 33(7), 2816–2827 (2021)
Article Google Scholar
Chua, K., Calandra, R., McAllister, R., Levine, S.: Deep reinforcement learning in a handful of trials using probabilistic dynamics models. In: Advances in Neural Information Processing Systems, vol. 31 (2018)
Google Scholar
Deisenroth, M.P., Rasmussen, C.E., Fox, D.: Learning to control a low-cost manipulator using data-efficient reinforcement learning. Robot. Sci. Syst. VII(7), 57–64 (2011)
Google Scholar
Ding, W., Lin, H., Li, B., Zhao, D.: Generalizing goal-conditioned reinforcement learning with variational causal reasoning. In: Advances in Neural Information Processing Systems (2022)
Google Scholar
Ghugare, R., Bharadhwaj, H., Eysenbach, B., Levine, S., Salakhutdinov, R.: Simplifying model-based RL: learning representations, latent-space models, and policies with one objective. In: The Eleventh International Conference on Learning Representations (2022)
Google Scholar
Haarnoja, T., Zhou, A., Abbeel, P., Levine, S.: Soft actor-critic: off-policy maximum entropy deep reinforcement learning with a stochastic actor. In: International Conference on Machine Learning, pp. 1861–1870. PMLR (2018)
Google Scholar
Hafner, D., et al.: Learning latent dynamics for planning from pixels. In: International Conference on Machine Learning, pp. 2555–2565. PMLR (2019)
Google Scholar
Hansen, N.A., Su, H., Wang, X.: Temporal difference learning for model predictive control. In: International Conference on Machine Learning, pp. 8387–8406. PMLR (2022)
Google Scholar
Hu, A., et al.: Model-based imitation learning for urban driving. In: Advances in Neural Information Processing Systems, vol. 35, pp. 20703–20716 (2022)
Google Scholar
Huang, B., et al.: Action-sufficient state representation learning for control with structural constraints. In: International Conference on Machine Learning, pp. 9260–9279. PMLR (2022)
Google Scholar
Kurutach, T., Clavera, I., Duan, Y., Tamar, A., Abbeel, P.: Model-ensemble trust-region policy optimization. In: International Conference on Learning Representations (2018)
Google Scholar
Lu, C.: Learning causal representations for generalization and adaptation in supervised, imitation, and reinforcement learning. Ph.D. thesis, University of Cambridge (2022)
Google Scholar
Mnih, V., et al.: Human-level control through deep reinforcement learning. Nature 518(7540), 529–533 (2015)
Google Scholar
Nagabandi, A., Kahn, G., Fearing, R.S., Levine, S.: Neural network dynamics for model-based deep reinforcement learning with model-free fine-tuning. In: 2018 IEEE International Conference on Robotics and Automation (ICRA), pp. 7559–7566. IEEE (2018)
Google Scholar
Nguyen, H., La, H.: Review of deep reinforcement learning for robot manipulation. In: 2019 Third IEEE International Conference on Robotic Computing (IRC), pp. 590–595. IEEE (2019)
Google Scholar
Pearl, J.: Causality: Models, Reasoning, and Inference. Cambridge University Press, Cambridge (2009)
Google Scholar
Runge, J., Nowack, P., Kretschmer, M., Flaxman, S., Sejdinovic, D.: Detecting and quantifying causal associations in large nonlinear time series datasets. Sci. Adv. 5(11), eaau4996 (2019)
Google Scholar
Sikchi, H., Zhou, W., Held, D.: Learning off-policy with online planning. In: Conference on Robot Learning, pp. 1622–1633. PMLR (2022)
Google Scholar
Spirtes, P., Glymour, C.N., Scheines, R., Heckerman, D.: Causation, Prediction, and Search. MIT Press, Cambridge (2000)
Google Scholar
Tassa, Y., et al.: Deepmind control suite. arXiv preprint arXiv:1801.00690 (2018)
Williams, G., Aldrich, A., Theodorou, E.: Model predictive path integral control using covariance variable importance sampling. arXiv preprint arXiv:1509.01149 (2015)
Ye, W., Liu, S., Kurutach, T., Abbeel, P., Gao, Y.: Mastering Atari games with limited data. In: Advances in Neural Information Processing Systems, vol. 34, pp. 25476–25488 (2021)
Google Scholar
Zhang, K., Peters, J., Janzing, D., Schölkopf, B.: Kernel-based conditional independence test and application in causal discovery. In: 27th Conference on Uncertainty in Artificial Intelligence (UAI 2011), pp. 804–813. AUAI Press (2011)
Google Scholar

Download references

Author information

Authors and Affiliations

School of Computer Science, Guangdong University of Technology, Guangzhou, China
Zhifeng Hao, Haipeng Zhu, Wei Chen & Ruichu Cai
College of Engineering, Shantou University, Shantou, China
Zhifeng Hao

Authors

Zhifeng Hao
View author publications
You can also search for this author in PubMed Google Scholar
Haipeng Zhu
View author publications
You can also search for this author in PubMed Google Scholar
Wei Chen
View author publications
You can also search for this author in PubMed Google Scholar
Ruichu Cai
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding authors

Correspondence to Wei Chen or Ruichu Cai .

Editor information

Editors and Affiliations

Central South University, Changsha, China
Biao Luo
Chinese Academy of Sciences, Beijing, China
Long Cheng
Zhejiang University, Hangzhou, China
Zheng-Guang Wu
Guangdong University of Technology, Guangzhou, China
Hongyi Li
UNSW Sydney, Sydney, NSW, Australia
Chaojie Li

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Hao, Z., Zhu, H., Chen, W., Cai, R. (2024). Latent Causal Dynamics Model for Model-Based Reinforcement Learning. In: Luo, B., Cheng, L., Wu, ZG., Li, H., Li, C. (eds) Neural Information Processing. ICONIP 2023. Lecture Notes in Computer Science, vol 14448. Springer, Singapore. https://doi.org/10.1007/978-981-99-8082-6_17

Download citation

DOI: https://doi.org/10.1007/978-981-99-8082-6_17
Published: 15 November 2023
Publisher Name: Springer, Singapore
Print ISBN: 978-981-99-8081-9
Online ISBN: 978-981-99-8082-6
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Latent Causal Dynamics Model for Model-Based Reinforcement Learning