Abstract
Learning an accurate dynamics model is the key task for model-based reinforcement learning (MBRL). Most existing MBRL methods learn the dynamics model over states. But in most cases, the relationships among states are complex because the states are affected by the interaction of various factors in the environment. Recently some works are proposed to learn the dynamics model on latent representations space. But the learned model is dense and may contain spurious associations between latent representations. To deal with these problems, we introduce a latent causal dynamics model over latent representations and provide a learning method for MBRL. Specifically, we first learn the latent representations from the observed state space. Second, we learn a latent causal dynamics model among latent representations by a causal discovery method. Finally, the latent causal dynamics model is used to aid policy learning. The above steps are iterative to update the unified loss function until convergence. Experimental results on four tasks show that the performance of our proposed method benefits from the causality and the learned latent representations.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Camacho, E.F., Alba, C.B.: Model Predictive Control. Springer, Heidelberg (2013). https://doi.org/10.1007/978-0-85729-398-5
Chen, W., Cai, R., Zhang, K., Hao, Z.: Causal discovery in linear non-gaussian acyclic model with multiple latent confounders. IEEE Trans. Neural Netw. Learn. Syst. 33(7), 2816–2827 (2021)
Chua, K., Calandra, R., McAllister, R., Levine, S.: Deep reinforcement learning in a handful of trials using probabilistic dynamics models. In: Advances in Neural Information Processing Systems, vol. 31 (2018)
Deisenroth, M.P., Rasmussen, C.E., Fox, D.: Learning to control a low-cost manipulator using data-efficient reinforcement learning. Robot. Sci. Syst. VII(7), 57–64 (2011)
Ding, W., Lin, H., Li, B., Zhao, D.: Generalizing goal-conditioned reinforcement learning with variational causal reasoning. In: Advances in Neural Information Processing Systems (2022)
Ghugare, R., Bharadhwaj, H., Eysenbach, B., Levine, S., Salakhutdinov, R.: Simplifying model-based RL: learning representations, latent-space models, and policies with one objective. In: The Eleventh International Conference on Learning Representations (2022)
Haarnoja, T., Zhou, A., Abbeel, P., Levine, S.: Soft actor-critic: off-policy maximum entropy deep reinforcement learning with a stochastic actor. In: International Conference on Machine Learning, pp. 1861–1870. PMLR (2018)
Hafner, D., et al.: Learning latent dynamics for planning from pixels. In: International Conference on Machine Learning, pp. 2555–2565. PMLR (2019)
Hansen, N.A., Su, H., Wang, X.: Temporal difference learning for model predictive control. In: International Conference on Machine Learning, pp. 8387–8406. PMLR (2022)
Hu, A., et al.: Model-based imitation learning for urban driving. In: Advances in Neural Information Processing Systems, vol. 35, pp. 20703–20716 (2022)
Huang, B., et al.: Action-sufficient state representation learning for control with structural constraints. In: International Conference on Machine Learning, pp. 9260–9279. PMLR (2022)
Kurutach, T., Clavera, I., Duan, Y., Tamar, A., Abbeel, P.: Model-ensemble trust-region policy optimization. In: International Conference on Learning Representations (2018)
Lu, C.: Learning causal representations for generalization and adaptation in supervised, imitation, and reinforcement learning. Ph.D. thesis, University of Cambridge (2022)
Mnih, V., et al.: Human-level control through deep reinforcement learning. Nature 518(7540), 529–533 (2015)
Nagabandi, A., Kahn, G., Fearing, R.S., Levine, S.: Neural network dynamics for model-based deep reinforcement learning with model-free fine-tuning. In: 2018 IEEE International Conference on Robotics and Automation (ICRA), pp. 7559–7566. IEEE (2018)
Nguyen, H., La, H.: Review of deep reinforcement learning for robot manipulation. In: 2019 Third IEEE International Conference on Robotic Computing (IRC), pp. 590–595. IEEE (2019)
Pearl, J.: Causality: Models, Reasoning, and Inference. Cambridge University Press, Cambridge (2009)
Runge, J., Nowack, P., Kretschmer, M., Flaxman, S., Sejdinovic, D.: Detecting and quantifying causal associations in large nonlinear time series datasets. Sci. Adv. 5(11), eaau4996 (2019)
Sikchi, H., Zhou, W., Held, D.: Learning off-policy with online planning. In: Conference on Robot Learning, pp. 1622–1633. PMLR (2022)
Spirtes, P., Glymour, C.N., Scheines, R., Heckerman, D.: Causation, Prediction, and Search. MIT Press, Cambridge (2000)
Tassa, Y., et al.: Deepmind control suite. arXiv preprint arXiv:1801.00690 (2018)
Williams, G., Aldrich, A., Theodorou, E.: Model predictive path integral control using covariance variable importance sampling. arXiv preprint arXiv:1509.01149 (2015)
Ye, W., Liu, S., Kurutach, T., Abbeel, P., Gao, Y.: Mastering Atari games with limited data. In: Advances in Neural Information Processing Systems, vol. 34, pp. 25476–25488 (2021)
Zhang, K., Peters, J., Janzing, D., Schölkopf, B.: Kernel-based conditional independence test and application in causal discovery. In: 27th Conference on Uncertainty in Artificial Intelligence (UAI 2011), pp. 804–813. AUAI Press (2011)
Author information
Authors and Affiliations
Corresponding authors
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2024 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.
About this paper
Cite this paper
Hao, Z., Zhu, H., Chen, W., Cai, R. (2024). Latent Causal Dynamics Model for Model-Based Reinforcement Learning. In: Luo, B., Cheng, L., Wu, ZG., Li, H., Li, C. (eds) Neural Information Processing. ICONIP 2023. Lecture Notes in Computer Science, vol 14448. Springer, Singapore. https://doi.org/10.1007/978-981-99-8082-6_17
Download citation
DOI: https://doi.org/10.1007/978-981-99-8082-6_17
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-99-8081-9
Online ISBN: 978-981-99-8082-6
eBook Packages: Computer ScienceComputer Science (R0)