Consistency Regularization for Ensemble Model Based Reinforcement Learning

Jia, Ruonan; Li, Qingming; Huang, Wenzhen; Zhang, Junge; Li, Xiu

doi:10.1007/978-3-030-89370-5_1

Ruonan Jia^12,14,
Qingming Li¹³,
Wenzhen Huang¹³,
Junge Zhang¹³ &
…
Xiu Li¹⁴

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 13033))

Included in the following conference series:

Pacific Rim International Conference on Artificial Intelligence

1399 Accesses
1 Citations

Abstract

It’s generally believed that model-based reinforcement learning (RL) is more sample efficient than model-free RL. However, model-based RL methods typically suffer from model bias, which severely limits the asymptotic performance of the algorithm. Although previous model-based RL approaches use ensemble models to reduce the model error, we find that vanilla ensemble learning does not consider the model discrepancy. The discrepancy between different models is huge, which is not conducive to policy optimization. To alleviate the problem, this paper proposes an Ensemble Model Consistency Actor-Critic (EMC-AC) method to decrease the discrepancy between models while maintaining the model diversity. Specifically, we design ablation experiments to analyze the effects of the trade-off between diversity and consistency on the EMC-AC algorithm performance. Finally, extensive experiments on the continuous control benchmarks demonstrate that our approach achieves the significant performance to exceed the sample efficiency of prior model-based RL methods and to match the asymptotic performance of the state-of-the-art model-free RL algorithm.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 69.99; Price excludes VAT (USA)

Softcover Book: USD 89.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Abbeel, P., Quigley, M., Ng, A.Y.: Using inaccurate models in reinforcement learning. In: Proceedings of the 23rd International Conference on Machine Learning, pp. 1–8 (2006)
Google Scholar
Abdullah, A., Veltkamp, R.C., Wiering, M.A.: An ensemble of deep support vector machines for image categorization. In: 2009 International Conference of Soft Computing and Pattern Recognition, pp. 301–306. IEEE (2009)
Google Scholar
Bagnell, J.A., Schneider, J.G.: Autonomous helicopter control using reinforcement learning policy search methods. In: Proceedings 2001 ICRA. IEEE International Conference on Robotics and Automation (Cat. No. 01CH37164), vol. 2, pp. 1615–1620. IEEE (2001)
Google Scholar
Botev, Z.I., Kroese, D.P., Rubinstein, R.Y., et al.: The cross-entropy method for optimization. In: Handbook of statistics, vol. 31, pp. 35–59. Elsevier (2013)
Google Scholar
Bousquet, O., Chapelle, O., Hein, M.: Measure based regularization. In: Advances in Neural Information Processing Systems, pp. 1221–1228 (2004)
Google Scholar
Buckman, J., Hafner, D., et al.: Sample-efficient reinforcement learning with stochastic ensemble value expansion. In: Proceedings of the 32nd International Conference on Neural Information Processing Systems, pp. 8234–8244 (2018)
Google Scholar
Chua, K., Calandra, R., et al.: Deep reinforcement learning in a handful of trials using probabilistic dynamics models. In: Proceedings of the 32nd International Conference on Neural Information Processing Systems, pp. 4759–4770 (2018)
Google Scholar
Clavera, I., Fu, Y., Abbeel, P.: Model-augmented actor-critic: backpropagating through paths. In: 8th International Conference on Learning Representations, ICLR 2020, Addis Ababa, Ethiopia, 26–30 April 2020. OpenReview.net (2020)
Google Scholar
Deisenroth, M.P., Rasmussen, C.E.: PILCO: a model-based and data-efficient approach to policy search. In: Proceedings of the 28th International Conference on Machine Learning, ICML 2011, Bellevue, Washington, USA, pp. 465–472 (2011)
Google Scholar
Feinberg, V., Wan, A., Stoica, I., Jordan, M.I., Gonzalez, J.E., Levine, S.: Model-based value estimation for efficient model-free reinforcement learning. CoRR abs/1803.00101 (2018). arXiv:1803.00101
Graves, A.: Generating sequences with recurrent neural networks. CoRR abs/1308.0850 (2013). arXiv:1308.0850
Ha, D., Schmidhuber, J.: Recurrent world models facilitate policy evolution. In: Proceedings of the 32nd International Conference on Neural Information Processing Systems, pp. 2455–2467 (2018)
Google Scholar
Haarnoja, T., Zhou, A., Abbeel, P., Levine, S.: Soft actor-critic: off-policy maximum entropy deep reinforcement learning with a stochastic actor. In: International Conference on Machine Learning, pp. 1861–1870. PMLR (2018)
Google Scholar
Hafner, D., et al.: Learning latent dynamics for planning from pixels. In: International Conference on Machine Learning, pp. 2555–2565. PMLR (2019)
Google Scholar
Heess, N., Wayne, G., Silver, D., et al.: Learning continuous control policies by stochastic value gradients. In: Proceedings of the 28th International Conference on Neural Information Processing Systems-Volume 2, pp. 2944–2952 (2015)
Google Scholar
Janner, M., Fu, J., Zhang, M., Levine, S.: When to trust your model: model-based policy optimization. In: Advances in Neural Information Processing Systems 32, pp. 12498–12509 (2019)
Google Scholar
Kingma, D.P., Welling, M.: Auto-encoding variational bayes. In: 2nd International Conference on Learning Representations, ICLR 2014, Banff, AB, Canada, 14–16 April 2014, Conference Track Proceedings (2014)
Google Scholar
Konda, V.R., Tsitsiklis, J.N.: Actor-critic algorithms. In: Advances in Neural Information Processing Systems, pp. 1008–1014. Citeseer (2000)
Google Scholar
Kurutach, T., Clavera, I., Duan, Y., Tamar, A., Abbeel, P.: Model-ensemble trust-region policy optimization. In: International Conference on Learning Representations (2018)
Google Scholar
Levine, S., Abbeel, P.: Learning neural network policies with guided policy search under unknown dynamics. In: Advances in Neural Information Processing Systems 27: Annual Conference on Neural Information Processing Systems 2014, 8–13 December 2014, Montreal, Quebec, Canada, pp. 1071–1079 (2014)
Google Scholar
Levine, S., Finn, C., Darrell, T., Abbeel, P.: End-to-end training of deep visuomotor policies. J. Mach. Learn. Res. 17, 39:1-39:40 (2016)
MathSciNet MATH Google Scholar
Lillicrap, T.P., et al.: Continuous control with deep reinforcement learning. In: ICLR (Poster) (2016)
Google Scholar
Lyu, J., Ma, X., Yan, J., Li, X.: Efficient continuous control with double actors and regularized critics. arXiv preprint arXiv:2106.03050 (2021)
Mnih, V., et al.: Human-level control through deep reinforcement learning. Nature 518(7540), 529–533 (2015)
Article Google Scholar
Nagabandi, A., Kahn, G., Fearing, R.S., Levine, S.: Neural network dynamics for model-based deep reinforcement learning with model-free fine-tuning. In: 2018 IEEE International Conference on Robotics and Automation (ICRA), pp. 7559–7566. IEEE (2018)
Google Scholar
Richards, A.G.: Robust constrained model predictive control. Ph.D. thesis, Massachusetts Institute of Technology (2005)
Google Scholar
Sagi, O., Rokach, L.: Ensemble learning: a survey. Wiley Interdisc. Rev.: Data Min. Knowl. Discov. 8(4), e1249 (2018)
Google Scholar
Schrittwieser, J.,et al.: Mastering atari, go, chess and shogi by planning with a learned model. CoRR abs/1911.08265 (2019)
Google Scholar
Schulman, J., Levine, S., Abbeel, P., Jordan, M.I., Moritz, P.: Trust region policy optimization. In: Proceedings of the 32nd International Conference on Machine Learning, ICML 2015, Lille, France, 6–11 July 2015, JMLR Workshop and Conference Proceedings, vol. 37, pp. 1889–1897. JMLR.org (2015)
Google Scholar
Schulman, J., Moritz, P., Levine, S., Jordan, M.I., Abbeel, P.: High-dimensional continuous control using generalized advantage estimation. In: 4th International Conference on Learning Representations (2016)
Google Scholar
Silver, D., et al.: Mastering the game of go with deep neural networks and tree search. Nature 529(7587), 484–489 (2016)
Article Google Scholar
Silver, D., et al.: Mastering the game of go without human knowledge. Nature 550(7676), 354–359 (2017)
Article Google Scholar
Sutton, R.S., Barto, A.G.: Reinforcement Learning: An Introduction. Adaptive Computation and Machine Learning. MIT Press, Cambridge (1998)
Google Scholar
Todorov, E., Erez, T., Tassa, Y.: Mujoco: a physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, IROS 2012, Vilamoura, Algarve, Portugal, pp. 5026–5033. IEEE (2012)
Google Scholar
Wang, T., Bao, X., Clavera, I., Hoang, J., et al.: Benchmarking model-based reinforcement learning. CoRR abs/1907.02057 (2019)
Google Scholar

Download references

Acknowledgement

This work is funded by the National Natural Science Foundation of China (Grand No. 61876181), Beijing Nova Program of Science and Technology under Grand No. Z191100001119043 and in part by the Youth Innovation Promotion Association, CAS.

Author information

Authors and Affiliations

Department of Automation, Tsinghua University, Beijing, China
Ruonan Jia
CRISE, Institute of Automation, Chinese Academy of Sciences, Beijing, China
Qingming Li, Wenzhen Huang & Junge Zhang
Tsinghua Shenzhen International Graduate School, Tsinghua University, Shenzhen, China
Ruonan Jia & Xiu Li

Authors

Ruonan Jia
View author publications
You can also search for this author in PubMed Google Scholar
Qingming Li
View author publications
You can also search for this author in PubMed Google Scholar
Wenzhen Huang
View author publications
You can also search for this author in PubMed Google Scholar
Junge Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Xiu Li
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding authors

Correspondence to Junge Zhang or Xiu Li .

Editor information

Editors and Affiliations

MIMOS Berhad, Kuala Lumpur, Malaysia
Duc Nghia Pham
Sirindhorn International Institute of Science and Technology, Thammasat University, Mueang Pathum Thani, Thailand
Thanaruk Theeramunkong
Data61, CSIRO, Brisbane, QLD, Australia
Guido Governatori
Department of Philosophy, Tsinghua University, Beijing, China
Fenrong Liu

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Jia, R., Li, Q., Huang, W., Zhang, J., Li, X. (2021). Consistency Regularization for Ensemble Model Based Reinforcement Learning. In: Pham, D.N., Theeramunkong, T., Governatori, G., Liu, F. (eds) PRICAI 2021: Trends in Artificial Intelligence. PRICAI 2021. Lecture Notes in Computer Science(), vol 13033. Springer, Cham. https://doi.org/10.1007/978-3-030-89370-5_1

Download citation

DOI: https://doi.org/10.1007/978-3-030-89370-5_1
Published: 01 November 2021
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-89369-9
Online ISBN: 978-3-030-89370-5
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics