Skip to main content

Consistency Regularization for Ensemble Model Based Reinforcement Learning

  • Conference paper
  • First Online:
PRICAI 2021: Trends in Artificial Intelligence (PRICAI 2021)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 13033))

Included in the following conference series:

Abstract

It’s generally believed that model-based reinforcement learning (RL) is more sample efficient than model-free RL. However, model-based RL methods typically suffer from model bias, which severely limits the asymptotic performance of the algorithm. Although previous model-based RL approaches use ensemble models to reduce the model error, we find that vanilla ensemble learning does not consider the model discrepancy. The discrepancy between different models is huge, which is not conducive to policy optimization. To alleviate the problem, this paper proposes an Ensemble Model Consistency Actor-Critic (EMC-AC) method to decrease the discrepancy between models while maintaining the model diversity. Specifically, we design ablation experiments to analyze the effects of the trade-off between diversity and consistency on the EMC-AC algorithm performance. Finally, extensive experiments on the continuous control benchmarks demonstrate that our approach achieves the significant performance to exceed the sample efficiency of prior model-based RL methods and to match the asymptotic performance of the state-of-the-art model-free RL algorithm.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 69.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 89.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Abbeel, P., Quigley, M., Ng, A.Y.: Using inaccurate models in reinforcement learning. In: Proceedings of the 23rd International Conference on Machine Learning, pp. 1–8 (2006)

    Google Scholar 

  2. Abdullah, A., Veltkamp, R.C., Wiering, M.A.: An ensemble of deep support vector machines for image categorization. In: 2009 International Conference of Soft Computing and Pattern Recognition, pp. 301–306. IEEE (2009)

    Google Scholar 

  3. Bagnell, J.A., Schneider, J.G.: Autonomous helicopter control using reinforcement learning policy search methods. In: Proceedings 2001 ICRA. IEEE International Conference on Robotics and Automation (Cat. No. 01CH37164), vol. 2, pp. 1615–1620. IEEE (2001)

    Google Scholar 

  4. Botev, Z.I., Kroese, D.P., Rubinstein, R.Y., et al.: The cross-entropy method for optimization. In: Handbook of statistics, vol. 31, pp. 35–59. Elsevier (2013)

    Google Scholar 

  5. Bousquet, O., Chapelle, O., Hein, M.: Measure based regularization. In: Advances in Neural Information Processing Systems, pp. 1221–1228 (2004)

    Google Scholar 

  6. Buckman, J., Hafner, D., et al.: Sample-efficient reinforcement learning with stochastic ensemble value expansion. In: Proceedings of the 32nd International Conference on Neural Information Processing Systems, pp. 8234–8244 (2018)

    Google Scholar 

  7. Chua, K., Calandra, R., et al.: Deep reinforcement learning in a handful of trials using probabilistic dynamics models. In: Proceedings of the 32nd International Conference on Neural Information Processing Systems, pp. 4759–4770 (2018)

    Google Scholar 

  8. Clavera, I., Fu, Y., Abbeel, P.: Model-augmented actor-critic: backpropagating through paths. In: 8th International Conference on Learning Representations, ICLR 2020, Addis Ababa, Ethiopia, 26–30 April 2020. OpenReview.net (2020)

    Google Scholar 

  9. Deisenroth, M.P., Rasmussen, C.E.: PILCO: a model-based and data-efficient approach to policy search. In: Proceedings of the 28th International Conference on Machine Learning, ICML 2011, Bellevue, Washington, USA, pp. 465–472 (2011)

    Google Scholar 

  10. Feinberg, V., Wan, A., Stoica, I., Jordan, M.I., Gonzalez, J.E., Levine, S.: Model-based value estimation for efficient model-free reinforcement learning. CoRR abs/1803.00101 (2018). arXiv:1803.00101

  11. Graves, A.: Generating sequences with recurrent neural networks. CoRR abs/1308.0850 (2013). arXiv:1308.0850

  12. Ha, D., Schmidhuber, J.: Recurrent world models facilitate policy evolution. In: Proceedings of the 32nd International Conference on Neural Information Processing Systems, pp. 2455–2467 (2018)

    Google Scholar 

  13. Haarnoja, T., Zhou, A., Abbeel, P., Levine, S.: Soft actor-critic: off-policy maximum entropy deep reinforcement learning with a stochastic actor. In: International Conference on Machine Learning, pp. 1861–1870. PMLR (2018)

    Google Scholar 

  14. Hafner, D., et al.: Learning latent dynamics for planning from pixels. In: International Conference on Machine Learning, pp. 2555–2565. PMLR (2019)

    Google Scholar 

  15. Heess, N., Wayne, G., Silver, D., et al.: Learning continuous control policies by stochastic value gradients. In: Proceedings of the 28th International Conference on Neural Information Processing Systems-Volume 2, pp. 2944–2952 (2015)

    Google Scholar 

  16. Janner, M., Fu, J., Zhang, M., Levine, S.: When to trust your model: model-based policy optimization. In: Advances in Neural Information Processing Systems 32, pp. 12498–12509 (2019)

    Google Scholar 

  17. Kingma, D.P., Welling, M.: Auto-encoding variational bayes. In: 2nd International Conference on Learning Representations, ICLR 2014, Banff, AB, Canada, 14–16 April 2014, Conference Track Proceedings (2014)

    Google Scholar 

  18. Konda, V.R., Tsitsiklis, J.N.: Actor-critic algorithms. In: Advances in Neural Information Processing Systems, pp. 1008–1014. Citeseer (2000)

    Google Scholar 

  19. Kurutach, T., Clavera, I., Duan, Y., Tamar, A., Abbeel, P.: Model-ensemble trust-region policy optimization. In: International Conference on Learning Representations (2018)

    Google Scholar 

  20. Levine, S., Abbeel, P.: Learning neural network policies with guided policy search under unknown dynamics. In: Advances in Neural Information Processing Systems 27: Annual Conference on Neural Information Processing Systems 2014, 8–13 December 2014, Montreal, Quebec, Canada, pp. 1071–1079 (2014)

    Google Scholar 

  21. Levine, S., Finn, C., Darrell, T., Abbeel, P.: End-to-end training of deep visuomotor policies. J. Mach. Learn. Res. 17, 39:1-39:40 (2016)

    MathSciNet  MATH  Google Scholar 

  22. Lillicrap, T.P., et al.: Continuous control with deep reinforcement learning. In: ICLR (Poster) (2016)

    Google Scholar 

  23. Lyu, J., Ma, X., Yan, J., Li, X.: Efficient continuous control with double actors and regularized critics. arXiv preprint arXiv:2106.03050 (2021)

  24. Mnih, V., et al.: Human-level control through deep reinforcement learning. Nature 518(7540), 529–533 (2015)

    Article  Google Scholar 

  25. Nagabandi, A., Kahn, G., Fearing, R.S., Levine, S.: Neural network dynamics for model-based deep reinforcement learning with model-free fine-tuning. In: 2018 IEEE International Conference on Robotics and Automation (ICRA), pp. 7559–7566. IEEE (2018)

    Google Scholar 

  26. Richards, A.G.: Robust constrained model predictive control. Ph.D. thesis, Massachusetts Institute of Technology (2005)

    Google Scholar 

  27. Sagi, O., Rokach, L.: Ensemble learning: a survey. Wiley Interdisc. Rev.: Data Min. Knowl. Discov. 8(4), e1249 (2018)

    Google Scholar 

  28. Schrittwieser, J.,et al.: Mastering atari, go, chess and shogi by planning with a learned model. CoRR abs/1911.08265 (2019)

    Google Scholar 

  29. Schulman, J., Levine, S., Abbeel, P., Jordan, M.I., Moritz, P.: Trust region policy optimization. In: Proceedings of the 32nd International Conference on Machine Learning, ICML 2015, Lille, France, 6–11 July 2015, JMLR Workshop and Conference Proceedings, vol. 37, pp. 1889–1897. JMLR.org (2015)

    Google Scholar 

  30. Schulman, J., Moritz, P., Levine, S., Jordan, M.I., Abbeel, P.: High-dimensional continuous control using generalized advantage estimation. In: 4th International Conference on Learning Representations (2016)

    Google Scholar 

  31. Silver, D., et al.: Mastering the game of go with deep neural networks and tree search. Nature 529(7587), 484–489 (2016)

    Article  Google Scholar 

  32. Silver, D., et al.: Mastering the game of go without human knowledge. Nature 550(7676), 354–359 (2017)

    Article  Google Scholar 

  33. Sutton, R.S., Barto, A.G.: Reinforcement Learning: An Introduction. Adaptive Computation and Machine Learning. MIT Press, Cambridge (1998)

    Google Scholar 

  34. Todorov, E., Erez, T., Tassa, Y.: Mujoco: a physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, IROS 2012, Vilamoura, Algarve, Portugal, pp. 5026–5033. IEEE (2012)

    Google Scholar 

  35. Wang, T., Bao, X., Clavera, I., Hoang, J., et al.: Benchmarking model-based reinforcement learning. CoRR abs/1907.02057 (2019)

    Google Scholar 

Download references

Acknowledgement

This work is funded by the National Natural Science Foundation of China (Grand No. 61876181), Beijing Nova Program of Science and Technology under Grand No. Z191100001119043 and in part by the Youth Innovation Promotion Association, CAS.

Author information

Authors and Affiliations

Authors

Corresponding authors

Correspondence to Junge Zhang or Xiu Li .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2021 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Jia, R., Li, Q., Huang, W., Zhang, J., Li, X. (2021). Consistency Regularization for Ensemble Model Based Reinforcement Learning. In: Pham, D.N., Theeramunkong, T., Governatori, G., Liu, F. (eds) PRICAI 2021: Trends in Artificial Intelligence. PRICAI 2021. Lecture Notes in Computer Science(), vol 13033. Springer, Cham. https://doi.org/10.1007/978-3-030-89370-5_1

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-89370-5_1

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-89369-9

  • Online ISBN: 978-3-030-89370-5

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics