Skip to main content

Context-Adapted Multi-policy Ensemble Method for Generalization in Reinforcement Learning

  • Conference paper
  • First Online:
Neural Information Processing (ICONIP 2022)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 13623))

Included in the following conference series:

  • 1148 Accesses

Abstract

Generalizability is a formidable challenge in applying reinforcement learning to the real world. The root cause of poor generalization performance in reinforcement learning is that generalization from a limited number of training conditions to unseen test conditions results in implicit partial observability, effectively transforming even fully observed Markov Decision Process (MDP) into Partially Observable Markov Decision Process (POMDP). To address such issues, we propose a novel structure, namely Context-adapted Multi-policy Ensemble Method (CAMPE), which enables the model to adapt to changes in the environment and efficiently solve implicit partial observability during generalization. The method captures local dynamic changes by learning contextual environment latent variables to equip the model with the ability of environment adaption. The latent variables and samples with contextual information are used as the input of the policy. Multiple policies are trained, combined in an integrated way to obtain a single policy to approximately solve the problem of partial observability. We demonstrate our method on various simulated robotics and control tasks. Experimental results show that our method achieves superior generalization ability.

CAS Project for Young Scientists in Basic Research, Grant No. YSBR-040.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Justesen, N., Torrado, R., Bontrager, P., Khalifa, A., Togelius, J., Risi, S.: Illuminating generalization in deep reinforcement learning through procedural level generation. arXiv: Learning (2018)

  2. Zhang, C., Vinyals, O., Munos, R., Bengio, S.: A study on overfitting in deep reinforcement learning. arXiv, abs/1804.06893 (2018)

    Google Scholar 

  3. Gamrian, S., Goldberg, Y.: Transfer learning for related reinforcement learning tasks via image-to-image translation. In: International Conference on Machine Learning. PMLR (2019)

    Google Scholar 

  4. Farebrother, J., Machado, M.C., Bowling, M.: Generalization and regularization in DQN. arXiv preprint arXiv:1810.00123 (2018)

  5. DeVries, T., Taylor, G.W.: Improved regularization of convolutional neural networks with cutout. arXiv preprint arXiv:1708.04552 (2017)

  6. Cubuk, E.D., et al.: Autoaugment: learning augmentation policies from data. arXiv preprint arXiv:1805.09501 (2018)

  7. Cobbe, K., et al.: Quantifying generalization in reinforcement learning. In: International Conference on Machine Learning. PMLR (2019)

    Google Scholar 

  8. Ponsen, M., Taylor, M.E., Tuyls, K.: Abstraction and generalization in reinforcement learning: a summary and framework. In: Taylor, M.E., Tuyls, K. (eds.) ALA 2009. LNCS (LNAI), vol. 5924, pp. 1–32. Springer, Heidelberg (2010). https://doi.org/10.1007/978-3-642-11814-2_1

    Chapter  Google Scholar 

  9. Ghosh, D., Rahme, J., Kumar, A., et al.: Why generalization in RL is difficult: epistemic pomdps and implicit partial observability. In: Advances in Neural Information Processing Systems, vol. 34 (2021)

    Google Scholar 

  10. Kirk, R., et al.: A survey of generalisation in deep reinforcement learning. arXiv preprint arXiv:2111.09794 (2021)

  11. Shanthamallu, U.S., et al.: A brief survey of machine learning methods and their sensor and IoT applications. In: 2017 8th International Conference on Information, Intelligence, Systems and Applications (IISA). IEEE (2017)

    Google Scholar 

  12. Tobin, J., et al.: Domain randomization for transferring deep neural networks from simulation to the real world. In: 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). IEEE (2017)

    Google Scholar 

  13. Bellemare, M.G., et al.: The arcade learning environment: an evaluation platform for general agents. J. Artif. Intell. Res. 47, 253–279 (2013)

    Google Scholar 

  14. Zhang, A., Ballas, N., Pineau, J.: A dissection of overfitting and generalization in continuous reinforcement learning. arXiv, abs/1806.07937 (2018)

    Google Scholar 

  15. Liu, Z., Li, X., Kang, B., Darrell, T.: Regularization matters in policy optimization - an empirical study on continuous control. arXiv: Learning (2020)

  16. Smallwood, R.D., Sondik, E.J.: The optimal control of partially observable Markov processes over a finite horizon. Oper. Res. 21(5), 1071–1088 (1973)

    Article  MATH  Google Scholar 

  17. Hausknecht, M., Stone, P.: Deep recurrent q-learning for partially observable MDPS. arXiv preprint arXiv:1507.06527 (2015)

  18. Zhu, P., Li, X., Poupart, P., Miao, G.: On improving deep reinforcement learning for pomdps. arXiv preprint arXiv:1704.07978 (2017)

  19. Qureshi, A.H., et al.: Robot gains social intelligence through multimodal deep reinforcement learning. In: 2016 IEEE-RAS 16th International Conference on Humanoid Robots (Humanoids). IEEE (2016)

    Google Scholar 

  20. Pineau, J., Gordon, G., Thrun, S.: Anytime point-based approximations for large POMDPs. J. Artif. Intell. Res. 27, 335–380 (2006)

    Article  MATH  Google Scholar 

  21. Amato, C., Bernstein, D.S., Zilberstein, S.: Optimizing fixed-size stochastic controllers for POMDPs and decentralized POMDPs. Auton. Agent. Multi-Agent Syst. 21(3), 293–320 (2010)

    Article  Google Scholar 

  22. Roy, N., Gordon, G., Thrun, S.: Finding approximate POMDP solutions through belief compression. J. Artif. Intell. Res. 23, 1–40 (2005)

    Article  MATH  Google Scholar 

  23. Ramachandran, D., Amir, E.: Bayesian inverse reinforcement learning. In: IJCAI (2007)

    Google Scholar 

  24. Lazaric, A., Ghavamzadeh, M.: Bayesian multi-task reinforcement learning. In: ICML (2010)

    Google Scholar 

  25. Ghavamzadeh, M., et al.: Bayesian reinforcement learning: a survey. Found. Trends® Mach. Learn. 8(5-6), 359–483 (2015)

    Google Scholar 

  26. Zhou, W., Pinto, L., Gupta, A.: Environment probing interaction policies. In: ICLR (2019)

    Google Scholar 

  27. Mnih, V., et al.: Asynchronous methods for deep reinforcement learning. In: International Conference on Machine Learning (ICML) (2016)

    Google Scholar 

  28. Schulman, J., Abbeel, P., Chen, X.: Equivalence between policy gradients and soft Q-Learning. arXiv:1704.06440 (2017)

  29. Todorov, E., Erez, T., Tassa, Y.: Mujoco: a physics engine for model-based control. In: IROS (2012)

    Google Scholar 

  30. Brockman, G., et al.: Openai gym. arXiv preprint arXiv:1606.01540 (2016)

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Fengge Wu .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Xu, T., Wu, F., Zhao, J. (2023). Context-Adapted Multi-policy Ensemble Method for Generalization in Reinforcement Learning. In: Tanveer, M., Agarwal, S., Ozawa, S., Ekbal, A., Jatowt, A. (eds) Neural Information Processing. ICONIP 2022. Lecture Notes in Computer Science, vol 13623. Springer, Cham. https://doi.org/10.1007/978-3-031-30105-6_34

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-30105-6_34

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-30104-9

  • Online ISBN: 978-3-031-30105-6

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics