Context-Adapted Multi-policy Ensemble Method for Generalization in Reinforcement Learning

Xu, Tingting; Wu, Fengge; Zhao, Junsuo

doi:10.1007/978-3-031-30105-6_34

Tingting Xu^12,13,
Fengge Wu^12,13 &
Junsuo Zhao^12,13

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 13623))

Included in the following conference series:

International Conference on Neural Information Processing

1148 Accesses

Abstract

Generalizability is a formidable challenge in applying reinforcement learning to the real world. The root cause of poor generalization performance in reinforcement learning is that generalization from a limited number of training conditions to unseen test conditions results in implicit partial observability, effectively transforming even fully observed Markov Decision Process (MDP) into Partially Observable Markov Decision Process (POMDP). To address such issues, we propose a novel structure, namely Context-adapted Multi-policy Ensemble Method (CAMPE), which enables the model to adapt to changes in the environment and efficiently solve implicit partial observability during generalization. The method captures local dynamic changes by learning contextual environment latent variables to equip the model with the ability of environment adaption. The latent variables and samples with contextual information are used as the input of the policy. Multiple policies are trained, combined in an integrated way to obtain a single policy to approximately solve the problem of partial observability. We demonstrate our method on various simulated robotics and control tasks. Experimental results show that our method achieves superior generalization ability.

CAS Project for Young Scientists in Basic Research, Grant No. YSBR-040.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Justesen, N., Torrado, R., Bontrager, P., Khalifa, A., Togelius, J., Risi, S.: Illuminating generalization in deep reinforcement learning through procedural level generation. arXiv: Learning (2018)
Zhang, C., Vinyals, O., Munos, R., Bengio, S.: A study on overfitting in deep reinforcement learning. arXiv, abs/1804.06893 (2018)
Google Scholar
Gamrian, S., Goldberg, Y.: Transfer learning for related reinforcement learning tasks via image-to-image translation. In: International Conference on Machine Learning. PMLR (2019)
Google Scholar
Farebrother, J., Machado, M.C., Bowling, M.: Generalization and regularization in DQN. arXiv preprint arXiv:1810.00123 (2018)
DeVries, T., Taylor, G.W.: Improved regularization of convolutional neural networks with cutout. arXiv preprint arXiv:1708.04552 (2017)
Cubuk, E.D., et al.: Autoaugment: learning augmentation policies from data. arXiv preprint arXiv:1805.09501 (2018)
Cobbe, K., et al.: Quantifying generalization in reinforcement learning. In: International Conference on Machine Learning. PMLR (2019)
Google Scholar
Ponsen, M., Taylor, M.E., Tuyls, K.: Abstraction and generalization in reinforcement learning: a summary and framework. In: Taylor, M.E., Tuyls, K. (eds.) ALA 2009. LNCS (LNAI), vol. 5924, pp. 1–32. Springer, Heidelberg (2010). https://doi.org/10.1007/978-3-642-11814-2_1
Chapter Google Scholar
Ghosh, D., Rahme, J., Kumar, A., et al.: Why generalization in RL is difficult: epistemic pomdps and implicit partial observability. In: Advances in Neural Information Processing Systems, vol. 34 (2021)
Google Scholar
Kirk, R., et al.: A survey of generalisation in deep reinforcement learning. arXiv preprint arXiv:2111.09794 (2021)
Shanthamallu, U.S., et al.: A brief survey of machine learning methods and their sensor and IoT applications. In: 2017 8th International Conference on Information, Intelligence, Systems and Applications (IISA). IEEE (2017)
Google Scholar
Tobin, J., et al.: Domain randomization for transferring deep neural networks from simulation to the real world. In: 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). IEEE (2017)
Google Scholar
Bellemare, M.G., et al.: The arcade learning environment: an evaluation platform for general agents. J. Artif. Intell. Res. 47, 253–279 (2013)
Google Scholar
Zhang, A., Ballas, N., Pineau, J.: A dissection of overfitting and generalization in continuous reinforcement learning. arXiv, abs/1806.07937 (2018)
Google Scholar
Liu, Z., Li, X., Kang, B., Darrell, T.: Regularization matters in policy optimization - an empirical study on continuous control. arXiv: Learning (2020)
Smallwood, R.D., Sondik, E.J.: The optimal control of partially observable Markov processes over a finite horizon. Oper. Res. 21(5), 1071–1088 (1973)
Article MATH Google Scholar
Hausknecht, M., Stone, P.: Deep recurrent q-learning for partially observable MDPS. arXiv preprint arXiv:1507.06527 (2015)
Zhu, P., Li, X., Poupart, P., Miao, G.: On improving deep reinforcement learning for pomdps. arXiv preprint arXiv:1704.07978 (2017)
Qureshi, A.H., et al.: Robot gains social intelligence through multimodal deep reinforcement learning. In: 2016 IEEE-RAS 16th International Conference on Humanoid Robots (Humanoids). IEEE (2016)
Google Scholar
Pineau, J., Gordon, G., Thrun, S.: Anytime point-based approximations for large POMDPs. J. Artif. Intell. Res. 27, 335–380 (2006)
Article MATH Google Scholar
Amato, C., Bernstein, D.S., Zilberstein, S.: Optimizing fixed-size stochastic controllers for POMDPs and decentralized POMDPs. Auton. Agent. Multi-Agent Syst. 21(3), 293–320 (2010)
Article Google Scholar
Roy, N., Gordon, G., Thrun, S.: Finding approximate POMDP solutions through belief compression. J. Artif. Intell. Res. 23, 1–40 (2005)
Article MATH Google Scholar
Ramachandran, D., Amir, E.: Bayesian inverse reinforcement learning. In: IJCAI (2007)
Google Scholar
Lazaric, A., Ghavamzadeh, M.: Bayesian multi-task reinforcement learning. In: ICML (2010)
Google Scholar
Ghavamzadeh, M., et al.: Bayesian reinforcement learning: a survey. Found. Trends® Mach. Learn. 8(5-6), 359–483 (2015)
Google Scholar
Zhou, W., Pinto, L., Gupta, A.: Environment probing interaction policies. In: ICLR (2019)
Google Scholar
Mnih, V., et al.: Asynchronous methods for deep reinforcement learning. In: International Conference on Machine Learning (ICML) (2016)
Google Scholar
Schulman, J., Abbeel, P., Chen, X.: Equivalence between policy gradients and soft Q-Learning. arXiv:1704.06440 (2017)
Todorov, E., Erez, T., Tassa, Y.: Mujoco: a physics engine for model-based control. In: IROS (2012)
Google Scholar
Brockman, G., et al.: Openai gym. arXiv preprint arXiv:1606.01540 (2016)

Download references

Author information

Authors and Affiliations

Institute of Software, Chinese Academy of Sciences, Beijing, 100190, China
Tingting Xu, Fengge Wu & Junsuo Zhao
University of Chinese Academy of Sciences, Beijing, 100049, China
Tingting Xu, Fengge Wu & Junsuo Zhao

Authors

Tingting Xu
View author publications
You can also search for this author in PubMed Google Scholar
Fengge Wu
View author publications
You can also search for this author in PubMed Google Scholar
Junsuo Zhao
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Fengge Wu .

Editor information

Editors and Affiliations

Indian Institute of Technology Indore, Indore, India
Mohammad Tanveer
Indian Institute of Information Technology - Allahabad, Prayagraj, India
Sonali Agarwal
Kobe University, Kobe, Japan
Seiichi Ozawa
Indian Institute of Technology Patna, Patna, India
Asif Ekbal
University of Innsbruck, Innsbruck, Austria
Adam Jatowt

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Xu, T., Wu, F., Zhao, J. (2023). Context-Adapted Multi-policy Ensemble Method for Generalization in Reinforcement Learning. In: Tanveer, M., Agarwal, S., Ozawa, S., Ekbal, A., Jatowt, A. (eds) Neural Information Processing. ICONIP 2022. Lecture Notes in Computer Science, vol 13623. Springer, Cham. https://doi.org/10.1007/978-3-031-30105-6_34

Download citation

DOI: https://doi.org/10.1007/978-3-031-30105-6_34
Published: 13 April 2023
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-30104-9
Online ISBN: 978-3-031-30105-6
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Context-Adapted Multi-policy Ensemble Method for Generalization in Reinforcement Learning