Abstract
Generalizability is a formidable challenge in applying reinforcement learning to the real world. The root cause of poor generalization performance in reinforcement learning is that generalization from a limited number of training conditions to unseen test conditions results in implicit partial observability, effectively transforming even fully observed Markov Decision Process (MDP) into Partially Observable Markov Decision Process (POMDP). To address such issues, we propose a novel structure, namely Context-adapted Multi-policy Ensemble Method (CAMPE), which enables the model to adapt to changes in the environment and efficiently solve implicit partial observability during generalization. The method captures local dynamic changes by learning contextual environment latent variables to equip the model with the ability of environment adaption. The latent variables and samples with contextual information are used as the input of the policy. Multiple policies are trained, combined in an integrated way to obtain a single policy to approximately solve the problem of partial observability. We demonstrate our method on various simulated robotics and control tasks. Experimental results show that our method achieves superior generalization ability.
CAS Project for Young Scientists in Basic Research, Grant No. YSBR-040.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Justesen, N., Torrado, R., Bontrager, P., Khalifa, A., Togelius, J., Risi, S.: Illuminating generalization in deep reinforcement learning through procedural level generation. arXiv: Learning (2018)
Zhang, C., Vinyals, O., Munos, R., Bengio, S.: A study on overfitting in deep reinforcement learning. arXiv, abs/1804.06893 (2018)
Gamrian, S., Goldberg, Y.: Transfer learning for related reinforcement learning tasks via image-to-image translation. In: International Conference on Machine Learning. PMLR (2019)
Farebrother, J., Machado, M.C., Bowling, M.: Generalization and regularization in DQN. arXiv preprint arXiv:1810.00123 (2018)
DeVries, T., Taylor, G.W.: Improved regularization of convolutional neural networks with cutout. arXiv preprint arXiv:1708.04552 (2017)
Cubuk, E.D., et al.: Autoaugment: learning augmentation policies from data. arXiv preprint arXiv:1805.09501 (2018)
Cobbe, K., et al.: Quantifying generalization in reinforcement learning. In: International Conference on Machine Learning. PMLR (2019)
Ponsen, M., Taylor, M.E., Tuyls, K.: Abstraction and generalization in reinforcement learning: a summary and framework. In: Taylor, M.E., Tuyls, K. (eds.) ALA 2009. LNCS (LNAI), vol. 5924, pp. 1–32. Springer, Heidelberg (2010). https://doi.org/10.1007/978-3-642-11814-2_1
Ghosh, D., Rahme, J., Kumar, A., et al.: Why generalization in RL is difficult: epistemic pomdps and implicit partial observability. In: Advances in Neural Information Processing Systems, vol. 34 (2021)
Kirk, R., et al.: A survey of generalisation in deep reinforcement learning. arXiv preprint arXiv:2111.09794 (2021)
Shanthamallu, U.S., et al.: A brief survey of machine learning methods and their sensor and IoT applications. In: 2017 8th International Conference on Information, Intelligence, Systems and Applications (IISA). IEEE (2017)
Tobin, J., et al.: Domain randomization for transferring deep neural networks from simulation to the real world. In: 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). IEEE (2017)
Bellemare, M.G., et al.: The arcade learning environment: an evaluation platform for general agents. J. Artif. Intell. Res. 47, 253–279 (2013)
Zhang, A., Ballas, N., Pineau, J.: A dissection of overfitting and generalization in continuous reinforcement learning. arXiv, abs/1806.07937 (2018)
Liu, Z., Li, X., Kang, B., Darrell, T.: Regularization matters in policy optimization - an empirical study on continuous control. arXiv: Learning (2020)
Smallwood, R.D., Sondik, E.J.: The optimal control of partially observable Markov processes over a finite horizon. Oper. Res. 21(5), 1071–1088 (1973)
Hausknecht, M., Stone, P.: Deep recurrent q-learning for partially observable MDPS. arXiv preprint arXiv:1507.06527 (2015)
Zhu, P., Li, X., Poupart, P., Miao, G.: On improving deep reinforcement learning for pomdps. arXiv preprint arXiv:1704.07978 (2017)
Qureshi, A.H., et al.: Robot gains social intelligence through multimodal deep reinforcement learning. In: 2016 IEEE-RAS 16th International Conference on Humanoid Robots (Humanoids). IEEE (2016)
Pineau, J., Gordon, G., Thrun, S.: Anytime point-based approximations for large POMDPs. J. Artif. Intell. Res. 27, 335–380 (2006)
Amato, C., Bernstein, D.S., Zilberstein, S.: Optimizing fixed-size stochastic controllers for POMDPs and decentralized POMDPs. Auton. Agent. Multi-Agent Syst. 21(3), 293–320 (2010)
Roy, N., Gordon, G., Thrun, S.: Finding approximate POMDP solutions through belief compression. J. Artif. Intell. Res. 23, 1–40 (2005)
Ramachandran, D., Amir, E.: Bayesian inverse reinforcement learning. In: IJCAI (2007)
Lazaric, A., Ghavamzadeh, M.: Bayesian multi-task reinforcement learning. In: ICML (2010)
Ghavamzadeh, M., et al.: Bayesian reinforcement learning: a survey. Found. Trends® Mach. Learn. 8(5-6), 359–483 (2015)
Zhou, W., Pinto, L., Gupta, A.: Environment probing interaction policies. In: ICLR (2019)
Mnih, V., et al.: Asynchronous methods for deep reinforcement learning. In: International Conference on Machine Learning (ICML) (2016)
Schulman, J., Abbeel, P., Chen, X.: Equivalence between policy gradients and soft Q-Learning. arXiv:1704.06440 (2017)
Todorov, E., Erez, T., Tassa, Y.: Mujoco: a physics engine for model-based control. In: IROS (2012)
Brockman, G., et al.: Openai gym. arXiv preprint arXiv:1606.01540 (2016)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Xu, T., Wu, F., Zhao, J. (2023). Context-Adapted Multi-policy Ensemble Method for Generalization in Reinforcement Learning. In: Tanveer, M., Agarwal, S., Ozawa, S., Ekbal, A., Jatowt, A. (eds) Neural Information Processing. ICONIP 2022. Lecture Notes in Computer Science, vol 13623. Springer, Cham. https://doi.org/10.1007/978-3-031-30105-6_34
Download citation
DOI: https://doi.org/10.1007/978-3-031-30105-6_34
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-30104-9
Online ISBN: 978-3-031-30105-6
eBook Packages: Computer ScienceComputer Science (R0)