Meta-Reinforcement Learning Algorithm Based on Reward and Dynamic Inference

Chen, Jinhao; Zhang, Chunhong; Hu, Zheng

doi:10.1007/978-981-97-2259-4_17

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 14647))

Included in the following conference series:

Pacific-Asia Conference on Knowledge Discovery and Data Mining

146 Accesses

Abstract

Meta-Reinforcement Learning aims to rapidly address unseen tasks that share similar structures. However, the agent heavily relies on a large amount of experience during the meta-training phase, presenting a formidable challenge in achieving high sample efficiency. Current methods typically adapt to novel tasks within the Meta-Reinforcement Learning framework through task inference. Unfortunately, these approaches still exhibit limitations when faced with high-complexity task space. In this paper, we propose a Meta-Reinforcement Learning method based on reward and dynamic inference. We introduce independent reward and dynamic inference encoders, which sample specific context information to capture the deep-level features of task goals and dynamics. By reducing task inference space, agent effectively learns the shared structures across tasks and acquires a profound understanding of the task differences. We illustrate the performance degradation caused by the high task inference complexity and demonstrate that our method outperforms previous algorithms in terms of sample efficiency.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 119.00; Price excludes VAT (USA)

Softcover Book: USD 139.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Bellemare, M.G., et al.: Autonomous navigation of stratospheric balloons using reinforcement learning. Nature 588(7836), 77–82. https://doi.org/10.1038/s41586-020-2939-8. https://www.nature.com/articles/s41586-020-2939-8
Miki, T., Lee, J., Hwangbo, J., Wellhausen, L., Koltun, V., Hutter, M.: Learning robust perceptive locomotion for quadrupedal robots in the wild. Sci. Robot. 7(62), eabk2822. https://doi.org/10.1126/scirobotics.abk2822. https://www.science.org/doi/full/10.1126/scirobotics.abk2822
Lake, B.M., Ullman, T.D., Tenenbaum, J.B., Gershman, S.J.: Building machines that learn and think like people. Behav. Brain Sci. 40, e253 (2017). https://doi.org/10.1017/S0140525X16001837
Article Google Scholar
Peng, M., Zhu, B., Jiao, J.: Linear representation meta-reinforcement learning for instant adaptation. arXiv arXiv:2101.04750v1 (2021)
Beck, J., et al.: A survey of meta-reinforcement learning. arXiv arXiv:2301.08028 (2023). https://doi.org/10.48550/arXiv.2301.08028
Imagawa, T., Hiraoka, T., Tsuruoka, Y.: Off-policy meta-reinforcement learning with belief-based task inference. IEEE Access 10, 49494–49507. https://doi.org/10.1109/ACCESS.2022.3170582. https://ieeexplore.ieee.org/abstract/document/9763505
Wang, J.X., et al.: Learning to reinforcement learn. arXiv arXiv:1611.05763 (2017)
Melo, L.C.: Transformers are meta-reinforcement learners. arXiv arXiv:2206.06614 (2022)
Rakelly, K., Zhou, A., Quillen, D., Finn, C., Levine, S.: Efficient off-policy meta-reinforcement learning via probabilistic context variables, p. 10 (2019)
Google Scholar
Jiang, P., Song, S., Huang, G.: Exploration with task information for meta reinforcement learning. IEEE Trans. Neural Netw. Learn. Syst. 34(8), 4033–4046 (2023). https://doi.org/10.1109/TNNLS.2021.3121432. https://ieeexplore.ieee.org/document/9604770/
Humplik, J., Galashov, A., Hasenclever, L., Ortega, P.A., Teh, Y.W., Heess, N.: Meta reinforcement learning as task inference. arXiv arXiv:1905.06424 (2019)
Han, X., Wu, F.: Meta reinforcement learning with successor feature based context. arXiv arXiv:2207.14723 (2022)
Gupta, A., Mendonca, R., Liu, Y., Abbeel, P., Levine, S.: Meta-reinforcement learning of structured exploration strategies. In: Advances in Neural Information Processing Systems, vol. 31. Curran Associates, Inc. (2018). https://proceedings.neurips.cc/paper/2018/hash/4de754248c196c85ee4fbdcee89179bd-Abstract.html
Stadie, B.C., et al.: Some considerations on learning to explore via meta-reinforcement learning. arXiv arXiv:1803.01118 (2018)
Rothfuss, J., Lee, D., Clavera, I., Asfour, T., Abbeel, P.: ProMP: proximal meta-policy search (2018). https://doi.org/10.48550/arXiv.1810.06784. http://arxiv.org/abs/1810.06784
Zintgraf, L., Shiarli, K., Kurin, V., Hofmann, K., Whiteson, S.: Fast context adaptation via meta-learning. In: Proceedings of the 36th International Conference on Machine Learning, pp. 7693–7702. PMLR (2018). ISSN 2640-3498. https://proceedings.mlr.press/v97/zintgraf19a.html
Vuorio, R., Beck, J., Farquhar, G., Foerster, J., Whiteson, S.: No dice: an investigation of the bias- variance tradeoff in meta-gradients (2022)
Google Scholar
Mendonca, R., Gupta, A., Kralev, R., Abbeel, P., Levine, S., Finn, C.: Guided meta-policy search. In: Advances in Neural Information Processing Systems, vol. 32. Curran Associates, Inc. (2019). https://proceedings.neurips.cc/paper/2019/hash/d324a0cc02881779dcda44a675fdcaaa-Abstract.html
Finn, C., Abbeel, P., Levine, S.: Model-agnostic meta-learning for fast adaptation of deep networks, p. 10 (2017)
Google Scholar
Korshunova, I., Degrave, J., Dambre, J., Gretton, A., Huszár, F.: Exchangeable models in meta reinforcement learning (2020)
Google Scholar
Raileanu, R., Goldstein, M., Szlam, A., Fergus, R.: Fast adaptation via policy-dynamics value functions (2020). https://doi.org/10.48550/arXiv.2007.02879. http://arxiv.org/abs/2007.02879
He, J.Z.Y., Raghunathan, A., Brown, D.S., Erickson, Z., Dragan, A.D.: Learning representations that enable generalization in assistive tasks (2022). https://doi.org/10.48550/arXiv.2212.03175. https://arxiv.org/abs/2212.03175v1
Beck, J., Jackson, M.T., Vuorio, R., Whiteson, S.: Hypernetworks in meta-reinforcement learning (2022). https://doi.org/10.48550/arXiv.2210.11348. https://arxiv.org/abs/2210.11348v1
Duan, Y., Schulman, J., Chen, X., Bartlett, P.L., Sutskever, I., Abbeel, P.: RL\(^{2}\): fast reinforcement learning via slow reinforcement learning. arXiv arXiv:1611.02779 (2017)
Greenberg, I., Mannor, S., Chechik, G., Meirom, E.: Train hard, fight easy: robust meta reinforcement learning (2023). https://doi.org/10.48550/arXiv.2301.11147. http://arxiv.org/abs/2301.11147
Kaelbling, L.P., Littman, M.L., Cassandra, A.R.: Planning and acting in partially observable stochastic domains. Artif. Intell. 101(1), 99–134 (1998). https://doi.org/10.1016/S0004-3702(98)00023-X. https://www.sciencedirect.com/science/article/pii/S000437029800023X
Zintgraf, L., et al.: VariBAD: a very good method for Bayes-adaptive deep RL via meta-learning (2020). https://doi.org/10.48550/arXiv.1910.08348. https://arxiv.org/abs/1910.08348v2
Yu, T., et al.: Meta-world: a benchmark and evaluation for multi-task and meta reinforcement learning, p. 17 (2021)
Google Scholar
Yang, R., Xu, H., Wu, Y., Wang, X.: Multi-task reinforcement learning with soft modularization. arXiv arXiv:2003.13661 (2020)
Li, L., Huang, Y., Chen, M., Luo, S., Luo, D., Huang, J.: Provably improved context-based offline meta-RL with attention and contrastive learning, p. 21 (2021)
Google Scholar
Kingma, D.P., Welling, M.: Auto-encoding variational bayes (2022). https://doi.org/10.48550/arXiv.1312.6114. http://arxiv.org/abs/1312.6114
Alemi, A.A., Fischer, I., Dillon, J.V., Murphy, K.: Deep variational information bottleneck (2019). https://doi.org/10.48550/arXiv.1612.00410. http://arxiv.org/abs/1612.00410
Haarnoja, T., Zhou, A., Abbeel, P., Levine, S.: Soft actor-critic: off-policy maximum entropy deep reinforcement learning with a stochastic actor (2018). https://doi.org/10.48550/arXiv.1801.01290. http://arxiv.org/abs/1801.01290
Todorov, E., Erez, T., Tassa, Y.: MuJoCo: a physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033 (2012). ISSN 2153-0866. https://doi.org/10.1109/IROS.2012.6386109. https://ieeexplore.ieee.org/abstract/document/6386109

Download references

Acknowledgments

This work was supported by Beijing University of Posts and Telecommunications China Mobile Research Institute Joint Innovation Center.

Author information

Authors and Affiliations

State Key Laboratory Of Networking and Switching Technology, Beijing University of Posts and Telecommunications, Beijing, 100088, China
Jinhao Chen & Zheng Hu
Key Laboratory of Universal Wireless Communications, Ministry of Education, Beijing University of Posts and Telecommunications, Beijing, 100088, China
Chunhong Zhang

Authors

Jinhao Chen
View author publications
You can also search for this author in PubMed Google Scholar
Chunhong Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Zheng Hu
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Zheng Hu .

Editor information

Editors and Affiliations

Academia Sinica, Taipei, Taiwan
De-Nian Yang
Microsoft Research Asia, Beijing, China
Xing Xie
National Yang Ming Chiao Tung University, Hsinchu, Taiwan
Vincent S. Tseng
Duke University, Durham, NC, USA
Jian Pei
National Cheng Kung University, Tainan, Taiwan
Jen-Wei Huang
Silesian University of Technology, Gliwice, Poland
Jerry Chun-Wei Lin

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Chen, J., Zhang, C., Hu, Z. (2024). Meta-Reinforcement Learning Algorithm Based on Reward and Dynamic Inference. In: Yang, DN., Xie, X., Tseng, V.S., Pei, J., Huang, JW., Lin, J.CW. (eds) Advances in Knowledge Discovery and Data Mining. PAKDD 2024. Lecture Notes in Computer Science(), vol 14647. Springer, Singapore. https://doi.org/10.1007/978-981-97-2259-4_17

Download citation

DOI: https://doi.org/10.1007/978-981-97-2259-4_17
Published: 25 April 2024
Publisher Name: Springer, Singapore
Print ISBN: 978-981-97-2261-7
Online ISBN: 978-981-97-2259-4
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Meta-Reinforcement Learning Algorithm Based on Reward and Dynamic Inference