Abstract
Meta-Reinforcement Learning aims to rapidly address unseen tasks that share similar structures. However, the agent heavily relies on a large amount of experience during the meta-training phase, presenting a formidable challenge in achieving high sample efficiency. Current methods typically adapt to novel tasks within the Meta-Reinforcement Learning framework through task inference. Unfortunately, these approaches still exhibit limitations when faced with high-complexity task space. In this paper, we propose a Meta-Reinforcement Learning method based on reward and dynamic inference. We introduce independent reward and dynamic inference encoders, which sample specific context information to capture the deep-level features of task goals and dynamics. By reducing task inference space, agent effectively learns the shared structures across tasks and acquires a profound understanding of the task differences. We illustrate the performance degradation caused by the high task inference complexity and demonstrate that our method outperforms previous algorithms in terms of sample efficiency.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Bellemare, M.G., et al.: Autonomous navigation of stratospheric balloons using reinforcement learning. Nature 588(7836), 77–82. https://doi.org/10.1038/s41586-020-2939-8. https://www.nature.com/articles/s41586-020-2939-8
Miki, T., Lee, J., Hwangbo, J., Wellhausen, L., Koltun, V., Hutter, M.: Learning robust perceptive locomotion for quadrupedal robots in the wild. Sci. Robot. 7(62), eabk2822. https://doi.org/10.1126/scirobotics.abk2822. https://www.science.org/doi/full/10.1126/scirobotics.abk2822
Lake, B.M., Ullman, T.D., Tenenbaum, J.B., Gershman, S.J.: Building machines that learn and think like people. Behav. Brain Sci. 40, e253 (2017). https://doi.org/10.1017/S0140525X16001837
Peng, M., Zhu, B., Jiao, J.: Linear representation meta-reinforcement learning for instant adaptation. arXiv arXiv:2101.04750v1 (2021)
Beck, J., et al.: A survey of meta-reinforcement learning. arXiv arXiv:2301.08028 (2023). https://doi.org/10.48550/arXiv.2301.08028
Imagawa, T., Hiraoka, T., Tsuruoka, Y.: Off-policy meta-reinforcement learning with belief-based task inference. IEEE Access 10, 49494–49507. https://doi.org/10.1109/ACCESS.2022.3170582. https://ieeexplore.ieee.org/abstract/document/9763505
Wang, J.X., et al.: Learning to reinforcement learn. arXiv arXiv:1611.05763 (2017)
Melo, L.C.: Transformers are meta-reinforcement learners. arXiv arXiv:2206.06614 (2022)
Rakelly, K., Zhou, A., Quillen, D., Finn, C., Levine, S.: Efficient off-policy meta-reinforcement learning via probabilistic context variables, p. 10 (2019)
Jiang, P., Song, S., Huang, G.: Exploration with task information for meta reinforcement learning. IEEE Trans. Neural Netw. Learn. Syst. 34(8), 4033–4046 (2023). https://doi.org/10.1109/TNNLS.2021.3121432. https://ieeexplore.ieee.org/document/9604770/
Humplik, J., Galashov, A., Hasenclever, L., Ortega, P.A., Teh, Y.W., Heess, N.: Meta reinforcement learning as task inference. arXiv arXiv:1905.06424 (2019)
Han, X., Wu, F.: Meta reinforcement learning with successor feature based context. arXiv arXiv:2207.14723 (2022)
Gupta, A., Mendonca, R., Liu, Y., Abbeel, P., Levine, S.: Meta-reinforcement learning of structured exploration strategies. In: Advances in Neural Information Processing Systems, vol. 31. Curran Associates, Inc. (2018). https://proceedings.neurips.cc/paper/2018/hash/4de754248c196c85ee4fbdcee89179bd-Abstract.html
Stadie, B.C., et al.: Some considerations on learning to explore via meta-reinforcement learning. arXiv arXiv:1803.01118 (2018)
Rothfuss, J., Lee, D., Clavera, I., Asfour, T., Abbeel, P.: ProMP: proximal meta-policy search (2018). https://doi.org/10.48550/arXiv.1810.06784. http://arxiv.org/abs/1810.06784
Zintgraf, L., Shiarli, K., Kurin, V., Hofmann, K., Whiteson, S.: Fast context adaptation via meta-learning. In: Proceedings of the 36th International Conference on Machine Learning, pp. 7693–7702. PMLR (2018). ISSN 2640-3498. https://proceedings.mlr.press/v97/zintgraf19a.html
Vuorio, R., Beck, J., Farquhar, G., Foerster, J., Whiteson, S.: No dice: an investigation of the bias- variance tradeoff in meta-gradients (2022)
Mendonca, R., Gupta, A., Kralev, R., Abbeel, P., Levine, S., Finn, C.: Guided meta-policy search. In: Advances in Neural Information Processing Systems, vol. 32. Curran Associates, Inc. (2019). https://proceedings.neurips.cc/paper/2019/hash/d324a0cc02881779dcda44a675fdcaaa-Abstract.html
Finn, C., Abbeel, P., Levine, S.: Model-agnostic meta-learning for fast adaptation of deep networks, p. 10 (2017)
Korshunova, I., Degrave, J., Dambre, J., Gretton, A., Huszár, F.: Exchangeable models in meta reinforcement learning (2020)
Raileanu, R., Goldstein, M., Szlam, A., Fergus, R.: Fast adaptation via policy-dynamics value functions (2020). https://doi.org/10.48550/arXiv.2007.02879. http://arxiv.org/abs/2007.02879
He, J.Z.Y., Raghunathan, A., Brown, D.S., Erickson, Z., Dragan, A.D.: Learning representations that enable generalization in assistive tasks (2022). https://doi.org/10.48550/arXiv.2212.03175. https://arxiv.org/abs/2212.03175v1
Beck, J., Jackson, M.T., Vuorio, R., Whiteson, S.: Hypernetworks in meta-reinforcement learning (2022). https://doi.org/10.48550/arXiv.2210.11348. https://arxiv.org/abs/2210.11348v1
Duan, Y., Schulman, J., Chen, X., Bartlett, P.L., Sutskever, I., Abbeel, P.: RL\(^{2}\): fast reinforcement learning via slow reinforcement learning. arXiv arXiv:1611.02779 (2017)
Greenberg, I., Mannor, S., Chechik, G., Meirom, E.: Train hard, fight easy: robust meta reinforcement learning (2023). https://doi.org/10.48550/arXiv.2301.11147. http://arxiv.org/abs/2301.11147
Kaelbling, L.P., Littman, M.L., Cassandra, A.R.: Planning and acting in partially observable stochastic domains. Artif. Intell. 101(1), 99–134 (1998). https://doi.org/10.1016/S0004-3702(98)00023-X. https://www.sciencedirect.com/science/article/pii/S000437029800023X
Zintgraf, L., et al.: VariBAD: a very good method for Bayes-adaptive deep RL via meta-learning (2020). https://doi.org/10.48550/arXiv.1910.08348. https://arxiv.org/abs/1910.08348v2
Yu, T., et al.: Meta-world: a benchmark and evaluation for multi-task and meta reinforcement learning, p. 17 (2021)
Yang, R., Xu, H., Wu, Y., Wang, X.: Multi-task reinforcement learning with soft modularization. arXiv arXiv:2003.13661 (2020)
Li, L., Huang, Y., Chen, M., Luo, S., Luo, D., Huang, J.: Provably improved context-based offline meta-RL with attention and contrastive learning, p. 21 (2021)
Kingma, D.P., Welling, M.: Auto-encoding variational bayes (2022). https://doi.org/10.48550/arXiv.1312.6114. http://arxiv.org/abs/1312.6114
Alemi, A.A., Fischer, I., Dillon, J.V., Murphy, K.: Deep variational information bottleneck (2019). https://doi.org/10.48550/arXiv.1612.00410. http://arxiv.org/abs/1612.00410
Haarnoja, T., Zhou, A., Abbeel, P., Levine, S.: Soft actor-critic: off-policy maximum entropy deep reinforcement learning with a stochastic actor (2018). https://doi.org/10.48550/arXiv.1801.01290. http://arxiv.org/abs/1801.01290
Todorov, E., Erez, T., Tassa, Y.: MuJoCo: a physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033 (2012). ISSN 2153-0866. https://doi.org/10.1109/IROS.2012.6386109. https://ieeexplore.ieee.org/abstract/document/6386109
Acknowledgments
This work was supported by Beijing University of Posts and Telecommunications China Mobile Research Institute Joint Innovation Center.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2024 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.
About this paper
Cite this paper
Chen, J., Zhang, C., Hu, Z. (2024). Meta-Reinforcement Learning Algorithm Based on Reward and Dynamic Inference. In: Yang, DN., Xie, X., Tseng, V.S., Pei, J., Huang, JW., Lin, J.CW. (eds) Advances in Knowledge Discovery and Data Mining. PAKDD 2024. Lecture Notes in Computer Science(), vol 14647. Springer, Singapore. https://doi.org/10.1007/978-981-97-2259-4_17
Download citation
DOI: https://doi.org/10.1007/978-981-97-2259-4_17
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-97-2261-7
Online ISBN: 978-981-97-2259-4
eBook Packages: Computer ScienceComputer Science (R0)