MAML2: meta reinforcement learning via meta-learning for task categories

Fu, Qiming; Wang, Zhechao; Fang, Nengwei; Xing, Bin; Zhang, Xiao; Chen, Jianping

doi:10.1007/s11704-022-2037-1

MAML²: meta reinforcement learning via meta-learning for task categories

Research Article
Published: 12 December 2022

Volume 17, article number 174325, (2023)
Cite this article

Frontiers of Computer Science Aims and scope Submit manuscript

Qiming Fu¹^na1,
Zhechao Wang¹^na1,
Nengwei Fang²,
Bin Xing²,
Xiao Zhang³ &
…
Jianping Chen⁴

265 Accesses
12 Citations
1 Altmetric
Explore all metrics

Abstract

Meta-learning has been widely applied to solving few-shot reinforcement learning problems, where we hope to obtain an agent that can learn quickly in a new task. However, these algorithms often ignore some isolated tasks in pursuit of the average performance, which may result in negative adaptation in these isolated tasks, and they usually need sufficient learning in a stationary task distribution. In this paper, our algorithm presents a hierarchical framework of double meta-learning, and the whole framework includes classification, meta-learning, and re-adaptation. Firstly, in the classification process, we classify tasks into several task subsets, considered as some categories of tasks, by learned parameters of each task, which can separate out some isolated tasks thereafter. Secondly, in the meta-learning process, we learn category parameters in all subsets via meta-learning. Simultaneously, based on the gradient of each category parameter in each subset, we use meta-learning again to learn a new meta-parameter related to the whole task set, which can be used as an initial parameter for the new task. Finally, in the re-adaption process, we adapt the parameter of the new task with two steps, by the meta-parameter and the appropriate category parameter successively. Experimentally, we demonstrate our algorithm prevents the agent from negative adaptation without losing the average performance for the whole task set. Additionally, our algorithm presents a more rapid adaptation process within re-adaptation. Moreover, we show the good performance of our algorithm with fewer samples as the agent is exposed to an online meta-learning setting.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Metalearning

Automatic test-time adaptation for heterogeneous contexts in meta-learning

Article 09 April 2025

Efficient hyperparameters optimization through model-based reinforcement learning with experience exploiting and meta-learning

Article 09 April 2023

Discover the latest articles and news from researchers in related subjects, suggested using machine learning.

References

Santoro A, Bartunov S, Botvinick M, Wierstra D, Lillicrap T. Meta-learning with memory-augmented neural networks. In: Proceedings of the 33rd International Conference on Machine Learning. 2016, 1842–1850
Hochreiter S, Younger A S, Conwell P R. Learning to learn using gradient descent. In: Proceedings of the International Conference on Artificial Neural Networks. 2001, 87–94
Schmidhuber J. Learning to control fast-weight memories: an alternative to dynamic recurrent networks. Neural Computation, 1992, 4(1): 131–139
Article Google Scholar
Fakoor R, Chaudhari P, Soatto S, Smola A J. Meta-Q-learning. In: Proceedings of the 8th International Conference on Learning Representations. 2020
Wang J X, Kurth-Nelson Z, Tirumala D, Soyer H, Leibo J Z, Munos R, Blundell C, Kumaran D, Botvinick M. Learning to reinforcement learn. 2016, arXiv preprint arXiv: 1611.05763
Finn C, Abbeel P, Levine S. Model-agnostic meta-learning for fast adaptation of deep networks. In: Proceedings of the 34th International Conference on Machine Learning. 2017, 1126–1135
Deleu T, Bengio Y. The effects of negative adaptation in Model-Agnostic Meta-Learning. In: Proceedings of the 2nd Workshop on Meta-Learning. 2018
Lecarpentier E, Abel D, Asadi K, Jinnai Y, Rachelson E, Littman M L. Lipschitz lifelong reinforcement learning. In: Proceedings of the 35th AAAI Conference on Artificial Intelligence. 2021, 8270–8278
Finn C, Rajeswaran A, Kakade S, Levine S. Online meta-learning. In: Proceedings of the 36th International Conference on Machine Learning. 2019, 1920–1930
Nguyen T, Luu T, Pham T, Rakhimkul S, Yoo C D. Robust MAML: prioritization task buffer with adaptive learning process for model-agnostic meta-learning. In: Proceedings of 2021 IEEE International Conference on Acoustics, Speech and Signal Processing. 2021, 3460–3464
Clavera I, Rothfuss J, Schulman J, Fujita Y, Asfour T, Abbeel P. Model-based reinforcement learning via meta-policy optimization. In: Proceedings of the 2nd Conference on Robot Learning. 2018, 617–629
Seo Y, Lee K, Clavera I, Kurutach T, Shin J, Abbeel P. Trajectory-wise multiple choice learning for dynamics generalization in reinforcement learning. In: Proceedings of the 34th International Conference on Neural Information Processing Systems. 2020, 12968–12979
Kirsch L, van Steenkiste S, Schmidhuber J. Improving generalization in meta reinforcement learning using learned objectives. In: Proceedings of the 8th International Conference on Learning Representations. 2020
Sohn S, Woo H, Choi J, Lee H. Meta reinforcement learning with autonomous inference of subtask dependencies. In: Proceedings of the 8th International Conference on Learning Representations. 2020
Guzman-Rivera A, Kohli P, Batra D, Rutenbar R A. Efficiently enforcing diversity in multi-output structured prediction. In: Proceedings of the 17th International Conference on Artificial Intelligence and Statistics. 2014, 284–292
Guzmán-Rivera A, Batra D, Kohli P. Multiple choice learning: learning to produce multiple structured outputs. In: Proceedings of the 25th International Conference on Neural Information Processing Systems. 2012, 1799–1807
Silver D L, Yang Q, Li L. Lifelong machine learning systems: beyond learning algorithms. In: Proceedings of 2013 AAAI Spring Symposium Series. 2013, 49–55
Brunskill E, Li L. PAC-inspired option discovery in lifelong reinforcement learning. In: Proceedings of the 31st International Conference on Machine Learning. 2014, 316–324
Brafman R I, Tennenholtz M. R-max — a general polynomial time algorithm for near-optimal reinforcement learning. In: Proceedings of the 17th International Joint Conference on Artificial Intelligence. 2001, 953–958
Abel D, Jinnai Y, Guo S Y, Konidaris G, Littman M L. Policy and value transfer in lifelong reinforcement learning. In: Proceedings of the 35th International Conference on Machine Learning. 2018, 20–29
Nichol A, Achiam J, Schulman J. On first-order meta-learning algorithms. 2018, arXiv preprint arXiv: 1803.02999
Wang H, Dong S, Shao L. Measuring structural similarities in finite MDPs. In: Proceedings of the 28th International Joint Conference on Artificial Intelligence. 2019, 3684–3690
Song J, Gao Y, Wang H, An B. Measuring the distance between finite Markov decision processes. In: Proceedings of 2016 International Conference on Autonomous Agents and Multiagent Systems. 2016, 468–476
Hu Y, Gao Y, An B. Learning in multi-agent systems with sparse interactions by knowledge transfer and game abstraction. In: Proceedings of 2015 International Conference on Autonomous Agents and Multiagent Systems. 2015, 753–761
Fu Q M, Liu Q, You S H, Huang W, Zhang X F. A novel fast Sarsa algorithm based on value function transfer. Acta Electronica Sinica, 2014, 42(11): 2157–2161
Google Scholar
Sutton R S, Barto A G. Reinforcement Learning: An Introduction. 2nd ed. Cambridge: MIT Press, 2018
Google Scholar
Duan Y, Schulman J, Chen X, Bartlett P L, Sutskever I, Abbeel P. RL²: fast reinforcement learning via slow reinforcement learning. 2016, arXiv preprint arXiv: 1611.02779
Rakelly K, Zhou A, Finn C, Levine S, Quillen D. Efficient off-policy meta-reinforcement learning via probabilistic context variables. In: Proceedings of the 36th International Conference on Machine Learning. 2019, 5331–5340
Haarnoja T, Zhou A, Abbeel P, Levine S. Soft actor-critic: off-policy maximum entropy deep reinforcement learning with a stochastic actor. In: Proceedings of the 35th International Conference on Machine Learning. 2018, 1861–1870

Download references

Acknowledgements

This work was financially supported by the National Key R&D Program of China (2020YFC2006602), the National Natural Science Foundation of China (Grant Nos. 62072324, 61876217, 61876121, 61772357), University Natural Science Foundation of Jiangsu Province (No. 21KJA520005), Primary Research and Development Plan of Jiangsu Province (BE2020026), Natural Science Foundation of Jiangsu Province (BK20190942), and Postgraduate Research & Practice Innovation Program of Jiangsu Province (No. KYCX21_3020).

Author information

These authors contributed equally to this work.

Authors and Affiliations

School of Electronics and Information Engineering, Suzhou University of Science and Technology, Suzhou, 215009, China
Qiming Fu & Zhechao Wang
Chongqing Industrial Big Data Innovation Center Co. Ltd., Chongqing, 404100, China
Nengwei Fang & Bin Xing
School of Medical Informatics, Xuzhou Medical University, Xuzhou, 221004, China
Xiao Zhang
School of Architecture and Urban Planning, Suzhou University of Science and Technology, Suzhou, 215009, China
Jianping Chen

Authors

Qiming Fu
View author publications
Search author on:PubMed Google Scholar
Zhechao Wang
View author publications
Search author on:PubMed Google Scholar
Nengwei Fang
View author publications
Search author on:PubMed Google Scholar
Bin Xing
View author publications
Search author on:PubMed Google Scholar
Xiao Zhang
View author publications
Search author on:PubMed Google Scholar
Jianping Chen
View author publications
Search author on:PubMed Google Scholar

Corresponding author

Correspondence to Jianping Chen.

Additional information

Qiming Fu received his PhD degree from the School of Computer Science and Technology of Soochow University, China in 2014. He is an associate professor at the School of Electronics and Information Engineering, Suzhou University of Science and Technology, China. His research interests include reinforcement learning, deep learning, and Intelligent Information Processing.

Zhechao Wang is currently working towards his master’s degree at the School of Electronics and Information Engineering, Suzhou University of Science and Technology, China. His research interests lie in reinforcement learning, including meta-learning and transfer learning.

Nengwei Fang graduated from the Department of Mechanical Engineering, Tsinghua University, China and achieved a Master’s degree in Materials Science and Engineering. His research interests include applications in the area of industrial big-data. He is currently the Chairman of the Board of Directors in Chongqing Innovation Center of Industrial Big-Data Co. Ltd.

Bin Xing is Chief scientist of National Engineering Laboratory of Industrial Big-Data Application Technology. He worked from 1998 to 2017, in DASSULT Group and ATOS ORIGIN in France, as Senior data consultant, Data analysis scientist. Visiting Professor, University of Electronic Science and Technology of China. He graduated with a PhD, from the Central Polytechnic University of Paris (Ecole Centrale de Paris), France, in Data analysis and processing major in 1998. His main research directions and contents include industrial big data analysis, key technologies of intelligent manufacturing, industrial mechanism models, and AI application.

Xiao Zhang received his PhD degree from the Department of Computer Science and Electrical Engineering of University of Wisconsin-Milwaukee, USA. He is a professor and the dean at the School of Medical Informatics, Xuzhou Medical University, China. His research interests include precision medicine, artificial intelligence and machine learning in healthcare and medical sector.

Jianping Chen received the Bachelor’s degree from Southeast University, China, the Master’s degree from New Jersey Institute of Technology, USA, the MBA from Washington University in St. Louis, USA, and the Doctor’s degree from University of Nice Sophia Antipolis, France. He is currently a professor at Suzhou University of Science and Technology, a director of the Jiangsu Province Key Laboratory of Intelligent Building Energy Efficiency and a director of the Suzhou City Key Laboratory of Mobile Networking and Applied Technology. His research interests include big data and analytics, building energy efficiency, and machine learning.

Electronic supplementary material

MAML²: meta reinforcement learning via meta-learning for task categories

Rights and permissions

Reprints and permissions

About this article

Cite this article

Fu, Q., Wang, Z., Fang, N. et al. MAML²: meta reinforcement learning via meta-learning for task categories. Front. Comput. Sci. 17, 174325 (2023). https://doi.org/10.1007/s11704-022-2037-1

Download citation

Received: 20 January 2022
Accepted: 14 June 2022
Published: 12 December 2022
DOI: https://doi.org/10.1007/s11704-022-2037-1

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

MAML2: meta reinforcement learning via meta-learning for task categories

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Metalearning

Automatic test-time adaptation for heterogeneous contexts in meta-learning

Efficient hyperparameters optimization through model-based reinforcement learning with experience exploiting and meta-learning

Explore related subjects

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Electronic supplementary material