Skip to main content
Log in

MAML2: meta reinforcement learning via meta-learning for task categories

  • Research Article
  • Published:
Frontiers of Computer Science Aims and scope Submit manuscript

Abstract

Meta-learning has been widely applied to solving few-shot reinforcement learning problems, where we hope to obtain an agent that can learn quickly in a new task. However, these algorithms often ignore some isolated tasks in pursuit of the average performance, which may result in negative adaptation in these isolated tasks, and they usually need sufficient learning in a stationary task distribution. In this paper, our algorithm presents a hierarchical framework of double meta-learning, and the whole framework includes classification, meta-learning, and re-adaptation. Firstly, in the classification process, we classify tasks into several task subsets, considered as some categories of tasks, by learned parameters of each task, which can separate out some isolated tasks thereafter. Secondly, in the meta-learning process, we learn category parameters in all subsets via meta-learning. Simultaneously, based on the gradient of each category parameter in each subset, we use meta-learning again to learn a new meta-parameter related to the whole task set, which can be used as an initial parameter for the new task. Finally, in the re-adaption process, we adapt the parameter of the new task with two steps, by the meta-parameter and the appropriate category parameter successively. Experimentally, we demonstrate our algorithm prevents the agent from negative adaptation without losing the average performance for the whole task set. Additionally, our algorithm presents a more rapid adaptation process within re-adaptation. Moreover, we show the good performance of our algorithm with fewer samples as the agent is exposed to an online meta-learning setting.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. Santoro A, Bartunov S, Botvinick M, Wierstra D, Lillicrap T. Meta-learning with memory-augmented neural networks. In: Proceedings of the 33rd International Conference on Machine Learning. 2016, 1842–1850

  2. Hochreiter S, Younger A S, Conwell P R. Learning to learn using gradient descent. In: Proceedings of the International Conference on Artificial Neural Networks. 2001, 87–94

  3. Schmidhuber J. Learning to control fast-weight memories: an alternative to dynamic recurrent networks. Neural Computation, 1992, 4(1): 131–139

    Article  Google Scholar 

  4. Fakoor R, Chaudhari P, Soatto S, Smola A J. Meta-Q-learning. In: Proceedings of the 8th International Conference on Learning Representations. 2020

  5. Wang J X, Kurth-Nelson Z, Tirumala D, Soyer H, Leibo J Z, Munos R, Blundell C, Kumaran D, Botvinick M. Learning to reinforcement learn. 2016, arXiv preprint arXiv: 1611.05763

  6. Finn C, Abbeel P, Levine S. Model-agnostic meta-learning for fast adaptation of deep networks. In: Proceedings of the 34th International Conference on Machine Learning. 2017, 1126–1135

  7. Deleu T, Bengio Y. The effects of negative adaptation in Model-Agnostic Meta-Learning. In: Proceedings of the 2nd Workshop on Meta-Learning. 2018

  8. Lecarpentier E, Abel D, Asadi K, Jinnai Y, Rachelson E, Littman M L. Lipschitz lifelong reinforcement learning. In: Proceedings of the 35th AAAI Conference on Artificial Intelligence. 2021, 8270–8278

  9. Finn C, Rajeswaran A, Kakade S, Levine S. Online meta-learning. In: Proceedings of the 36th International Conference on Machine Learning. 2019, 1920–1930

  10. Nguyen T, Luu T, Pham T, Rakhimkul S, Yoo C D. Robust MAML: prioritization task buffer with adaptive learning process for model-agnostic meta-learning. In: Proceedings of 2021 IEEE International Conference on Acoustics, Speech and Signal Processing. 2021, 3460–3464

  11. Clavera I, Rothfuss J, Schulman J, Fujita Y, Asfour T, Abbeel P. Model-based reinforcement learning via meta-policy optimization. In: Proceedings of the 2nd Conference on Robot Learning. 2018, 617–629

  12. Seo Y, Lee K, Clavera I, Kurutach T, Shin J, Abbeel P. Trajectory-wise multiple choice learning for dynamics generalization in reinforcement learning. In: Proceedings of the 34th International Conference on Neural Information Processing Systems. 2020, 12968–12979

  13. Kirsch L, van Steenkiste S, Schmidhuber J. Improving generalization in meta reinforcement learning using learned objectives. In: Proceedings of the 8th International Conference on Learning Representations. 2020

  14. Sohn S, Woo H, Choi J, Lee H. Meta reinforcement learning with autonomous inference of subtask dependencies. In: Proceedings of the 8th International Conference on Learning Representations. 2020

  15. Guzman-Rivera A, Kohli P, Batra D, Rutenbar R A. Efficiently enforcing diversity in multi-output structured prediction. In: Proceedings of the 17th International Conference on Artificial Intelligence and Statistics. 2014, 284–292

  16. Guzmán-Rivera A, Batra D, Kohli P. Multiple choice learning: learning to produce multiple structured outputs. In: Proceedings of the 25th International Conference on Neural Information Processing Systems. 2012, 1799–1807

  17. Silver D L, Yang Q, Li L. Lifelong machine learning systems: beyond learning algorithms. In: Proceedings of 2013 AAAI Spring Symposium Series. 2013, 49–55

  18. Brunskill E, Li L. PAC-inspired option discovery in lifelong reinforcement learning. In: Proceedings of the 31st International Conference on Machine Learning. 2014, 316–324

  19. Brafman R I, Tennenholtz M. R-max — a general polynomial time algorithm for near-optimal reinforcement learning. In: Proceedings of the 17th International Joint Conference on Artificial Intelligence. 2001, 953–958

  20. Abel D, Jinnai Y, Guo S Y, Konidaris G, Littman M L. Policy and value transfer in lifelong reinforcement learning. In: Proceedings of the 35th International Conference on Machine Learning. 2018, 20–29

  21. Nichol A, Achiam J, Schulman J. On first-order meta-learning algorithms. 2018, arXiv preprint arXiv: 1803.02999

  22. Wang H, Dong S, Shao L. Measuring structural similarities in finite MDPs. In: Proceedings of the 28th International Joint Conference on Artificial Intelligence. 2019, 3684–3690

  23. Song J, Gao Y, Wang H, An B. Measuring the distance between finite Markov decision processes. In: Proceedings of 2016 International Conference on Autonomous Agents and Multiagent Systems. 2016, 468–476

  24. Hu Y, Gao Y, An B. Learning in multi-agent systems with sparse interactions by knowledge transfer and game abstraction. In: Proceedings of 2015 International Conference on Autonomous Agents and Multiagent Systems. 2015, 753–761

  25. Fu Q M, Liu Q, You S H, Huang W, Zhang X F. A novel fast Sarsa algorithm based on value function transfer. Acta Electronica Sinica, 2014, 42(11): 2157–2161

    Google Scholar 

  26. Sutton R S, Barto A G. Reinforcement Learning: An Introduction. 2nd ed. Cambridge: MIT Press, 2018

    MATH  Google Scholar 

  27. Duan Y, Schulman J, Chen X, Bartlett P L, Sutskever I, Abbeel P. RL2: fast reinforcement learning via slow reinforcement learning. 2016, arXiv preprint arXiv: 1611.02779

  28. Rakelly K, Zhou A, Finn C, Levine S, Quillen D. Efficient off-policy meta-reinforcement learning via probabilistic context variables. In: Proceedings of the 36th International Conference on Machine Learning. 2019, 5331–5340

  29. Haarnoja T, Zhou A, Abbeel P, Levine S. Soft actor-critic: off-policy maximum entropy deep reinforcement learning with a stochastic actor. In: Proceedings of the 35th International Conference on Machine Learning. 2018, 1861–1870

Download references

Acknowledgements

This work was financially supported by the National Key R&D Program of China (2020YFC2006602), the National Natural Science Foundation of China (Grant Nos. 62072324, 61876217, 61876121, 61772357), University Natural Science Foundation of Jiangsu Province (No. 21KJA520005), Primary Research and Development Plan of Jiangsu Province (BE2020026), Natural Science Foundation of Jiangsu Province (BK20190942), and Postgraduate Research & Practice Innovation Program of Jiangsu Province (No. KYCX21_3020).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Jianping Chen.

Additional information

Qiming Fu received his PhD degree from the School of Computer Science and Technology of Soochow University, China in 2014. He is an associate professor at the School of Electronics and Information Engineering, Suzhou University of Science and Technology, China. His research interests include reinforcement learning, deep learning, and Intelligent Information Processing.

Zhechao Wang is currently working towards his master’s degree at the School of Electronics and Information Engineering, Suzhou University of Science and Technology, China. His research interests lie in reinforcement learning, including meta-learning and transfer learning.

Nengwei Fang graduated from the Department of Mechanical Engineering, Tsinghua University, China and achieved a Master’s degree in Materials Science and Engineering. His research interests include applications in the area of industrial big-data. He is currently the Chairman of the Board of Directors in Chongqing Innovation Center of Industrial Big-Data Co. Ltd.

Bin Xing is Chief scientist of National Engineering Laboratory of Industrial Big-Data Application Technology. He worked from 1998 to 2017, in DASSULT Group and ATOS ORIGIN in France, as Senior data consultant, Data analysis scientist. Visiting Professor, University of Electronic Science and Technology of China. He graduated with a PhD, from the Central Polytechnic University of Paris (Ecole Centrale de Paris), France, in Data analysis and processing major in 1998. His main research directions and contents include industrial big data analysis, key technologies of intelligent manufacturing, industrial mechanism models, and AI application.

Xiao Zhang received his PhD degree from the Department of Computer Science and Electrical Engineering of University of Wisconsin-Milwaukee, USA. He is a professor and the dean at the School of Medical Informatics, Xuzhou Medical University, China. His research interests include precision medicine, artificial intelligence and machine learning in healthcare and medical sector.

Jianping Chen received the Bachelor’s degree from Southeast University, China, the Master’s degree from New Jersey Institute of Technology, USA, the MBA from Washington University in St. Louis, USA, and the Doctor’s degree from University of Nice Sophia Antipolis, France. He is currently a professor at Suzhou University of Science and Technology, a director of the Jiangsu Province Key Laboratory of Intelligent Building Energy Efficiency and a director of the Suzhou City Key Laboratory of Mobile Networking and Applied Technology. His research interests include big data and analytics, building energy efficiency, and machine learning.

Electronic supplementary material

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Fu, Q., Wang, Z., Fang, N. et al. MAML2: meta reinforcement learning via meta-learning for task categories. Front. Comput. Sci. 17, 174325 (2023). https://doi.org/10.1007/s11704-022-2037-1

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s11704-022-2037-1

Keywords

Navigation