ABSTRACT
Deep learning empowerment enables traditional reinforcement learning methods to deal with complex problems with large state space. However, one of the disadvantages of the deep reinforcement learning method is that the trained agent has limited exploration ability, requires a large amount of computing resources, and may fall into local optimization. In this paper, imitation learning is added into the training process of agents to increase the exploration ability of multiple deep reinforcement learning agents. Experiments are carried out in the multi-agent environment of Overcook. The results show that the performance of multiple agents in cooperative learning with expert strategy is significantly improved. This suggests that using expert-assisted training can help reinforcement learning agents solve complex problems.
- SUTTON R S, BARTO A G. Reinforcement learning: an introduction[M]. Second edition. Cambridge, Massachusetts: The MIT Press, 2018.Google Scholar
- WU S A, WANG R E, EVANS J A, et.. Too Many Cooks: Bayesian Inference for Coordinating Multi‐Agent Collaboration[J/OL]. Topics in Cognitive Science, 2021, 13(2): 414-432. DOI:10.1111/tops.12525.Google ScholarCross Ref
- HU H, LERER A, PEYSAKHOVICH A, et.. “Other-Play” for Zero-Shot Coordination[J/OL]. arXiv:2003.02979 [cs], 2021[2022-01-24]. http://arxiv.org/abs/2003.02979.Google Scholar
- SONG Y, WANG J, LUKASIEWICZ T, et.. Diversity-driven extensible hierarchical reinforcement learning [C]//Proceedings of the AAAI conference on artificial intelligence: volume 33. 2019: 4992-4999.Google Scholar
- YANG Y, WANG J. An Overview of Multi-Agent Reinforcement Learning from Game Theoretical Perspective [J/OL]. arXiv:2011.00583 [cs], 2021 [2022-01-25]. http://arxiv.org/abs/2011.00583.Google Scholar
- CHARAKORN R, MANOONPONG P, DILOKTHANAKUL N. Investigating partner diversification methods in cooperative multi-agent deep reinforcement learning[C]//International Conference on Neural Information Processing. Springer, 2020: 395-402.Google Scholar
- Zhao Yunbo, Kang Yu, Zhu Jin. Theory and methods of autonomy for human-computer hybrid intelligent systems [M]. Science Press, 2021.Google Scholar
- FONTAINE M C, HSU Y C, ZHANG Y, et.. On the Importance of Environments in Human-Robot Coordination [J/OL]. arXiv:2106.10853 [cs], 2021[2022-02-04]. http://arxiv.org/abs/2106.10853.Google Scholar
- KNOTT P, CARROLL M, DEVLIN S, et.. Evaluating the Robustness of Collaborative Agents[J/OL]. arXiv:2101.05507 [cs], 2021[2022-01-21]. http://arxiv.org/abs/2101.05507.Google Scholar
- BACON P L, HARB J, PRECUP D. The Option-Critic Architecture[J/OL]. Proceedings of the AAAI Conference on Artificial Intelligence, 2017, 31(1)[2022-02-19]. https://ojs.aaai.org/index.php/AAAI/article/view/10916.Google ScholarCross Ref
- DIETTERICH T G. Hierarchical reinforcement learning with the MAXQ value function decomposition[J]. Journal of artificial intelligence research, 2000, 13: 227-303.Google Scholar
- CARROLL M, SHAH R, HO M K, et.. On the Utility of Learning about Humans for Human-AI Coordination[C/OL]//Advances in Neural Information Processing Systems: volume 32. Curran Associates, Inc., 2019[2022-01-25]. https://proceedings.neurips.cc/paper/2019/hash/f5b1b89d98b7286673128a5fb112cb9a-Abstract.html.Google Scholar
- LOWE R, WU Y, TAMAR A, et.. Multi-Agent Actor-Critic for Mixed Cooperative-Competitive Environments[J/OL]. arXiv:1706.02275 [cs], 2020[2022-02-18]. http://arxiv.org/abs/1706.02275.Google Scholar
- HEESS N, WAYNE G, TASSA Y, et.. Learning and Transfer of Modulated Locomotor Controllers[J/OL]. arXiv:1610.05182 [cs], 2016[2022-02-19]. http://arxiv.org/abs/1610.05182.Google Scholar
- SRINIVAS A, KRISHNAMURTHY R, KUMAR P, et.. Option discovery in hierarchical reinforcement learning using spatio-temporal clustering[J]. arXiv preprint arXiv:1605.05359, 2016.Google Scholar
- ZHAO R, SONG J, HAIFENG H, et.. Maximum Entropy Population Based Training for Zero-Shot Human-AI Coordination[J/OL]. arXiv:2112.11701 [cs], 2021[2022-02-04]. http://arxiv.org/abs/2112.11701.Google Scholar
- SARKAR B, TALATI A, SHIH A, et.. PantheonRL: A MARL Library for Dynamic Training Interactions[J/OL]. arXiv:2112.07013 [cs], 2021[2022-02-04]. http://arxiv.org/abs/2112.07013.Google Scholar
- ŞIMŞEK Ö, BARTO A G. Using relative novelty to identify useful temporal abstractions in reinforcement learning[C/OL]//Proceedings of the twenty-first international conference on Machine learning. New York, NY, USA: Association for Computing Machinery, 2004: 95[2022-02-19]. https://doi.org/10.1145/1015330.1015353. DOI:10.1145/1015330.1015353.Google ScholarDigital Library
Index Terms
- Research on the application of multi-intelligence collaborative method based on imitation learning
Recommendations
Embodied imitation-enhanced reinforcement learning in multi-agent systems
Imitation is an example of social learning in which an individual observes and copies another's actions. This paper presents a new method for using imitation as a way of enhancing the learning speed of individual agents that employ a well-known ...
Self-Imitation Advantage Learning
AAMAS '21: Proceedings of the 20th International Conference on Autonomous Agents and MultiAgent SystemsSelf-imitation learning is a Reinforcement Learning (RL) method that encourages actions whose returns were higher than expected, which helps in hard exploration and sparse reward problems. It was shown to improve the performance of on-policy actor-...
Effective Integration of Imitation Learning and Reinforcement Learning by Generating Internal Reward
ISDA '08: Proceedings of the 2008 Eighth International Conference on Intelligent Systems Design and Applications - Volume 03This paper describes an integrative machine learning architecture of imitation learning and reinforcement learning. The learning architecture aims to help integration of the two learning process by generating internal rewards. After observing superiors, ...
Comments