Abstract
The application of deep reinforcement learning (DRL) algorithms in multi-agent environments has become more and more popular. However, most DRL algorithms do not solve the problem of group cooperation. Each agent explores in a direction that is beneficial to itself, but ignores the situation of its teammates, which is easy to fall into the local optimum. This paper aims to solve this problem in a multi-UAV confrontation scenario. We try to find the optimal cooperative policy by dividing UAVs into several groups and make UAVs learn to cooperate with teammates autonomously. Specifically, we propose an algorithm called group-based actor-critic (GBAC). We group UAVs by setting the observation radius, and we use a double Q network to process rewards. We divide rewards into individual rewards and group rewards. The Q network is used to process individual rewards, and the group-Q network is used to process group rewards. As a result, UAVs can get higher rewards through group cooperation. The performance of UAVs trained by our method outperforms other DRL methods. In this paper, we use the group-based DRL method to solve the problem of group cooperation and maximize the expected return in multi-UAV confrontation.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Schulman, J., Levine, S., Abbeel, P., Jordan, M., Moritz, P.: Trust region policy optimization. In: International Conference on Machine Learning. PMLR, pp. 1889–1897 (2015)
Silver, D., et al.: Mastering the game of go without human knowledge. Nature 550(7676), 354–359 (2017)
Mnih, V., et al.: Human-level control through deep reinforcement learning. Nature 518(7540), 529–533 (2015)
Lillicrap, T., et al.: Continuous control with deep reinforcement learning. CoRR abs/1509.02971 (2016)
Lowe, R., Wu, Y., Tamar, A., Harb, J., Abbeel, P., Mordatch, I.: Multi-agent actor-critic for mixed cooperative-competitive environments. In: NIPS (2017)
Tan, M., Multi-agent reinforcement learning: independent versus cooperative agents. In: ICML (1993)
Witten, I.H.: An adaptive optimal controller for discrete-time Markov environments. Inf. Control 34(4), 286–295 (1977)
Barto, A., Sutton, R., Anderson, C.: Neuronlike adaptive elements that can solve difficult learning control problems. IEEE Trans. Syst. Man Cybern. SMC-13, 834–846 (1983)
Sutton, R., McAllester, D.A., Singh, S., Mansour, Y.: Policy gradient methods for reinforcement learning with function approximation. In: NIPS (1999)
Silver, D., Lever, G., Heess, N., Degris, T., Wierstra, D., Riedmiller, M.A.: Deterministic policy gradient algorithms. In: ICML (2014)
Mnih, V., et al.: Asynchronous methods for deep reinforcement learning. In: International Conference on Machine Learning. PMLR, pp. 1928–1937 (2016)
Haarnoja, T., Zhou, A., Abbeel, P., Levine, S.: Soft actor-critic: off-policy maximum entropy deep reinforcement learning with a stochastic actor. In: International Conference on Machine Learning. PMLR, pp. 1861–1870 (2018)
Acknowledgments
This paper is supported by the World Park of Digital Economy EVONature Foundation and the national Science Foundation of China, under the project No. 61472476.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2021 Springer Nature Switzerland AG
About this paper
Cite this paper
Li, S., Wang, B., Xie, T. (2021). Group-Based Deep Reinforcement Learning in Multi-UAV Confrontation. In: Mantoro, T., Lee, M., Ayu, M.A., Wong, K.W., Hidayanto, A.N. (eds) Neural Information Processing. ICONIP 2021. Communications in Computer and Information Science, vol 1516. Springer, Cham. https://doi.org/10.1007/978-3-030-92307-5_72
Download citation
DOI: https://doi.org/10.1007/978-3-030-92307-5_72
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-92306-8
Online ISBN: 978-3-030-92307-5
eBook Packages: Computer ScienceComputer Science (R0)