Abstract
Finding the optimal game strategy is a difficult problem in unmanned aerial vehicle (UAV) swarm confrontation. As an effective solution to the sequential decision-making problem, multi-agent reinforcement learning (MARL) provides a promising way to realize intelligent countermeasures. However, there are two challenges in applying MARL to large-scale UAV swarm confrontation: i) the curse of dimensionality caused by the excessive scale of UAV clusters and ii) the generalization problem caused by the dynamically changing UAV cluster size. To address these problems, we propose a novel MARL paradigm, called Weighted Mean Field Reinforcement Learning, where the pairwise communication between any UAV and its neighbors is modeled as that between a central UAV and the virtual UAV, which is abstracted from the weighted mean effect of neighboring UAVs. This approach reduces the multi-agent problem to a two-agent problem, which can reduce the input dimension of the agent and adapt to the changing cluster size. The communication content between UAVs includes actions and local observations. Actions can enhance the cooperation between UAVs and alleviate the non-stationarity of the environment, while local observations can expand the perception range of the central UAV so that it can obtain more useful information about the environment. The attention mechanism is leveraged to enable UAVs to select more valuable information flexibly, making our method more scalable than other algorithms. Combining this paradigm with double Q-learning and actor-critic algorithms, we propose weighted mean field Q-learning (WMFQ) and weighted mean field actor-critic (WMFAC) algorithms. Experiments on our constructed UAV swarm confrontation environment verify the effectiveness and scalability of our algorithms.
Similar content being viewed by others
Data Availability
The datasets generated during and/or analysed during the current study are available from the corresponding author on reasonable request.
References
Zhou L, Leng S, Liu Q, Wang Q (2022) Intelligent uav swarm cooperation for multiple targets tracking. IEEE Internet Things J 9(1):743–754. https://doi.org/10.1109/JIOT.2021.3085673
Sun Z, Piao H, Yang Z, Zhao Y, Zhan G, Zhou D, Meng G, Chen H, Chen X, Qu B et al (2021) Multi-agent hierarchical policy gradient for air combat tactics emergence via self-play. Eng Appl Artif Intell 98:104112. https://doi.org/10.1016/j.engappai.2020.104112
Mnih V, Kavukcuoglu K, Silver D, Rusu AA, Veness J, Bellemare MG, Graves A, Riedmiller M, Fidjeland AK (2015) Ostrovski, G., others : Human-level control through deep reinforcement learning. Nature 518(7540):529–533. https://doi.org/10.1038/nature14236
Mnih V, Badia AP, Mirza M, Graves A, Lillicrap T, Harley T, Silver D, Kavukcuoglu K (2016) Asynchronous methods for deep reinforcement learning. In: Proceedings of The 33rd international conference on machine learning, vol 48. PMLR, pp 1928–1937
Lowe R, Wu Y, Tamar A, Harb J, Abbeel P, Mordatch I (2017) Multi-agent actor-critic for mixed cooperative-competitive environments. In: Proceedings of the 31st international conference on neural information processing systems, vol 30. MIT Press, pp 6382–6393
Tampuu A, Matiisen T, Kodelja D, Kuzovkin I, Korjus K, Aru J, Aru J, Vicente R (2017) Multiagent cooperation and competition with deep reinforcement learning. PLoS ONE 12(4):0172395. https://doi.org/10.1371/journal.pone.0172395
Gupta JK, Egorov M, Kochenderfer M (2017) Cooperative multi-agent control using deep reinforcement learning. In: International conference on autonomous agents and multiagent systems, vol 10642. Springer, pp 66–83. https://doi.org/10.1007/978-3-319-71682-4_5
Yang Y, Luo R, Li M, Zhou M, Zhang W, Wang J (2018) Mean field multi-agent reinforcement learning. In: International conference on machine learning, vol 80. PMLR, pp 5571–5580
Hasselt HV, Guez A, Silver D (2016) Deep reinforcement learning with double q-learning. In: Proceedings of the Thirtieth AAAI conference on artificial intelligence, vol 30. AAAI Press, pp 2094–2100
Konda VR, Tsitsiklis JN (2000) Actor-critic algorithms. In: Advances in neural information processing systems, vol 12. MIT press, pp 1008–1014. https://doi.org/10.1137/S0363012901385691
Shao S, Peng Y, He C, Du Y (2020) Efficient path planning for uav formation via comprehensively improved particle swarm optimization. ISA Trans 97:415–430. https://doi.org/10.1016/j.isatra.2019.08.018
He W, qi X, Liu L (2021) A novel hybrid particle swarm optimization for multi-uav cooperate path planning. Appl Intell 51:7350–7364. https://doi.org/10.1007/s10489-020-02082-8
Xu C, Xu M, Yin C (2020) Optimized multi-uav cooperative path planning under the complex confrontation environment. Comput Commun 162:196–203. https://doi.org/10.1016/j.comcom.2020.04.050
Qiu H, Duan H (2020) A multi-objective pigeon-inspired optimization approach to uav distributed flocking among obstacles. Inf Sci 509:515–529. https://doi.org/10.1016/j.ins.2018.06.061
Luo L, Wang X, Ma J, Ong Y-S (2021) Grpavoid: Multigroup collision-avoidance control and optimization for uav swarm. IEEE Trans Cybern, 1–14. https://doi.org/10.1109/TCYB.2021.3132044
Wu X, Chen H, Chen C, Zhong M, Xie S, Guo Y, Fujita H (2020) The autonomous navigation and obstacle avoidance for usvs with anoa deep reinforcement learning method. Knowl-Based Syst 196:105201. https://doi.org/10.1016/j.knosys.2019.105201
Yan C, Wang C, Xiang X, Lan Z, Jiang Y (2022) Deep reinforcement learning of collision-free flocking policies for multiple fixed-wing uavs using local situation maps. IEEE Trans on Industr Inform 18(2):1260–1270. https://doi.org/10.1109/TII.2021.3094207
Vinyals O, Babuschkin I, Czarnecki WM, Mathieu M, Dudzik A, Chung J, Choi DH, Powell R, Ewalds T, Georgiev P et al (2019) Grandmaster level in starcraft ii using multi-agent reinforcement learning. Nature 575 (7782):350–354. https://doi.org/10.1038/s41586-019-1724-z
Schrittwieser J, Antonoglou I, Hubert T, Simonyan K, Sifre L, Schmitt S, Guez A, Lockhart E, Hassabis D, Graepel T et al (2020) Mastering atari, go, chess and shogi by planning with a learned model. Nature 588(7839):604–609. https://doi.org/10.1038/s41586-020-03051-4
Kiran BR, Sobh I, Talpaert V, Mannion P, Sallab AAA, Yogamani S, Pérez P (2021) Deep reinforcement learning for autonomous driving: a survey. IEEE Trans Intell Transp Syst, 1–18. https://doi.org/10.1109/TITS.2021.3054625
Xu X, Zuo L, Li X, Qian L, Ren J, Sun Z (2018) A reinforcement learning approach to autonomous decision making of intelligent vehicles on highways. IEEE Transactions on Systems, Man, and Cybernetics: Systems 50(10):3884–3897. https://doi.org/10.1109/TSMC.2018.2870983
Zhang Y, Zhou Y, Lu H, Fujita H (2021) Cooperative multi-agent actor–critic control of traffic network flow based on edge computing. Futur Gener Comput Syst 123:128–141. https://doi.org/10.1016/j.future.2021.04.018
Wang X, Ke L, Qiao Z, Chai X (2021) Large-scale traffic signal control using a novel multiagent reinforcement learning. IEEE Trans Cybern 51(1):174–187. https://doi.org/10.1109/TCYB.2020.3015811
Foerster J, Nardelli N, Farquhar G, Afouras T, Torr PH, Kohli P, Whiteson S (2017) Stabilising experience replay for deep multi-agent reinforcement learning. In: International conference on machine learning, vol 70. PMLR, pp 1146–1155
Jiang H, Shi D, Xue C, Wang Y, Wang G, Zhang Y (2021) Multi-agent deep reinforcement learning with type-based hierarchical group communication. Appl Intell 51:5793–5808. https://doi.org/10.1007/s10489-020-02065-9
Jiang H, Shi D, Xue C, Wang Y, Wang G, Zhang Y (2020) Ghgc: Goal-based hierarchical group communication in multi-agent reinforcement learning. In: 2020 IEEE international conference on systems, man, and cybernetics (SMC). IEEE, pp 3507–3514. https://doi.org/10.1109/SMC42975.2020.9282974
Albrecht SV, Stone P (2018) Autonomous agents modelling other agents: a comprehensive survey and open problems. Artif Intell 258:66–95. https://doi.org/10.1016/j.artint.2018.01.002
He H, Boyd-Graber J, Kwok K, Daumé H III (2016) Opponent modeling in deep reinforcement learning. In: International conference on machine learning, vol 48. PMLR, pp 1804–1813
Gao P, Zhang Q, Wang F, Xiao L, Fujita H, Zhang Y (2020) Learning reinforced attentional representation for end-to-end visual tracking. Inf Sci 517:52–67. https://doi.org/10.1016/j.ins.2019.12.084
Iqbal S, Sha F (2019) Actor-attention-critic for multi-agent reinforcement learning. In: International conference on machine learning, vol 97. PMLR, pp 2961–2970
Lever G, Heess N, Degris T, Wierstra D, Riedmiller M (2014) Deterministic policy gradient algorithms. In: International conference on machine learning, vol 32. PMLR, pp 387–395
Acknowledgments
This research was funded by Postgraduate Scientific Research Innovation Project of Hunan Province (grant No.CX20210030).
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of Interests
The authors have no relevant financial or non-financial interests to disclose.
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Wang, B., Li, S., Gao, X. et al. Weighted mean field reinforcement learning for large-scale UAV swarm confrontation. Appl Intell 53, 5274–5289 (2023). https://doi.org/10.1007/s10489-022-03840-6
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10489-022-03840-6