Weighted mean field reinforcement learning for large-scale UAV swarm confrontation

Wang, Baolai; Li, Shengang; Gao, Xianzhong; Xie, Tao

doi:10.1007/s10489-022-03840-6

Weighted mean field reinforcement learning for large-scale UAV swarm confrontation

Published: 21 June 2022

Volume 53, pages 5274–5289, (2023)
Cite this article

Applied Intelligence Aims and scope Submit manuscript

Baolai Wang ORCID: orcid.org/0000-0003-1189-0726¹,
Shengang Li¹,
Xianzhong Gao² &
…
Tao Xie¹

1203 Accesses
5 Citations
Explore all metrics

Abstract

Finding the optimal game strategy is a difficult problem in unmanned aerial vehicle (UAV) swarm confrontation. As an effective solution to the sequential decision-making problem, multi-agent reinforcement learning (MARL) provides a promising way to realize intelligent countermeasures. However, there are two challenges in applying MARL to large-scale UAV swarm confrontation: i) the curse of dimensionality caused by the excessive scale of UAV clusters and ii) the generalization problem caused by the dynamically changing UAV cluster size. To address these problems, we propose a novel MARL paradigm, called Weighted Mean Field Reinforcement Learning, where the pairwise communication between any UAV and its neighbors is modeled as that between a central UAV and the virtual UAV, which is abstracted from the weighted mean effect of neighboring UAVs. This approach reduces the multi-agent problem to a two-agent problem, which can reduce the input dimension of the agent and adapt to the changing cluster size. The communication content between UAVs includes actions and local observations. Actions can enhance the cooperation between UAVs and alleviate the non-stationarity of the environment, while local observations can expand the perception range of the central UAV so that it can obtain more useful information about the environment. The attention mechanism is leveraged to enable UAVs to select more valuable information flexibly, making our method more scalable than other algorithms. Combining this paradigm with double Q-learning and actor-critic algorithms, we propose weighted mean field Q-learning (WMFQ) and weighted mean field actor-critic (WMFAC) algorithms. Experiments on our constructed UAV swarm confrontation environment verify the effectiveness and scalability of our algorithms.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Unmanned aerial vehicles (UAVs): practical aspects, applications, open challenges, security issues, and future trends

Article 16 January 2023

Recent Advances in Unmanned Aerial Vehicles: A Review

Article 25 April 2022

Multi-Agent Reinforcement Learning: A Selective Overview of Theories and Algorithms

Data Availability

The datasets generated during and/or analysed during the current study are available from the corresponding author on reasonable request.

References

Zhou L, Leng S, Liu Q, Wang Q (2022) Intelligent uav swarm cooperation for multiple targets tracking. IEEE Internet Things J 9(1):743–754. https://doi.org/10.1109/JIOT.2021.3085673
Article Google Scholar
Sun Z, Piao H, Yang Z, Zhao Y, Zhan G, Zhou D, Meng G, Chen H, Chen X, Qu B et al (2021) Multi-agent hierarchical policy gradient for air combat tactics emergence via self-play. Eng Appl Artif Intell 98:104112. https://doi.org/10.1016/j.engappai.2020.104112
Article Google Scholar
Mnih V, Kavukcuoglu K, Silver D, Rusu AA, Veness J, Bellemare MG, Graves A, Riedmiller M, Fidjeland AK (2015) Ostrovski, G., others : Human-level control through deep reinforcement learning. Nature 518(7540):529–533. https://doi.org/10.1038/nature14236
Article Google Scholar
Mnih V, Badia AP, Mirza M, Graves A, Lillicrap T, Harley T, Silver D, Kavukcuoglu K (2016) Asynchronous methods for deep reinforcement learning. In: Proceedings of The 33rd international conference on machine learning, vol 48. PMLR, pp 1928–1937
Lowe R, Wu Y, Tamar A, Harb J, Abbeel P, Mordatch I (2017) Multi-agent actor-critic for mixed cooperative-competitive environments. In: Proceedings of the 31st international conference on neural information processing systems, vol 30. MIT Press, pp 6382–6393
Tampuu A, Matiisen T, Kodelja D, Kuzovkin I, Korjus K, Aru J, Aru J, Vicente R (2017) Multiagent cooperation and competition with deep reinforcement learning. PLoS ONE 12(4):0172395. https://doi.org/10.1371/journal.pone.0172395
Article Google Scholar
Gupta JK, Egorov M, Kochenderfer M (2017) Cooperative multi-agent control using deep reinforcement learning. In: International conference on autonomous agents and multiagent systems, vol 10642. Springer, pp 66–83. https://doi.org/10.1007/978-3-319-71682-4_5
Yang Y, Luo R, Li M, Zhou M, Zhang W, Wang J (2018) Mean field multi-agent reinforcement learning. In: International conference on machine learning, vol 80. PMLR, pp 5571–5580
Hasselt HV, Guez A, Silver D (2016) Deep reinforcement learning with double q-learning. In: Proceedings of the Thirtieth AAAI conference on artificial intelligence, vol 30. AAAI Press, pp 2094–2100
Konda VR, Tsitsiklis JN (2000) Actor-critic algorithms. In: Advances in neural information processing systems, vol 12. MIT press, pp 1008–1014. https://doi.org/10.1137/S0363012901385691
Shao S, Peng Y, He C, Du Y (2020) Efficient path planning for uav formation via comprehensively improved particle swarm optimization. ISA Trans 97:415–430. https://doi.org/10.1016/j.isatra.2019.08.018
Article Google Scholar
He W, qi X, Liu L (2021) A novel hybrid particle swarm optimization for multi-uav cooperate path planning. Appl Intell 51:7350–7364. https://doi.org/10.1007/s10489-020-02082-8
Article Google Scholar
Xu C, Xu M, Yin C (2020) Optimized multi-uav cooperative path planning under the complex confrontation environment. Comput Commun 162:196–203. https://doi.org/10.1016/j.comcom.2020.04.050
Article Google Scholar
Qiu H, Duan H (2020) A multi-objective pigeon-inspired optimization approach to uav distributed flocking among obstacles. Inf Sci 509:515–529. https://doi.org/10.1016/j.ins.2018.06.061
Article MathSciNet Google Scholar
Luo L, Wang X, Ma J, Ong Y-S (2021) Grpavoid: Multigroup collision-avoidance control and optimization for uav swarm. IEEE Trans Cybern, 1–14. https://doi.org/10.1109/TCYB.2021.3132044
Wu X, Chen H, Chen C, Zhong M, Xie S, Guo Y, Fujita H (2020) The autonomous navigation and obstacle avoidance for usvs with anoa deep reinforcement learning method. Knowl-Based Syst 196:105201. https://doi.org/10.1016/j.knosys.2019.105201
Article Google Scholar
Yan C, Wang C, Xiang X, Lan Z, Jiang Y (2022) Deep reinforcement learning of collision-free flocking policies for multiple fixed-wing uavs using local situation maps. IEEE Trans on Industr Inform 18(2):1260–1270. https://doi.org/10.1109/TII.2021.3094207
Article Google Scholar
Vinyals O, Babuschkin I, Czarnecki WM, Mathieu M, Dudzik A, Chung J, Choi DH, Powell R, Ewalds T, Georgiev P et al (2019) Grandmaster level in starcraft ii using multi-agent reinforcement learning. Nature 575 (7782):350–354. https://doi.org/10.1038/s41586-019-1724-z
Article Google Scholar
Schrittwieser J, Antonoglou I, Hubert T, Simonyan K, Sifre L, Schmitt S, Guez A, Lockhart E, Hassabis D, Graepel T et al (2020) Mastering atari, go, chess and shogi by planning with a learned model. Nature 588(7839):604–609. https://doi.org/10.1038/s41586-020-03051-4
Article Google Scholar
Kiran BR, Sobh I, Talpaert V, Mannion P, Sallab AAA, Yogamani S, Pérez P (2021) Deep reinforcement learning for autonomous driving: a survey. IEEE Trans Intell Transp Syst, 1–18. https://doi.org/10.1109/TITS.2021.3054625
Xu X, Zuo L, Li X, Qian L, Ren J, Sun Z (2018) A reinforcement learning approach to autonomous decision making of intelligent vehicles on highways. IEEE Transactions on Systems, Man, and Cybernetics: Systems 50(10):3884–3897. https://doi.org/10.1109/TSMC.2018.2870983
Google Scholar
Zhang Y, Zhou Y, Lu H, Fujita H (2021) Cooperative multi-agent actor–critic control of traffic network flow based on edge computing. Futur Gener Comput Syst 123:128–141. https://doi.org/10.1016/j.future.2021.04.018
Article Google Scholar
Wang X, Ke L, Qiao Z, Chai X (2021) Large-scale traffic signal control using a novel multiagent reinforcement learning. IEEE Trans Cybern 51(1):174–187. https://doi.org/10.1109/TCYB.2020.3015811
Article Google Scholar
Foerster J, Nardelli N, Farquhar G, Afouras T, Torr PH, Kohli P, Whiteson S (2017) Stabilising experience replay for deep multi-agent reinforcement learning. In: International conference on machine learning, vol 70. PMLR, pp 1146–1155
Jiang H, Shi D, Xue C, Wang Y, Wang G, Zhang Y (2021) Multi-agent deep reinforcement learning with type-based hierarchical group communication. Appl Intell 51:5793–5808. https://doi.org/10.1007/s10489-020-02065-9
Article Google Scholar
Jiang H, Shi D, Xue C, Wang Y, Wang G, Zhang Y (2020) Ghgc: Goal-based hierarchical group communication in multi-agent reinforcement learning. In: 2020 IEEE international conference on systems, man, and cybernetics (SMC). IEEE, pp 3507–3514. https://doi.org/10.1109/SMC42975.2020.9282974
Albrecht SV, Stone P (2018) Autonomous agents modelling other agents: a comprehensive survey and open problems. Artif Intell 258:66–95. https://doi.org/10.1016/j.artint.2018.01.002
Article MathSciNet MATH Google Scholar
He H, Boyd-Graber J, Kwok K, Daumé H III (2016) Opponent modeling in deep reinforcement learning. In: International conference on machine learning, vol 48. PMLR, pp 1804–1813
Gao P, Zhang Q, Wang F, Xiao L, Fujita H, Zhang Y (2020) Learning reinforced attentional representation for end-to-end visual tracking. Inf Sci 517:52–67. https://doi.org/10.1016/j.ins.2019.12.084
Article Google Scholar
Iqbal S, Sha F (2019) Actor-attention-critic for multi-agent reinforcement learning. In: International conference on machine learning, vol 97. PMLR, pp 2961–2970
Lever G, Heess N, Degris T, Wierstra D, Riedmiller M (2014) Deterministic policy gradient algorithms. In: International conference on machine learning, vol 32. PMLR, pp 387–395

Download references

Acknowledgments

This research was funded by Postgraduate Scientific Research Innovation Project of Hunan Province (grant No.CX20210030).

Author information

Authors and Affiliations

College of Computer, National University of Defense Technology, Changsha, 410073, Hunan, China
Baolai Wang, Shengang Li & Tao Xie
College of Aerospace Science and Engineering, National University of Defense Technology, Changsha, 410073, Hunan, China
Xianzhong Gao

Authors

Baolai Wang
View author publications
You can also search for this author in PubMed Google Scholar
Shengang Li
View author publications
You can also search for this author in PubMed Google Scholar
Xianzhong Gao
View author publications
You can also search for this author in PubMed Google Scholar
Tao Xie
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Baolai Wang.

Ethics declarations

Conflict of Interests

The authors have no relevant financial or non-financial interests to disclose.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Wang, B., Li, S., Gao, X. et al. Weighted mean field reinforcement learning for large-scale UAV swarm confrontation. Appl Intell 53, 5274–5289 (2023). https://doi.org/10.1007/s10489-022-03840-6

Download citation

Accepted: 01 June 2022
Published: 21 June 2022
Issue Date: March 2023
DOI: https://doi.org/10.1007/s10489-022-03840-6

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Weighted mean field reinforcement learning for large-scale UAV swarm confrontation

Abstract

Access this article

Similar content being viewed by others

Unmanned aerial vehicles (UAVs): practical aspects, applications, open challenges, security issues, and future trends

Recent Advances in Unmanned Aerial Vehicles: A Review

Multi-Agent Reinforcement Learning: A Selective Overview of Theories and Algorithms

Data Availability

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of Interests

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Weighted mean field reinforcement learning for large-scale UAV swarm confrontation

Abstract

Access this article

Similar content being viewed by others

Unmanned aerial vehicles (UAVs): practical aspects, applications, open challenges, security issues, and future trends

Recent Advances in Unmanned Aerial Vehicles: A Review

Multi-Agent Reinforcement Learning: A Selective Overview of Theories and Algorithms

Data Availability

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of Interests

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation