Skip to main content
Log in

Weighted mean field reinforcement learning for large-scale UAV swarm confrontation

  • Published:
Applied Intelligence Aims and scope Submit manuscript

Abstract

Finding the optimal game strategy is a difficult problem in unmanned aerial vehicle (UAV) swarm confrontation. As an effective solution to the sequential decision-making problem, multi-agent reinforcement learning (MARL) provides a promising way to realize intelligent countermeasures. However, there are two challenges in applying MARL to large-scale UAV swarm confrontation: i) the curse of dimensionality caused by the excessive scale of UAV clusters and ii) the generalization problem caused by the dynamically changing UAV cluster size. To address these problems, we propose a novel MARL paradigm, called Weighted Mean Field Reinforcement Learning, where the pairwise communication between any UAV and its neighbors is modeled as that between a central UAV and the virtual UAV, which is abstracted from the weighted mean effect of neighboring UAVs. This approach reduces the multi-agent problem to a two-agent problem, which can reduce the input dimension of the agent and adapt to the changing cluster size. The communication content between UAVs includes actions and local observations. Actions can enhance the cooperation between UAVs and alleviate the non-stationarity of the environment, while local observations can expand the perception range of the central UAV so that it can obtain more useful information about the environment. The attention mechanism is leveraged to enable UAVs to select more valuable information flexibly, making our method more scalable than other algorithms. Combining this paradigm with double Q-learning and actor-critic algorithms, we propose weighted mean field Q-learning (WMFQ) and weighted mean field actor-critic (WMFAC) algorithms. Experiments on our constructed UAV swarm confrontation environment verify the effectiveness and scalability of our algorithms.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11

Similar content being viewed by others

Data Availability

The datasets generated during and/or analysed during the current study are available from the corresponding author on reasonable request.

References

  1. Zhou L, Leng S, Liu Q, Wang Q (2022) Intelligent uav swarm cooperation for multiple targets tracking. IEEE Internet Things J 9(1):743–754. https://doi.org/10.1109/JIOT.2021.3085673

    Article  Google Scholar 

  2. Sun Z, Piao H, Yang Z, Zhao Y, Zhan G, Zhou D, Meng G, Chen H, Chen X, Qu B et al (2021) Multi-agent hierarchical policy gradient for air combat tactics emergence via self-play. Eng Appl Artif Intell 98:104112. https://doi.org/10.1016/j.engappai.2020.104112

    Article  Google Scholar 

  3. Mnih V, Kavukcuoglu K, Silver D, Rusu AA, Veness J, Bellemare MG, Graves A, Riedmiller M, Fidjeland AK (2015) Ostrovski, G., others : Human-level control through deep reinforcement learning. Nature 518(7540):529–533. https://doi.org/10.1038/nature14236

    Article  Google Scholar 

  4. Mnih V, Badia AP, Mirza M, Graves A, Lillicrap T, Harley T, Silver D, Kavukcuoglu K (2016) Asynchronous methods for deep reinforcement learning. In: Proceedings of The 33rd international conference on machine learning, vol 48. PMLR, pp 1928–1937

  5. Lowe R, Wu Y, Tamar A, Harb J, Abbeel P, Mordatch I (2017) Multi-agent actor-critic for mixed cooperative-competitive environments. In: Proceedings of the 31st international conference on neural information processing systems, vol 30. MIT Press, pp 6382–6393

  6. Tampuu A, Matiisen T, Kodelja D, Kuzovkin I, Korjus K, Aru J, Aru J, Vicente R (2017) Multiagent cooperation and competition with deep reinforcement learning. PLoS ONE 12(4):0172395. https://doi.org/10.1371/journal.pone.0172395

    Article  Google Scholar 

  7. Gupta JK, Egorov M, Kochenderfer M (2017) Cooperative multi-agent control using deep reinforcement learning. In: International conference on autonomous agents and multiagent systems, vol 10642. Springer, pp 66–83. https://doi.org/10.1007/978-3-319-71682-4_5

  8. Yang Y, Luo R, Li M, Zhou M, Zhang W, Wang J (2018) Mean field multi-agent reinforcement learning. In: International conference on machine learning, vol 80. PMLR, pp 5571–5580

  9. Hasselt HV, Guez A, Silver D (2016) Deep reinforcement learning with double q-learning. In: Proceedings of the Thirtieth AAAI conference on artificial intelligence, vol 30. AAAI Press, pp 2094–2100

  10. Konda VR, Tsitsiklis JN (2000) Actor-critic algorithms. In: Advances in neural information processing systems, vol 12. MIT press, pp 1008–1014. https://doi.org/10.1137/S0363012901385691

  11. Shao S, Peng Y, He C, Du Y (2020) Efficient path planning for uav formation via comprehensively improved particle swarm optimization. ISA Trans 97:415–430. https://doi.org/10.1016/j.isatra.2019.08.018

    Article  Google Scholar 

  12. He W, qi X, Liu L (2021) A novel hybrid particle swarm optimization for multi-uav cooperate path planning. Appl Intell 51:7350–7364. https://doi.org/10.1007/s10489-020-02082-8

    Article  Google Scholar 

  13. Xu C, Xu M, Yin C (2020) Optimized multi-uav cooperative path planning under the complex confrontation environment. Comput Commun 162:196–203. https://doi.org/10.1016/j.comcom.2020.04.050

    Article  Google Scholar 

  14. Qiu H, Duan H (2020) A multi-objective pigeon-inspired optimization approach to uav distributed flocking among obstacles. Inf Sci 509:515–529. https://doi.org/10.1016/j.ins.2018.06.061

    Article  MathSciNet  Google Scholar 

  15. Luo L, Wang X, Ma J, Ong Y-S (2021) Grpavoid: Multigroup collision-avoidance control and optimization for uav swarm. IEEE Trans Cybern, 1–14. https://doi.org/10.1109/TCYB.2021.3132044

  16. Wu X, Chen H, Chen C, Zhong M, Xie S, Guo Y, Fujita H (2020) The autonomous navigation and obstacle avoidance for usvs with anoa deep reinforcement learning method. Knowl-Based Syst 196:105201. https://doi.org/10.1016/j.knosys.2019.105201

    Article  Google Scholar 

  17. Yan C, Wang C, Xiang X, Lan Z, Jiang Y (2022) Deep reinforcement learning of collision-free flocking policies for multiple fixed-wing uavs using local situation maps. IEEE Trans on Industr Inform 18(2):1260–1270. https://doi.org/10.1109/TII.2021.3094207

    Article  Google Scholar 

  18. Vinyals O, Babuschkin I, Czarnecki WM, Mathieu M, Dudzik A, Chung J, Choi DH, Powell R, Ewalds T, Georgiev P et al (2019) Grandmaster level in starcraft ii using multi-agent reinforcement learning. Nature 575 (7782):350–354. https://doi.org/10.1038/s41586-019-1724-z

    Article  Google Scholar 

  19. Schrittwieser J, Antonoglou I, Hubert T, Simonyan K, Sifre L, Schmitt S, Guez A, Lockhart E, Hassabis D, Graepel T et al (2020) Mastering atari, go, chess and shogi by planning with a learned model. Nature 588(7839):604–609. https://doi.org/10.1038/s41586-020-03051-4

    Article  Google Scholar 

  20. Kiran BR, Sobh I, Talpaert V, Mannion P, Sallab AAA, Yogamani S, Pérez P (2021) Deep reinforcement learning for autonomous driving: a survey. IEEE Trans Intell Transp Syst, 1–18. https://doi.org/10.1109/TITS.2021.3054625

  21. Xu X, Zuo L, Li X, Qian L, Ren J, Sun Z (2018) A reinforcement learning approach to autonomous decision making of intelligent vehicles on highways. IEEE Transactions on Systems, Man, and Cybernetics: Systems 50(10):3884–3897. https://doi.org/10.1109/TSMC.2018.2870983

    Google Scholar 

  22. Zhang Y, Zhou Y, Lu H, Fujita H (2021) Cooperative multi-agent actor–critic control of traffic network flow based on edge computing. Futur Gener Comput Syst 123:128–141. https://doi.org/10.1016/j.future.2021.04.018

    Article  Google Scholar 

  23. Wang X, Ke L, Qiao Z, Chai X (2021) Large-scale traffic signal control using a novel multiagent reinforcement learning. IEEE Trans Cybern 51(1):174–187. https://doi.org/10.1109/TCYB.2020.3015811

    Article  Google Scholar 

  24. Foerster J, Nardelli N, Farquhar G, Afouras T, Torr PH, Kohli P, Whiteson S (2017) Stabilising experience replay for deep multi-agent reinforcement learning. In: International conference on machine learning, vol 70. PMLR, pp 1146–1155

  25. Jiang H, Shi D, Xue C, Wang Y, Wang G, Zhang Y (2021) Multi-agent deep reinforcement learning with type-based hierarchical group communication. Appl Intell 51:5793–5808. https://doi.org/10.1007/s10489-020-02065-9

    Article  Google Scholar 

  26. Jiang H, Shi D, Xue C, Wang Y, Wang G, Zhang Y (2020) Ghgc: Goal-based hierarchical group communication in multi-agent reinforcement learning. In: 2020 IEEE international conference on systems, man, and cybernetics (SMC). IEEE, pp 3507–3514. https://doi.org/10.1109/SMC42975.2020.9282974

  27. Albrecht SV, Stone P (2018) Autonomous agents modelling other agents: a comprehensive survey and open problems. Artif Intell 258:66–95. https://doi.org/10.1016/j.artint.2018.01.002

    Article  MathSciNet  MATH  Google Scholar 

  28. He H, Boyd-Graber J, Kwok K, Daumé H III (2016) Opponent modeling in deep reinforcement learning. In: International conference on machine learning, vol 48. PMLR, pp 1804–1813

  29. Gao P, Zhang Q, Wang F, Xiao L, Fujita H, Zhang Y (2020) Learning reinforced attentional representation for end-to-end visual tracking. Inf Sci 517:52–67. https://doi.org/10.1016/j.ins.2019.12.084

    Article  Google Scholar 

  30. Iqbal S, Sha F (2019) Actor-attention-critic for multi-agent reinforcement learning. In: International conference on machine learning, vol 97. PMLR, pp 2961–2970

  31. Lever G, Heess N, Degris T, Wierstra D, Riedmiller M (2014) Deterministic policy gradient algorithms. In: International conference on machine learning, vol 32. PMLR, pp 387–395

Download references

Acknowledgments

This research was funded by Postgraduate Scientific Research Innovation Project of Hunan Province (grant No.CX20210030).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Baolai Wang.

Ethics declarations

Conflict of Interests

The authors have no relevant financial or non-financial interests to disclose.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Wang, B., Li, S., Gao, X. et al. Weighted mean field reinforcement learning for large-scale UAV swarm confrontation. Appl Intell 53, 5274–5289 (2023). https://doi.org/10.1007/s10489-022-03840-6

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10489-022-03840-6

Keywords

Navigation