Skip to main content
Log in

UAVs rounding up inspired by communication multi-agent depth deterministic policy gradient

  • Published:
Applied Intelligence Aims and scope Submit manuscript

Abstract

UAVs rounding up is a game between UAV swarm and targets. The main challenge lies in achieving efficient collaboration between UAVs and the setting of rounding-up points. This paper extends our work in three aspects, including establishing an information interaction strategy model, dynamic rounding-up points, and detailed reward function settings. Inspired by the intelligence of the biological swarm, this paper constructs a communication multi-agent depth deterministic policy gradient (COM-MADDPG) framework, based on the communication topology during the rounding-up process, which proposes an information interaction strategy as action policy in reinforcement learning. When carrying out rounding up, it is no longer limited to a fixed threshold, and a dynamic rounding-up points is proposed to judge the success of the mission, each UAV has own area of rounding-up and cooperate to complete the swarm mission. In view of the situation where the target is at the corner or edge, the reward function of reinforcement learning is redefined, which effectively avoids the problem of rounding-up failure under special circumstances. Furthermore, the simulation results verify the COM-MADDPG framework perform better than DDPG and MADDPG in rounding-up tasks, and can be help for improving the success rate, which confirms the effectiveness of decision-making in those special situations. Those all have shown promise due to their robustness.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14

Similar content being viewed by others

References

  1. Hespanha JP, Prandini M, Sastry S (2000) Probabilistic pursuit-evasion games: a one-step nash approach. In: Proceedings of the 39th IEEE conference on decision and control (Cat. No. 00CH37187). IEEE, vol 3, pp 2272–2277. https://doi.org/10.1109/CDC.2000.914136

  2. Weintraub IE, Pachter M, Garcia E (2020) An introduction to pursuit-evasion differential games. In: 2020 American control conference (ACC). IEEE, pp 1049–1066. https://doi.org/10.23919/ACC45564.2020.9147205

  3. Li D, Cruz JB, Chen G, Kwan C, Chang M-H (2005) A hierarchical approach to multi-player pursuit-evasion differential games. In: Proceedings of the 44th IEEE conference on decision and control. IEEE, pp 5674–5679. https://doi.org/10.1109/CDC.2005.1583067

  4. Bhattacharya S, Hutchinson S (2009) On the existence of nash equilibrium for a two player pursuit-evasion game with visibility constraints. In: Algorithmic foundation of robotics VIII. Springer, pp 251–265. https://doi.org/10.1007/978-3-642-00312-7_16

  5. Jaleel H, Shamma JS (2020) Distributed optimization for robot networks: from real-time convex optimization to game-theoretic self-organization. Proc IEEE 108(11):1953–1967. https://doi.org/10.1109/JPROC.2020.3028295

    Article  Google Scholar 

  6. El Ferik S (2017) Behavioral control of uavs with multi-threat evasion strategy inspired by biological systems. In: 2017 14th International multi-conference on systems, signals & devices (SSD). IEEE, pp 181–186. https://doi.org/10.1109/SSD.2017.8167018

  7. Fu X, Chen Z (2021) Cooperative capture control method for multi-UAV based on consensus protocol. Syst Eng Electr 43(9):2501–2507. https://doi.org/10.12305/j.issn.1001-506X.2021.09.17

    Google Scholar 

  8. Huang S (2019) Research on applying deep reinforcement learning in pursuit-evasion problem, Huazhong University of Science & Technology. https://doi.org/10.27157/d.cnki.ghzku.2019.002980

  9. Huang S-Y, Hu B, Liao R-Q, Xiao J-W, He D-X, Guan Z-H (2019) Multi-agent cooperative-competitive environment with reinforcement learning. In: 2019 IEEE 8th data driven control and learning systems conference (DDCLS). IEEE, pp 1382–1386. https://doi.org/10.1109/DDCLS.2019.8909048

  10. Zhang Y, Xu J, Yao K, Liu J (2020) Pursuit missions for UAV swarms based on DDPG algorithm. Acta Aeronautica et As-tronautica Sinica 41(10):324000–324000. https://doi.org/10.7527/S1000-6893.2020.24000

    Google Scholar 

  11. Lowe R, Wu Y, Tamar A, Harb J, Abbeel P, Mordatch I (2017) Multi-agent actor-critic for mixed cooperative-competitive environments. In: Proceedings of the 31st international conference on neural information processing systems, pp 6382–6393. https://doi.org/10.48550/arXiv.1706.02275

  12. Fu X, Wang H, Xu Zh (2022) Cooperative pursuit strategy for multi-UAVs based on DE-MADDPG algorithm. Acta Aeronautica et Astronautica Sinica 43(5):325311–325311. https://doi.org/10.7527/S1000-6893.2021.25311

    Google Scholar 

  13. Zou Ch, Zhen J, Zhang Zh (2020) Research on collaborative strategy based on gaed-maddpg multi-agent reinforcement learning. Appl Res Comput 37(12):142–147. https://doi.org/10.19734/j.issn.1001-3695.2019.09.0546

    Google Scholar 

  14. Bilgin AT, Kadioglu-Urtis E (2015) An approach to multi-agent pursuit evasion games using reinforcement learning. In: 2015 International conference on advanced robotics (ICAR). IEEE, pp 164–169. https://doi.org/10.1109/ICAR.2015.7251450

  15. Su J, Adams S, Beling PA (2021) Value-decomposition multi-agent actor-critics. In: Proceedings of the AAAI conference on artificial intelligence, vol 35, pp 11352–11360. https://doi.org/10.48550/arXiv.2007.12306

  16. Jiang J, Lu Z (2018) Learning attentional communication for multi-agent cooperation. Adv Neural Inf Process Syst 31(13):7254–7264. https://doi.org/10.48550/arXiv.1805.07733

    Google Scholar 

  17. Wang Y, Dong L, Sun C (2020) Cooperative control for multi-player pursuit-evasion games with reinforcement learning. Neurocomputing 412:101–114. https://doi.org/10.1016/j.neucom.2020.06.031

    Article  Google Scholar 

  18. Das A, Gervet T, Romoff J, Batra D, Parikh D, Rabbat M, Pineau J (2019) Tarmac: targeted multi-agent communication. In: International conference on machine learning. PMLR, pp 1538–1546. https://doi.org/10.48550/arXiv.1810.11187

  19. Su J, Adams S, Beling PA (2020) Counterfactual multi-agent reinforcement learning with graph convolution communication, vol 2004. https://doi.org/10.48550/arXiv.2004.00470

  20. Rangwala M, Williams R (2020) Learning multi-agent communication through structured attentive reasoning. Adv Neural Inf Process Syst 33:10088–10098

    Google Scholar 

  21. Zhang W, Li X, Ma H, Luo Z, Li X (2021) Federated learning for machinery fault diagnosis with dynamic validation and self-supervision. Knowl-Based Syst 213:106679. https://doi.org/10.1016/j.knosys.2020.106679

    Article  Google Scholar 

  22. Zhang W, Li X, Ma H, Luo Z, Li X (2021) Universal domain adaptation in fault diagnostics with hybrid weighted deep adversarial learning. IEEE Trans Industr Inform 17(12):7957–7967. https://doi.org/10.1109/TII.2021.3064377

    Article  Google Scholar 

  23. Kong X, Xin B, Liu F, Wang Y (2017) Effective master-slave communication on a multiagent deep reinforcement learning system. In: Hierarchical reinforcement learning workshop at the 31st conference on NIPS, Long Beach, USA

  24. Kong X, Xin B, Liu F, Wang Y (2017) Revisiting the master-slave architecture in multi-agent deep reinforcement learning. arXiv:1712.07305, https://doi.org/10.48550/arXiv.1712.07305

  25. Mao H, Zhang Z, Xiao Z, Gong Z (2018) Modelling the dynamic joint policy of teammates with attention multi-agent ddpg. arXiv:1811.07029, https://doi.org/10.48550/arXiv.1811.07029

  26. Yang Q, Zhu Y, Zhang J, Qiao S, Liu J (2019) Uav air combat autonomous maneuver decision based on ddpg algorithm. In: 2019 IEEE 15th international conference on control and automation (ICCA). IEEE, pp 37–42. https://doi.org/10.1109/ICCA.2019.8899703

  27. Hou Y, Liu L, Wei Q, Xu X, Chen C (2017) A novel ddpg method with prioritized experience replay. In: 2017 IEEE international conference on systems, man, and cybernetics (SMC). IEEE, pp 316–321. https://doi.org/10.1109/SMC.2017.8122622

  28. Wagner C, Back A (2008) Group wisdom support systems: aggregating the insights of many trough information technology. Issues Inf Syst (IIS) 9(2):343–350. https://doi.org/10.48009/2_iis_2008_343-350

    Google Scholar 

  29. Wei R, Zhang Q, Xu Z (2020) Peers’ experience learning for developmental robots. Int J Social Robot 12(1):35–45. https://doi.org/10.1007/s12369-019-00531-0

    Article  Google Scholar 

  30. Zhou K, Wei K, Zhang Q, Ding C (2020) Learning method for autonomousair combat based on experience transfer. Acta Aeronautica et Astronautica Sinica 42((S2)):724285. https://doi.org/10.7527/S1000-6893.2020.24285

    Google Scholar 

  31. Bai X (2020) Research and application of reinforcement learning in multi-agent collaboration. University of Electr Sci Technol China. https://doi.org/10.27005/d.cnki.gdzku.2020.000375

  32. Markova VD, Shopov VK (2019) Knowledge transfer in reinforcement learning agent. In: 2019 International conference on information technologies (InfoTech). IEEE, pp 1–4. https://doi.org/10.1109/InfoTech.2019.8860881

  33. Patricia N, Caputo B (2014) Learning to learn, from transfer learning to domain adaptation: a unifying perspective. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1442–1449. https://doi.org/10.1109/CVPR.2014.187

  34. Duan Y, Huang X, Yu X (2016) Multi-robot dynamic virtual potential point hunting strategy based on fis. In: 2016 IEEE Chinese guidance, navigation and control conference (CGNCC). IEEE, pp 332–335. https://doi.org/10.1109/CGNCC.2016.7828806

  35. Li R, Yang H, Xiao C (2019) Cooperative hunting strategy for multi-mobile robot systems based on dynamic hunting points. Control Eng China 26(03):173–196. https://doi.org/10.14107/j.cnki.kzgc.161174

    Google Scholar 

  36. Lillicrap TP, Hunt JJ, Pritzel A, Heess N, Erez T, Tassa Y, Silver D, Wierstra D (2015) Continuous control with deep reinforcement learning. arXiv:1509.02971, https://doi.org/10.48550/arXiv.1509.02971

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Longting Jiang.

Ethics declarations

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper. This work is funded by the Science and Technology Innovation 2030-Key Project of “New Generation Artificial Intelligence”, China (No. 2018AAA0102403).

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Wei Ruixuan and Wang Dong are contributed equally to this work.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Jiang, L., Wei, R. & Wang, D. UAVs rounding up inspired by communication multi-agent depth deterministic policy gradient. Appl Intell 53, 11474–11489 (2023). https://doi.org/10.1007/s10489-022-03986-3

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10489-022-03986-3

Keywords

Navigation