UAVs rounding up inspired by communication multi-agent depth deterministic policy gradient

Jiang, Longting; Wei, Ruixuan; Wang, Dong

doi:10.1007/s10489-022-03986-3

UAVs rounding up inspired by communication multi-agent depth deterministic policy gradient

Published: 06 September 2022

Volume 53, pages 11474–11489, (2023)
Cite this article

Applied Intelligence Aims and scope Submit manuscript

999 Accesses
2 Citations
1 Altmetric
Explore all metrics

Abstract

UAVs rounding up is a game between UAV swarm and targets. The main challenge lies in achieving efficient collaboration between UAVs and the setting of rounding-up points. This paper extends our work in three aspects, including establishing an information interaction strategy model, dynamic rounding-up points, and detailed reward function settings. Inspired by the intelligence of the biological swarm, this paper constructs a communication multi-agent depth deterministic policy gradient (COM-MADDPG) framework, based on the communication topology during the rounding-up process, which proposes an information interaction strategy as action policy in reinforcement learning. When carrying out rounding up, it is no longer limited to a fixed threshold, and a dynamic rounding-up points is proposed to judge the success of the mission, each UAV has own area of rounding-up and cooperate to complete the swarm mission. In view of the situation where the target is at the corner or edge, the reward function of reinforcement learning is redefined, which effectively avoids the problem of rounding-up failure under special circumstances. Furthermore, the simulation results verify the COM-MADDPG framework perform better than DDPG and MADDPG in rounding-up tasks, and can be help for improving the success rate, which confirms the effectiveness of decision-making in those special situations. Those all have shown promise due to their robustness.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 5

Coordinated Multi-UAV Adaptive Exploration Under Recurrent Connectivity Constraints

Weighted mean field reinforcement learning for large-scale UAV swarm confrontation

Article 21 June 2022

Cooperative multi-agent target searching: a deep reinforcement learning approach based on parallel hindsight experience replay

Article Open access 21 February 2023

References

Hespanha JP, Prandini M, Sastry S (2000) Probabilistic pursuit-evasion games: a one-step nash approach. In: Proceedings of the 39th IEEE conference on decision and control (Cat. No. 00CH37187). IEEE, vol 3, pp 2272–2277. https://doi.org/10.1109/CDC.2000.914136
Weintraub IE, Pachter M, Garcia E (2020) An introduction to pursuit-evasion differential games. In: 2020 American control conference (ACC). IEEE, pp 1049–1066. https://doi.org/10.23919/ACC45564.2020.9147205
Li D, Cruz JB, Chen G, Kwan C, Chang M-H (2005) A hierarchical approach to multi-player pursuit-evasion differential games. In: Proceedings of the 44th IEEE conference on decision and control. IEEE, pp 5674–5679. https://doi.org/10.1109/CDC.2005.1583067
Bhattacharya S, Hutchinson S (2009) On the existence of nash equilibrium for a two player pursuit-evasion game with visibility constraints. In: Algorithmic foundation of robotics VIII. Springer, pp 251–265. https://doi.org/10.1007/978-3-642-00312-7_16
Jaleel H, Shamma JS (2020) Distributed optimization for robot networks: from real-time convex optimization to game-theoretic self-organization. Proc IEEE 108(11):1953–1967. https://doi.org/10.1109/JPROC.2020.3028295
Article Google Scholar
El Ferik S (2017) Behavioral control of uavs with multi-threat evasion strategy inspired by biological systems. In: 2017 14th International multi-conference on systems, signals & devices (SSD). IEEE, pp 181–186. https://doi.org/10.1109/SSD.2017.8167018
Fu X, Chen Z (2021) Cooperative capture control method for multi-UAV based on consensus protocol. Syst Eng Electr 43(9):2501–2507. https://doi.org/10.12305/j.issn.1001-506X.2021.09.17
Google Scholar
Huang S (2019) Research on applying deep reinforcement learning in pursuit-evasion problem, Huazhong University of Science & Technology. https://doi.org/10.27157/d.cnki.ghzku.2019.002980
Huang S-Y, Hu B, Liao R-Q, Xiao J-W, He D-X, Guan Z-H (2019) Multi-agent cooperative-competitive environment with reinforcement learning. In: 2019 IEEE 8th data driven control and learning systems conference (DDCLS). IEEE, pp 1382–1386. https://doi.org/10.1109/DDCLS.2019.8909048
Zhang Y, Xu J, Yao K, Liu J (2020) Pursuit missions for UAV swarms based on DDPG algorithm. Acta Aeronautica et As-tronautica Sinica 41(10):324000–324000. https://doi.org/10.7527/S1000-6893.2020.24000
Google Scholar
Lowe R, Wu Y, Tamar A, Harb J, Abbeel P, Mordatch I (2017) Multi-agent actor-critic for mixed cooperative-competitive environments. In: Proceedings of the 31st international conference on neural information processing systems, pp 6382–6393. https://doi.org/10.48550/arXiv.1706.02275
Fu X, Wang H, Xu Zh (2022) Cooperative pursuit strategy for multi-UAVs based on DE-MADDPG algorithm. Acta Aeronautica et Astronautica Sinica 43(5):325311–325311. https://doi.org/10.7527/S1000-6893.2021.25311
Google Scholar
Zou Ch, Zhen J, Zhang Zh (2020) Research on collaborative strategy based on gaed-maddpg multi-agent reinforcement learning. Appl Res Comput 37(12):142–147. https://doi.org/10.19734/j.issn.1001-3695.2019.09.0546
Google Scholar
Bilgin AT, Kadioglu-Urtis E (2015) An approach to multi-agent pursuit evasion games using reinforcement learning. In: 2015 International conference on advanced robotics (ICAR). IEEE, pp 164–169. https://doi.org/10.1109/ICAR.2015.7251450
Su J, Adams S, Beling PA (2021) Value-decomposition multi-agent actor-critics. In: Proceedings of the AAAI conference on artificial intelligence, vol 35, pp 11352–11360. https://doi.org/10.48550/arXiv.2007.12306
Jiang J, Lu Z (2018) Learning attentional communication for multi-agent cooperation. Adv Neural Inf Process Syst 31(13):7254–7264. https://doi.org/10.48550/arXiv.1805.07733
Google Scholar
Wang Y, Dong L, Sun C (2020) Cooperative control for multi-player pursuit-evasion games with reinforcement learning. Neurocomputing 412:101–114. https://doi.org/10.1016/j.neucom.2020.06.031
Article Google Scholar
Das A, Gervet T, Romoff J, Batra D, Parikh D, Rabbat M, Pineau J (2019) Tarmac: targeted multi-agent communication. In: International conference on machine learning. PMLR, pp 1538–1546. https://doi.org/10.48550/arXiv.1810.11187
Su J, Adams S, Beling PA (2020) Counterfactual multi-agent reinforcement learning with graph convolution communication, vol 2004. https://doi.org/10.48550/arXiv.2004.00470
Rangwala M, Williams R (2020) Learning multi-agent communication through structured attentive reasoning. Adv Neural Inf Process Syst 33:10088–10098
Google Scholar
Zhang W, Li X, Ma H, Luo Z, Li X (2021) Federated learning for machinery fault diagnosis with dynamic validation and self-supervision. Knowl-Based Syst 213:106679. https://doi.org/10.1016/j.knosys.2020.106679
Article Google Scholar
Zhang W, Li X, Ma H, Luo Z, Li X (2021) Universal domain adaptation in fault diagnostics with hybrid weighted deep adversarial learning. IEEE Trans Industr Inform 17(12):7957–7967. https://doi.org/10.1109/TII.2021.3064377
Article Google Scholar
Kong X, Xin B, Liu F, Wang Y (2017) Effective master-slave communication on a multiagent deep reinforcement learning system. In: Hierarchical reinforcement learning workshop at the 31st conference on NIPS, Long Beach, USA
Kong X, Xin B, Liu F, Wang Y (2017) Revisiting the master-slave architecture in multi-agent deep reinforcement learning. arXiv:1712.07305, https://doi.org/10.48550/arXiv.1712.07305
Mao H, Zhang Z, Xiao Z, Gong Z (2018) Modelling the dynamic joint policy of teammates with attention multi-agent ddpg. arXiv:1811.07029, https://doi.org/10.48550/arXiv.1811.07029
Yang Q, Zhu Y, Zhang J, Qiao S, Liu J (2019) Uav air combat autonomous maneuver decision based on ddpg algorithm. In: 2019 IEEE 15th international conference on control and automation (ICCA). IEEE, pp 37–42. https://doi.org/10.1109/ICCA.2019.8899703
Hou Y, Liu L, Wei Q, Xu X, Chen C (2017) A novel ddpg method with prioritized experience replay. In: 2017 IEEE international conference on systems, man, and cybernetics (SMC). IEEE, pp 316–321. https://doi.org/10.1109/SMC.2017.8122622
Wagner C, Back A (2008) Group wisdom support systems: aggregating the insights of many trough information technology. Issues Inf Syst (IIS) 9(2):343–350. https://doi.org/10.48009/2_iis_2008_343-350
Google Scholar
Wei R, Zhang Q, Xu Z (2020) Peers’ experience learning for developmental robots. Int J Social Robot 12(1):35–45. https://doi.org/10.1007/s12369-019-00531-0
Article Google Scholar
Zhou K, Wei K, Zhang Q, Ding C (2020) Learning method for autonomousair combat based on experience transfer. Acta Aeronautica et Astronautica Sinica 42((S2)):724285. https://doi.org/10.7527/S1000-6893.2020.24285
Google Scholar
Bai X (2020) Research and application of reinforcement learning in multi-agent collaboration. University of Electr Sci Technol China. https://doi.org/10.27005/d.cnki.gdzku.2020.000375
Markova VD, Shopov VK (2019) Knowledge transfer in reinforcement learning agent. In: 2019 International conference on information technologies (InfoTech). IEEE, pp 1–4. https://doi.org/10.1109/InfoTech.2019.8860881
Patricia N, Caputo B (2014) Learning to learn, from transfer learning to domain adaptation: a unifying perspective. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1442–1449. https://doi.org/10.1109/CVPR.2014.187
Duan Y, Huang X, Yu X (2016) Multi-robot dynamic virtual potential point hunting strategy based on fis. In: 2016 IEEE Chinese guidance, navigation and control conference (CGNCC). IEEE, pp 332–335. https://doi.org/10.1109/CGNCC.2016.7828806
Li R, Yang H, Xiao C (2019) Cooperative hunting strategy for multi-mobile robot systems based on dynamic hunting points. Control Eng China 26(03):173–196. https://doi.org/10.14107/j.cnki.kzgc.161174
Google Scholar
Lillicrap TP, Hunt JJ, Pritzel A, Heess N, Erez T, Tassa Y, Silver D, Wierstra D (2015) Continuous control with deep reinforcement learning. arXiv:1509.02971, https://doi.org/10.48550/arXiv.1509.02971

Download references

Author information

Authors and Affiliations

Graduate School, Air Force Engineering University, No. 1 Baling Road, Baqiao District, Xi’an, 710051, Shann’xi Province, China
Longting Jiang
Aviation Engineering School, Air Force Engineering University, No. 1 Baling Road, Baqiao District, Xi’an, 710038, Shann’xi Province, China
Longting Jiang, Ruixuan Wei & Dong Wang

Authors

Longting Jiang
View author publications
You can also search for this author in PubMed Google Scholar
Ruixuan Wei
View author publications
You can also search for this author in PubMed Google Scholar
Dong Wang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Longting Jiang.

Ethics declarations

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper. This work is funded by the Science and Technology Innovation 2030-Key Project of “New Generation Artificial Intelligence”, China (No. 2018AAA0102403).

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Wei Ruixuan and Wang Dong are contributed equally to this work.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Jiang, L., Wei, R. & Wang, D. UAVs rounding up inspired by communication multi-agent depth deterministic policy gradient. Appl Intell 53, 11474–11489 (2023). https://doi.org/10.1007/s10489-022-03986-3

Download citation

Accepted: 08 July 2022
Published: 06 September 2022
Issue Date: May 2023
DOI: https://doi.org/10.1007/s10489-022-03986-3

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

UAVs rounding up inspired by communication multi-agent depth deterministic policy gradient

Abstract

Access this article

Similar content being viewed by others

Coordinated Multi-UAV Adaptive Exploration Under Recurrent Connectivity Constraints

Weighted mean field reinforcement learning for large-scale UAV swarm confrontation

Cooperative multi-agent target searching: a deep reinforcement learning approach based on parallel hindsight experience replay

References

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

UAVs rounding up inspired by communication multi-agent depth deterministic policy gradient

Abstract

Access this article

Similar content being viewed by others

Coordinated Multi-UAV Adaptive Exploration Under Recurrent Connectivity Constraints

Weighted mean field reinforcement learning for large-scale UAV swarm confrontation

Cooperative multi-agent target searching: a deep reinforcement learning approach based on parallel hindsight experience replay

References

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation