Skip to main content

Group-Based Deep Reinforcement Learning in Multi-UAV Confrontation

  • Conference paper
  • First Online:
Book cover Neural Information Processing (ICONIP 2021)

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 1516))

Included in the following conference series:

  • 2372 Accesses

Abstract

The application of deep reinforcement learning (DRL) algorithms in multi-agent environments has become more and more popular. However, most DRL algorithms do not solve the problem of group cooperation. Each agent explores in a direction that is beneficial to itself, but ignores the situation of its teammates, which is easy to fall into the local optimum. This paper aims to solve this problem in a multi-UAV confrontation scenario. We try to find the optimal cooperative policy by dividing UAVs into several groups and make UAVs learn to cooperate with teammates autonomously. Specifically, we propose an algorithm called group-based actor-critic (GBAC). We group UAVs by setting the observation radius, and we use a double Q network to process rewards. We divide rewards into individual rewards and group rewards. The Q network is used to process individual rewards, and the group-Q network is used to process group rewards. As a result, UAVs can get higher rewards through group cooperation. The performance of UAVs trained by our method outperforms other DRL methods. In this paper, we use the group-based DRL method to solve the problem of group cooperation and maximize the expected return in multi-UAV confrontation.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 99.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 129.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Schulman, J., Levine, S., Abbeel, P., Jordan, M., Moritz, P.: Trust region policy optimization. In: International Conference on Machine Learning. PMLR, pp. 1889–1897 (2015)

    Google Scholar 

  2. Silver, D., et al.: Mastering the game of go without human knowledge. Nature 550(7676), 354–359 (2017)

    Article  Google Scholar 

  3. Mnih, V., et al.: Human-level control through deep reinforcement learning. Nature 518(7540), 529–533 (2015)

    Article  Google Scholar 

  4. Lillicrap, T., et al.: Continuous control with deep reinforcement learning. CoRR abs/1509.02971 (2016)

    Google Scholar 

  5. Lowe, R., Wu, Y., Tamar, A., Harb, J., Abbeel, P., Mordatch, I.: Multi-agent actor-critic for mixed cooperative-competitive environments. In: NIPS (2017)

    Google Scholar 

  6. Tan, M., Multi-agent reinforcement learning: independent versus cooperative agents. In: ICML (1993)

    Google Scholar 

  7. Witten, I.H.: An adaptive optimal controller for discrete-time Markov environments. Inf. Control 34(4), 286–295 (1977)

    Article  MathSciNet  Google Scholar 

  8. Barto, A., Sutton, R., Anderson, C.: Neuronlike adaptive elements that can solve difficult learning control problems. IEEE Trans. Syst. Man Cybern. SMC-13, 834–846 (1983)

    Google Scholar 

  9. Sutton, R., McAllester, D.A., Singh, S., Mansour, Y.: Policy gradient methods for reinforcement learning with function approximation. In: NIPS (1999)

    Google Scholar 

  10. Silver, D., Lever, G., Heess, N., Degris, T., Wierstra, D., Riedmiller, M.A.: Deterministic policy gradient algorithms. In: ICML (2014)

    Google Scholar 

  11. Mnih, V., et al.: Asynchronous methods for deep reinforcement learning. In: International Conference on Machine Learning. PMLR, pp. 1928–1937 (2016)

    Google Scholar 

  12. Haarnoja, T., Zhou, A., Abbeel, P., Levine, S.: Soft actor-critic: off-policy maximum entropy deep reinforcement learning with a stochastic actor. In: International Conference on Machine Learning. PMLR, pp. 1861–1870 (2018)

    Google Scholar 

Download references

Acknowledgments

This paper is supported by the World Park of Digital Economy EVONature Foundation and the national Science Foundation of China, under the project No. 61472476.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Tao Xie .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2021 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Li, S., Wang, B., Xie, T. (2021). Group-Based Deep Reinforcement Learning in Multi-UAV Confrontation. In: Mantoro, T., Lee, M., Ayu, M.A., Wong, K.W., Hidayanto, A.N. (eds) Neural Information Processing. ICONIP 2021. Communications in Computer and Information Science, vol 1516. Springer, Cham. https://doi.org/10.1007/978-3-030-92307-5_72

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-92307-5_72

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-92306-8

  • Online ISBN: 978-3-030-92307-5

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics