Skip to main content

Reinforcement Learning for Multiple HAPS/UAV Coordination: Impact of Exploration–Exploitation Dilemma on Convergence

  • Conference paper
  • First Online:
Soft Computing for Problem Solving 2019

Abstract

This work analyses the application of reinforcement learning (RL) for coordinating multiple unmanned high-altitude platform stations (HAPS) or unmanned aerial vehicles (UAVs) for providing communications area coverage to a community of fixed and mobile users. Multiple agent coordination techniques are essential for developing autonomous capabilities for multi-UAV/HAPS control and management. This paper examines the impact of exploration–exploitation dilemma on the application of RL for coordinating multiple UAVs/HAPS. In the work, it is observed that RL convergence is a challenge, as the RL algorithm struggles to find optimal positioning for maximum user coverage. This paper attempts to establish the source of the convergence issue with the RL technique for this specific application scenario. The work goes on to suggest methods to minimise this impact, and some insights for applying RL techniques for multi-agent coordination for communications area coverage.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 169.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 219.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. I.T.U. (ITU), Terms and definitions. Radio Regulations Articles (2016)

    Google Scholar 

  2. F.A. d’Oliveira, F.C.L. de Melo, T.C. Devezas, High-altitude platforms—present situation and technology trends. J. Aerosp. Technol. Manag. 8, 249–262 (2016)

    Article  Google Scholar 

  3. D. Grace, M. Mohorcic, Broadband Communications via High Altitude Platforms (Wiley, New York, 2011)

    Google Scholar 

  4. ITU, Identifying the potential of new communications technologies for sustainable development, in Broadband Commission For Sustainable Development: Working Group on Technologies in Space and the Upper-Atmosphere, Technical Report (2017)

    Google Scholar 

  5. O. Anicho, P.B. Charlesworth, G.S. Baicher, A. Nagar, Integrating routing schemes and platform autonomy algorithms for UAV ad-hoc & infrastructure based networks, in 28th International Telecommunication Networks and Applications Conference (ITNAC), 28th International Telecommunication Networks and Applications Conference (ITNAC). IEEE, Nov 2018

    Google Scholar 

  6. R. Stengel, Flight Dynamics (Princeton University Press, Princeton, 2004)

    Google Scholar 

  7. V. Hehtke, J. Kiam, A. Schulte, An autonomous mission management system to assist decision making for a HALE operator, in Deutscher Luft-und RaumfahrtKongress (2017)

    Google Scholar 

  8. T.B. Chen, Management of multiple heterogenous unmanned aerial vehicles through capacity transparency, Ph.D. dissertation. Queensland University of Technology (2016)

    Google Scholar 

  9. D.W. Gage, Command and Control of Many-Robot Systems, Unmanned Systems (1992)

    Google Scholar 

  10. A. Howard, M. Mataric, G. Sukhatme, Mobile sensor network deployment using potential fields: a distributed, scalable solution to the area coverage problem, in International Symposium on Distributed Autonomous Robotic Systems, June 2002

    Google Scholar 

  11. A. Gosavi, A Tutorial for Reinforcement Learning (Springer, Berlin, 2017)

    MATH  Google Scholar 

  12. H. Xuan Pham, H. La, D. Feil-Seifer, L. Nguyen, Cooperative and Distributed Reinforcement Learning of Drones for Field Coverage, Mar 2018

    Google Scholar 

  13. L. Busoniu, B. Schtter, R. Babuska, Multiagent reinforcement learning with adaptive state focus, in Proceedings of the 17th Belgium-Netherlands Conference on Artificial Intelligence, BNAIC 2005, ed. by K. Verbeeck, K. Tuyls, A. Nowe, B. Manderick, B. Kuijpers, Oct 2005, pp. 35–42

    Google Scholar 

  14. A. Adepegba, S. Miah, D. Spinello, Multi-agent area coverage control using reinforcement learning, in Proceedings of the Twenty-Ninth International Florida Artificial Intelligence Society Conference (2016)

    Google Scholar 

  15. S.-M. Hung, S. Givigi, A Q-learning approach to flocking with UAVs in a stochastic environment. IEEE Trans. Cybern. 47(1), 186–197 (2017)

    Article  Google Scholar 

  16. H. Nguyen, L. Bui, M. Garratt, H. Abbass, Apprenticeship bootstrapping: inverse reinforcement learning in a multi-skill UAV-UGV coordination task, in Proceedings of the 17th International Conference on Autonomous and Multiagent Systems, ed. by M. Dastani, G. Sukthankar, E. Andre, S. Koenig, July 2018, pp. 2204–2206

    Google Scholar 

  17. R.S. Sutton, G. Andrew, Reinforcement Learning: An Introduction (MIT Press, Barto, 2017)

    MATH  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Ogbonnaya Anicho .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2020 Springer Nature Singapore Pte Ltd.

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Anicho, O., Charlesworth, P.B., Baicher, G.S., Nagar, A.K. (2020). Reinforcement Learning for Multiple HAPS/UAV Coordination: Impact of Exploration–Exploitation Dilemma on Convergence. In: Nagar, A., Deep, K., Bansal, J., Das, K. (eds) Soft Computing for Problem Solving 2019 . Advances in Intelligent Systems and Computing, vol 1138. Springer, Singapore. https://doi.org/10.1007/978-981-15-3290-0_12

Download citation

Publish with us

Policies and ethics