Abstract:
The flocking and navigation control of large-scale Unmanned Aerial Vehicle (UAV) swarms have received a lot of research interest due to the wide applications of UAVs in m...Show MoreMetadata
Abstract:
The flocking and navigation control of large-scale Unmanned Aerial Vehicle (UAV) swarms have received a lot of research interest due to the wide applications of UAVs in many fields. Compared to traditional non-learning-based flocking and navigation control methods, reinforcement learning-based methods have advantages in model-free, flexibility, and adaptability. In this paper, we formulate the flocking and navigation control of the UAV swarm as a Markov Decision Process (MDP) and use multi-agent reinforcement learning methods to solve the problem. There are two significant challenges introduced by reinforcement learning: the scalability issue and the partial observations of each UAV. We adopt the independent learning and parameter sharing scheme to tackle the scalability issue, which extends the single-agent reinforcement learning algorithms to the multi-agent scenario. For the partial observations, we propose an oracle-guided two-stage training and execution scheme, which utilizes the flock center during the training phase but avoids the dependence on the flock center during the execution phase. We design the oracle-guided observations and rewards and build a highly efficient simulation environment to conduct experiments. Simulation results show that the policy trained with our method performs well with up to thirty-two UAVs and outperforms the policy trained with local observations.
Published in: IEEE Transactions on Vehicular Technology ( Volume: 71, Issue: 10, October 2022)