Abstract:
In recent years, unmanned aerial vehicles (UAVs) are considered to be integrated into wireless communication systems because of their tremendous advantages in mobility, c...View moreMetadata
Abstract:
In recent years, unmanned aerial vehicles (UAVs) are considered to be integrated into wireless communication systems because of their tremendous advantages in mobility, cost, maneuverability, etc. In some real UAV-assisted communication scenarios, the dynamics of the environment, such as the roaming of served users, make it hard to obtain an optimal trajectory before the UAV is dispatched. Implanting an intelligent control policy into UAVs for distributed task execution is necessary to complete the task. In this article, a UAV trajectory design problem is investigated for an orthorgonal-frequency-division-multiplexing (OFDM) wireless sensor network, which is dynamic because mobile sensors may randomly roam within a certain range. The UAV is expected to balance task efficiency with the safety constraint with a pretrained onboard control policy. Compared to prior works, this work requires the policy to adapt to randomly generated obstacle maps, and also assumes that the UAV has no prior knowledge of the obstacles before it is dispatched, which brings about challenges to the problem. The motivation comes from adversarial environments without the specific obstacle distribution beforehand, such as a disaster area. The problem is formulated as a constrained Markov decision process (CMDP) model, which incorporates the safety constraint compared to a basic Markov decision process. Due to the assumption of randomized obstacle distribution and lack of prior knowledge, existing algorithms for CMDP can not be applied directly. To tackle this issue, we enhance the reinforcement learning (RL) algorithm with a safety control mechanism to derive our novel safe RL (Safe RL) algorithm, which is based on the framework of the Lagrangian method. Compared to former algorithms about CMDP, our algorithm eliminates the premise that the safety model is known, the agent is able to learn safety judgment from scratch through its interactions with the environment. Simulation results demonstrate ...
Published in: IEEE Internet of Things Journal ( Volume: 11, Issue: 21, 01 November 2024)