Abstract:
The goal of this study is to minimize the average delay under the average energy consumption constraint in a single-queue and single-server wireless communication system ...Show MoreMetadata
Abstract:
The goal of this study is to minimize the average delay under the average energy consumption constraint in a single-queue and single-server wireless communication system with block fading channels. To this end, we formulate this problem as an infinite-horizon-constrained Markov decision process (CMDP). In our CMDP, we jointly consider the queue length and channel condition as the state. We apply the Lagrange multiplier method to transform the constrained optimization problem into an unconstrained optimization problem. Then, we prove that an optimal scheduling strategy is nondecreasing with respect to queue length and channel state. To obtain an optimal scheduling policy, an efficient reinforcement learning algorithm, the Structural-Optimistic Q -learning algorithm (SOQ), is proposed, which exploits the nondecreasing property of optimal policies by using policy projection. Finally, we analyze how to control the average energy consumption to satisfy a given energy consumption constraint. The simulation results show that the performance of the SOQ surpasses that of the traditional Q -learning algorithm in terms of the average cost during the learning phase.
Published in: IEEE Internet of Things Journal ( Volume: 11, Issue: 3, 01 February 2024)