Abstract:
This letter optimizes resource blocks allocated for channel estimation and data transmission in multiple-input and single-output (MISO) ultra-reliable low-latency communi...Show MoreMetadata
Abstract:
This letter optimizes resource blocks allocated for channel estimation and data transmission in multiple-input and single-output (MISO) ultra-reliable low-latency communication (URLLC) systems. The goal is to improve the resource utilization efficiency subject to a reliability constraint. Considering that wireless channels are correlated in the temporal domain, and the channel estimation is not error-free, the problem is formulated as a partial observation Markov decision process (POMDP), which is a sequential decision-making problem with partial observations. To solve this problem, we develop a constrained deep reinforcement learning (DRL) algorithm, namely Cascaded-Action Twin Delayed Deep Deterministic policy (CA-TD3). We train the policy by using a primal domain method and compare it with a primal-dual method and an existing benchmark. Two channel models are considered for evaluation: the first-order autoregressive correlated channel model and the clustered delay line (CDL) channel model. The simulation results show that the proposed primal CA-TD3 method converges faster than the primal-dual method and achieves over 30% improvement in resource utilization efficiency compared to the benchmark.
Published in: IEEE Wireless Communications Letters ( Volume: 12, Issue: 10, October 2023)