Abstract:
In this letter, we study the problem of dynamic spectrum anti-jamming access with exponentially growing action space. Traditional deep reinforcement learning methods, whi...Show MoreMetadata
Abstract:
In this letter, we study the problem of dynamic spectrum anti-jamming access with exponentially growing action space. Traditional deep reinforcement learning methods, which were restricted to scenarios with relatively small action space, worked poorly with large action space due to low exploration efficiency. To address this challenge, we propose an efficient algorithm called Proximal Policy Optimization with Action Branching and Dynamic Action Masking (PPO-ABM). To achieve linear growth of output nodes in neural network, the joint action space is decoupled using action branching architecture. A dynamic action masking based sequential decision scheme is further proposed to eliminate invalid actions and accelerate convergence. Simulation results show that PPO-ABM converges rapidly and achieves almost the optimal performance regardless of exponentially growing action space. Performance of PPO-ABM is 53.14% higher than that of baseline when there are 11^{5} actions.
Published in: IEEE Wireless Communications Letters ( Volume: 13, Issue: 10, October 2024)