On the Effectiveness of Regularization Methods for Soft Actor-Critic in Discrete-Action Domains | IEEE Journals & Magazine | IEEE Xplore

On the Effectiveness of Regularization Methods for Soft Actor-Critic in Discrete-Action Domains


Abstract:

Soft actor-critic (SAC) is a reinforcement learning algorithm that employs the maximum entropy framework to train a stochastic policy. This work examines a specific failu...Show More

Abstract:

Soft actor-critic (SAC) is a reinforcement learning algorithm that employs the maximum entropy framework to train a stochastic policy. This work examines a specific failure case of SAC where the stochastic policy is trained to maximize the expected entropy from a sparse reward environment. We demonstrate that the over-exploration of SAC can make the entropy temperature collapse, followed by unstable updates to the actor. Based on our analyses, we introduce Reg-SAC, an improved version of SAC, to mitigate the detrimental effects of the entropy temperature on the learning stability of the stochastic policy. Reg-SAC incorporates a clipping value to prevent the entropy temperature collapse and regularizes the gradient updates of the policy via Kullback-Leibler divergence. Through experiments on discrete benchmarks, our proposed Reg-SAC outperforms the standard SAC in spare-reward grid world environments while it is able to maintain competitive performance in the dense-reward Atari benchmark. The results highlight that our regularized version makes the stochastic policy of SAC more stable in discrete-action domains.
Published in: IEEE Transactions on Systems, Man, and Cybernetics: Systems ( Volume: 55, Issue: 2, February 2025)
Page(s): 1425 - 1438
Date of Publication: 04 December 2024

ISSN Information:


Contact IEEE to Subscribe

References

References is not available for this document.