Loading [MathJax]/extensions/MathMenu.js
Offline Reinforcement Learning With Reverse Diffusion Guide Policy | IEEE Journals & Magazine | IEEE Xplore

Offline Reinforcement Learning With Reverse Diffusion Guide Policy


Abstract:

Offline reinforcement learning (ORL) learns policy from a static dataset without further interaction with the environment, which holds significant promise in industrial c...Show More

Abstract:

Offline reinforcement learning (ORL) learns policy from a static dataset without further interaction with the environment, which holds significant promise in industrial control systems characterized by inefficient online interaction and inherent safety concerns. To mitigate the extrapolation error induced by distribution shift, it is essential for ORL to constrain the learned policy to perform actions within the support set of behavior policy. Existing methods fail to represent the behavior policy properly and typically tend to prefer actions with higher densities within the support set, resulting suboptimal learned policy. This article proposes a novel ORL method which represents the behavior policy with a diffusion model and trains a reverse diffusion guide policy to instruct the pretrained diffusion model in generating actions. The diffusion model exhibits stable training and strong distribution expression ability, and the reverse diffusion guide policy can effectively explore the entire support set to help generate the optimal action. When facing low-quality datasets, a trainable perturbation can be further added to the generated action to help the learned policy escape the performance limitation of behavior policy. Experimental results on D4RL Gym-MuJoCo benchmark demonstrate the effectiveness of the proposed method, surpassing several state-of-the-art ORL methods.
Published in: IEEE Transactions on Industrial Informatics ( Volume: 20, Issue: 10, October 2024)
Page(s): 11785 - 11793
Date of Publication: 27 June 2024

ISSN Information:

Funding Agency:


Contact IEEE to Subscribe

References

References is not available for this document.