Journals & Magazines >IEEE Transactions on Artifici... >Volume: 5 Issue: 1

Offline–Online Actor–Critic

Download PDF
Download References
Request Permissions
Save to
Alerts

Impact Statement:Offline–online RL is a new framework for RL. It can use historical data in offline training phase and improve the policy performance in online training phase. However, th...Show More

Abstract:

Offline–online reinforcement learning (RL) can effectively address the problem of missing data (commonly known as transition) in offline RL. However, due to the effect of...Show More

Metadata

Impact Statement:

Offline–online RL is a new framework for RL. It can use historical data in offline training phase and improve the policy performance in online training phase. However, the existing offline–online RL algorithms cannot handle the

$Q$ -value overestimation caused by distribution shift in offline training phase and the policy performance degradation in the conversion from offline to online learning phases. The algorithm we propose namely O2AC can solve above problems effectively. Experimental results show that our algorithm exceeds the current state-of-the-art offline–online RL algorithms.

Abstract:

Offline–online reinforcement learning (RL) can effectively address the problem of missing data (commonly known as transition) in offline RL. However, due to the effect of distribution shift, the performance of policy may degrade when an agent moves from offline to online training phases. In this article, we first analyze the problems of distribution shift and policy performance degradation in offline–online RL. Then, in order to alleviate these problems, we propose a novel RL algorithm offline–online actor–critic (O2AC) algorithm. In O2AC, a behavior clone constraint term is introduced into the policy objective function to address the distribution shift in offline training phase. In addition, in online training phase, the influence of the behavior clone constraint term is gradually reduced, which alleviates the policy performance degradation. Experiments show that O2AC outperforms existing offline–online RL algorithms.

Published in: IEEE Transactions on Artificial Intelligence ( Volume: 5, Issue: 1, January 2024)

Page(s): 61 - 69

Date of Publication: 28 November 2022

Electronic ISSN: 2691-4581

DOI: 10.1109/TAI.2022.3225251

Funding Agency:

Contents

References is not available for this document.

Offline–Online Actor–Critic

Abstract:

Metadata

Abstract:

Funding Agency:

References

IEEE Account

Purchase Details

Profile Information

Need Help?

Offline–Online Actor–Critic

Alerts

Abstract:

Metadata

Abstract:

Funding Agency:

References

IEEE Account

Purchase Details

Profile Information

Need Help?