Journals & Magazines >IEEE Transactions on Artifici... >Volume: 5 Issue: 7

Variance-Reduced Deep Actor–Critic With an Optimally Subsampled Actor Recursion

Download PDF
Download References
Request Permissions
Save to
Alerts

Impact Statement:RL algorithms integrated with deep learning architectures (called DRL) have achieved immense success in a wide range of practical applications such as robotics, game theo...Show More

Abstract:

Reinforcement learning (RL) algorithms combined with deep learning architectures have achieved tremendous success in many practical applications. However, the policies ob...Show More

Metadata

Impact Statement:

RL algorithms integrated with deep learning architectures (called DRL) have achieved immense success in a wide range of practical applications such as robotics, game theory, and natural language processing. Deep AC is one of the most popular DRL algorithms that combines the benefits of both policy-based and value-based RL methods. However, it is observed that deep AC algorithms are not free from stability issues caused by high variance, which makes them less useful in critical applications such as finance. In this work, we propose an “optimal L-step AC with general approximation architecture (optimal

$L$ -AC-GAA)” algorithm, which while giving the optimal policy, also provides the minimum estimator variance, a result that we establish both theoretically and experimentally. Such a result had been lacking in all prior works to the best of our knowledge.

Abstract:

Reinforcement learning (RL) algorithms combined with deep learning architectures have achieved tremendous success in many practical applications. However, the policies obtained by many deep reinforcement learning (DRL) algorithms are seen to suffer from high variance making them less useful in safety-critical applications. In general, it is desirable to have algorithms that give a low iterate-variance while providing a high long-term reward. In this work, we consider the actor–critic (AC) paradigm, where the critic is responsible for evaluating the policy while the feedback from the critic is used by the actor for updating the policy. The updates of both the critic and the actor in the standard AC procedure are run concurrently until convergence. It has been previously observed that updating the actor once after every

$L>1$ steps of the critic reduces the iterate variance. In this article, we address the question of what optimal

$L$ -value to use in the recursions and propose a data-dri...

Published in: IEEE Transactions on Artificial Intelligence ( Volume: 5, Issue: 7, July 2024)

Page(s): 3607 - 3623

Date of Publication: 20 March 2024

Electronic ISSN: 2691-4581

DOI: 10.1109/TAI.2024.3379109

Funding Agency:

Contents

References is not available for this document.

Variance-Reduced Deep Actor–Critic With an Optimally Subsampled Actor Recursion

Abstract:

Metadata

Abstract:

Funding Agency:

References

IEEE Account

Purchase Details

Profile Information

Need Help?

Variance-Reduced Deep Actor–Critic With an Optimally Subsampled Actor Recursion

Alerts

Abstract:

Metadata

Abstract:

Funding Agency:

References

IEEE Account

Purchase Details

Profile Information

Need Help?