Conferences >2013 IEEE Symposium on Adapti...

Bias-corrected Q-learning to control max-operator bias in Q-learning

Download PDF
Download References
Request Permissions
Save to
Alerts

Abstract:

We identify a class of stochastic control problems with highly random rewards and high discount factor which induce high levels of statistical error in the estimated acti...Show More

Metadata

Abstract:

We identify a class of stochastic control problems with highly random rewards and high discount factor which induce high levels of statistical error in the estimated action-value function. This produces significant levels of max-operator bias in Q-learning, which can induce the algorithm to diverge for millions of iterations. We present a bias-corrected Q-learning algorithm with asymptotically unbiased resistance against the max-operator bias, and show that the algorithm asymptotically converges to the optimal policy, as Q-learning does. We show experimentally that bias-corrected Q-learning performs well in a domain with highly random rewards where Q-learning and other related algorithms suffer from the max-operator bias.

Published in: 2013 IEEE Symposium on Adaptive Dynamic Programming and Reinforcement Learning (ADPRL)

Date of Conference: 16-19 April 2013

Date Added to IEEE Xplore: 30 September 2013

Electronic ISBN:978-1-4673-5925-2

ISSN Information:

DOI: 10.1109/ADPRL.2013.6614994

Conference Location: Singapore

Contents

References is not available for this document.

Bias-corrected Q-learning to control max-operator bias in Q-learning

Abstract:

Metadata

Abstract:

ISSN Information:

References

IEEE Account

Purchase Details

Profile Information

Need Help?

Bias-corrected Q-learning to control max-operator bias in Q-learning

Alerts

Abstract:

Metadata

Abstract:

ISSN Information:

References

IEEE Account

Purchase Details

Profile Information

Need Help?