Journals & Magazines >IEEE Transactions on Signal P... >Volume: 70

Finite-Time Error Bounds of Biased Stochastic Approximation With Application to TD-Learning

Download PDF
Download References
Request Permissions
Save to
Alerts

Abstract:

Motivated by the recent success of reinforcement learning algorithms, this paper studies a class of biased stochastic approximation (SA) procedures under a mild “ergodici...Show More

Metadata

Abstract:

Motivated by the recent success of reinforcement learning algorithms, this paper studies a class of biased stochastic approximation (SA) procedures under a mild “ergodicity-like” assumption on the random noise sequence. Building on a multistep Lyapunov function that looks ahead to several future updates to accommodate the stochastic perturbations (thus gaining control over the bias), we prove a general result on the convergence of the SA iterates, and use it to derive non-asymptotic bounds on the mean-square error in the case of constant stepsizes. This novel viewpoint renders finite-time analysis of biased SA algorithms under a family of stochastic perturbations possible. For direct comparison with prior work, we demonstrate these bounds by applying them to TD-learning with linear function approximation, under the Markov chain observation model. The resultant finite-time error bound for TD-learning is the first of its kind, in the sense that it holds i) for the unmodified versions (i.e., without any modification to the updates) using even nonlinear approximators; as well as for Markov chains ii) under sublinear mixing conditions and iii) starting from any initial distribution, at least one of which has to be violated for existing results to be applicable.

Published in: IEEE Transactions on Signal Processing ( Volume: 70)

Page(s): 950 - 962

Date of Publication: 17 November 2021

ISSN Information:

DOI: 10.1109/TSP.2021.3128723

Funding Agency:

Contents

References is not available for this document.

Finite-Time Error Bounds of Biased Stochastic Approximation With Application to TD-Learning

Abstract:

Metadata

Abstract:

ISSN Information:

Funding Agency:

References

IEEE Account

Purchase Details

Profile Information

Need Help?

Finite-Time Error Bounds of Biased Stochastic Approximation With Application to TD-Learning

Alerts

Abstract:

Metadata

Abstract:

ISSN Information:

Funding Agency:

References

IEEE Account

Purchase Details

Profile Information

Need Help?