Abstract:
We analyze the behaviour of a generalised TD(λ) algorithm with constant step size. We first consider linear estimation of the optimal cost. By using realisation-wise aver...Show MoreMetadata
Abstract:
We analyze the behaviour of a generalised TD(λ) algorithm with constant step size. We first consider linear estimation of the optimal cost. By using realisation-wise averaging analysis we prove for the first time, boundedness under a positive real condition. We also provide for the first time, a detailed analysis of second order fluctuations of a TD(λ) type algorithm. We then consider nonlinear estimation of the optimal cost.
Published in: 49th IEEE Conference on Decision and Control (CDC)
Date of Conference: 15-17 December 2010
Date Added to IEEE Xplore: 22 February 2011
ISBN Information: