Abstract:
In this paper, a novel generalized value iteration (VI) technique is presented which is a reinforcement learning (RL) scheme for solving online the continuous-time (CT) d...Show MoreMetadata
Abstract:
In this paper, a novel generalized value iteration (VI) technique is presented which is a reinforcement learning (RL) scheme for solving online the continuous-time (CT) discounted linear quadratic regulation (LQR) problems without exactly knowing the system matrix A. In the proposed method, a discounted value function is considered, which is a general setting in RL frameworks, but not fully considered in RL for CT dynamical systems. Moreover, a stepwise-varying learning rate is introduced for the fast and safe convergence. In relation to this learning rate, we also discuss the locations of the poles of the closed-loop system and monotone convergence to the optimal solution. The results from these discussions give the conditions on the stability and monotone convergence of the existing VI methods.
Published in: 49th IEEE Conference on Decision and Control (CDC)
Date of Conference: 15-17 December 2010
Date Added to IEEE Xplore: 22 February 2011
ISBN Information: