Abstract:
Reinforcement learning belongs to a class of artificial intelligence algorithms which can be used to design adaptive optimal controllers learned online. These methods hav...Show MoreMetadata
Abstract:
Reinforcement learning belongs to a class of artificial intelligence algorithms which can be used to design adaptive optimal controllers learned online. These methods have mostly been based on state feedback, which limits their application in practical scenarios. In this paper, we present an output feedback Q-learning algorithm to solve the discrete-time linear quadratic regulator (LQR) problem. An output feedback Q-learning scheme is proposed that learns the optimal controller online without requiring any knowledge of system dynamics, making it completely model-free. Both policy iteration (PI) and value iteration (VI) algorithms are developed, where the later does not require an initially stabilizing policy. The convergence of these algorithms has been shown. The proposed method does not require a discounting factor which is typically introduced in the cost function to trade-off between the excitation noise bias and system stability. The method is therefore exact and converges to the actual LQR control solution obtained by solving the Riccati equation. Simulation results have been used to show the effectiveness of the scheme.
Date of Conference: 12-15 December 2017
Date Added to IEEE Xplore: 22 January 2018
ISBN Information: