Abstract:
We propose an adaptive state aggregation scheme to be used along with temporal-difference reinforcement learning and value function approximation algorithms. The resultin...Show MoreMetadata
Abstract:
We propose an adaptive state aggregation scheme to be used along with temporal-difference reinforcement learning and value function approximation algorithms. The resulting algorithm constitutes a two-timescale stochastic approximation algorithm with: (a) a fast component that executes a temporal-difference reinforcement learning algorithm, and (b) a slow component, based on online vector quantization, that adaptively partitions the state space of a Markov Decision Process according to an appropriately defined dissimilarity measure. We study the convergence of the proposed methodology using Bregman Divergences as dissimilarity measures that can increase the efficiency and reduce the computational complexity of vector quantization algorithms. Finally, we quantify its performance on the Cart-pole (inverted pendulum) optimal control problem using Q-learning with adaptive state aggregation based on the Self-Organizing Map (SOM) algorithm.
Published in: 2021 American Control Conference (ACC)
Date of Conference: 25-28 May 2021
Date Added to IEEE Xplore: 28 July 2021
ISBN Information: