Abstract:
Many real-world tasks on practical control systems involve the learning and decision-making of multiple agents, under limited communications and observations. In this pap...Show MoreMetadata
Abstract:
Many real-world tasks on practical control systems involve the learning and decision-making of multiple agents, under limited communications and observations. In this paper, we study the problem of networked multi-agent reinforcement learning (MARL), where multiple agents perform reinforcement learning in a common environment, and are able to exchange information via a possibly time-varying communication network. In particular, we focus on a collaborative MARL setting where each agent has individual reward functions, and the objective of all the agents is to maximize the network-wide averaged long-term return. To this end, we propose a fully decentralized actor-critic algorithm that only relies on neighbor-to-neighbor communications among agents. To promote the use of the algorithm on practical control systems, we focus on the setting with continuous state and action spaces, and adopt the newly proposed expected policy gradient to reduce the variance of the gradient estimate. We provide convergence guarantees for the algorithm when linear function approximation is employed, and corroborate our theoretical results via simulations.
Published in: 2018 IEEE Conference on Decision and Control (CDC)
Date of Conference: 17-19 December 2018
Date Added to IEEE Xplore: 20 January 2019
ISBN Information: