Abstract:
Graph search over states and actions is a valuable tool for robotic planning and navigation. However, the required computation is sensitive to the size of the state and a...Show MoreMetadata
Abstract:
Graph search over states and actions is a valuable tool for robotic planning and navigation. However, the required computation is sensitive to the size of the state and action spaces, a fact which is further exacerbated in multi-agent planning by the number of agents and the presence of sparse reward signals dependent on the cooperation of agents. To tackle these problems, we introduce an algorithm that is pre-trained in a centralized fashion but implemented on robots in a distributed way at runtime. The centralized portion uses imitation learning to iteratively construct policies that help guide an individual agent`s own runtime search as well as predict other agents' future actions by exploiting previously discovered joint actions. Our algorithm includes a novel method of tree search based on a mixture of the individual and joint action space, which can be interpreted as a cascading effect where agents are biased by exploration of new actions, exploitation of previously profitable ones, and recommendation provided by deep neural nets. Simulations show the efficacy of the proposed method in cooperative scenarios with sparse rewards.
Published in: IEEE Robotics and Automation Letters ( Volume: 5, Issue: 2, April 2020)