Abstract:
Intense spatiotemporal coupling states frequently appear in robotic tasks, and this coupling enriches the information encapsulated in each state. Taking advantage of hist...Show MoreMetadata
Abstract:
Intense spatiotemporal coupling states frequently appear in robotic tasks, and this coupling enriches the information encapsulated in each state. Taking advantage of historical observations can provide more information about the robot, especially for partially observable Markov decision processes. How to deal with this coupling remains a challenging issue in robotic reinforcement learning (RL), and we allege that the imbalanced processing capability of spatiotemporal details is one of the bottlenecks of the vanilla transformer model in learning robotic policies. To address this problem, we novelly propose an efficient spatiotemporal transformer structure. To our knowledge, this work is the first to improve the transformer with spatiotemporal information in RL. In each attention block, we sequentially execute attention computation twice: the first to process the temporal sequence of the input and the latter to manage the spatial state. This input reconstruction enables sufficient information extraction to promote data efficiency. We also add correlation encoding into the query and key computation of multi-head attention, providing the operability of associating states between and within time steps. We evaluate the proposed approach on several robot tasks, and it outperforms state-of-the-art transformer-based online RL.
Published in: IEEE Robotics and Automation Letters ( Volume: 7, Issue: 3, July 2022)