Authors:
Chenxing Li
1
;
2
;
Yinlong Liu
3
;
Zhenshan Bing
3
;
Fabian Schreier
1
;
2
;
Jan Seyler
2
and
Shahram Eivazi
1
;
2
Affiliations:
1
University of Tübingen, Tübingen, Germany
;
2
Festo, Esslingen, Germany
;
3
Technical University of Munich, Munich, Germany
Keyword(s):
Q-function Targets Via Optimization, Data Efficiency, Hindsight Goals Techniques, Offline Data Collection, Dynamic Buffer.
Abstract:
In this paper, we examine three extensions to the Q-function Targets via Optimization (QT-Opt) algorithm and empirically studies their effects on training time over complex robotic tasks. The vanilla QT-Opt algorithm requires lots of offline data (several months with multiple robots) for training which is hard to collect in practice. To bridge the gap between basic reinforcement learning research and real world robotic applications, first we propose to use hindsight goals techniques (Hindsight Experience Replay, Hindsight Goal Generation) and Energy-Based Prioritization (EBP) to increase data efficiency in reinforcement learning. Then, an efficient offline data collection method using PD control method and dynamic buffer are proposed. Our experiments show that both data collection and training the agent for a robotic grasping task takes about one day only, besides, the learning performance maintains high level (80% successful rate). This work serves as a step towards accelerating the
training of reinforcement learning for complex real world robotics tasks.
(More)