Abstract:
Reinforcement Learning (RL) techniques use a reward function to correct a learning agent to solve sequential decision making problems through interactions with a dynamic ...Show MoreMetadata
Abstract:
Reinforcement Learning (RL) techniques use a reward function to correct a learning agent to solve sequential decision making problems through interactions with a dynamic environment, but it is hard to design the reward function in complex problems. Its design difficulties promote the Inverse Reinforcement Learning (IRL) by deriving from an expert's demonstrations. It is assumed that the demonstrations are meaningful and reproducible. However, demonstrations of failure are not entirely useless. In this paper, an unified method of combining oppositive demonstrations is proposed to teach the robot by showing inappropriate demonstrations or trying to exhibit unrelated behaviors, so as to the agent can deliberately avoid such bad situations and speed up the learning. According to the result of simulations, it is obvious that the performance of algorithm combined with demonstrations of failure is better than that has only good demonstrations. It is not only convenient to operate but also save a lot of learning time.
Date of Conference: 14-17 March 2016
Date Added to IEEE Xplore: 26 May 2016
ISBN Information: