Abstract:
Policy gradient algorithm is often used to deal with the continuous control problems. But as a model-free algorithm, it suffers from the low data efficiency and long lear...Show MoreMetadata
Abstract:
Policy gradient algorithm is often used to deal with the continuous control problems. But as a model-free algorithm, it suffers from the low data efficiency and long learning phase. In this paper, a policy gradient with Gaussian process modelling (PGGPM) algorithm is proposed to accelerate learning process. The system model is approximated by Gaussian process in an incremental way, which is used to explore state action space virtually by generating imaginary samples. Both the real and imaginary samples are used to train the actor and critic networks. Finally, we apply our algorithm to two experiments to verify that Gaussian process can accurately fit system model and the supplementary imaginary samples can speed up the learning phase.
Date of Conference: 14-19 May 2017
Date Added to IEEE Xplore: 03 July 2017
ISBN Information:
Electronic ISSN: 2161-4407