Abstract
Reinforcement learning (RL) is an effective method to control dynamic system without prior knowledge. One of the most important and difficult problem in RL is how to improve data efficiency. PILCO is a state-of-art data-efficient framework which uses Gaussian Process (GP) to model dynamic. However, it only focuses on optimizing cumulative rewards, and does not consider the accuracy of dynamic model which is an important factor for controller learning. To further improve the data-efficiency of PILCO, we propose an active exploration version of PILCO (AEPILCO) which utilizes information entropy to describe samples. In policy evaluation stage, we incorporate information entropy criterion into long term sample prediction. With the informative policy evaluation function, our algorithm obtains informative policy parameters in policy improvement stage. Using the policy parameters in real execution will produce informative sample set which is helpful to learn accurate dynamic model. Thus our AEPILCO algorithm improves data efficiency through learning an accurate dynamic model by actively selecting informative samples with information-entropy criterion. We demonstrate the validity and efficiency of the proposed algorithm for several challenging controller problems involving cart-pole, pendubot, double-pendulum and cart-double-pendulum. The proposed AEPILCO algorithm can learn controller using less trials which is verified by both theoretical analysis and experimental results.
Supported by National Science Foundation of China (Grant NO. 61672190, No. 61370162).
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Ahmed, N.A., Gokhale, D.: Entropy expressions and their estimators for multi-variate distributions. IEEE Trans. Inf. Theory 35(3), 688–692 (1989)
Deisenroth, M., Rasmussen, C.E.: PILCO: a model-based and data-efficient approach to policy search. In: Proceedings of the 28th International Conference on Machine Learning, ICML2011, pp. 465–472. ACM, Bellevue (2011)
Deisenroth, M.P., Fox, D., Rasmussen, C.E.: Gaussian processes for data-efficient learning in robotics and control. IEEE Trans. Pattern Anal. Mach. Intell. 37(2), 408–423 (2015)
Fabisch, A., Metzen, J.H.: Active contextual policy search. J. Mach. Learn. Res. 15(1), 3371–3399 (2014)
Lai, T.L., Robbins, H.: Asymptotically efficient adaptive allocation rules. Adv. Appl. Math. 6(1), 4–22 (1985)
Levine, S., Finn, C., Darrell, T., Abbeel, P.: End-to-end training of deep visuomotor policies. J. Mach. Learn. Res. 17(1), 1334–1373 (2016)
Ng, A.Y., et al.: Autonomous inverted helicopter flight via reinforcement learning. In: Ang, M.H., Khatib, O. (eds.) Experimental Robotics IX. STAR, vol. 21, pp. 363–372. Springer, Heidelberg (2006). https://doi.org/10.1007/11552246_35
Pan, Y., Theodorou, E., Kontitsis, M.: Sample efficient path integral control under uncertainty. In: Advances in Neural Information Processing Systems, pp. 2314–2322 (2016)
Silver, D., Sutton, R.S., Mller, M.: Sample-based learning and search with permanent and transient memories. In: International Conference on Machine Learning, ICML2008, pp. 968–975. ACM, Helsinki (2008)
Sutton, R.S.: Dyna, an integrated architecture for learning, planning, and reacting. ACM Sigart Bull. 2(4), 160–163 (1991)
Williams, C.K.: Gaussian Processes for Machine Learning. The MIT Press, pp. 7–30. Massachusetts Institute of Technology (2006)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2018 Springer Nature Switzerland AG
About this paper
Cite this paper
Zhao, D., Liu, J., Wu, R., Cheng, D., Tang, X. (2018). Data-Efficient Reinforcement Learning Using Active Exploration Method. In: Cheng, L., Leung, A., Ozawa, S. (eds) Neural Information Processing. ICONIP 2018. Lecture Notes in Computer Science(), vol 11303. Springer, Cham. https://doi.org/10.1007/978-3-030-04182-3_24
Download citation
DOI: https://doi.org/10.1007/978-3-030-04182-3_24
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-04181-6
Online ISBN: 978-3-030-04182-3
eBook Packages: Computer ScienceComputer Science (R0)