Data-Efficient Reinforcement Learning Using Active Exploration Method

Zhao, Dongfang; Liu, Jiafeng; Wu, Rui; Cheng, Dansong; Tang, Xianglong

doi:10.1007/978-3-030-04182-3_24

Dongfang Zhao¹⁶,
Jiafeng Liu¹⁶,
Rui Wu¹⁶,
Dansong Cheng¹⁶ &
…
Xianglong Tang¹⁶

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 11303))

Included in the following conference series:

International Conference on Neural Information Processing

2316 Accesses

Abstract

Reinforcement learning (RL) is an effective method to control dynamic system without prior knowledge. One of the most important and difficult problem in RL is how to improve data efficiency. PILCO is a state-of-art data-efficient framework which uses Gaussian Process (GP) to model dynamic. However, it only focuses on optimizing cumulative rewards, and does not consider the accuracy of dynamic model which is an important factor for controller learning. To further improve the data-efficiency of PILCO, we propose an active exploration version of PILCO (AEPILCO) which utilizes information entropy to describe samples. In policy evaluation stage, we incorporate information entropy criterion into long term sample prediction. With the informative policy evaluation function, our algorithm obtains informative policy parameters in policy improvement stage. Using the policy parameters in real execution will produce informative sample set which is helpful to learn accurate dynamic model. Thus our AEPILCO algorithm improves data efficiency through learning an accurate dynamic model by actively selecting informative samples with information-entropy criterion. We demonstrate the validity and efficiency of the proposed algorithm for several challenging controller problems involving cart-pole, pendubot, double-pendulum and cart-double-pendulum. The proposed AEPILCO algorithm can learn controller using less trials which is verified by both theoretical analysis and experimental results.

Supported by National Science Foundation of China (Grant NO. 61672190, No. 61370162).

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 79.99; Price excludes VAT (USA)

Softcover Book: USD 99.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Actively learning dynamical systems using Bayesian neural networks

Article 27 October 2023

Generalized exploration in policy search

Article 13 July 2017

A Robust Exploration Strategy in Reinforcement Learning Based on Temporal Difference Error

References

Ahmed, N.A., Gokhale, D.: Entropy expressions and their estimators for multi-variate distributions. IEEE Trans. Inf. Theory 35(3), 688–692 (1989)
Article Google Scholar
Deisenroth, M., Rasmussen, C.E.: PILCO: a model-based and data-efficient approach to policy search. In: Proceedings of the 28th International Conference on Machine Learning, ICML2011, pp. 465–472. ACM, Bellevue (2011)
Google Scholar
Deisenroth, M.P., Fox, D., Rasmussen, C.E.: Gaussian processes for data-efficient learning in robotics and control. IEEE Trans. Pattern Anal. Mach. Intell. 37(2), 408–423 (2015)
Article Google Scholar
Fabisch, A., Metzen, J.H.: Active contextual policy search. J. Mach. Learn. Res. 15(1), 3371–3399 (2014)
MathSciNet MATH Google Scholar
Lai, T.L., Robbins, H.: Asymptotically efficient adaptive allocation rules. Adv. Appl. Math. 6(1), 4–22 (1985)
Article MathSciNet Google Scholar
Levine, S., Finn, C., Darrell, T., Abbeel, P.: End-to-end training of deep visuomotor policies. J. Mach. Learn. Res. 17(1), 1334–1373 (2016)
MathSciNet MATH Google Scholar
Ng, A.Y., et al.: Autonomous inverted helicopter flight via reinforcement learning. In: Ang, M.H., Khatib, O. (eds.) Experimental Robotics IX. STAR, vol. 21, pp. 363–372. Springer, Heidelberg (2006). https://doi.org/10.1007/11552246_35
Chapter Google Scholar
Pan, Y., Theodorou, E., Kontitsis, M.: Sample efficient path integral control under uncertainty. In: Advances in Neural Information Processing Systems, pp. 2314–2322 (2016)
Google Scholar
Silver, D., Sutton, R.S., Mller, M.: Sample-based learning and search with permanent and transient memories. In: International Conference on Machine Learning, ICML2008, pp. 968–975. ACM, Helsinki (2008)
Google Scholar
Sutton, R.S.: Dyna, an integrated architecture for learning, planning, and reacting. ACM Sigart Bull. 2(4), 160–163 (1991)
Article Google Scholar
Williams, C.K.: Gaussian Processes for Machine Learning. The MIT Press, pp. 7–30. Massachusetts Institute of Technology (2006)
Google Scholar

Download references

Author information

Authors and Affiliations

School of Computer Science and Technology, Harbin Institute of Technology, Harbin, China
Dongfang Zhao, Jiafeng Liu, Rui Wu, Dansong Cheng & Xianglong Tang

Authors

Dongfang Zhao
View author publications
You can also search for this author in PubMed Google Scholar
Jiafeng Liu
View author publications
You can also search for this author in PubMed Google Scholar
Rui Wu
View author publications
You can also search for this author in PubMed Google Scholar
Dansong Cheng
View author publications
You can also search for this author in PubMed Google Scholar
Xianglong Tang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Rui Wu .

Editor information

Editors and Affiliations

The Chinese Academy of Sciences, Beijing, China
Long Cheng
City University of Hong Kong, Kowloon, Hong Kong
Andrew Chi Sing Leung
Kobe University, Kobe, Japan
Seiichi Ozawa

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Zhao, D., Liu, J., Wu, R., Cheng, D., Tang, X. (2018). Data-Efficient Reinforcement Learning Using Active Exploration Method. In: Cheng, L., Leung, A., Ozawa, S. (eds) Neural Information Processing. ICONIP 2018. Lecture Notes in Computer Science(), vol 11303. Springer, Cham. https://doi.org/10.1007/978-3-030-04182-3_24

Download citation

DOI: https://doi.org/10.1007/978-3-030-04182-3_24
Published: 18 November 2018
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-04181-6
Online ISBN: 978-3-030-04182-3
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics