Implicit Posterior Sampling Reinforcement Learning for Continuous Control

Wang, Shaochen; Li, Bin

doi:10.1007/978-3-030-63833-7_38

Implicit Posterior Sampling Reinforcement Learning for Continuous Control

Shaochen Wang¹⁴ &
Bin Li¹⁴

Conference paper
First Online: 20 November 2020

2398 Accesses
4 Citations

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 12533))

Abstract

Value function approximation has achieved notable success in reinforcement learning. Many popular algorithms (e.g. Deep Q Network) maintain a point estimation of the parameters in the value network or policy network. However, the frequentist perspective is prone to overfitting and lacks uncertainty representation. In this paper, we perform Bayesian analysis on the value function. Following the principle “optimism in the face of uncertainty”, we conduct a posterior sampling of the value or policy network which implicitly captures the posterior distribution via a Bayesian hypernetwork. Experimental results show that the implicit posterior distribution for modeling the structural dependencies between parameters can better balance exploration and exploitation, and it is competitive to state-of-the-art methods on MuJoCo continuous benchmark.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

Bellemare, M.G., Dabney, W., Munos, R.: A distributional perspective on reinforcement learning. arXiv preprint arXiv:1707.06887 (2017)
Bishop, C.M.: Pattern Recognition and Machine Learning (2006)
Google Scholar
Blundell, C., Cornebise, J., Kavukcuoglu, K., Wierstra, D.: Weight uncertainty in neural networks. arXiv, Machine Learning (2015)
Google Scholar
Brockman, G., et al.: Openai gym. arXiv preprint arXiv:1606.01540 (2016)
Cao, Z., Lin, C.T.: Reinforcement learning from hierarchical critics. arXiv preprint arXiv:1902.03079 (2019)
Cao, Z., Wong, K., Lin, C.T.: Human preference scaling with demonstrations for deep reinforcement learning. arXiv preprint arXiv:2007.12904 (2020)
Fujimoto, S., Hoof, H., Meger, D.: Addressing function approximation error in actor-critic methods. In: International Conference on Machine Learning, pp. 1582–1591 (2018)
Google Scholar
Goodfellow, I., et al.: Generative adversarial nets. In: Advances in Neural Information Processing Systems 27, pp. 2672–2680 (2014)
Google Scholar
Ha, D., Dai, A.M., Le, Q.V.: Hypernetworks. In: International Conference on Learning Representations 2017, ICLR 2017 (2017)
Google Scholar
Haarnoja, T., Zhou, A., Abbeel, P., Levine, S.: arXiv preprint arXiv:1801.01290 (2018)
Jiang, B.: Approximate Bayesian computation with Kullback-Leibler divergence as data discrepancy. In: International Conference on Artificial Intelligence and Statistics, pp. 1711–1721 (2018)
Google Scholar
Krueger, D., Huang, C.W., Islam, R., Turner, R., Lacoste, A., Courville, A.: Bayesian hypernetworks. arXiv, Machine Learning (2018)
Google Scholar
Lillicrap, T.P., et al.: Continuous control with deep reinforcement learning. arXiv preprint arXiv:1509.02971 (2015)
Lipton, Z., Li, X., Gao, J., Li, L., Ahmed, F., Deng, L.: BBQ-networks: efficient exploration in deep reinforcement learning for task-oriented dialogue systems. In: AAAI Conference on Artificial Intelligence, AAAI 2018, pp. 5237–5244 (2018)
Google Scholar
Mnih, V., et al.: Human-level control through deep reinforcement learning. Nature 518(7540), 529–533 (2015)
Article Google Scholar
Moerland, T., Broekens, D., Jonker, C.: Efficient exploration with double uncertain value networks. arXiv preprint arXiv:1711.10789 (2017)
Nachum, O., Norouzi, M., Xu, K., Schuurmans, D.: Trust-PCL: an off-policy trust region method for continuous control. In: International Conference on Learning Representations 2018, ICLR 2018 (2018)
Google Scholar
Osband, I., Blundell, C., Pritzel, A., Roy, B.V.: Deep exploration via bootstrapped DQN. In: Proceedings of the 30th International Conference on Neural Information Processing Systems, NIPS 2016, pp. 4033–4041 (2016)
Google Scholar
Pawlowski, N., Rajchl, M., Glocker, B.: Implicit weight uncertainty in neural networks. arXiv preprint arXiv:1711.01297 (2017)
Plappert, M., et al.: Parameter space noise for exploration. In: International Conference on Learning Representations 2018, ICLR 2018 (2018)
Google Scholar
Silver, D., et al.: Mastering the game of go without human knowledge. Nature 550(7676), 354–359 (2017)
Article Google Scholar
Todorov, E., Erez, T., Tassa, Y.: Mujoco: a physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033 (2012)
Google Scholar

Download references

Acknowledgements

The work is partially supported by the National Natural Science Foundation of China under grand No. U19B2044 and No. 61836011.

Author information

Authors and Affiliations

University of Science and Technology of China, Hefei, China
Shaochen Wang & Bin Li

Authors

Shaochen Wang
View author publications
You can also search for this author in PubMed Google Scholar
Bin Li
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Bin Li .

Editor information

Editors and Affiliations

Department of AI, Ping An Life, Shenzhen, China
Haiqin Yang
Faculty of Information Technology, King Mongkut’s Institute of Technology Ladkrabang, Bangkok, Thailand
Kitsuchart Pasupa
City University of Hong Kong, Kowloon, China
Andrew Chi-Sing Leung
Department of Computer Science and Engineering, Hong Kong University of Science and Technology, Hong Kong, Hong Kong
James T. Kwok
School of Information Technology, King Mongkut’s University of Technology Thonburi, Bangkok, Thailand
Jonathan H. Chan
The Chinese University of Hong Kong, New Territories, Hong Kong
Irwin King

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Wang, S., Li, B. (2020). Implicit Posterior Sampling Reinforcement Learning for Continuous Control. In: Yang, H., Pasupa, K., Leung, A.CS., Kwok, J.T., Chan, J.H., King, I. (eds) Neural Information Processing. ICONIP 2020. Lecture Notes in Computer Science(), vol 12533. Springer, Cham. https://doi.org/10.1007/978-3-030-63833-7_38

Download citation

DOI: https://doi.org/10.1007/978-3-030-63833-7_38
Published: 20 November 2020
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-63832-0
Online ISBN: 978-3-030-63833-7
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics