Planning with Q-Values in Sparse Reward Reinforcement Learning

Lei, Hejun; Weng, Paul; Rojas, Juan; Guan, Yisheng

doi:10.1007/978-3-031-13844-7_56

Hejun Lei¹⁴,
Paul Weng¹⁵,
Juan Rojas¹⁶ &
…
Yisheng Guan¹⁴

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 13455))

Included in the following conference series:

International Conference on Intelligent Robotics and Applications

2556 Accesses

Abstract

Learning a policy from sparse rewards is a main challenge in reinforcement learning (RL). The best solutions to this challenge have been via sample inefficient model-free RL algorithms. Model-based RL algorithms are known to be sample efficient but few of them can solve sparse settings. To address these limitations, we present PlanQ, a sample efficient model-based RL framework that resolves sparse reward settings. PlanQ leverages Q-values that encode long-term values and serve as a richer feedback signal to actions than immediate rewards. As such, PlanQ scores rollout returns from its learned model with returns containing Q-values. We verify the efficacy of the approach on robot manipulation tasks whose difficulties range from simple to complex. Our experimental results show that PlanQ enhances performance and efficiency in sparse reward settings.

The work is supported by the Key Research & Development Prog. of Guangdong (2019B090915001), the Guangdong Yangfan Innovative & Entrepreneurial Program (2017YT05G026), the CUHK Direct Grant (4055141), the NSF of China (62176154), and Hong Kong Center for Logistic Robotics.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 99.00; Price excludes VAT (USA)

Softcover Book: USD 129.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
The simpler returns are weighed by summing all the returns of one imagined rollout without weighing them.
2.
Note that this goal transformation does not change the environment dynamics.

References

Charlesworth, H.J., Montana, G.: Solving challenging dexterous manipulation tasks with trajectory optimisation and reinforcement learning. In: International Conference on Machine Learning, pp. 1496–1506. PMLR (2021)
Google Scholar
Silver, D., Huang, A., Maddison, C.J., et al.: Mastering the game of Go with deep neural networks and tree search. Nature 529(7587), 484–489 (2016)
Article Google Scholar
Andrychowicz, O.A.I.M., Baker, B., Chociej, M., et al.: Learning dexterous in-hand manipulation. Int. J. Robot. Res. 39(1), 3–20 (2020)
Article Google Scholar
Pathak, D., Agrawal, P., Efros, A.A., et al.: Curiosity-driven exploration by self-supervised prediction. In: International Conference on Machine Learning, pp. 2778–2787. PMLR (2017)
Google Scholar
Bengio, Y., Louradour, J., Collobert, R., et al.: Curriculum learning. In: Proceedings of the 26th Annual International Conference on Machine Learning, pp. 41–48 (2009)
Google Scholar
Andrychowicz, M., Wolski, F., et al.: Hindsight experience replay. In: Advances in Neural Information Processing Systems, pp. 5048–5058 (2017)
Google Scholar
Gu, S., Lillicrap, T., Sutskever, I., et al.: Continuous deep Q-learning with model-based acceleration. In: International Conference on Machine Learning, pp. 2829–2838. PMLR (2016)
Google Scholar
Kalweit, G., Boedecker, J.: Uncertainty-driven imagination for continuous deep reinforcement learning. In: Conference on Robot Learning, pp. 195–206. PMLR (2017)
Google Scholar
Kurutach, T., Clavera, I., Duan, Y., et al.: Model-ensemble trust-region policy optimization. In: International Conference on Learning Representations (2018)
Google Scholar
Feinberg, V., Wan, A., Stoica, I., et al.: Model-based value estimation for efficient model-free reinforcement learning. Arxiv:1803.00101 (2018)
Buckman, J., Hafner, D., Tucker, G., et al.: Sample-efficient reinforcement learning with stochastic ensemble value expansion. In: Advances in Neural Information Processing Systems, p. 31 (2018)
Google Scholar
Sutton, R.S., Barto, A.G.: Reinforcement Learning: An Introduction, 2nd edn. MIT Press, Cambridge (2018)
Google Scholar
Wang, T., Ba, J.: Exploring model-based planning with policy networks. In: International Conference on Learning Representations (2019)
Google Scholar
Nagabandi, A., Kahn, G., Fearing, R.S., et al.: Neural network dynamics for model-based deep reinforcement learning with model-free fine-tuning. In: International Conference on Robotics and Automation, pp. 7559–7566. IEEE (2018)
Google Scholar
Chua, K., Calandra, R., McAllister, R., et al.: Deep reinforcement learning in a handful of trials using probabilistic dynamics models. In: Advances in Neural Information Processing Systems, p. 31 (2018)
Google Scholar
Charlesworth, H., Montana, G.: PlanGAN: model-based planning with sparse rewards and multiple goals. In: Advances in Neural Information Processing Systems, pp. 8532–8542 (2020)
Google Scholar
Lillicrap, T.P., Hunt, J.J., Pritzel, A., et al.: Continuous control with deep reinforcement learning. In: International Conference on Learning Representations (2016)
Google Scholar
Nagabandi, A., Konolige, K., Levine, S., et al.: Deep dynamics models for learning dexterous manipulation. In: Conference on Robot Learning, pp. 1101–1112. PMLR (2020)
Google Scholar
Plappert, M., Andrychowicz, M., Ray, A., et al.: Multi-goal reinforcement learning: challenging robotics environments and request for research. arxiv:1802.09464 (2018)

Download references

Author information

Authors and Affiliations

BIRL, Guangdong University of Technology, Guangzhou, China
Hejun Lei & Yisheng Guan
UM-SJTU Joint Institute, Shanghai Jiao Tong University, Shanghai, China
Paul Weng
School of Mechanical and Automation Engineering, Chinese University of Hong Kong, Hong Kong, China
Juan Rojas

Authors

Hejun Lei
View author publications
You can also search for this author in PubMed Google Scholar
Paul Weng
View author publications
You can also search for this author in PubMed Google Scholar
Juan Rojas
View author publications
You can also search for this author in PubMed Google Scholar
Yisheng Guan
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Yisheng Guan .

Editor information

Editors and Affiliations

Harbin Institute of Technology, Shenzhen, China
Honghai Liu
Huazhong University of Science and Technology, Wuhan, China
Zhouping Yin
Shenyang Institute of Automation, Shenyang, Liaoning, China
Lianqing Liu
Harbin Institute of Technology, Harbin, China
Li Jiang
Shanghai Jiao Tong University, Shanghai, China
Guoying Gu
Shenzhen Institute of Advanced Technology, Shenzhen, China
Xinyu Wu
Harbin Institute of Technology, Shenzhen, China
Weihong Ren

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Lei, H., Weng, P., Rojas, J., Guan, Y. (2022). Planning with Q-Values in Sparse Reward Reinforcement Learning. In: Liu, H., et al. Intelligent Robotics and Applications. ICIRA 2022. Lecture Notes in Computer Science(), vol 13455. Springer, Cham. https://doi.org/10.1007/978-3-031-13844-7_56

Download citation

DOI: https://doi.org/10.1007/978-3-031-13844-7_56
Published: 04 August 2022
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-13843-0
Online ISBN: 978-3-031-13844-7
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Planning with Q-Values in Sparse Reward Reinforcement Learning