Skip to main content

Planning with Q-Values in Sparse Reward Reinforcement Learning

  • Conference paper
  • First Online:
Intelligent Robotics and Applications (ICIRA 2022)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 13455))

Included in the following conference series:

  • 2556 Accesses

Abstract

Learning a policy from sparse rewards is a main challenge in reinforcement learning (RL). The best solutions to this challenge have been via sample inefficient model-free RL algorithms. Model-based RL algorithms are known to be sample efficient but few of them can solve sparse settings. To address these limitations, we present PlanQ, a sample efficient model-based RL framework that resolves sparse reward settings. PlanQ leverages Q-values that encode long-term values and serve as a richer feedback signal to actions than immediate rewards. As such, PlanQ scores rollout returns from its learned model with returns containing Q-values. We verify the efficacy of the approach on robot manipulation tasks whose difficulties range from simple to complex. Our experimental results show that PlanQ enhances performance and efficiency in sparse reward settings.

The work is supported by the Key Research & Development Prog. of Guangdong (2019B090915001), the Guangdong Yangfan Innovative & Entrepreneurial Program (2017YT05G026), the CUHK Direct Grant (4055141), the NSF of China (62176154), and Hong Kong Center for Logistic Robotics.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 99.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 129.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    The simpler returns are weighed by summing all the returns of one imagined rollout without weighing them.

  2. 2.

    Note that this goal transformation does not change the environment dynamics.

References

  1. Charlesworth, H.J., Montana, G.: Solving challenging dexterous manipulation tasks with trajectory optimisation and reinforcement learning. In: International Conference on Machine Learning, pp. 1496–1506. PMLR (2021)

    Google Scholar 

  2. Silver, D., Huang, A., Maddison, C.J., et al.: Mastering the game of Go with deep neural networks and tree search. Nature 529(7587), 484–489 (2016)

    Article  Google Scholar 

  3. Andrychowicz, O.A.I.M., Baker, B., Chociej, M., et al.: Learning dexterous in-hand manipulation. Int. J. Robot. Res. 39(1), 3–20 (2020)

    Article  Google Scholar 

  4. Pathak, D., Agrawal, P., Efros, A.A., et al.: Curiosity-driven exploration by self-supervised prediction. In: International Conference on Machine Learning, pp. 2778–2787. PMLR (2017)

    Google Scholar 

  5. Bengio, Y., Louradour, J., Collobert, R., et al.: Curriculum learning. In: Proceedings of the 26th Annual International Conference on Machine Learning, pp. 41–48 (2009)

    Google Scholar 

  6. Andrychowicz, M., Wolski, F., et al.: Hindsight experience replay. In: Advances in Neural Information Processing Systems, pp. 5048–5058 (2017)

    Google Scholar 

  7. Gu, S., Lillicrap, T., Sutskever, I., et al.: Continuous deep Q-learning with model-based acceleration. In: International Conference on Machine Learning, pp. 2829–2838. PMLR (2016)

    Google Scholar 

  8. Kalweit, G., Boedecker, J.: Uncertainty-driven imagination for continuous deep reinforcement learning. In: Conference on Robot Learning, pp. 195–206. PMLR (2017)

    Google Scholar 

  9. Kurutach, T., Clavera, I., Duan, Y., et al.: Model-ensemble trust-region policy optimization. In: International Conference on Learning Representations (2018)

    Google Scholar 

  10. Feinberg, V., Wan, A., Stoica, I., et al.: Model-based value estimation for efficient model-free reinforcement learning. Arxiv:1803.00101 (2018)

  11. Buckman, J., Hafner, D., Tucker, G., et al.: Sample-efficient reinforcement learning with stochastic ensemble value expansion. In: Advances in Neural Information Processing Systems, p. 31 (2018)

    Google Scholar 

  12. Sutton, R.S., Barto, A.G.: Reinforcement Learning: An Introduction, 2nd edn. MIT Press, Cambridge (2018)

    Google Scholar 

  13. Wang, T., Ba, J.: Exploring model-based planning with policy networks. In: International Conference on Learning Representations (2019)

    Google Scholar 

  14. Nagabandi, A., Kahn, G., Fearing, R.S., et al.: Neural network dynamics for model-based deep reinforcement learning with model-free fine-tuning. In: International Conference on Robotics and Automation, pp. 7559–7566. IEEE (2018)

    Google Scholar 

  15. Chua, K., Calandra, R., McAllister, R., et al.: Deep reinforcement learning in a handful of trials using probabilistic dynamics models. In: Advances in Neural Information Processing Systems, p. 31 (2018)

    Google Scholar 

  16. Charlesworth, H., Montana, G.: PlanGAN: model-based planning with sparse rewards and multiple goals. In: Advances in Neural Information Processing Systems, pp. 8532–8542 (2020)

    Google Scholar 

  17. Lillicrap, T.P., Hunt, J.J., Pritzel, A., et al.: Continuous control with deep reinforcement learning. In: International Conference on Learning Representations (2016)

    Google Scholar 

  18. Nagabandi, A., Konolige, K., Levine, S., et al.: Deep dynamics models for learning dexterous manipulation. In: Conference on Robot Learning, pp. 1101–1112. PMLR (2020)

    Google Scholar 

  19. Plappert, M., Andrychowicz, M., Ray, A., et al.: Multi-goal reinforcement learning: challenging robotics environments and request for research. arxiv:1802.09464 (2018)

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Yisheng Guan .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Lei, H., Weng, P., Rojas, J., Guan, Y. (2022). Planning with Q-Values in Sparse Reward Reinforcement Learning. In: Liu, H., et al. Intelligent Robotics and Applications. ICIRA 2022. Lecture Notes in Computer Science(), vol 13455. Springer, Cham. https://doi.org/10.1007/978-3-031-13844-7_56

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-13844-7_56

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-13843-0

  • Online ISBN: 978-3-031-13844-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics