Skip to main content

Towards a Unified Benchmark for Reinforcement Learning in Sparse Reward Environments

  • Conference paper
  • First Online:
Neural Information Processing (ICONIP 2022)

Abstract

Reinforcement learning in sparse reward environments is challenging and has recently received increasing attention, with dozens of new algorithms proposed every year. Despite promising results demonstrated in various sparse reward environments, this domain lacks a unified definition of a sparse reward environment and an experimentally fair way to compare existing algorithms. These issues significantly affect the in-depth analysis of the underlying problem and hinder further studies. This paper proposes a benchmark to unify the selection of environments and the comparison of algorithms. We first define sparsity to describe the proportion of rewarded states in the entire state space and select environments by this sparsity. Inspired by the sparsity concept, we categorize the existing algorithms into two classes. To provide a fair comparison of different algorithms, we propose a new metric along with a standard protocol for performance evaluation. Primary experimental evaluations of seven algorithms in ten environments provide a startup user guide of the proposed benchmark. We hope the proposed benchmark will promote the research of reinforcement learning algorithms in sparse reward environments. The source code of this work is published on https://github.com/simayuhe/ICONIP_Benchmark.git.

This work was supported in part by the National Key Research and Development Program of China under Grant No. 2020AAA0103401, in part by the Natural Science Foundation of China under Grant No. 62076238 and 61902402, in part by the CCF-Tencent Open Fund, and in part by the Strategic Priority Research Program of Chinese Academy of Sciences under Grant No. XDA27000000.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 89.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 119.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Aytar, Y., Pfaff, T., Budden, D., et al.: Playing hard exploration games by watching youtube. In: NeurIPS. pp. 2935–2945 (2018)

    Google Scholar 

  2. Baumli, K., Warde-Farley, D., Hansen, S., et al.: Relative variational intrinsic control. In: AAAI. pp. 6732–6740 (2021)

    Google Scholar 

  3. Bellemare, M., Srinivasan, S., Ostrovski, G., et al.: Unifying count-based exploration and intrinsic motivation. In: NeurIPS. pp. 1471–1479 (2016)

    Google Scholar 

  4. Bellemare, M.G., Naddaf, Y., Veness, J., et al.: The arcade learning environment: An evaluation platform for general agents. JAIR 47(1), 253–279 (2013)

    Article  Google Scholar 

  5. Burda, Y., Edwards, H., Storkey, A., et al.: Exploration by random network distillation. In: ICLR. pp. 1–17 (2018)

    Google Scholar 

  6. Chen, Z., Lin, M.: Self-imitation learning in sparse reward settings. arXiv preprint arXiv:2010.06962 (2020)

  7. Dai, T., Liu, H., Anthony Bharath, A.: Episodic self-imitation learning with hindsight. Electronics 9(10), 1742 (2020)

    Article  Google Scholar 

  8. Dhariwal, P., Hesse, C., Klimov, O., et al.: OpenAI Baselines. https://github.com/openai/baselines (2017)

  9. Guo, Y., Choi, J., Moczulski, M., et al.: Memory based trajectory-conditioned policies for learning from sparse rewards. In: NeurIPS. pp. 4333–4345 (2020)

    Google Scholar 

  10. Guo, Y., Oh, J., Singh, S., Lee, H.: Generative adversarial self-imitation learning. arXiv preprint arXiv:1812.00950 (2018)

  11. Hessel, M., Modayil, J., Van Hasselt, H., et al.: Rainbow: Combining improvements in deep reinforcement learning. In: AAAI. pp. 3215–3222 (2018)

    Google Scholar 

  12. Hester, T., Vecerik, M., Pietquin, O., et al.: Deep Q-learning from demonstrations. In: AAAI. pp. 3223–3230 (2018)

    Google Scholar 

  13. Itti, L., Baldi, P.: Bayesian surprise attracts human attention. In: NeurIPS. pp. 547–554 (2005)

    Google Scholar 

  14. Kim, H., Kim, J., Jeong, Y., et al.: EMI: Exploration with mutual information. In: ICML. pp. 3360–3369 (2019)

    Google Scholar 

  15. Leibfried, F., Pascual-Díaz, S., Grau-Moya, J.: A unified bellman optimality principle combining reward maximization and empowerment. In: NeurIPS. pp. 7869–7880 (2019)

    Google Scholar 

  16. Mnih, V., Badia, A.P., Mirza, M., et al.: Asynchronous methods for deep reinforcement learning. In: ICML. pp. 1928–1937 (2016)

    Google Scholar 

  17. Mnih, V., Kavukcuoglu, K., Silver, D., et al.: Human-level control through deep reinforcement learning. Nature 518(7540), 529–533 (2015)

    Article  Google Scholar 

  18. Ng, A.Y., Russell, S.J., et al.: Algorithms for inverse reinforcement learning. In: ICML. pp. 663–670 (2000)

    Google Scholar 

  19. Oh, J., Guo, Y., Singh, S., Lee, H.: Self-imitation learning. In: ICML. pp. 3875–3884 (2018)

    Google Scholar 

  20. Ostrovski, G., Bellemare, M.G., Oord, A., et al.: Count-based exploration with neural density models. In: ICML. pp. 2721–2730 (2017)

    Google Scholar 

  21. Pathak, D., Agrawal, P., Efros, A.A., et al.: Curiosity-driven exploration by self-supervised prediction. In: ICML. pp. 2778–2787 (2017)

    Google Scholar 

  22. Peng, X.B., Abbeel, P., Levine, S., et al.: DeepMimic: Example-guided deep reinforcement learning of physics-based character skills. TOG 37(4), 1–14 (2018)

    Google Scholar 

  23. Pohlen, T., Piot, B., Hester, T., et al.: Observe and look further: Achieving consistent performance on atari. arXiv preprint arXiv:1805.11593 (2018)

  24. Ross, S., Gordon, G.J., Bagnell, J.A.: A reduction of imitation learning and structured prediction to no-regret online learning. AISTATS 1(2), 627–635 (2011)

    Google Scholar 

  25. Savinov, N., Raichuk, A., Vincent, D., et al.: Episodic curiosity through reachability. In: ICLR. pp. 1–20 (2019)

    Google Scholar 

  26. Schulman, J., Wolski, F., Dhariwal, P., et al.: Proximal policy optimization algorithms. arXiv preprint arXiv:1707.06347 (2017)

  27. Sekar, R., Rybkin, O., Daniilidis, K., et al.: Planning to explore via self-supervised world models. In: ICML. pp. 8583–8592 (2020)

    Google Scholar 

  28. Singh, S., Lewis, R.L., Barto, A.G., et al.: Intrinsically motivated reinforcement learning: An evolutionary perspective. TAMD 2(2), 70–82 (2010)

    Google Scholar 

  29. Strehl, A.L., Littman, M.L.: An analysis of model-based interval estimation for markov decision processes. JCSS 74(8), 1309–1331 (2008)

    MathSciNet  MATH  Google Scholar 

  30. Taiga, A.A., Fedus, W., Machado, M.C., et al.: On bonus based exploration methods in the arcade learning environment. In: ICLR. pp. 1–20 (2020)

    Google Scholar 

  31. Tang, H., Houthooft, R., Foote, D., et al.: # Exploration: A study of count-based exploration for deep reinforcement learning. In: NeurIPS. pp. 2753–2762 (2017)

    Google Scholar 

  32. Zhang, C., Cai, Y., Huang, L., Li, J.: Exploration by maximizing renyi entropy for reward-free rl framework. In: AAAI. pp. 10859–10867 (2021)

    Google Scholar 

  33. Zhang, T., Xu, H., Wang, X., et al.: BeBold: Exploration beyond the boundary of explored regions. arXiv preprint arXiv:2012.08621 (2020)

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Junliang Xing .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2023 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Kang, Y., Zhao, E., Zang, Y., Li, K., Xing, J. (2023). Towards a Unified Benchmark for Reinforcement Learning in Sparse Reward Environments. In: Tanveer, M., Agarwal, S., Ozawa, S., Ekbal, A., Jatowt, A. (eds) Neural Information Processing. ICONIP 2022. Communications in Computer and Information Science, vol 1791. Springer, Singapore. https://doi.org/10.1007/978-981-99-1639-9_16

Download citation

  • DOI: https://doi.org/10.1007/978-981-99-1639-9_16

  • Published:

  • Publisher Name: Springer, Singapore

  • Print ISBN: 978-981-99-1638-2

  • Online ISBN: 978-981-99-1639-9

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics