Towards a Unified Benchmark for Reinforcement Learning in Sparse Reward Environments

Kang, Yongxin; Zhao, Enmin; Zang, Yifan; Li, Kai; Xing, Junliang

doi:10.1007/978-981-99-1639-9_16

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 1791))

Included in the following conference series:

International Conference on Neural Information Processing

804 Accesses

Abstract

Reinforcement learning in sparse reward environments is challenging and has recently received increasing attention, with dozens of new algorithms proposed every year. Despite promising results demonstrated in various sparse reward environments, this domain lacks a unified definition of a sparse reward environment and an experimentally fair way to compare existing algorithms. These issues significantly affect the in-depth analysis of the underlying problem and hinder further studies. This paper proposes a benchmark to unify the selection of environments and the comparison of algorithms. We first define sparsity to describe the proportion of rewarded states in the entire state space and select environments by this sparsity. Inspired by the sparsity concept, we categorize the existing algorithms into two classes. To provide a fair comparison of different algorithms, we propose a new metric along with a standard protocol for performance evaluation. Primary experimental evaluations of seven algorithms in ten environments provide a startup user guide of the proposed benchmark. We hope the proposed benchmark will promote the research of reinforcement learning algorithms in sparse reward environments. The source code of this work is published on https://github.com/simayuhe/ICONIP_Benchmark.git.

This work was supported in part by the National Key Research and Development Program of China under Grant No. 2020AAA0103401, in part by the Natural Science Foundation of China under Grant No. 62076238 and 61902402, in part by the CCF-Tencent Open Fund, and in part by the Strategic Priority Research Program of Chinese Academy of Sciences under Grant No. XDA27000000.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 89.00; Price excludes VAT (USA)

Softcover Book: USD 119.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Aytar, Y., Pfaff, T., Budden, D., et al.: Playing hard exploration games by watching youtube. In: NeurIPS. pp. 2935–2945 (2018)
Google Scholar
Baumli, K., Warde-Farley, D., Hansen, S., et al.: Relative variational intrinsic control. In: AAAI. pp. 6732–6740 (2021)
Google Scholar
Bellemare, M., Srinivasan, S., Ostrovski, G., et al.: Unifying count-based exploration and intrinsic motivation. In: NeurIPS. pp. 1471–1479 (2016)
Google Scholar
Bellemare, M.G., Naddaf, Y., Veness, J., et al.: The arcade learning environment: An evaluation platform for general agents. JAIR 47(1), 253–279 (2013)
Article Google Scholar
Burda, Y., Edwards, H., Storkey, A., et al.: Exploration by random network distillation. In: ICLR. pp. 1–17 (2018)
Google Scholar
Chen, Z., Lin, M.: Self-imitation learning in sparse reward settings. arXiv preprint arXiv:2010.06962 (2020)
Dai, T., Liu, H., Anthony Bharath, A.: Episodic self-imitation learning with hindsight. Electronics 9(10), 1742 (2020)
Article Google Scholar
Dhariwal, P., Hesse, C., Klimov, O., et al.: OpenAI Baselines. https://github.com/openai/baselines (2017)
Guo, Y., Choi, J., Moczulski, M., et al.: Memory based trajectory-conditioned policies for learning from sparse rewards. In: NeurIPS. pp. 4333–4345 (2020)
Google Scholar
Guo, Y., Oh, J., Singh, S., Lee, H.: Generative adversarial self-imitation learning. arXiv preprint arXiv:1812.00950 (2018)
Hessel, M., Modayil, J., Van Hasselt, H., et al.: Rainbow: Combining improvements in deep reinforcement learning. In: AAAI. pp. 3215–3222 (2018)
Google Scholar
Hester, T., Vecerik, M., Pietquin, O., et al.: Deep Q-learning from demonstrations. In: AAAI. pp. 3223–3230 (2018)
Google Scholar
Itti, L., Baldi, P.: Bayesian surprise attracts human attention. In: NeurIPS. pp. 547–554 (2005)
Google Scholar
Kim, H., Kim, J., Jeong, Y., et al.: EMI: Exploration with mutual information. In: ICML. pp. 3360–3369 (2019)
Google Scholar
Leibfried, F., Pascual-Díaz, S., Grau-Moya, J.: A unified bellman optimality principle combining reward maximization and empowerment. In: NeurIPS. pp. 7869–7880 (2019)
Google Scholar
Mnih, V., Badia, A.P., Mirza, M., et al.: Asynchronous methods for deep reinforcement learning. In: ICML. pp. 1928–1937 (2016)
Google Scholar
Mnih, V., Kavukcuoglu, K., Silver, D., et al.: Human-level control through deep reinforcement learning. Nature 518(7540), 529–533 (2015)
Article Google Scholar
Ng, A.Y., Russell, S.J., et al.: Algorithms for inverse reinforcement learning. In: ICML. pp. 663–670 (2000)
Google Scholar
Oh, J., Guo, Y., Singh, S., Lee, H.: Self-imitation learning. In: ICML. pp. 3875–3884 (2018)
Google Scholar
Ostrovski, G., Bellemare, M.G., Oord, A., et al.: Count-based exploration with neural density models. In: ICML. pp. 2721–2730 (2017)
Google Scholar
Pathak, D., Agrawal, P., Efros, A.A., et al.: Curiosity-driven exploration by self-supervised prediction. In: ICML. pp. 2778–2787 (2017)
Google Scholar
Peng, X.B., Abbeel, P., Levine, S., et al.: DeepMimic: Example-guided deep reinforcement learning of physics-based character skills. TOG 37(4), 1–14 (2018)
Google Scholar
Pohlen, T., Piot, B., Hester, T., et al.: Observe and look further: Achieving consistent performance on atari. arXiv preprint arXiv:1805.11593 (2018)
Ross, S., Gordon, G.J., Bagnell, J.A.: A reduction of imitation learning and structured prediction to no-regret online learning. AISTATS 1(2), 627–635 (2011)
Google Scholar
Savinov, N., Raichuk, A., Vincent, D., et al.: Episodic curiosity through reachability. In: ICLR. pp. 1–20 (2019)
Google Scholar
Schulman, J., Wolski, F., Dhariwal, P., et al.: Proximal policy optimization algorithms. arXiv preprint arXiv:1707.06347 (2017)
Sekar, R., Rybkin, O., Daniilidis, K., et al.: Planning to explore via self-supervised world models. In: ICML. pp. 8583–8592 (2020)
Google Scholar
Singh, S., Lewis, R.L., Barto, A.G., et al.: Intrinsically motivated reinforcement learning: An evolutionary perspective. TAMD 2(2), 70–82 (2010)
Google Scholar
Strehl, A.L., Littman, M.L.: An analysis of model-based interval estimation for markov decision processes. JCSS 74(8), 1309–1331 (2008)
MathSciNet MATH Google Scholar
Taiga, A.A., Fedus, W., Machado, M.C., et al.: On bonus based exploration methods in the arcade learning environment. In: ICLR. pp. 1–20 (2020)
Google Scholar
Tang, H., Houthooft, R., Foote, D., et al.: # Exploration: A study of count-based exploration for deep reinforcement learning. In: NeurIPS. pp. 2753–2762 (2017)
Google Scholar
Zhang, C., Cai, Y., Huang, L., Li, J.: Exploration by maximizing renyi entropy for reward-free rl framework. In: AAAI. pp. 10859–10867 (2021)
Google Scholar
Zhang, T., Xu, H., Wang, X., et al.: BeBold: Exploration beyond the boundary of explored regions. arXiv preprint arXiv:2012.08621 (2020)

Download references

Author information

Authors and Affiliations

School of Artificial Intelligence, University of Chinese Academy of Sciences, Beijing, China
Yongxin Kang, Enmin Zhao, Yifan Zang, Kai Li & Junliang Xing
Institute of Automation, Chinese Academy of Sciences, Beijing, China
Yongxin Kang, Enmin Zhao, Yifan Zang & Kai Li
Department of Computer Science and Technology, Tsinghua University, Beijing, China
Junliang Xing

Authors

Yongxin Kang
View author publications
You can also search for this author in PubMed Google Scholar
Enmin Zhao
View author publications
You can also search for this author in PubMed Google Scholar
Yifan Zang
View author publications
You can also search for this author in PubMed Google Scholar
Kai Li
View author publications
You can also search for this author in PubMed Google Scholar
Junliang Xing
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Junliang Xing .

Editor information

Editors and Affiliations

Indian Institute of Technology Indore, Indore, India
Mohammad Tanveer
Indian Institute of Information Technology - Allahabad, Prayagraj, India
Sonali Agarwal
Kobe University, Kobe, Japan
Seiichi Ozawa
Indian Institute of Technology Patna, Patna, India
Asif Ekbal
University of Innsbruck, Innsbruck, Austria
Adam Jatowt

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Kang, Y., Zhao, E., Zang, Y., Li, K., Xing, J. (2023). Towards a Unified Benchmark for Reinforcement Learning in Sparse Reward Environments. In: Tanveer, M., Agarwal, S., Ozawa, S., Ekbal, A., Jatowt, A. (eds) Neural Information Processing. ICONIP 2022. Communications in Computer and Information Science, vol 1791. Springer, Singapore. https://doi.org/10.1007/978-981-99-1639-9_16

Download citation

DOI: https://doi.org/10.1007/978-981-99-1639-9_16
Published: 15 April 2023
Publisher Name: Springer, Singapore
Print ISBN: 978-981-99-1638-2
Online ISBN: 978-981-99-1639-9
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Towards a Unified Benchmark for Reinforcement Learning in Sparse Reward Environments