Abstract
Reinforcement learning in sparse reward environments is challenging and has recently received increasing attention, with dozens of new algorithms proposed every year. Despite promising results demonstrated in various sparse reward environments, this domain lacks a unified definition of a sparse reward environment and an experimentally fair way to compare existing algorithms. These issues significantly affect the in-depth analysis of the underlying problem and hinder further studies. This paper proposes a benchmark to unify the selection of environments and the comparison of algorithms. We first define sparsity to describe the proportion of rewarded states in the entire state space and select environments by this sparsity. Inspired by the sparsity concept, we categorize the existing algorithms into two classes. To provide a fair comparison of different algorithms, we propose a new metric along with a standard protocol for performance evaluation. Primary experimental evaluations of seven algorithms in ten environments provide a startup user guide of the proposed benchmark. We hope the proposed benchmark will promote the research of reinforcement learning algorithms in sparse reward environments. The source code of this work is published on https://github.com/simayuhe/ICONIP_Benchmark.git.
This work was supported in part by the National Key Research and Development Program of China under Grant No. 2020AAA0103401, in part by the Natural Science Foundation of China under Grant No. 62076238 and 61902402, in part by the CCF-Tencent Open Fund, and in part by the Strategic Priority Research Program of Chinese Academy of Sciences under Grant No. XDA27000000.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Aytar, Y., Pfaff, T., Budden, D., et al.: Playing hard exploration games by watching youtube. In: NeurIPS. pp. 2935–2945 (2018)
Baumli, K., Warde-Farley, D., Hansen, S., et al.: Relative variational intrinsic control. In: AAAI. pp. 6732–6740 (2021)
Bellemare, M., Srinivasan, S., Ostrovski, G., et al.: Unifying count-based exploration and intrinsic motivation. In: NeurIPS. pp. 1471–1479 (2016)
Bellemare, M.G., Naddaf, Y., Veness, J., et al.: The arcade learning environment: An evaluation platform for general agents. JAIR 47(1), 253–279 (2013)
Burda, Y., Edwards, H., Storkey, A., et al.: Exploration by random network distillation. In: ICLR. pp. 1–17 (2018)
Chen, Z., Lin, M.: Self-imitation learning in sparse reward settings. arXiv preprint arXiv:2010.06962 (2020)
Dai, T., Liu, H., Anthony Bharath, A.: Episodic self-imitation learning with hindsight. Electronics 9(10), 1742 (2020)
Dhariwal, P., Hesse, C., Klimov, O., et al.: OpenAI Baselines. https://github.com/openai/baselines (2017)
Guo, Y., Choi, J., Moczulski, M., et al.: Memory based trajectory-conditioned policies for learning from sparse rewards. In: NeurIPS. pp. 4333–4345 (2020)
Guo, Y., Oh, J., Singh, S., Lee, H.: Generative adversarial self-imitation learning. arXiv preprint arXiv:1812.00950 (2018)
Hessel, M., Modayil, J., Van Hasselt, H., et al.: Rainbow: Combining improvements in deep reinforcement learning. In: AAAI. pp. 3215–3222 (2018)
Hester, T., Vecerik, M., Pietquin, O., et al.: Deep Q-learning from demonstrations. In: AAAI. pp. 3223–3230 (2018)
Itti, L., Baldi, P.: Bayesian surprise attracts human attention. In: NeurIPS. pp. 547–554 (2005)
Kim, H., Kim, J., Jeong, Y., et al.: EMI: Exploration with mutual information. In: ICML. pp. 3360–3369 (2019)
Leibfried, F., Pascual-Díaz, S., Grau-Moya, J.: A unified bellman optimality principle combining reward maximization and empowerment. In: NeurIPS. pp. 7869–7880 (2019)
Mnih, V., Badia, A.P., Mirza, M., et al.: Asynchronous methods for deep reinforcement learning. In: ICML. pp. 1928–1937 (2016)
Mnih, V., Kavukcuoglu, K., Silver, D., et al.: Human-level control through deep reinforcement learning. Nature 518(7540), 529–533 (2015)
Ng, A.Y., Russell, S.J., et al.: Algorithms for inverse reinforcement learning. In: ICML. pp. 663–670 (2000)
Oh, J., Guo, Y., Singh, S., Lee, H.: Self-imitation learning. In: ICML. pp. 3875–3884 (2018)
Ostrovski, G., Bellemare, M.G., Oord, A., et al.: Count-based exploration with neural density models. In: ICML. pp. 2721–2730 (2017)
Pathak, D., Agrawal, P., Efros, A.A., et al.: Curiosity-driven exploration by self-supervised prediction. In: ICML. pp. 2778–2787 (2017)
Peng, X.B., Abbeel, P., Levine, S., et al.: DeepMimic: Example-guided deep reinforcement learning of physics-based character skills. TOG 37(4), 1–14 (2018)
Pohlen, T., Piot, B., Hester, T., et al.: Observe and look further: Achieving consistent performance on atari. arXiv preprint arXiv:1805.11593 (2018)
Ross, S., Gordon, G.J., Bagnell, J.A.: A reduction of imitation learning and structured prediction to no-regret online learning. AISTATS 1(2), 627–635 (2011)
Savinov, N., Raichuk, A., Vincent, D., et al.: Episodic curiosity through reachability. In: ICLR. pp. 1–20 (2019)
Schulman, J., Wolski, F., Dhariwal, P., et al.: Proximal policy optimization algorithms. arXiv preprint arXiv:1707.06347 (2017)
Sekar, R., Rybkin, O., Daniilidis, K., et al.: Planning to explore via self-supervised world models. In: ICML. pp. 8583–8592 (2020)
Singh, S., Lewis, R.L., Barto, A.G., et al.: Intrinsically motivated reinforcement learning: An evolutionary perspective. TAMD 2(2), 70–82 (2010)
Strehl, A.L., Littman, M.L.: An analysis of model-based interval estimation for markov decision processes. JCSS 74(8), 1309–1331 (2008)
Taiga, A.A., Fedus, W., Machado, M.C., et al.: On bonus based exploration methods in the arcade learning environment. In: ICLR. pp. 1–20 (2020)
Tang, H., Houthooft, R., Foote, D., et al.: # Exploration: A study of count-based exploration for deep reinforcement learning. In: NeurIPS. pp. 2753–2762 (2017)
Zhang, C., Cai, Y., Huang, L., Li, J.: Exploration by maximizing renyi entropy for reward-free rl framework. In: AAAI. pp. 10859–10867 (2021)
Zhang, T., Xu, H., Wang, X., et al.: BeBold: Exploration beyond the boundary of explored regions. arXiv preprint arXiv:2012.08621 (2020)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2023 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.
About this paper
Cite this paper
Kang, Y., Zhao, E., Zang, Y., Li, K., Xing, J. (2023). Towards a Unified Benchmark for Reinforcement Learning in Sparse Reward Environments. In: Tanveer, M., Agarwal, S., Ozawa, S., Ekbal, A., Jatowt, A. (eds) Neural Information Processing. ICONIP 2022. Communications in Computer and Information Science, vol 1791. Springer, Singapore. https://doi.org/10.1007/978-981-99-1639-9_16
Download citation
DOI: https://doi.org/10.1007/978-981-99-1639-9_16
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-99-1638-2
Online ISBN: 978-981-99-1639-9
eBook Packages: Computer ScienceComputer Science (R0)