Abstract
Traditional exploration methods in reinforcement learning rely on well-designed extrinsic rewards. However, many real-world scenarios involve sparse or delayed rewards. One solution inspired by curious behaviors in animals is to let the agent develop its own intrinsic rewards. In this paper we propose a novel end-to-end curiosity mechanism which uses learning progress as novelty bonus. We compare a policy-based and a visual-based progress bonus to move the agent towards hard-to-learn regions of the state space. We further leverage the agent’s learning to identify the most critical regions, which results in more sample-efficient and global exploration strategies. We evaluate our method on a variety of benchmark environments, including Minigrid, Super Mario Bros., and Atari games. Experimental results show that our method outperforms prior approaches in most tasks in terms of exploration efficiency and average scores, especially for those featuring high-level exploration patterns or with deceptive rewards.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Bellemare, M.G., Naddaf, Y., Veness, J., Bowling, M.: The arcade learning environment: an evaluation platform for general agents. JAIR 47, 253–279 (2013)
Bougie, N., Ichise, R.: Skill-based curiosity for intrinsically motivated reinforcement learning. Mach. Learn. 109(3), 493–512 (2019). https://doi.org/10.1007/s10994-019-05845-8
Burda, Y., Edwards, H., Storkey, A., Klimov, O.: Exploration by random network distillation. arXiv preprint:1810.12894 (2018)
Chevalier-Boisvert, M., Willems, L., Pal, S.: Minimalistic gridworld environment for openAI gym (2018). https://github.com/maximecb/gym-minigrid
Florensa, C., Held, D., Geng, X., Abbeel, P.: Automatic goal generation for reinforcement learning agents. arXiv preprint:1705.06366 (2017)
Haarnoja, T., Zhou, A., Abbeel, P., Levine, S.: Soft actor-critic: off-policy maximum entropy deep reinforcement learning with a stochastic actor. arXiv preprint:1801.01290 (2018)
Hong, Z.W., Shann, T.Y., Su, S.Y., Chang, Y.H., Fu, T.J., Lee, C.Y.: Diversity-driven exploration strategy for deep reinforcement learning. In: NIPS (2018)
Houthooft, R., Chen, X., Duan, Y., Schulman, J., De Turck, F., Abbeel, P.: Variational information maximizing exploration. In: NIPS, pp. 1109–1117 (2016)
Kaelbling, L.P.: Learning to achieve goals. In: IJCAI, pp. 1094–1098 (1993)
Kauten, C.: Super mario bros for openAI gym. https://github.com/Kautenja/gym-super-mario-bros (2018)
Kingma, D.P., Welling, M.: Auto-encoding variational Bayes. In: ICLR (2014)
Lillicrap, T.P., et al.: Continuous control with deep reinforcement learning. arXiv preprint:1509.02971 (2015)
Machado, M., Bellemare, M., Bowling, M.: Count-based exploration with the successor representation. arXiv preprint:1807.11622 (2018)
Mnih, V., et al.: Asynchronous methods for DRL. In: ICML, pp. 1928–1937 (2016)
Mnih, V., et al.: Human-level control through deep reinforcement learning. Nature 518(7540), 529 (2015)
Nair, A.V., Pong, V., Dalal, M., Bahl, S., Lin, S., Levine, S.: Visual reinforcement learning with imagined goals. In: ICML, pp. 9191–9200 (2018)
Ostrovski, G., Bellemare, M.G., van den Oord, A., Munos, R.: Count-based exploration with neural density models. In: ICML, pp. 2721–2730 (2017)
Pathak, D., Agrawal, P., Efros, A.A., Darrell, T.: Curiosity-driven exploration by self-supervised prediction. In: ICML (2017)
Savinov, N., et al.: Episodic curiosity through reachability. In: ICLR (2019)
Schaul, T., Horgan, D., Gregor, K., Silver, D.: Universal value function approximators. In: Proceedings of the International conference on Machine Learning (2015)
Schulman, J., Wolski, F., Dhariwal, P., Radford, A., Klimov, O.: Proximal policy optimization algorithms. arXiv preprint:1707.06347 (2017)
Stanton, C., Clune, J.: Deep curiosity search: intra-life exploration improves performance on challenging deep reinforcement learning problems. In: ICML (2019)
Tang, H., et al.: Exploration: a study of count-based exploration for deep reinforcement learning. In: NIPS, pp. 2753–2762 (2017)
Wang, Z., Schaul, T., Hessel, M., Van Hasselt, H., Lanctot, M., De Freitas, N.: Dueling network architectures for deep reinforcement learning. In: ICML (2016)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2020 Springer Nature Switzerland AG
About this paper
Cite this paper
Bougie, N., Ichise, R. (2020). Exploration via Progress-Driven Intrinsic Rewards. In: Farkaš, I., Masulli, P., Wermter, S. (eds) Artificial Neural Networks and Machine Learning – ICANN 2020. ICANN 2020. Lecture Notes in Computer Science(), vol 12397. Springer, Cham. https://doi.org/10.1007/978-3-030-61616-8_22
Download citation
DOI: https://doi.org/10.1007/978-3-030-61616-8_22
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-61615-1
Online ISBN: 978-3-030-61616-8
eBook Packages: Computer ScienceComputer Science (R0)