Abstract
Long-term horizon exploration remains a challenging problem in deep reinforcement learning, especially when an environment contains sparse or poorly-defined extrinsic rewards. To tackle this challenge, we propose a reinforcement learning agent to solve hard exploration tasks by leveraging a lifelong exploration bonus. Our method decomposes this bonus into a short-term and a long-term intrinsic reward. The former deals with local exploration - exploring the consequences of short-term decisions, while the latter explicitly encourages deep exploration strategies by remaining large throughout the training process. As formulation of intrinsic novelty, we propose to measure the reconstruction error of an observation given its context to capture flexible exploration behaviors characterized by different time horizons. We demonstrate the effectiveness of our approach in visually rich environments in Minigrid, DMLab, and Atari games. Experimental results validate that our method outperforms baselines in most tasks in terms of scores and exploration efficiency.
Keywords
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsReferences
Abel, D., Agarwal, A., Diaz, F., Krishnamurthy, A., Schapire, R.E.: Exploratory gradient boosting for reinforcement learning in complex domains. In: ICML Workshop on Abstraction in Reinforcement Learning (2016)
Badia, A.P., et al.: Never give up: learning directed exploration strategies. In: International Conference on Learning Representations (2020)
Beattie, C., et al.: DeepMind lab. arXiv preprint arXiv:1612.03801 (2016)
Bellemare, M., Srinivasan, S., Ostrovski, G., Schaul, T., Saxton, D., Munos, R.: Unifying count-based exploration and intrinsic motivation. In: Advances in Neural Information Processing Systems, pp. 1471–1479 (2016)
Bellemare, M.G., Naddaf, Y., Veness, J., Bowling, M.: The arcade learning environment: an evaluation platform for general agents. J. Artif. Intell. Res. 47, 253–279 (2013)
Bougie, N., Ichise, R.: Skill-based curiosity for intrinsically motivated reinforcement learning. Mach. Learn. 109, 493–512 (2019)
Bougie, N., Ichise, R.: Combining local and global exploration via intrinsic rewards. In: Annual Conference of JSAI, pp. 2K6ES205–2K6ES205 (2020). https://doi.org/10.11517/pjsai.JSAI2020.0_2K6ES205
Burda, Y., Edwards, H., Storkey, A., Klimov, O.: Exploration by random network distillation. In: Conference on Learning Representations (2019)
Chevalier-Boisvert, M., Willems, L., Pal, S.: Minimalistic gridworld environment for OpenAI gym (2018). https://github.com/maximecb/gym-minigrid
Fu, J., Co-Reyes, J., Levine, S.: Ex2: exploration with exemplar models for deep reinforcement learning. In: Advances in Neural Information Processing Systems, pp. 2577–2587 (2017)
Han, D.: Comparison of commonly used image interpolation methods. In: Conference on Computer Science and Electronics Engineering (2013)
Houthooft, R., et al.: VIME: variational information maximizing exploration. In: Advances in Neural Information Processing Systems, pp. 1109–1117 (2016)
Klyubin, A.S., Polani, D., Nehaniv, C.L.: Empowerment: a universal agent-centric measure of control. In: IEEE Congress on Evolutionary Computation, vol. 1, pp. 128–135 (2005)
Lehman, J., Stanley, K.O.: Abandoning objectives: evolution through the search for novelty alone. Evol. Comput. 19, 189–223 (2011)
Machado, M.C., Bellemare, M.G., Bowling, M.: Count-based exploration with the successor representation. In: AAAI Conference on Artificial Intelligence, pp. 5125–5133 (2020)
Martin, J., Sasikumar, S.N., Everitt, T., Hutter, M.: Count-based exploration in feature space for reinforcement learning. In: International Joint Conference on Artificial Intelligence (2017)
Mnih, V., et al.: Asynchronous methods for deep reinforcement learning. In: Conference on Machine Learning, pp. 1928–1937 (2016)
Ostrovski, G., Bellemare, M.G., van den Oord, A., Munos, R.: Count-based exploration with neural density models. In: International Conference on Machine Learning, pp. 2721–2730 (2017)
Pathak, D., Agrawal, P., Efros, A.A., Darrell, T.: Curiosity-driven exploration by self-supervised prediction. In: International Conference on International Conference on Machine Learning (2017)
Raileanu, R., Rocktäschel, T.: Ride: rewarding impact-driven exploration for procedurally-generated environments. In: International Conference on Learning Representations (2020)
Savinov, N., et al.: Episodic curiosity through reachability. In: International Conference on Learning Representations (2019)
Schulman, J., Wolski, F., Dhariwal, P., Radford, A., Klimov, O.: Proximal policy optimization algorithms. arXiv preprint arXiv:1707.06347 (2017)
Strehl, A.L., Littman, M.L.: An analysis of model-based interval estimation for Markov decision processes. J. Comput. Syst. Sci. 74(8), 1309–1331 (2008)
Sutton, R.S., Barto, A.G.: Reinforcement Learning: An Introduction. MIT Press, Cambridge (1998)
Tang, H., et al.: # exploration: a study of count-based exploration for deep reinforcement learning. In: Advances in Neural Information Processing Systems, pp. 2753–2762 (2017)
Wang, Z., Bovik, A.C., Sheikh, H.R., Simoncelli, E.P., et al.: Image quality assessment: from error visibility to structural similarity. IEEE Trans. Image Process. 13(4), 600–612 (2004)
Wang, Z., Schaul, T., Hessel, M., Van Hasselt, H., Lanctot, M., De Freitas, N.: Dueling network architectures for deep reinforcement learning. In: International Conference on Machine Learning (2016)
Yang, H.K., Chiang, P.H., Hong, M.F., Lee, C.Y.: Flow-based intrinsic curiosity module. In: Proceedings of the the International Conference on Learning Representations (2020)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2021 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Bougie, N., Ichise, R. (2021). Intrinsically Motivated Lifelong Exploration in Reinforcement Learning. In: Yada, K., et al. Advances in Artificial Intelligence. JSAI 2020. Advances in Intelligent Systems and Computing, vol 1357. Springer, Cham. https://doi.org/10.1007/978-3-030-73113-7_10
Download citation
DOI: https://doi.org/10.1007/978-3-030-73113-7_10
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-73112-0
Online ISBN: 978-3-030-73113-7
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)