Abstract
In the last few years, the research activity around reinforcement learning tasks formulated over environments with sparse rewards has been especially notable. Among the numerous approaches proposed to deal with these hard exploration problems, intrinsic motivation mechanisms are arguably among the most studied alternatives to date. Advances reported in this area over time have tackled the exploration issue by proposing new algorithmic ideas to generate alternative mechanisms to measure the novelty. However, most efforts in this direction have overlooked the influence of different design choices and parameter settings that have also been introduced to improve the effect of the generated intrinsic bonus, forgetting the application of those choices to other intrinsic motivation techniques that may also benefit of them. Furthermore, some of those intrinsic methods are applied with different base reinforcement algorithms (e.g. PPO, IMPALA) and neural network architectures, being hard to fairly compare the provided results and the actual progress provided by each solution. The goal of this work is to stress on this crucial matter in reinforcement learning over hard exploration environments, exposing the variability and susceptibility of avant-garde intrinsic motivation techniques to diverse design factors. Ultimately, our experiments herein reported underscore the importance of a careful selection of these design aspects coupled with the exploration requirements of the environment and the task in question under the same setup, so that fair comparisons can be guaranteed.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
Notes
- 1.
Depending on the task under consideration, the novelty can be associated to the very last performed action and/or the next state visited by the agent in the trajectory.
- 2.
Rollout is denoted as \(\tau \), whereas the i-th rollout is denoted as \(\tau _i\).
- 3.
We note that the choice of the neural network architecture is not just for the actor-critic modules, but also for IM approaches that hinge on neural computation.
- 4.
- 5.
Even with different neural architectures and base RL algorithms, they successfully solve the same tasks in MiniGrid.
- 6.
We note that the number of parameters is slightly increased, but they also differ in the type of layers that are used in each network (the two-headed network uses CNNs while the independent actor-critic only uses dense layers).
References
Silver, D., et al.: Mastering the game of go without human knowledge. Nature 550(7676), 354–359 (2017)
Baker, B., et al.: Emergent tool use from multi-agent autocurricula. arXiv:1909.07528 (2019)
Holzinger, A.: Introduction to machine learning & knowledge extraction (make). Mach. Learn. Knowl. Extr. 1(1), 1–20 (2019)
Aubret, A., Matignon, L., Hassas, S.: A survey on intrinsic motivation in reinforcement learning. arXiv:1908.06976 (2019)
Ho, J., Ermon, S.: Generative adversarial imitation learning. In: Advances in Neural Information Processing Systems, vol. 29 (2016)
Finn, C., Levine, S., Abbeel, P.: Guided cost learning: deep inverse optimal control via policy optimization (2016)
Grigorescu, D.: Curiosity, intrinsic motivation and the pleasure of knowledge. J. Educ. Sci. Psychol. 10(1) (2020)
Raileanu, R., Rocktäschel, T.: Ride: rewarding impact-driven exploration for procedurally-generated environments. arXiv:2002.12292 (2020)
Badia, A.P., et al.: Never give up: learning directed exploration strategies. arXiv:2002.06038 (2020)
Flet-Berliac, Y., Ferret, J., Pietquin, O., Preux, P., Geist, M.: Adversarially guided actor-critic. arXiv:2102.04376 (2021)
Pathak, D., Agrawal, P., Efros, A.A., Darrell, T.: Curiosity-driven exploration by self-supervised prediction. In: International Conference on Machine Learning, pp. 2778–2787 (2017)
Burda, Y., Edwards, H., Storkey, A., Klimov, O.: Exploration by random network distillation. arXiv:1810.12894 (2018)
Andrychowicz, M., et al.: What matters in on-policy reinforcement learning? A large-scale empirical study. arXiv:2006.05990 (2020)
Andrychowicz, M., et al.: What matters for on-policy deep actor-critic methods? A large-scale study. In: International Conference on Learning Representations (2020)
Bellemare, M., Srinivasan, S., Ostrovski, G., Schaul, T., Saxton, D., Munos, R.: Unifying count-based exploration and intrinsic motivation. In: Advances in Neural Information Processing Systems, vol. 29 (2016)
Tang, H., et al.: # exploration: a study of count-based exploration for deep reinforcement learning. In: Advances in Neural Information Processing Systems, pp. 2753–2762 (2017)
Machado, M.C., Bellemare, M.G., Bowling, M.: Count-based exploration with the successor representation. In: AAAI Conference on Artificial Intelligence, vol. 34, no. 4, pp. 5125–5133 (2020)
Pîslar, M., Szepesvari, D., Ostrovski, G., Borsa, D., Schaul, T.: When should agents explore? arXiv:2108.11811 (2021)
Zhang, T., et al.: NovelD: a simple yet effective exploration criterion. In: Advances in Neural Information Processing Systems, vol. 34 (2021)
Bougie, N., Ichise, R.: Fast and slow curiosity for high-level exploration in reinforcement learning. Appl. Intell. 51(2), 1086–1107 (2020). https://doi.org/10.1007/s10489-020-01849-3
Campero, A., Raileanu, R., Küttler, H., Tenenbaum, J.B., Rocktäschel, T., Grefenstette, E.: Learning with amigo: adversarially motivated intrinsic goals. arXiv:2006.12122 (2020)
Taiga, A.A., Fedus, W., Machado, M.C., Courville, A., Bellemare, M.G.: On bonus-based exploration methods in the arcade learning environment. arXiv:2109.11052 (2021)
Hessel, M., et al.: Rainbow: combining improvements in deep reinforcement learning. In: AAAI Conference on Artificial Intelligence (2018)
Burda, Y., Edwards, H., Pathak, D., Storkey, A., Darrell, T., Efros, A.A.: Large-scale study of curiosity-driven learning. In: ICLR (2019)
Schulman, J., Wolski, F., Dhariwal, P., Radford, A., Klimov, O.: Proximal policy optimization algorithms. arXiv:1707.06347 (2017)
Orsini, M., et al.: What matters for adversarial imitation learning? In: Advances in Neural Information Processing Systems, vol. 34 (2021)
Jing, X., et al.: Divide and explore: multi-agent separate exploration with shared intrinsic motivations (2022)
Seurin, M., Strub, F., Preux, P., Pietquin, O.: Don’t do what doesn’t matter: intrinsic motivation with action usefulness. arXiv:2105.09992 (2021)
Zha, D., Ma, W., Yuan, L., Hu, X., Liu, J.: Rank the episodes: a simple approach for exploration in procedurally-generated environments. arXiv:2101.08152 (2021)
Espeholt, L., et al.: IMPALA: scalable distributed deep-RL with importance weighted actor-learner architectures. In: International Conference on Machine Learning, pp. 1407–1416 (2018)
Chevalier-Boisvert, M., Willems, L., Pal, S.: Minimalistic gridworld environment for OpenAI gym. http://github.com/maximecb/gym-minigrid (2018)
Schulman, J., Moritz, P., Levine, S., Jordan, M., Abbeel, P.: High-dimensional continuous control using generalized advantage estimation. arXiv:1506.02438 (2015)
Acknowledgments
A. Andres and J. Del Ser would like to thank the Basque Government for its funding support through the research group MATHMODE (T1294-19) and the BIKAINTEK PhD support program.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2022 IFIP International Federation for Information Processing
About this paper
Cite this paper
Andres, A., Villar-Rodriguez, E., Del Ser, J. (2022). An Evaluation Study of Intrinsic Motivation Techniques Applied to Reinforcement Learning over Hard Exploration Environments. In: Holzinger, A., Kieseberg, P., Tjoa, A.M., Weippl, E. (eds) Machine Learning and Knowledge Extraction. CD-MAKE 2022. Lecture Notes in Computer Science, vol 13480. Springer, Cham. https://doi.org/10.1007/978-3-031-14463-9_13
Download citation
DOI: https://doi.org/10.1007/978-3-031-14463-9_13
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-14462-2
Online ISBN: 978-3-031-14463-9
eBook Packages: Computer ScienceComputer Science (R0)