Abstract
The paper proposes and analyses the evolution of a deep reinforcement learning agent in a stochastic environment that represents a simple game. We investigate the use of an embedded planning loop in the training of a model free agent, using a learned model in the style of I2A (Imagination-Augmented Agent), to solve a stochastic grid environment. The performance of the proposed agent architecture is compared against a baseline A2C (Advantage Actor Critic) agent.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
References
Schulman, J., Levine, S., Abbeel, P., Jordan, M., Moritz, P.: Trust region policy optimization. In: International Conference on Machine Learning, pp. 1889–1897 (2015)
Schulman, J., Wolski, F., Dhariwal, P., Radford, A., Klimov, O.: Proximal policy optimization algorithms (2017). https://arxiv.org/abs/1707.06347
Mnih, V., et al.: Asynchronous methods for deep reinforcement learning. In: International Conference on Machine Learning, pp. 1928–1937 (2016)
Haarnoja, T., Zhou, A., Abbeel, P., Levine, S.: Soft actor-critic: off-policy maximum entropy deep reinforcement learning with a stochastic actor (2018). https://arxiv.org/abs/1801.01290
Mnih, V., Kavukcuoglu, K., Silver, D., Graves, A., Antonoglou, I., Wierstra, D., Riedmiller, M.: Playing Atari with Deep Reinforcement Learning. Preprint at: https://arxiv.org/abs/1801.01290 (2013)
Lillicrap, T.P., et al.: Continuous control with deep reinforcement learning (2015). https://arxiv.org/abs/1509.02971
Silver, D., et al.: Mastering the game of go without human knowledge. Nature 550(7676), 354–359 (2017). https://doi.org/10.1038/nature24270
Silver, D., Hubert, T., Schrittwieser, J., Antonoglou, I., Lai, M., Guez, A., Lanctot, M., Sifre, L., Kumaran, D., Graepel, T., Lillicrap, T.: A general reinforcement learning algorithm that masters chess, Shogi, and go through self-play. Science 362(6419), 1140–1144 (2018)
Talvitie, E.: Model regularization for stable sample rollouts. In: Thirtieth Conference on Uncertainty in Artificial Intelligence, pp. 780–789 (2014)
Talvitie, E.: Agnostic system identification for monte carlo planning. In: Twenty-Ninth AAAI Conference on Artificial Intelligence (2015)
Racanière, S., et al.: Imagination-augmented agents for deep reinforcement learning. In: Advances in Neural Information Processing Systems, pp. 5690–5701 (2017)
Selvaraju, R.R., Cogswell, M., Das, A., Vedantam, R., Parikh, D., Batra, D.: Grad-Cam: visual explanations from deep networks via gradient-based localization. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 618–626 (2017)
Lapan, M.: Deep Reinforcement Learning Hands-On: Apply Modern RL Methods, with Deep Q-networks, Value Iteration, Policy Gradients, TRPO, AlphaGo Zero and More. Packt Publishing, Birmingham (2018)
Hafner, D., et al.: Learning latent dynamics for planning from pixels (2018). https://arxiv.org/abs/1811.04551
Ha, D., Schmidhuber, J.: World models (2018). https://arxiv.org/abs/1803.10122
Schrittwieser, J., et al.: Mastering Atari, go, chess and shogi by planning with a learned model (2019). https://arxiv.org/abs/1911.08265
Pal, C.V.: I2AGrid. Online source code (2020). https://github.com/ValentinPal/I2AGrid
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2020 Springer Nature Switzerland AG
About this paper
Cite this paper
Pal, CV., Leon, F. (2020). A Modified I2A Agent for Learning in a Stochastic Environment. In: Nguyen, N.T., Hoang, B.H., Huynh, C.P., Hwang, D., Trawiński, B., Vossen, G. (eds) Computational Collective Intelligence. ICCCI 2020. Lecture Notes in Computer Science(), vol 12496. Springer, Cham. https://doi.org/10.1007/978-3-030-63007-2_30
Download citation
DOI: https://doi.org/10.1007/978-3-030-63007-2_30
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-63006-5
Online ISBN: 978-3-030-63007-2
eBook Packages: Computer ScienceComputer Science (R0)