A Modified I2A Agent for Learning in a Stochastic Environment

Pal, Constantin-Valentin; Leon, Florin

doi:10.1007/978-3-030-63007-2_30

Constantin-Valentin Pal¹⁴ &
Florin Leon¹⁴

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 12496))

Included in the following conference series:

International Conference on Computational Collective Intelligence

1262 Accesses
1 Citations

Abstract

The paper proposes and analyses the evolution of a deep reinforcement learning agent in a stochastic environment that represents a simple game. We investigate the use of an embedded planning loop in the training of a model free agent, using a learned model in the style of I2A (Imagination-Augmented Agent), to solve a stochastic grid environment. The performance of the proposed agent architecture is compared against a baseline A2C (Advantage Actor Critic) agent.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
https://gym.openai.com/.

References

Schulman, J., Levine, S., Abbeel, P., Jordan, M., Moritz, P.: Trust region policy optimization. In: International Conference on Machine Learning, pp. 1889–1897 (2015)
Google Scholar
Schulman, J., Wolski, F., Dhariwal, P., Radford, A., Klimov, O.: Proximal policy optimization algorithms (2017). https://arxiv.org/abs/1707.06347
Mnih, V., et al.: Asynchronous methods for deep reinforcement learning. In: International Conference on Machine Learning, pp. 1928–1937 (2016)
Google Scholar
Haarnoja, T., Zhou, A., Abbeel, P., Levine, S.: Soft actor-critic: off-policy maximum entropy deep reinforcement learning with a stochastic actor (2018). https://arxiv.org/abs/1801.01290
Mnih, V., Kavukcuoglu, K., Silver, D., Graves, A., Antonoglou, I., Wierstra, D., Riedmiller, M.: Playing Atari with Deep Reinforcement Learning. Preprint at: https://arxiv.org/abs/1801.01290 (2013)
Lillicrap, T.P., et al.: Continuous control with deep reinforcement learning (2015). https://arxiv.org/abs/1509.02971
Silver, D., et al.: Mastering the game of go without human knowledge. Nature 550(7676), 354–359 (2017). https://doi.org/10.1038/nature24270
Silver, D., Hubert, T., Schrittwieser, J., Antonoglou, I., Lai, M., Guez, A., Lanctot, M., Sifre, L., Kumaran, D., Graepel, T., Lillicrap, T.: A general reinforcement learning algorithm that masters chess, Shogi, and go through self-play. Science 362(6419), 1140–1144 (2018)
Article MathSciNet Google Scholar
Talvitie, E.: Model regularization for stable sample rollouts. In: Thirtieth Conference on Uncertainty in Artificial Intelligence, pp. 780–789 (2014)
Google Scholar
Talvitie, E.: Agnostic system identification for monte carlo planning. In: Twenty-Ninth AAAI Conference on Artificial Intelligence (2015)
Google Scholar
Racanière, S., et al.: Imagination-augmented agents for deep reinforcement learning. In: Advances in Neural Information Processing Systems, pp. 5690–5701 (2017)
Google Scholar
Selvaraju, R.R., Cogswell, M., Das, A., Vedantam, R., Parikh, D., Batra, D.: Grad-Cam: visual explanations from deep networks via gradient-based localization. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 618–626 (2017)
Google Scholar
Lapan, M.: Deep Reinforcement Learning Hands-On: Apply Modern RL Methods, with Deep Q-networks, Value Iteration, Policy Gradients, TRPO, AlphaGo Zero and More. Packt Publishing, Birmingham (2018)
Google Scholar
Hafner, D., et al.: Learning latent dynamics for planning from pixels (2018). https://arxiv.org/abs/1811.04551
Ha, D., Schmidhuber, J.: World models (2018). https://arxiv.org/abs/1803.10122
Schrittwieser, J., et al.: Mastering Atari, go, chess and shogi by planning with a learned model (2019). https://arxiv.org/abs/1911.08265
Pal, C.V.: I2AGrid. Online source code (2020). https://github.com/ValentinPal/I2AGrid

Download references

Author information

Authors and Affiliations

Department of Computer Science and Engineering, “Gheorghe Asachi” Technical University of Iași, Iași, Romania
Constantin-Valentin Pal & Florin Leon

Authors

Constantin-Valentin Pal
View author publications
You can also search for this author in PubMed Google Scholar
Florin Leon
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Florin Leon .

Editor information

Editors and Affiliations

Department of Applied Informatics, Wrocław University of Science and Technology, Wroclaw, Poland
Ngoc Thanh Nguyen
Thua Thien Hue Center of Information Technology, Hue, Vietnam
Bao Hung Hoang
Vietnam - Korea University of Information and Communication Technology, University of Da Nang, Da Nang, Vietnam
Cong Phap Huynh
Department of Computer Engineering, Yeungnam University, Gyeungsan, Korea (Republic of)
Dosam Hwang
Department of Applied Informatics, Wrocław University of Science and Technology, Wroclaw, Poland
Bogdan Trawiński
Department of Information Systems, University of Münster, Münster, Germany
Gottfried Vossen

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Pal, CV., Leon, F. (2020). A Modified I2A Agent for Learning in a Stochastic Environment. In: Nguyen, N.T., Hoang, B.H., Huynh, C.P., Hwang, D., Trawiński, B., Vossen, G. (eds) Computational Collective Intelligence. ICCCI 2020. Lecture Notes in Computer Science(), vol 12496. Springer, Cham. https://doi.org/10.1007/978-3-030-63007-2_30

Download citation

DOI: https://doi.org/10.1007/978-3-030-63007-2_30
Published: 23 November 2020
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-63006-5
Online ISBN: 978-3-030-63007-2
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics