ViZDoom: DRQN with Prioritized Experience Replay, Double-Q Learning and Snapshot Ensembling

Schulze, Christopher; Schulze, Marcus

doi:10.1007/978-3-030-01054-6_1

Christopher Schulze¹⁷ &
Marcus Schulze¹⁷

Part of the book series: Advances in Intelligent Systems and Computing ((AISC,volume 868))

Included in the following conference series:

Proceedings of SAI Intelligent Systems Conference

2530 Accesses

Abstract

ViZDoom is a robust, first-person shooter reinforcement learning environment, characterized by a significant degree of latent state information. In this paper, double-Q learning and prioritized experience replay methods are tested under a certain ViZDoom combat scenario using a competitive deep recurrent Q-network (DRQN) architecture. In addition, an ensembling technique known as snapshot ensembling is employed using a specific annealed learning rate to observe differences in ensembling efficacy under these two methods. Annealed learning rates are important in general to the training of deep neural network models, as they shake up the status-quo and counter a model’s tending towards local optima. While both variants show performance exceeding those of built-in AI agents of the game, the known stabilizing effects of double-Q learning are illustrated, and priority experience replay is again validated in its usefulness by showing immediate results early on in agent development, with the caveat that value overestimation is accelerated in this case. In addition, some unique behaviors are observed to develop for priority experience replay (PER) and double-Q (DDQ) variants, and snapshot ensembling of both PER and DDQ proves a valuable method for improving performance of the ViZDoom Marine.

This work was not supported by any organization.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 169.00; Price excludes VAT (USA)

Softcover Book: USD 219.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

When Less May Be More: Exploring Similarity to Improve Experience Replay

Deep Reinforcement Learning in Strategic Board Game Environments

Learning from Monte Carlo Rollouts with Opponent Models for Playing Tron

References

Braylan, A., Hollenbeck, M., Meyerson, E., Miikkulainen, R.: Frame skip is a powerful parameter for learning to play Atari. Space 1600, 1800 (2005)
Google Scholar
Bellemare, M.G., Naddaf, Y., Veness, J., Bowling, M.: The arcade learning environment: an evaluation platform for general agents. J. Artif. Intell. Res. (2012)
Google Scholar
Watkins, C.J.C.H.: Learning from delayed rewards. Ph.D. thesis, University of Cambridge England (1989)
Google Scholar
Huang, G., Li, Y., Pleiss, G., Liu, Z., Hopcroft, J.E., Weinberger, K.Q.: Snapshot ensembles: Train 1, get M for free. ICLR submission (2017)
Google Scholar
Lample, G., Chaplot, D.S.: Playing FPS games with deep reinforcement learning. arXiv preprint arXiv:1609.05521 (2016)
Hasselt, H.V.: Double q-learning. In: Advances in Neural Information Processing Systems, pp. 2613–2621 (2010)
Google Scholar
Hausknecht, M., Stone, P.: Deep recurrent q-learning for partially observable MDPS. arXiv preprint arXiv:1507.06527 (2015)
Kempka, M., Wydmuch, M., Runc, G., Toczek, J., Jakowski, W.: Vizdoom: a doom-based AI research platform for visual reinforcement learning. In: IEEE Conference on Computational Intelligence and Games (2016)
Google Scholar
Justensen, N., Bontrager, P., Togelius, J., Risi, S.: Deep Learning for Video Game Playing. ArXiv:1708.07902
Sutton, R.S., Barto, A.G.: Introduction to Reinforcement Learning. MIT Press, Cambridge (1998)
Google Scholar
Sutton, R.S.: Learning to predict by the methods of temporal differences. Mach. Learn. 3(1), 9–44 (1988)
Google Scholar
Schaul, T., Quan, J., Antonoglou, I., Silver, D.: Prioritized experience replay. In: International Conference on Learning Representations (ICLR). http://arxiv.org/abs/1511.05952 (2016)
Dozat, T.: Incorporating Nesterov momentum into Adam. Technical Report, Stanford University (2015). http://cs229.stanford.edu/proj2015/054_report.pdf
Van Hasselt, H., Guez, A., Silver, D.: Deep reinforcement learning with double q-learning. arXiv preprint arXiv:1509.06461 (2015)

Download references

Acknowledgment

The authors here would like to thank: (1) the ViZDoom development team for their continued maintenance and extension of this remarkable RL framework, and (2) IEEE CIG for their support of the annual ViZDoom Limited and Full Deathmatch Competitions.

Author information

Authors and Affiliations

Austin, USA
Christopher Schulze & Marcus Schulze

Authors

Christopher Schulze
View author publications
You can also search for this author in PubMed Google Scholar
Marcus Schulze
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Christopher Schulze .

Editor information

Editors and Affiliations

Faculty of Science and Engineering, Saga University, Saga, Japan
Kohei Arai
The Science and Information (SAI) Organization, Bradford, UK
Supriya Kapoor
The Science and Information (SAI) Organization, Bradford, UK
Rahul Bhatia

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Schulze, C., Schulze, M. (2019). ViZDoom: DRQN with Prioritized Experience Replay, Double-Q Learning and Snapshot Ensembling. In: Arai, K., Kapoor, S., Bhatia, R. (eds) Intelligent Systems and Applications. IntelliSys 2018. Advances in Intelligent Systems and Computing, vol 868. Springer, Cham. https://doi.org/10.1007/978-3-030-01054-6_1

Download citation

DOI: https://doi.org/10.1007/978-3-030-01054-6_1
Published: 09 November 2018
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-01053-9
Online ISBN: 978-3-030-01054-6
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)

Publish with us

Policies and ethics