A Model of External Memory for Navigation in Partially Observable Visual Reinforcement Learning Tasks

Smith, Robert J.; Heywood, Malcolm I.

doi:10.1007/978-3-030-16670-0_11

A Model of External Memory for Navigation in Partially Observable Visual Reinforcement Learning Tasks

Robert J. Smith¹⁹ &
Malcolm I. Heywood¹⁹

Conference paper
First Online: 27 March 2019

983 Accesses
10 Citations

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 11451))

Abstract

Visual reinforcement learning implies that, decision making policies are identified under delayed rewards from an environment. Moreover, state information takes the form of high-dimensional data, such as video. In addition, although the video might characterize a 3D world in high resolution, partial observability will place significant limits on what the agent can actually perceive of the world. This means that the agent also has to: (1) provide efficient encodings of state, (2) store the encodings of state efficiently in some form of memory, (3) recall such memories after arbitrary delays for decision making. In this work, we demonstrate how an external memory model facilitates decision making in the complex world of multi-agent ‘deathmatches’ in the ViZDoom first person shooter environment. The ViZDoom environment provides a complex environment of multiple rooms and resources in which agents are spawned from multiple different locations. A unique approach is adopted to defining external memory for genetic programming agents in which: (1) the state of memory is shared across all programs. (2) Writing is formulated as a probabilistic process, resulting in different regions of memory having short- versus long-term memory. (3) Read operations are indexed, enabling programs to identify regions of external memory with specific temporal properties. We demonstrate that agents purposefully navigate the world when external memory is provided, whereas those without external memory are limited to merely ‘flight or fight’ behaviour.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Notes

1.
Given the larger state space than encountered in the original TPG work, we begin with larger teams, i.e. a state space of \(\approx \)78,000 versus \(\approx \)1,300 in [3, 4].
2.
Both TPG and M-TPG support per program stateful scalar memory, or a limited form of memory in which programs are unaware of each other’s state.
3.
Short-term memory is located at indexes near ‘50’, long-term at indexes near ‘1’ and ‘100’.
4.
Conversely, deep learning solutions down sampled to a \(84 \times 84\) state space.

References

Mnih, V., et al.: Human-level control through deep reinforcement learning. Nature 518(7540), 529–533 (2015)
Article Google Scholar
Hessel, M., et al.: Rainbow: combining improvements in deep reinforcement learning. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp. 3215–3222 (2018)
Google Scholar
Kelly, S., Heywood, M.I.: Emergent tangled graph representations for Atari game playing agents. In: McDermott, J., Castelli, M., Sekanina, L., Haasdijk, E., García-Sánchez, P. (eds.) EuroGP 2017. LNCS, vol. 10196, pp. 64–79. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-55696-3_5
Chapter Google Scholar
Kelly, S., Heywood, M.I.: Emergent solutions to high-dimensional multitask reinforcement learning. Evol. Comput. 26(3), 347–380 (2018)
Article Google Scholar
Wilson, D.G., Cussat-Blanc, S., Luga, H., Miller, J.F.: Evolving simple programs for playing Atari games. In: ACM Genetic and Evolutionary Computation Conference, pp. 229–236 (2018)
Google Scholar
Bellemare, M.G., Naddaf, Y., Veness, J., Bowling, M.: The arcade learning environment: an evaluation platform for general agents. J. Artif. Intell. Res. 47, 253–279 (2013)
Article Google Scholar
Graves, A., Wayne, G., Danihelka, I.: Neural Turing machines. CoRR abs/1410.5401 (2014)
Google Scholar
Greve, R.B., Jacobsen, E.J., Risi, S.: Evolving neural Turing machines for reward-based learning. In: ACM Genetic and Evolutionary Computation Conference, pp. 117–124 (2016)
Google Scholar
Merrild, J., Rasmussen, M.A., Risi, S.: HyperNTM: evolving scalable neural Turing machines through HyperNEAT. In: Sim, K., Kaufmann, P. (eds.) EvoApplications 2018. LNCS, vol. 10784, pp. 750–766. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-77538-8_50
Chapter Google Scholar
Jaderberg, M., et al.: Human-level performance in first-person multiplayer games with population-based deep reinforcement learning. CoRR abs/1807.01281 (2018)
Google Scholar
Nordin, P.: A compiling genetic programming system that directly manipulates the machine code. In: Kinnear, K.E. (ed.) Advances in Genetic Programming, pp. 311–332. MIT Press, Amsterdam (1994)
Google Scholar
Huelsbergen, L.: Toward simulated evolution of machine language iteration. In: Proceedings of the Annual Conference on Genetic Programming, pp. 315–320 (1996)
Google Scholar
Haddadi, F., Kayacik, H.G., Zincir-Heywood, A.N., Heywood, M.I.: Malicious automatically generated domain name detection using stateful-SBB. In: Esparcia-Alcázar, A.I. (ed.) EvoApplications 2013. LNCS, vol. 7835, pp. 529–539. Springer, Heidelberg (2013). https://doi.org/10.1007/978-3-642-37192-9_53
Chapter Google Scholar
Agapitos, A., Brabazon, A., O’Neill, M.: Genetic programming with memory for financial trading. In: Squillero, G., Burelli, P. (eds.) EvoApplications 2016. LNCS, vol. 9597, pp. 19–34. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-31204-0_2
Chapter Google Scholar
Teller, A.: Turing completeness in the language of genetic programming with indexed memory. In: IEEE Congress on Evolutionary Computation, pp. 136–141 (1994)
Google Scholar
Teller, A.: The evolution of mental models. In: Kinnear, K.E. (ed.) Advances in Genetic Programming, pp. 199–220. MIT Press, Amsterdam (1994)
Google Scholar
Langdon, W.B.: Genetic Programming and Data Structures. Kluwer Academic, Dordrecht (1998)
Book Google Scholar
Andre, D.: Evolution of mapmaking ability: strategies for the evolution of learning, planning, and memory using genetic programming. In: IEEE World Congress on Computational Intelligence, pp. 250–255 (1994)
Google Scholar
Brave, S.: The evolution of memory and mental models using genetic programming. In: Proceedings of the Annual Conference on Genetic Programming (1996)
Google Scholar
Nordin, P., Banzhaf, W., Brameier, M.: Evolution of world model for a minature robot using genetic programming. Robot. Auton. Syst. 25, 105–116 (1998)
Article Google Scholar
Spector, L., Luke, S.: Cultural transmission of information in genetic programming. In: Annual Conference on Genetic Programming, pp. 209–214 (1996)
Google Scholar
Kelly, S., Heywood, M.I.: Multi-task learning in Atari video games with emergent tangled program graphs. In: ACM Genetic and Evolutionary Computation Conference, pp. 195–202 (2017)
Google Scholar
Lichodzijewski, P., Heywood, M.I.: Symbiosis, complexification and simplicity under GP. In: Proceedings of the ACM Genetic and Evolutionary Computation Conference, pp. 853–860 (2010)
Google Scholar
Brameier, M., Banzhaf, W.: Linear Genetic Programming. Springer, New York (2007). https://doi.org/10.1007/978-0-387-31030-5
Book MATH Google Scholar
Kempka, M., Wydmuch, M., Runc, G., Toczek, J., Jaśkowski, W.: ViZDoom: a doom-based AI research platform for visual reinforcement learning. In: IEEE Conference on Computational Intelligence and Games, pp. 1–8 (2016)
Google Scholar
Smith, R.J., Heywood, M.I.: Scaling tangled program graphs to visual reinforcement learning in ViZDoom. In: Castelli, M., Sekanina, L., Zhang, M., Cagnoni, S., García-Sánchez, P. (eds.) EuroGP 2018. LNCS, vol. 10781, pp. 135–150. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-77553-1_9
Chapter Google Scholar
Quiroga, R.Q., Kreiman, G., Koch, C., Fried, I.: Sparse but not ‘grandmonther-cell’ coding in the medial temporal lobe. Trends Cogn. Sci. 12(3), 87–91 (2008)
Article Google Scholar

Download references

Acknowledgments

This research was supported by NSERC grant CRDJ 499792.

Author information

Authors and Affiliations

Dalhousie University, Halifax, NS, Canada
Robert J. Smith & Malcolm I. Heywood

Authors

Robert J. Smith
View author publications
You can also search for this author in PubMed Google Scholar
Malcolm I. Heywood
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Robert J. Smith .

Editor information

Editors and Affiliations

Brno University of Technology, Brno, Czech Republic
Lukas Sekanina
Memorial University, St. John's, NL, Canada
Ting Hu
University of Coimbra, Coimbra, Portugal
Nuno Lourenço
HTWK Leipzig University of Applied Sciences, Leipzig, Germany
Hendrik Richter
University of Granada, Granada, Spain
Pablo García-Sánchez

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Smith, R.J., Heywood, M.I. (2019). A Model of External Memory for Navigation in Partially Observable Visual Reinforcement Learning Tasks. In: Sekanina, L., Hu, T., Lourenço, N., Richter, H., García-Sánchez, P. (eds) Genetic Programming. EuroGP 2019. Lecture Notes in Computer Science(), vol 11451. Springer, Cham. https://doi.org/10.1007/978-3-030-16670-0_11

Download citation

DOI: https://doi.org/10.1007/978-3-030-16670-0_11
Published: 27 March 2019
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-16669-4
Online ISBN: 978-3-030-16670-0
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics