Abstract
Visual reinforcement learning implies that, decision making policies are identified under delayed rewards from an environment. Moreover, state information takes the form of high-dimensional data, such as video. In addition, although the video might characterize a 3D world in high resolution, partial observability will place significant limits on what the agent can actually perceive of the world. This means that the agent also has to: (1) provide efficient encodings of state, (2) store the encodings of state efficiently in some form of memory, (3) recall such memories after arbitrary delays for decision making. In this work, we demonstrate how an external memory model facilitates decision making in the complex world of multi-agent ‘deathmatches’ in the ViZDoom first person shooter environment. The ViZDoom environment provides a complex environment of multiple rooms and resources in which agents are spawned from multiple different locations. A unique approach is adopted to defining external memory for genetic programming agents in which: (1) the state of memory is shared across all programs. (2) Writing is formulated as a probabilistic process, resulting in different regions of memory having short- versus long-term memory. (3) Read operations are indexed, enabling programs to identify regions of external memory with specific temporal properties. We demonstrate that agents purposefully navigate the world when external memory is provided, whereas those without external memory are limited to merely ‘flight or fight’ behaviour.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsNotes
- 1.
- 2.
Both TPG and M-TPG support per program stateful scalar memory, or a limited form of memory in which programs are unaware of each other’s state.
- 3.
Short-term memory is located at indexes near ‘50’, long-term at indexes near ‘1’ and ‘100’.
- 4.
Conversely, deep learning solutions down sampled to a \(84 \times 84\) state space.
References
Mnih, V., et al.: Human-level control through deep reinforcement learning. Nature 518(7540), 529–533 (2015)
Hessel, M., et al.: Rainbow: combining improvements in deep reinforcement learning. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp. 3215–3222 (2018)
Kelly, S., Heywood, M.I.: Emergent tangled graph representations for Atari game playing agents. In: McDermott, J., Castelli, M., Sekanina, L., Haasdijk, E., García-Sánchez, P. (eds.) EuroGP 2017. LNCS, vol. 10196, pp. 64–79. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-55696-3_5
Kelly, S., Heywood, M.I.: Emergent solutions to high-dimensional multitask reinforcement learning. Evol. Comput. 26(3), 347–380 (2018)
Wilson, D.G., Cussat-Blanc, S., Luga, H., Miller, J.F.: Evolving simple programs for playing Atari games. In: ACM Genetic and Evolutionary Computation Conference, pp. 229–236 (2018)
Bellemare, M.G., Naddaf, Y., Veness, J., Bowling, M.: The arcade learning environment: an evaluation platform for general agents. J. Artif. Intell. Res. 47, 253–279 (2013)
Graves, A., Wayne, G., Danihelka, I.: Neural Turing machines. CoRR abs/1410.5401 (2014)
Greve, R.B., Jacobsen, E.J., Risi, S.: Evolving neural Turing machines for reward-based learning. In: ACM Genetic and Evolutionary Computation Conference, pp. 117–124 (2016)
Merrild, J., Rasmussen, M.A., Risi, S.: HyperNTM: evolving scalable neural Turing machines through HyperNEAT. In: Sim, K., Kaufmann, P. (eds.) EvoApplications 2018. LNCS, vol. 10784, pp. 750–766. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-77538-8_50
Jaderberg, M., et al.: Human-level performance in first-person multiplayer games with population-based deep reinforcement learning. CoRR abs/1807.01281 (2018)
Nordin, P.: A compiling genetic programming system that directly manipulates the machine code. In: Kinnear, K.E. (ed.) Advances in Genetic Programming, pp. 311–332. MIT Press, Amsterdam (1994)
Huelsbergen, L.: Toward simulated evolution of machine language iteration. In: Proceedings of the Annual Conference on Genetic Programming, pp. 315–320 (1996)
Haddadi, F., Kayacik, H.G., Zincir-Heywood, A.N., Heywood, M.I.: Malicious automatically generated domain name detection using stateful-SBB. In: Esparcia-Alcázar, A.I. (ed.) EvoApplications 2013. LNCS, vol. 7835, pp. 529–539. Springer, Heidelberg (2013). https://doi.org/10.1007/978-3-642-37192-9_53
Agapitos, A., Brabazon, A., O’Neill, M.: Genetic programming with memory for financial trading. In: Squillero, G., Burelli, P. (eds.) EvoApplications 2016. LNCS, vol. 9597, pp. 19–34. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-31204-0_2
Teller, A.: Turing completeness in the language of genetic programming with indexed memory. In: IEEE Congress on Evolutionary Computation, pp. 136–141 (1994)
Teller, A.: The evolution of mental models. In: Kinnear, K.E. (ed.) Advances in Genetic Programming, pp. 199–220. MIT Press, Amsterdam (1994)
Langdon, W.B.: Genetic Programming and Data Structures. Kluwer Academic, Dordrecht (1998)
Andre, D.: Evolution of mapmaking ability: strategies for the evolution of learning, planning, and memory using genetic programming. In: IEEE World Congress on Computational Intelligence, pp. 250–255 (1994)
Brave, S.: The evolution of memory and mental models using genetic programming. In: Proceedings of the Annual Conference on Genetic Programming (1996)
Nordin, P., Banzhaf, W., Brameier, M.: Evolution of world model for a minature robot using genetic programming. Robot. Auton. Syst. 25, 105–116 (1998)
Spector, L., Luke, S.: Cultural transmission of information in genetic programming. In: Annual Conference on Genetic Programming, pp. 209–214 (1996)
Kelly, S., Heywood, M.I.: Multi-task learning in Atari video games with emergent tangled program graphs. In: ACM Genetic and Evolutionary Computation Conference, pp. 195–202 (2017)
Lichodzijewski, P., Heywood, M.I.: Symbiosis, complexification and simplicity under GP. In: Proceedings of the ACM Genetic and Evolutionary Computation Conference, pp. 853–860 (2010)
Brameier, M., Banzhaf, W.: Linear Genetic Programming. Springer, New York (2007). https://doi.org/10.1007/978-0-387-31030-5
Kempka, M., Wydmuch, M., Runc, G., Toczek, J., Jaśkowski, W.: ViZDoom: a doom-based AI research platform for visual reinforcement learning. In: IEEE Conference on Computational Intelligence and Games, pp. 1–8 (2016)
Smith, R.J., Heywood, M.I.: Scaling tangled program graphs to visual reinforcement learning in ViZDoom. In: Castelli, M., Sekanina, L., Zhang, M., Cagnoni, S., García-Sánchez, P. (eds.) EuroGP 2018. LNCS, vol. 10781, pp. 135–150. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-77553-1_9
Quiroga, R.Q., Kreiman, G., Koch, C., Fried, I.: Sparse but not ‘grandmonther-cell’ coding in the medial temporal lobe. Trends Cogn. Sci. 12(3), 87–91 (2008)
Acknowledgments
This research was supported by NSERC grant CRDJ 499792.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2019 Springer Nature Switzerland AG
About this paper
Cite this paper
Smith, R.J., Heywood, M.I. (2019). A Model of External Memory for Navigation in Partially Observable Visual Reinforcement Learning Tasks. In: Sekanina, L., Hu, T., Lourenço, N., Richter, H., García-Sánchez, P. (eds) Genetic Programming. EuroGP 2019. Lecture Notes in Computer Science(), vol 11451. Springer, Cham. https://doi.org/10.1007/978-3-030-16670-0_11
Download citation
DOI: https://doi.org/10.1007/978-3-030-16670-0_11
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-16669-4
Online ISBN: 978-3-030-16670-0
eBook Packages: Computer ScienceComputer Science (R0)