ABSTRACT
Neuroscience provides a rich source of inspiration for new types of algorithms and architectures to employ when building AI and the resulting biologically-plausible approaches that provide formal, testable models of brain function. The working memory toolkit (WMtk), was developed to assist the integration of an artificial neural network (ANN)-based computational neuroscience model of working memory into reinforcement learning (RL) agents, mitigating the details of ANN design and providing a simple symbolic encoding interface. While the WMtk allows RL agents to perform well in partially-observable domains, it requires prefiltering of sensory information by the programmer: a task often delegated to dimensional attention mechanisms in other cognitive architectures. To fill this gap, we develop and test a biologically-plausible dimensional attention filter for the WMtk and validate model performance using a partially-observable 1D maze task. We show that the attention filter improves learning behavior in two ways by: 1) speeding up learning in the short-term, early in training and 2) developing emergent alternative strategies which optimize performance over the long-term.
- A. Baddeley. 1986. Working Memory. Oxford University Press. Google ScholarCross Ref
- P. S. Churchland and T. J. Sejnowski. 1988. Perspectives on Cognitive Neuroscience. Vol. 242. Science. Google ScholarCross Ref
- A. Conway and R. Engle. 1996. Individual Difference in Working Memory Capacity: More Evidence for a General Capacity Theory. 4, 6 (1996), 577. Google ScholarCross Ref
- G.M. DuBois and J. L. Phillips. 2017. Working Memory Concept Encoding Using Holographic Reduced Representations. Modern Artificial Intelligence and Cognitive Science, Fort Wayne, US, 137--144.Google Scholar
- D. Hebb. 1949. The Organization of Behavior: A Neuropsychological Theory. 35, 5 (1949), 335. Google ScholarCross Ref
- J. J. Hopfield. 1982. Neural Networks and Physical Systems with Emergent Collective Computational Abilities. Proceedings of the National Academy 79, 8 (1982), 2554--2558. arXiv:https://www.pnas.org/content/79/8/2554.full.pdf. Google ScholarCross Ref
- T. Kriete, D. C. Noelle, J. D. Cohen, and R. C. O'Reilly. 2013. Indirection and Symbol-like Processing in the Prefrontal Cortex and Basal Ganglia. Proceedings of the National Academy of Sciences 110, 41 (oct 2013), 16390--16395. Google ScholarCross Ref
- J. K. Kruschke. 1992. ALCOVE: An Exemplar-based Connectionist Model of Category Learning. Psychological Review 99 (1992), 22--44.Google ScholarCross Ref
- Y. Niv. 2009. Reinforcement Learning in the Brain. 53 (2009), 139--154. Google ScholarCross Ref
- R. C. O'Reilly, D. C. Noelle, T. S. Braver, and J. D. Cohen. 2002. Prefrontal Cortex and Dynamic Categorization Tasks: Representational Organization and Neuromodulatory Control. Cerebral Cortex 12, 3 (Mar 2002), 246--257. Google ScholarCross Ref
- J. L. Phillips and D. C. Noelle. 2004. Reinforcement Learning of Dimensional Attention for Categorization. The 26th Annual Meeting of the Cognitive Science Society, Chicago, US, 1101--1106.Google Scholar
- J. L. Phillips and D. C. Noelle. 2006. Working Memory for Robots: Inspirations for Computational Neuroscience. 5th International Conference on Development and Learning, Bloomington, US.Google Scholar
- W. Schultz. 1998. Predictive Reward Signal of Dopamine Neurons. 80 (1998), 1--27.Google Scholar
- R. S. Sutton and A.G. Barto. 1998. Reinforcement Learning: An Introduction (first ed.). The MIT Press. http://incompleteideas.net/book/the-book-2nd.html.Google ScholarDigital Library
- A. Turing. 1950. Computing Machinery and Intelligence. 236 (1950), 433--460. Google ScholarCross Ref
- N. C. Waugh and D. A. Norman. 1965. Primary Memory. 72 (1965), 89--104. /. Google ScholarCross Ref
Index Terms
- Benefits of combining dimensional attention and working memory for partially observable reinforcement learning problems
Recommendations
Learning continuous-time working memory tasks with on-policy neural reinforcement learning
AbstractAn animals’ ability to learn how to make decisions based on sensory evidence is often well described by Reinforcement Learning (RL) frameworks. These frameworks, however, typically apply to event-based representations and lack the ...
Reward Shaping in Episodic Reinforcement Learning
AAMAS '17: Proceedings of the 16th Conference on Autonomous Agents and MultiAgent SystemsRecent advancements in reinforcement learning confirm that reinforcement learning techniques can solve large scale problems leading to high quality autonomous decision making. It is a matter of time until we will see large scale applications of ...
A pulse neural network reinforcement learning algorithm for partially observable Markov decision processes
This paper considers learning by a pulse neural network and proposes a new reinforcement learning algorithm focusing on the ability of pulse neuron elements to process time series. The conventional integrator neuron element is modeled in terms of the ...
Comments