Elsevier

Neurocomputing

Volume 461, 21 October 2021, Pages 635-656
Neurocomputing

Learning continuous-time working memory tasks with on-policy neural reinforcement learning

https://doi.org/10.1016/j.neucom.2020.11.072Get rights and content
Under a Creative Commons license
open access

Abstract

An animals’ ability to learn how to make decisions based on sensory evidence is often well described by Reinforcement Learning (RL) frameworks. These frameworks, however, typically apply to event-based representations and lack the explicit and fine-grained notion of time needed to study psychophysically relevant measures like reaction times and psychometric curves. Here, we develop and use a biologically plausible continuous-time RL scheme of CT-AuGMEnT (Continuous-Time Attention-Gated MEmory Tagging) to study these behavioural quantities. We show how CT-AuGMEnT implements on-policy SARSA learning as a biologically plausible form of reinforcement learning with working memory units using ‘attentional’ feedback. We show that the CT-AuGMEnT model efficiently learns tasks in continuous time and can learn to accumulate relevant evidence through time. This allows the model to link task difficulty to psychophysical measurements such as accuracy and reaction-times. We further show how the implementation of a separate accessory network for feedback allows the model to learn continuously, also in case of significant transmission delays between the network’s feedforward and feedback layers and even when the accessory network is randomly initialized. Our results demonstrate that CT-AuGMEnT represents a fully time-continuous biologically plausible end-to-end RL model for learning to integrate evidence and make decisions.

Keywords

Reinforcement learning
Neural networks
Working memory
Selective attention
Continuous-time SARSA

Cited by (0)

Dr. Davide Zambrano (M) is a senior post-doc in the Laboratory of Intelligent Systems of the Ecole polytechnique fèdérale de Lausanne (EPFL). He received his M.Sc. degree (cum laude) in biomedical engineering and his Ph.D. degree in health technologies from the University of Pisa, Pisa, Italy, in 2008 and 2012, respectively. He worked as a Ph.D. and post-doc at The BioRobotics Institute of The Scuola Superiore Sant Anna, Pisa, Italy. From 2014 to 2018 he has worked as post-doc at the CWI Machine Learning group with Prof. Sander Bohte on spiking neural networks, biological plausible reinforcement learning models.

Prof. Pieter Roelfsema (M) is director of the Netherlands Institute for Neuroscience and he also heads the lab “Vision & Cognition” at this institute. Additionally, he is a part-time professor at the University of Amsterdam and also at the Free University Amsterdam. He investigates how neurons in different brain areas work together during visual cognition and he proposed the influential theory that the processing of visual stimuli occurs in different phases with different contributions of feedforward and feedback connections. Roelfsema has received many awards including the NWO VICI award and the EU ERC advanced grant.

Prof Dr Sander M. Bohté (M) is a senior researcher and PI in the CWI Machine Learning group, and also a part-time professor of Cognitive Computational Neuroscience at the University of Amsterdam and of Bio-Inspired Neural Networks at the Rijksuniversiteit Groningen, The Netherlands. He received his PhD in 2003 on the topic of “Spiking Neural Networks” and worked as a post-doc with Prof Dr Michael Mozer at the University of Colorado, Boulder, USA. Since 2016, he is part of the CWI Machine Learning group, where his research bridges the field of neuroscience and bio-inspired neural networks.