An implementation of reinforcement learning based on spike timing dependent plasticity

Roberts, Patrick D.; Santiago, Roberto A.; Lafferriere, Gerardo

doi:10.1007/s00422-008-0265-6

An implementation of reinforcement learning based on spike timing dependent plasticity

Original Paper
Published: 22 October 2008

Volume 99, pages 517–523, (2008)
Cite this article

Biological Cybernetics Aims and scope Submit manuscript

Patrick D. Roberts¹,
Roberto A. Santiago² &
Gerardo Lafferriere³

222 Accesses
7 Citations
Explore all metrics

Abstract

An explanatory model is developed to show how synaptic learning mechanisms modeled through spike-timing dependent plasticity (STDP) can result in long-term adaptations consistent with reinforcement learning models. In particular, the reinforcement learning model known as temporal difference (TD) learning has been used to model neuronal behavior in the orbitofrontal cortex (OFC) and ventral tegmental area (VTA) of macaque monkey during reinforcement learning. While some research has observed, empirically, a connection between STDP and TD, there has not been an explanatory model directly connecting TD to STDP. Through analysis of the learning dynamics that results from a general form of a STDP learning rule, the connection between STDP and TD is explained. We further demonstrate that a STDP learning rule drives the spike probability of a reward predicting neuronal population to a stable equilibrium. The equilibrium solution has an increasing slope where the steepness of the slope predicts the probability of the reward, similar to the results from electrophysiological recordings suggesting a different slope that predicts the value of the anticipated reward of Montague and Berns [Neuron 36(2):265–284, 2002]. This connection begins to shed light into more recent data gathered from VTA and OFC which are not well modeled by TD. We suggest that STDP provides the underlying mechanism for explaining reinforcement learning and other higher level perceptual and cognitive function.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Learning Temporally Precise Spiking Patterns through Reward Modulated Spike-Timing-Dependent Plasticity

A behavioural correlate of the synaptic eligibility trace in the nucleus accumbens

Article Open access 04 February 2022

Pre- and Postsynaptic Properties Regulate Synaptic Competition through Spike-Timing-Dependent Plasticity

References

Bi Q, Mu-Ming P (1998) Precise spike timing determines the direction and extent of synaptic modifications in cultured hippocampal neurons. J Neurosci 18:10,464–10,472
Google Scholar
Daw ND, Dayan P (2004) Neuroscience. Matchmaking. Science 304(5678): 1753–1754
Article PubMed CAS Google Scholar
Feldman DE (2000) Timing-based LTP and LTD at vertical inputs to layer II/III—pyramids in rat cortex. Neuron 27: 45–56
Article PubMed CAS Google Scholar
Froemke RC, Dan Y (2002) Spike-timing-dependent synaptic modification induced by natural spike trains. Nature 416(6879): 433–438
Article PubMed CAS Google Scholar
Houk J, Davis J, Beiser D (1995) Models of information processing in the basal ganglia. MIT Press, Cambridge
Google Scholar
Izhikevich EM (2007) Solving the distal reward problem through linkage of stdp and dopamine signaling. Cereb Cortex 17(10): 2443–2452
Article PubMed Google Scholar
Klopf A (1988) A neuronal model for classical conditioning. Psychobiology 16: 85–125
Google Scholar
Kosko B (l986) Differential Hebbian learning. In: Denker JS (ed) AIP Conference Proceedings 151: Neural Networks for Computing. American Institute of Physics, New York, pp 277–288
Markram H, Lübke J, Frotscher M, Sakmann B (1997) Regulation of synaptic efficacy by coincidence of postsynaptic APs and EPSPs. Science 275: 213–215
Article PubMed CAS Google Scholar
Montague P, Berns G (2002) Neural economics and the biological substrates of valuation. Neuron 36(2): 265–284
Article PubMed CAS Google Scholar
Montague PR, Dayan P, Sejnowski TJ (1996) A framework for mesencephalic dopamine systems based on predictive Hebbian learning. J Neurosci 16(5): 1936–1947
PubMed CAS Google Scholar
Morris G, Nevet A, Arkadir D, Vaadia E, Bergman H (2006) Midbrain dopamine neurons encode decisions for future action. Nat Neurosci 9: 1057–1063
Article PubMed CAS Google Scholar
Niv Y, Duff MO, Dayan P (2005) Dopamine, uncertainty and TD learning. Behav Brain Funct 1: 6
Article PubMed Google Scholar
Otani S, Daniel H, Roisin MP, Crepel F (2003) Dopaminergic modulation of long-term synaptic plasticity in rat prefrontal neurons. Cereb Cortex 13(11): 1251–1256
Article PubMed Google Scholar
Pawlak V, Kerr J (2008) Dopamine receptor activation is required for corticostriatal spike-timing-dependent plasticity. J Neurosci 28(10): 2435
Article PubMed CAS Google Scholar
Rao RPN, Sejnowski TJ (2000) Predictive sequence learning in recurrent neocortical circuits. In: Solla SA, Leen TK, Muller KR(eds) Advances in neural information processing systems, vol 12. MIT Press, Cambridge, pp 164–170
Google Scholar
Roberts PD (1999) Computational consequences of temporally asymmetric learning rules: I. Differential Hebbian learning. J Compu Neurosci 7: 235–246
Article CAS Google Scholar
Roberts PD (2004) Recurrent biological neural networks: the weak and noisy limit. Phys Rev E 69: 031910
Article Google Scholar
Roberts PD, Bell CC (2000) Computational consequences of temporally asymmetric learning rules: II. Sensory image cancellation. J Compu Neurosci 9: 67–83
Article CAS Google Scholar
Schultz W, Dayan P, Montague PR (1997) A neural substrate of prediction and reward. Science 275: 1593–1598
Article PubMed CAS Google Scholar
Song S, Miller KD, Abbott LF (1993) Competitive hebbian learning through spike-timing-dependent synaptic plasticity. Nature Neurosci 3: 919–926
Google Scholar
Suri R, Schultz W (2001) Temporal difference model reproduces anticipatory neural activity. Neural Comput 13: 841–862
Article PubMed CAS Google Scholar
Sutton RS, Barto AG (1998) Reinforcement learning: an introduction. MIT Press, Cambridge
Google Scholar
Tremblay L, Schultz W (1999) Relative reward preference in primate orbitofrontal cortex. Nature 398(6729): 661–663
Article Google Scholar
Waelti P, Dickinson A, Schultz W (2001) Dopamine responses comply with basic assumptions of formal learning theory. Nature 412: 43–48
Article PubMed CAS Google Scholar
Wörgötter F, Porr B (2005) Temporal sequence learning, prediction, and control: a review of different models and their relation to biological mechanisms. Neural Comput 17(2): 245–319
Article PubMed Google Scholar

Download references

Author information

Authors and Affiliations

Department of Science and Engineering, Oregon Health and Science University, Portland, OR, 97239, USA
Patrick D. Roberts
Systems Science Program, Portland State University, Portland, OR, 97207, USA
Roberto A. Santiago
Department of Mathematics and Statistics, Portland State University, Portland, OR, 97207, USA
Gerardo Lafferriere

Authors

Patrick D. Roberts
View author publications
You can also search for this author in PubMed Google Scholar
Roberto A. Santiago
View author publications
You can also search for this author in PubMed Google Scholar
Gerardo Lafferriere
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Patrick D. Roberts.

Additional information

This material is based upon work supported by the National Science Foundation under Grants No. IOB-0445648 (PDR) and DMS-0408334 (GL) and by a Career Support grant from Portland State University (GL).

Rights and permissions

Reprints and permissions

About this article

Cite this article

Roberts, P.D., Santiago, R.A. & Lafferriere, G. An implementation of reinforcement learning based on spike timing dependent plasticity. Biol Cybern 99, 517–523 (2008). https://doi.org/10.1007/s00422-008-0265-6

Download citation

Received: 21 February 2008
Accepted: 19 September 2008
Published: 22 October 2008
Issue Date: December 2008
DOI: https://doi.org/10.1007/s00422-008-0265-6

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

An implementation of reinforcement learning based on spike timing dependent plasticity

Abstract

Access this article

Similar content being viewed by others

Learning Temporally Precise Spiking Patterns through Reward Modulated Spike-Timing-Dependent Plasticity

A behavioural correlate of the synaptic eligibility trace in the nucleus accumbens

Pre- and Postsynaptic Properties Regulate Synaptic Competition through Spike-Timing-Dependent Plasticity

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Keywords

Navigation

An implementation of reinforcement learning based on spike timing dependent plasticity

Abstract

Access this article

Similar content being viewed by others

Learning Temporally Precise Spiking Patterns through Reward Modulated Spike-Timing-Dependent Plasticity

A behavioural correlate of the synaptic eligibility trace in the nucleus accumbens

Pre- and Postsynaptic Properties Regulate Synaptic Competition through Spike-Timing-Dependent Plasticity

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation