Skip to main content

Learning from Delayed Reward und Punishment in a Spiking Neural Network Model of Basal Ganglia with Opposing D1/D2 Plasticity

  • Conference paper
Artificial Neural Networks and Machine Learning – ICANN 2012 (ICANN 2012)

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 7552))

Included in the following conference series:

  • 4134 Accesses

Abstract

Extending previous work, we introduce a spiking actor-critic network model of learning from reward and punishment in the basal ganglia. In the model, the striatum is taken to be segregated into populations of medium spiny neurons (MSNs) that carry either D1 or D2 dopamine receptor type. This segregation allows explicit representation of both positive and negative expected outcome within the respective population. In line with recent experiments, we further assume that D1 and D2 MSN populations have opposing dopamine-modulated bidirectional synaptic plasticity. Experiments were conducted in a grid world, where a moving agent had to reach a remote rewarded goal state. The network learned not only to approach the rewarded goal, but also to consequently avoid punishments as opposed to the previous model. The spiking network model explains functional role of D1/D2 MSN segregation within striatum, specifically the reversed direction of dopamine-dependent plasticity found at synapses converging on different MSNs.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Schultz, W., Dayan, P., Montague, P.R.: A neural substrate of prediction and reward. Science 275(5306), 1593–1599 (1997)

    Article  Google Scholar 

  2. Gurney, K., Prescott, T.J., Wickens, J.R., Redgrave, P.: Computational models of the basal ganglia: from robots to membranes. Trends Neurosci. 27(8), 453–459 (2004)

    Article  Google Scholar 

  3. Doya, K.: Reinforcement learning: Computational theory and biological mechanisms. HFSP J 1(1), 30–40 (2007)

    Article  Google Scholar 

  4. Frank, M.J.: Computational models of motivated action selection in corticostriatal circuits. Curr. Opin. Neurobiol. (April 2011)

    Google Scholar 

  5. Potjans, W., Diesmann, M., Morrison, A.: An imperfect dopaminergic error signal can drive temporal-difference learning. PLoS Comput. Biol. 7(5), e1001133 (2011)

    Google Scholar 

  6. Sutton, R.S., Barto, A.G.: Reinforcement learning: an introduction. IEEE Trans. Neural Netw. 9(5), 1054 (1998)

    Article  Google Scholar 

  7. Montague, P.R., Hyman, S.E., Cohen, J.D.: Computational roles for dopamine in behavioural control. Nature 431(7010), 760–767 (2004)

    Article  Google Scholar 

  8. Schultz, W.: Behavioral dopamine signals. Trends Neurosci. 30(5), 203–210 (2007)

    Article  Google Scholar 

  9. Kreitzer, A.C., Malenka, R.C.: Striatal plasticity and basal ganglia circuit function. Neuron 60(4), 543–554 (2008)

    Article  Google Scholar 

  10. Wickens, J.R.: Synaptic plasticity in the basal ganglia. Behav. Brain Res. 199(1), 119–128 (2009)

    Article  Google Scholar 

  11. Sesack, S.R., Grace, A.A.: Cortico-basal ganglia reward network: microcircuitry. Neuropsychopharmacology 35(1), 27–47 (2010)

    Article  Google Scholar 

  12. Shen, W., Flajolet, M., Greengard, P., James Surmeier, D.: Dichotomous dopaminergic control of striatal synaptic plasticity. Science 321(5890), 848–851 (2008)

    Article  Google Scholar 

  13. Gerstner, W., Kistler, W.M.: Spiking Neuron Models. Cambridge University Press (2002) ISBN 0521890799

    Google Scholar 

  14. Gewaltig, M.-O., Diesmann, M.: NEST (NEural Simulation Tool). Scholarpedia 2(4), 1430 (2007)

    Article  Google Scholar 

  15. Morrison, A., Diesmann, M., Gerstner, W.: Phenomenological models of synaptic plasticity based on spike timing. Biol. Cybern. 98(6), 459–478 (2008)

    Article  MathSciNet  MATH  Google Scholar 

  16. Clopath, C., Büsing, L., Vasilaki, E., Gerstner, W.: Connectivity reflects coding: a model of voltage-based stdp with homeostasis. Nat. Neurosci. 13(3), 344–352 (2010)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2012 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Jitsev, J., Abraham, N., Morrison, A., Tittgemeyer, M. (2012). Learning from Delayed Reward und Punishment in a Spiking Neural Network Model of Basal Ganglia with Opposing D1/D2 Plasticity. In: Villa, A.E.P., Duch, W., Érdi, P., Masulli, F., Palm, G. (eds) Artificial Neural Networks and Machine Learning – ICANN 2012. ICANN 2012. Lecture Notes in Computer Science, vol 7552. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-33269-2_58

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-33269-2_58

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-33268-5

  • Online ISBN: 978-3-642-33269-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics