Modeling dopamine activity by Reinforcement Learning methods: implications from two recent models

Horgan, Patrick; Cummins, Fred

doi:10.1007/s10462-007-9036-3

Modeling dopamine activity by Reinforcement Learning methods: implications from two recent models

Published: 14 September 2007

Volume 26, pages 49–62, (2006)
Cite this article

Artificial Intelligence Review Aims and scope Submit manuscript

Patrick Horgan^1,2 &
Fred Cummins¹

76 Accesses
Explore all metrics

Abstract

We compare and contrast two recent computational models of dopamine activity in the human central nervous system at the level of single cells. Both models implement reinforcement learning using the method of temporal differences (TD). To address drawbacks with earlier models, both models employ internal models. The principal difference between the internal models lies in the degree to which they implement the properties of the environment. One employs a partially observable semi-Markov environment; the other uses a form of transition matrix in an iterative manner to generate the sum of future predictions. We show that the internal models employ fundamentally different assumptions and that the assumptions are problematic in each case. Both models lack specification regarding their biological implementation to different degrees. In addition, the model employing the partially observable semi-Markov environment seems to have redundant features. In contrast, the alternate model appears to lack generalizability.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Modulation of Dopamine for Adaptive Learning: a Neurocomputational Model

Article 12 June 2020

Jeffrey B. Inglis, Vivian V. Valentin & F. Gregory Ashby

Reinforcement-learning in fronto-striatal circuits

Article 05 August 2021

Bruno Averbeck & John P. O’Doherty

Dopamine transients do not act as model-free prediction errors during associative learning

Article Open access 08 January 2020

Melissa J. Sharpe, Hannah M. Batchelor, … Geoffrey Schoenbaum

References

Berridge KC and Robinson TE (1998). What is the role of dopamine in reward: Hedonic impact, reward learning, or incentive salience. Brain Res Rev 28: 309–369
Article Google Scholar
Cannon CM and Palmiter RD (2003). Reward without dopamine. J Neurosci 23: 10827–10831
Google Scholar
Crossman AR, Neary D (2000) Neuroanatomy, an illustrated colour text, 2nd edn. Churchill Livingstone. Edinburgh, pp 151–160
Daw ND (2003) Reinforcement Learning models of the dopamine system and their behavioural implications. Ph.D. thesis, School of Computer Science, Carnegie Mellon University.
Daw ND, Courville AC and Touretzky DS (2003). Timing and partial observability in the dopamine system. In: Becker, S, Thrun, S, and Obermayer, K (eds) Advances in neural information processing systems, vol 15, pp 83–90. MIT Press, Cambridge MA
Google Scholar
Daw ND, Courville AC and Touretzky DS (2006). Representation and timing in theories of the dopamine system. Neural Comput 18: 1637–1677
Article MATH MathSciNet Google Scholar
Daw ND, Niv Y and Dayan P (2005). Uncertainty-based competition between prefrontal and dorsolateral striatal systems for behavioral control. Nat Neurosci 8: 1704–1711
Article Google Scholar
Dayan P, Abbott LF (2001) Theoretical neuroscience: computational and mathematical modeling of neural systems. MIT Press, pp 279–330
Fuxe K, Hökfelt T, Johansson O, Jonsson G, Lidbrink P and Ljungdahl Å (1974). The origin of the dopamine nerve terminals in limbic and frontal cortex. Evidence for meso-cortico dopamine neurons. Brain Res 82: 349–355
Google Scholar
Haber SN, Fudge JL and McFarland NR (2000). Striatonigrostriatal pathways in primates form an ascending spiral from the shell to the dorsolateral striatum. J Neurosci 20: 2369–2382
Google Scholar
Hollerman JR and Schultz W (1998). Dopamine neurons report an error in the temporal prediction of reward during learning. Nat Neurosci 1: 304–309
Article Google Scholar
Joel D, Niv Y and Ruppin E (2002). Actor-critic models of the basal ganglia: new anatomical and computational perspectives. Neural Net 15: 535–547
Article Google Scholar
Kakade S and Dayan P (2002). Dopamine: generalization and bonuses. Neural Net 15: 549–559
Article Google Scholar
Kelley AE, Baldo BA, Pratt WE and Will MJ (2005). Corticostriatal-hypothalamic circuitry and food motivation: integration of energy, action and reward. Physiol Behav 86: 773–795
Article Google Scholar
Mirenowicz J and Schultz W (1994). Importance of unpredictability for reward responses in primate dopamine neurons. J Neurophysiol 72: 1024–1027
Google Scholar
Mitchell TM (1997) Machine learning. The McGraw-Hill Companies Inc.
Montague PR, Dayan P and Sejnowski TJ (1996). A framework for mesencephalic dopamine systems based on predictive Hebbian learning. J Neurosci 16: 1936–1947
Google Scholar
Russell SJ, Norvig P (1995) Artificial intelligence: a modern approach. Prentice-Hall International Inc.
Salamone JD, Correa M, Mingote SM and Weber SM (2005). Beyond the reward hypothesis: alternative functions of nucleus accumbens dopamine. Curr Opin Pharmacol 5: 34–41
Article Google Scholar
Samii A, Nutt JG and Ransom BR (2004). Parkinson’s disease. Lancet 363: 1783–1793
Article Google Scholar
Schultz W (1998). Predictive reward signal of dopamine neurons. J Neurophysiol 80: 1–27
Google Scholar
Schultz W (2000). Multiple reward signals in the brain. Nat Rev Neurosci 1: 199–207
Article Google Scholar
Schultz W, Dayan P and Montague PR (1997). A neural substrate of prediction and reward. Science 275: 1593–1599
Article Google Scholar
Suri RE (2001). Anticipatory responses of dopamine neurons and cortical neurons reproduced by internal model. Exp Brain Res 140: 234–240
Article MathSciNet Google Scholar
Suri RE (2002). TD models of reward predictive responses in dopamine neurons. Neural Net 15: 523–533
Article Google Scholar
Sutton RS, Barto AG (1998) Reinforcement learning: an introduction. MIT Press.
Sutton RS, Pinette B (1985) The learning of world models by connectionist networks. In: Proceedings of the seventh annual conference of the cognitive science society. Lawrence Erlbaum, Irvine, CA, pp 54–64
Waelti P, Dickinson A and Schultz W (2001). Dopamine responses comply with basic assumptions of formal learning theory. Nature 412: 43–48
Article Google Scholar
Wise RA (2006). Role of brain dopamine in food reward and reinforcement. Philos Trans R Soc Lond. Ser B, Biol Sci 361: 1149–1158
Article Google Scholar
Wise RA and Schwartz HV (1981). Pimozide attenuates acquisition of lever-pressing for food in rats. Pharmacol Biochem Behav 15: 655–656
Article Google Scholar
Wörgötter F and Porr B (2005). Temporal sequence learning, prediction and control: a review of different models and their relation to biological mechanisms. Neural Comput 17: 245–319
Article Google Scholar
Young AM, Ahier RG, Upton RL, Joseph MH and Gray JA (1998). Increased extracellular dopamine in the nucleus accumbens of the rat during associative learning of neutral stimuli. Neuroscience 83: 1175–1183
Article Google Scholar
Young AM, Joseph MH and Gray JA (1993). Latent inhibition of conditioned dopamine release in rat nucleus accumbens. Neuroscience 54: 5–9
Article Google Scholar

Download references

Author information

Authors and Affiliations

UCD School of Computer Science and Informatics, University College Dublin Belfield, Dublin 4, Ireland
Patrick Horgan & Fred Cummins
Neuroscience and Psychiatry Unit, University of Manchester, G.714 Stopford Building, Oxford Road, Manchester, M13 9PT, UK
Patrick Horgan

Authors

Patrick Horgan
View author publications
You can also search for this author in PubMed Google Scholar
Fred Cummins
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Patrick Horgan.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Horgan, P., Cummins, F. Modeling dopamine activity by Reinforcement Learning methods: implications from two recent models. Artif Intell Rev 26, 49–62 (2006). https://doi.org/10.1007/s10462-007-9036-3

Download citation

Received: 01 October 2006
Revised: 30 October 2006
Published: 14 September 2007
Issue Date: October 2006
DOI: https://doi.org/10.1007/s10462-007-9036-3

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Modeling dopamine activity by Reinforcement Learning methods: implications from two recent models

Abstract

Access this article

Similar content being viewed by others

Modulation of Dopamine for Adaptive Learning: a Neurocomputational Model

Reinforcement-learning in fronto-striatal circuits

Dopamine transients do not act as model-free prediction errors during associative learning

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Modeling dopamine activity by Reinforcement Learning methods: implications from two recent models

Abstract

Access this article

Similar content being viewed by others

Modulation of Dopamine for Adaptive Learning: a Neurocomputational Model

Reinforcement-learning in fronto-striatal circuits

Dopamine transients do not act as model-free prediction errors during associative learning

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation