Skip to main content
Log in

Modeling dopamine activity by Reinforcement Learning methods: implications from two recent models

  • Published:
Artificial Intelligence Review Aims and scope Submit manuscript

Abstract

We compare and contrast two recent computational models of dopamine activity in the human central nervous system at the level of single cells. Both models implement reinforcement learning using the method of temporal differences (TD). To address drawbacks with earlier models, both models employ internal models. The principal difference between the internal models lies in the degree to which they implement the properties of the environment. One employs a partially observable semi-Markov environment; the other uses a form of transition matrix in an iterative manner to generate the sum of future predictions. We show that the internal models employ fundamentally different assumptions and that the assumptions are problematic in each case. Both models lack specification regarding their biological implementation to different degrees. In addition, the model employing the partially observable semi-Markov environment seems to have redundant features. In contrast, the alternate model appears to lack generalizability.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Similar content being viewed by others

References

  • Berridge KC and Robinson TE (1998). What is the role of dopamine in reward: Hedonic impact, reward learning, or incentive salience. Brain Res Rev 28: 309–369

    Article  Google Scholar 

  • Cannon CM and Palmiter RD (2003). Reward without dopamine. J Neurosci 23: 10827–10831

    Google Scholar 

  • Crossman AR, Neary D (2000) Neuroanatomy, an illustrated colour text, 2nd edn. Churchill Livingstone. Edinburgh, pp 151–160

  • Daw ND (2003) Reinforcement Learning models of the dopamine system and their behavioural implications. Ph.D. thesis, School of Computer Science, Carnegie Mellon University.

  • Daw ND, Courville AC and Touretzky DS (2003). Timing and partial observability in the dopamine system. In: Becker, S, Thrun, S, and Obermayer, K (eds) Advances in neural information processing systems, vol 15, pp 83–90. MIT Press, Cambridge MA

    Google Scholar 

  • Daw ND, Courville AC and Touretzky DS (2006). Representation and timing in theories of the dopamine system. Neural Comput 18: 1637–1677

    Article  MATH  MathSciNet  Google Scholar 

  • Daw ND, Niv Y and Dayan P (2005). Uncertainty-based competition between prefrontal and dorsolateral striatal systems for behavioral control. Nat Neurosci 8: 1704–1711

    Article  Google Scholar 

  • Dayan P, Abbott LF (2001) Theoretical neuroscience: computational and mathematical modeling of neural systems. MIT Press, pp 279–330

  • Fuxe K, Hökfelt T, Johansson O, Jonsson G, Lidbrink P and Ljungdahl Å (1974). The origin of the dopamine nerve terminals in limbic and frontal cortex. Evidence for meso-cortico dopamine neurons. Brain Res 82: 349–355

    Google Scholar 

  • Haber SN, Fudge JL and McFarland NR (2000). Striatonigrostriatal pathways in primates form an ascending spiral from the shell to the dorsolateral striatum. J Neurosci 20: 2369–2382

    Google Scholar 

  • Hollerman JR and Schultz W (1998). Dopamine neurons report an error in the temporal prediction of reward during learning. Nat Neurosci 1: 304–309

    Article  Google Scholar 

  • Joel D, Niv Y and Ruppin E (2002). Actor-critic models of the basal ganglia: new anatomical and computational perspectives. Neural Net 15: 535–547

    Article  Google Scholar 

  • Kakade S and Dayan P (2002). Dopamine: generalization and bonuses. Neural Net 15: 549–559

    Article  Google Scholar 

  • Kelley AE, Baldo BA, Pratt WE and Will MJ (2005). Corticostriatal-hypothalamic circuitry and food motivation: integration of energy, action and reward. Physiol Behav 86: 773–795

    Article  Google Scholar 

  • Mirenowicz J and Schultz W (1994). Importance of unpredictability for reward responses in primate dopamine neurons. J Neurophysiol 72: 1024–1027

    Google Scholar 

  • Mitchell TM (1997) Machine learning. The McGraw-Hill Companies Inc.

  • Montague PR, Dayan P and Sejnowski TJ (1996). A framework for mesencephalic dopamine systems based on predictive Hebbian learning. J Neurosci 16: 1936–1947

    Google Scholar 

  • Russell SJ, Norvig P (1995) Artificial intelligence: a modern approach. Prentice-Hall International Inc.

  • Salamone JD, Correa M, Mingote SM and Weber SM (2005). Beyond the reward hypothesis: alternative functions of nucleus accumbens dopamine. Curr Opin Pharmacol 5: 34–41

    Article  Google Scholar 

  • Samii A, Nutt JG and Ransom BR (2004). Parkinson’s disease. Lancet 363: 1783–1793

    Article  Google Scholar 

  • Schultz W (1998). Predictive reward signal of dopamine neurons. J Neurophysiol 80: 1–27

    Google Scholar 

  • Schultz W (2000). Multiple reward signals in the brain. Nat Rev Neurosci 1: 199–207

    Article  Google Scholar 

  • Schultz W, Dayan P and Montague PR (1997). A neural substrate of prediction and reward. Science 275: 1593–1599

    Article  Google Scholar 

  • Suri RE (2001). Anticipatory responses of dopamine neurons and cortical neurons reproduced by internal model. Exp Brain Res 140: 234–240

    Article  MathSciNet  Google Scholar 

  • Suri RE (2002). TD models of reward predictive responses in dopamine neurons. Neural Net 15: 523–533

    Article  Google Scholar 

  • Sutton RS, Barto AG (1998) Reinforcement learning: an introduction. MIT Press.

  • Sutton RS, Pinette B (1985) The learning of world models by connectionist networks. In: Proceedings of the seventh annual conference of the cognitive science society. Lawrence Erlbaum, Irvine, CA, pp 54–64

  • Waelti P, Dickinson A and Schultz W (2001). Dopamine responses comply with basic assumptions of formal learning theory. Nature 412: 43–48

    Article  Google Scholar 

  • Wise RA (2006). Role of brain dopamine in food reward and reinforcement. Philos Trans R Soc Lond. Ser B, Biol Sci 361: 1149–1158

    Article  Google Scholar 

  • Wise RA and Schwartz HV (1981). Pimozide attenuates acquisition of lever-pressing for food in rats. Pharmacol Biochem Behav 15: 655–656

    Article  Google Scholar 

  • Wörgötter F and Porr B (2005). Temporal sequence learning, prediction and control: a review of different models and their relation to biological mechanisms. Neural Comput 17: 245–319

    Article  Google Scholar 

  • Young AM, Ahier RG, Upton RL, Joseph MH and Gray JA (1998). Increased extracellular dopamine in the nucleus accumbens of the rat during associative learning of neutral stimuli. Neuroscience 83: 1175–1183

    Article  Google Scholar 

  • Young AM, Joseph MH and Gray JA (1993). Latent inhibition of conditioned dopamine release in rat nucleus accumbens. Neuroscience 54: 5–9

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Patrick Horgan.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Horgan, P., Cummins, F. Modeling dopamine activity by Reinforcement Learning methods: implications from two recent models. Artif Intell Rev 26, 49–62 (2006). https://doi.org/10.1007/s10462-007-9036-3

Download citation

  • Received:

  • Revised:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10462-007-9036-3

Keywords

Navigation