Abstract
We compare and contrast two recent computational models of dopamine activity in the human central nervous system at the level of single cells. Both models implement reinforcement learning using the method of temporal differences (TD). To address drawbacks with earlier models, both models employ internal models. The principal difference between the internal models lies in the degree to which they implement the properties of the environment. One employs a partially observable semi-Markov environment; the other uses a form of transition matrix in an iterative manner to generate the sum of future predictions. We show that the internal models employ fundamentally different assumptions and that the assumptions are problematic in each case. Both models lack specification regarding their biological implementation to different degrees. In addition, the model employing the partially observable semi-Markov environment seems to have redundant features. In contrast, the alternate model appears to lack generalizability.
Similar content being viewed by others
References
Berridge KC and Robinson TE (1998). What is the role of dopamine in reward: Hedonic impact, reward learning, or incentive salience. Brain Res Rev 28: 309–369
Cannon CM and Palmiter RD (2003). Reward without dopamine. J Neurosci 23: 10827–10831
Crossman AR, Neary D (2000) Neuroanatomy, an illustrated colour text, 2nd edn. Churchill Livingstone. Edinburgh, pp 151–160
Daw ND (2003) Reinforcement Learning models of the dopamine system and their behavioural implications. Ph.D. thesis, School of Computer Science, Carnegie Mellon University.
Daw ND, Courville AC and Touretzky DS (2003). Timing and partial observability in the dopamine system. In: Becker, S, Thrun, S, and Obermayer, K (eds) Advances in neural information processing systems, vol 15, pp 83–90. MIT Press, Cambridge MA
Daw ND, Courville AC and Touretzky DS (2006). Representation and timing in theories of the dopamine system. Neural Comput 18: 1637–1677
Daw ND, Niv Y and Dayan P (2005). Uncertainty-based competition between prefrontal and dorsolateral striatal systems for behavioral control. Nat Neurosci 8: 1704–1711
Dayan P, Abbott LF (2001) Theoretical neuroscience: computational and mathematical modeling of neural systems. MIT Press, pp 279–330
Fuxe K, Hökfelt T, Johansson O, Jonsson G, Lidbrink P and Ljungdahl Å (1974). The origin of the dopamine nerve terminals in limbic and frontal cortex. Evidence for meso-cortico dopamine neurons. Brain Res 82: 349–355
Haber SN, Fudge JL and McFarland NR (2000). Striatonigrostriatal pathways in primates form an ascending spiral from the shell to the dorsolateral striatum. J Neurosci 20: 2369–2382
Hollerman JR and Schultz W (1998). Dopamine neurons report an error in the temporal prediction of reward during learning. Nat Neurosci 1: 304–309
Joel D, Niv Y and Ruppin E (2002). Actor-critic models of the basal ganglia: new anatomical and computational perspectives. Neural Net 15: 535–547
Kakade S and Dayan P (2002). Dopamine: generalization and bonuses. Neural Net 15: 549–559
Kelley AE, Baldo BA, Pratt WE and Will MJ (2005). Corticostriatal-hypothalamic circuitry and food motivation: integration of energy, action and reward. Physiol Behav 86: 773–795
Mirenowicz J and Schultz W (1994). Importance of unpredictability for reward responses in primate dopamine neurons. J Neurophysiol 72: 1024–1027
Mitchell TM (1997) Machine learning. The McGraw-Hill Companies Inc.
Montague PR, Dayan P and Sejnowski TJ (1996). A framework for mesencephalic dopamine systems based on predictive Hebbian learning. J Neurosci 16: 1936–1947
Russell SJ, Norvig P (1995) Artificial intelligence: a modern approach. Prentice-Hall International Inc.
Salamone JD, Correa M, Mingote SM and Weber SM (2005). Beyond the reward hypothesis: alternative functions of nucleus accumbens dopamine. Curr Opin Pharmacol 5: 34–41
Samii A, Nutt JG and Ransom BR (2004). Parkinson’s disease. Lancet 363: 1783–1793
Schultz W (1998). Predictive reward signal of dopamine neurons. J Neurophysiol 80: 1–27
Schultz W (2000). Multiple reward signals in the brain. Nat Rev Neurosci 1: 199–207
Schultz W, Dayan P and Montague PR (1997). A neural substrate of prediction and reward. Science 275: 1593–1599
Suri RE (2001). Anticipatory responses of dopamine neurons and cortical neurons reproduced by internal model. Exp Brain Res 140: 234–240
Suri RE (2002). TD models of reward predictive responses in dopamine neurons. Neural Net 15: 523–533
Sutton RS, Barto AG (1998) Reinforcement learning: an introduction. MIT Press.
Sutton RS, Pinette B (1985) The learning of world models by connectionist networks. In: Proceedings of the seventh annual conference of the cognitive science society. Lawrence Erlbaum, Irvine, CA, pp 54–64
Waelti P, Dickinson A and Schultz W (2001). Dopamine responses comply with basic assumptions of formal learning theory. Nature 412: 43–48
Wise RA (2006). Role of brain dopamine in food reward and reinforcement. Philos Trans R Soc Lond. Ser B, Biol Sci 361: 1149–1158
Wise RA and Schwartz HV (1981). Pimozide attenuates acquisition of lever-pressing for food in rats. Pharmacol Biochem Behav 15: 655–656
Wörgötter F and Porr B (2005). Temporal sequence learning, prediction and control: a review of different models and their relation to biological mechanisms. Neural Comput 17: 245–319
Young AM, Ahier RG, Upton RL, Joseph MH and Gray JA (1998). Increased extracellular dopamine in the nucleus accumbens of the rat during associative learning of neutral stimuli. Neuroscience 83: 1175–1183
Young AM, Joseph MH and Gray JA (1993). Latent inhibition of conditioned dopamine release in rat nucleus accumbens. Neuroscience 54: 5–9
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Horgan, P., Cummins, F. Modeling dopamine activity by Reinforcement Learning methods: implications from two recent models. Artif Intell Rev 26, 49–62 (2006). https://doi.org/10.1007/s10462-007-9036-3
Received:
Revised:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10462-007-9036-3