Abstract
A class of reinforcement models termed Temporal Difference (TD) models has been developed from theoretical grounds as effective algorithms for various learning situations. Based on the observation that learning depends on the unpredictability of primary motivating events, these models use errors in the prediction of reinforcing events as teaching signals. Independent of the theoretical work, neuophysiological experiments have revealed that neurons in the mammalian midbrain using the neurotransmitter dopamine process information about rewards and reward-predicting stimuli in a very similar manner as the teaching signal of TD models.
Preview
Unable to display preview. Download preview PDF.
References
Alexander, G.E. and Crutcher, M.D.: Neural representations of the target (goal) of visually guided arm movements in three motor areas of the monkey. J. Neurophysiol. 64: 164–178, 1990
Calabresi, P., Maj, R., Mercuri, N.B. and Bernardi, G.: Coactivation of D1 and D2 dopamine receptors is required for long-term synaptic depression in the striatum. Neurosci. Lett. 142: 95–99, 1992
Calabresi, P., Pisani, A., Mercuri, N.B. and Bernardi, G.: Long-term potentiation in the striatum is unmasked by removing the voltage-dependent magnesium block of NMDA receptor channels. Europ. J. Neurosci. 4: 929–935, 1992
Contreras-Vidal, J.L. and Schultz, W.: A neural network model of reward-related learning, motivation and orienting behavior. Soc. Neurosci. Abstr. 22: 2029, 1996
Crutcher, M.D. and DeLong, M.R.: Single cell studies of the primate putamen. II. Relations to direction of movement and pattern of muscular activity. Exp. Brain Res. 53: 244–258, 1984
Dickinson, A.: Contemporary animal learning theory. Cambridge University Press, Cambridge 1980
Doucet, G., Descarries, L. and Garcia, S.: Quantification of the dopamine innervation in adult rat neostriatum. Neuroscience 19: 427–445, 1986
Filion, M., Tremblay, L. and Bédard, P.J.: Abnormal influences of passive limb movement on the activity of globus pallidus neurons in parkinsonian monkey. Brain Res. 444: 165–176, 1988
Flaherty, A.W. and Graybiel, A.: Two input systems for body representations in the primate striatal matrix: experimental evidence in the squirrel monkey. J. Neurosci. 13: 1120–1137, 1993
Freund, T.T., Powell, J.F. and Smith, A.D.: Tyrosine hydroxylaseimmunoreactive boutons in synaptic contact with identified striatonigral neurons, with particular reference to dendritic spines. Neuroscience 13: 1189–1215, 1984
Friston, K.J., Tononi, G., Reeke, G.N.Jr., Sporns, O. and Edelman, G.M.: Value-dependent selection in the brain: simulation in a synthetic neural model. Neuroscience 59: 229–243, 1994
Goldman-Rakic, P.S., Leranth, C., Williams, M.S., Mons, N. and Geffard, M.: Dopamine synaptic complex with pyramidal neurons in primate cerebral cortex. Proc. Natl.Acad. Sci. USA 86: 9015–9019, 1989
Hikosaka, O., Sakamoto, M. and Usui, S.: Functional properties of monkey caudate neurons. III. Activities related to expectation of target and reward. J. Neurophysiol. 61: 814–832, 1989
Kimura, M.: Behaviorally contingent property of movement-related activity of the primate putamen. J. Neurophysiol. 63: 1277–1296, 1990
Ljungberg, T., Apicella, P. and Schultz, W.: Responses of monkey midbrain dopamine neurons during delayed alternation performance. Brain Res. 586: 337–341, 1991
Ljungberg, T., Apicella, P. and Schultz, W.: Responses of monkey dopamine neurons during learning of behavioral reactions. J. Neurophysiol. 67: 145–163, 1992
Mackintosh, N.J.: A theory of attention: Variations in the associability of stimulus with reinforcement. Psychol. Rev. 82: 276–298, 1975
Mirenowicz, J. and Schultz, W.: Importance of unpredictability for reward responses in primate dopamine neurons. J. Neurophysiol. 72: 1024–1027, 1994
Mirenowicz, J. and Schultz, W.: Preferential activation of midbrain dopamine neurons by appetitive rather than aversive stimuli. Nature 379: 449–451, 1996
Montague, P.R., Dayan, P., Nowlan, S.J., Pouget, A. and Sejnowski, T.J.: Using aperiodic reinforcement for directed self-organization during development. In: Neural Information Processing Systems 5 (Eds. S.J. Hanson, J.D. Cowan and C.L. Giles). pp. 969–976. Morgan Kaufmann, San Mateo, 1993
Montague, P.R., Dayan, P. and Sejnowski, T.J.: A framework for mesencephalic dopamine systems based on predictive Hebbian learning. J. Neurosci. 16: 1936–1947, 1996
Pearce, J.M. and Hall, G.: A model for Pavlovian conditioning: variations in the effectiveness of conditioned but not of unconditioned stimuli. Psychol. Rev. 87: 532–552, 1980
Rescorla, R.A. and Wagner, A.R.: A theory of Pavlovian conditioning: Variations in the effectiveness of reinforcement and nonreinforcement. In: Classical Conditioning II: Current Research and Theory (Eds. Black, A.H. and Prokasy, W.F.) New York: Appleton Century Crofts, pp. 64–99, 1972
Rolis, E.T., Thorpe, S.J. and Maddison, S.P.: Responses of striatal neurons in the behaving monkey. I. Head of the caudate nucleus. Behav. Brain Res. 7: 179–210, 1983
Romo, R. and Schultz, W.: Dopamine neurons of the monkey midbrain: Contingencies of responses to active touch during self-initiated arm movements. J. Neurophysiol. 63: 592–606, 1990
Schultz, W.: Activity of dopamine neurons in the behaving primate. Sem. Neurosci. 4: 129–138, 1992
Schultz, W., Dayan, P. and Montague, R.R.: A neural substrate of prediction and reward. Science 275: 1593–1599, 1997
Schultz, W., Apicella, P., Scarnati, E. and Ljungberg, T.: Neuronal activity in monkey ventral striatum related to the expectation of reward. J. Neurosci. 12: 4595–4610, 1992
Schultz, W., Apicella, P. and Ljungberg, T.: Responses of monkey dopamine neurons during performance of a delayed response task. J. Neurosci. 13: 900–913, 1993
Schultz, W. and Romo, R.: Dopamine neurons of the monkey midbrain: Contingencies of responses to stimuli eliciting immediate behavioral reactions. J. Neurophysiol. 63: 607–624, 1990
Schultz, W. and Romo, R.: Role of primate basal ganglia and frontal cortex in the internal generation of movements: comparison with instruction-induced preparatory activity in striatal neurons. Exp. Brain Res. 91: 363–384, 1992
Schultz, W., Romo, R., Ljungberg, T., Mirenowicz, J., Hollerman, J.R. and Dickinson, A.: Reward-related signals carried by dopamine neurons. In: Models of Information Processing in the Basal Ganglia (Eds. J.C.Houk, J.L.Davis and D.G.Beiser) MIT Press, Cambridge, MA, pp. 233–248, 1995
Schultz, W., Ruffieux, A. and Aebischer, P.: The activity of pars compacta neurons of the monkey substantia nigra in relation to motor activation. Exp. Brain Res. 51: 377–387, 1983
Smith, A.D. and Bolam, J.P.: The neural network of the basal ganglia as revealed by the study of synaptic connections of identified neurones. Trends Neurosci. 13: 259–265, 1990
Steinfels, G.F., Heym, J., Strecker, R.E. and Jacobs, B.L.: Behavioral correlates of dopaminergic unit activity in freely moving cats. Brain Res. 258: 217–228, 1983
Suri, R. and Schultz, W.: A neural learning model based on the activity of primate dopamine neurons. Soc. Neurosci. Abstr. 22: 1389, 1996
Sutton, R.S. and Barto, A.G.: Toward a modern theory of adaptive networks: expectation and prediction. Psychol. Rev. 88: 135–170, 1981
Sutton, R.S. and Barto, A.G.: Time-derivative Models of Pavlovian Reinforcement. In: Learning and Computational Neuroscience: Foundations of Adaptive Networks (Eds. M. Gabriel and J. Moore). MIT Press, Cambridge, pp. 497–537, 1990
Toan, D.L. and Schultz, W.: Responses of rat pallidum cells to cortex stimulation and effects of altered dopaminergic activity. Neuroscience 15: 683–694, 1985
Wickens, J. and Kotter, R.: Cellular models of reinforcement. In: Models of Information Processing in the Basal Ganglia (Eds. J.C.Houk, J.L.Davis and D.G.Beiser) MIT Press, Cambridge, MA, pp. 187–214, 1995
Author information
Authors and Affiliations
Editor information
Rights and permissions
Copyright information
© 1997 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Schultz, W. (1997). Reward responses of dopamine neurons: A biological reinforcement signal. In: Gerstner, W., Germond, A., Hasler, M., Nicoud, JD. (eds) Artificial Neural Networks — ICANN'97. ICANN 1997. Lecture Notes in Computer Science, vol 1327. Springer, Berlin, Heidelberg. https://doi.org/10.1007/BFb0020125
Download citation
DOI: https://doi.org/10.1007/BFb0020125
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-63631-1
Online ISBN: 978-3-540-69620-9
eBook Packages: Springer Book Archive