Abstract
This paper presents a biologically constrained reward prediction model capable of learning cue-outcome associations involving temporally distant stimuli without using the commonly used temporal difference model. The model incorporates a novel use of an adapted echo state network to substitute the biologically implausible delay chains usually used, in relation to dopamine phenomena, for tackling temporally structured stimuli. Moreover, the model is based on a novel algorithm which successfully coordinates two sub systems: one providing Pavlovian conditioning, one providing timely inhibition of dopamine responses to salient anticipated stimuli. The model is validated against the typical profile of phasic dopamine in first and second order Pavlovian conditioning. The model is relevant not only to explaining the mechanisms underlying the biological regulation of dopamine signals, but also for applications in autonomous robotics involving reinforcement-based learning.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Schultz, W., Dayan, P., Montague, P.R.: A neural substrate of prediction and reward. Science 275, 1593–1599 (1997)
Rescorla, R.A., Wagner, A.W.: A theory of Pavlovian conditioning: Variations in the effectiveness of reinforcement and nonreinforcement. In: Black, A.H., Prokasy, W.F. (eds.) Classical Conditioning II: Current Research and Theory, pp. 64–99. Appleton-Century-Crofts, New York (1972)
Sutton, R.S., Barto, A.G.: Reinforcement Learning: An Introduction. The MIT Press, Cambridge (1998)
Redgrave, P., Prescott, T.J., Gurney, K.: The basal ganglia: a vertebrate solution to the selection problem? Neuroscience 89, 1009–1023 (1999)
Mannella, F., Mirolli, M., Baldassarre, G.: The role of amygdala in devaluation: a model tested with a simulated robot. In: Berthouze, L., Prince, C.G., Littman, M., Kozima, K., Balkenius, C. (eds.) Proc. 7th Int. Conf. on Epigenetic Robotics, pp. 77–84. Lund University Cognitive Studies (2007)
Alexander, W.H., Sporns, O.: An Embodied Model of Learning Plasticity, and Reward. Adaptive Behavior 3-4, 143–159 (2002)
O’Reilly, R.C., Frank, M.J.: PVLV: The Primary Value and Learned Value Pavlovian Learning Algorithm. Behavioral Neuroscience 121, 31–49 (2007)
Maass, W., Natschlager, T., Markram, H.: Real-time computing without stable states: a new framework for neural computation based on perturbations. Neural Computation 14, 2531–2560 (2002)
Jaeger, H.: Short term memory in echo state networks. GMD Report 152 (2001)
Buonomano, D.V., Maass, W.: State-dependent computations: spatiotemporal processing in cortical networks. Nat. Rev. Neurosci. 10, 113–125 (2009)
Jaeger, H.: A tutorial on training recurrent neural networks, covering BPPT, RTRL, EKF and the “echo state network” approach. GMD Report 159 (2005)
Hertzberg, J., Jaeger, H., Schoenherr, F.: Learning to Ground Fact Symbols in Behavior-Based Robots. In: van Harmelen, F. (ed.) Proc. 15th European Conf. on Artificial Intelligence, pp. 593–600. IOS Press, Amsterdam (2002)
Suri, R.E.: TD models of reward predictive responses in dopamine neurons. Neural Networks 15, 523–533 (2002)
Ziemke, T., Lowe, R.J.: On the Role of Emotion in Embodied Cognitive Architectures: From Organisms to Robots. Cognitive Computation 1(1), 104–117 (2009)
Lowe, R.J., Humphries, M., Ziemke, T.: The dual-route hypothesis: evaluating a neurocomputational model of fear conditioning in rats. Connection Science 21(1), 15–37 (2009)
Mannella, F., Baldassarre, G.: A Neural-Network Reinforcement-Learning Model of Domestic Chicks that Learn to Localise the Centre of Closed Arenas. Phil. Trans. R. Soc. B. 362, 383–401 (2006)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2011 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Lowe, R., Mannella, F., Ziemke, T., Baldassarre, G. (2011). Modelling Coordination of Learning Systems: A Reservoir Systems Approach to Dopamine Modulated Pavlovian Conditioning. In: Kampis, G., Karsai, I., Szathmáry, E. (eds) Advances in Artificial Life. Darwin Meets von Neumann. ECAL 2009. Lecture Notes in Computer Science(), vol 5777. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-21283-3_51
Download citation
DOI: https://doi.org/10.1007/978-3-642-21283-3_51
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-21282-6
Online ISBN: 978-3-642-21283-3
eBook Packages: Computer ScienceComputer Science (R0)