Abstract
Neuroscientists have identifed a neural substrate of predic- tion and reward in experiments with primates. The so-called dopamine neurons have been shown to code an error in the temporal prediction of rewards. Similarly, artificial systems can “learn to predict#x201D; by the so-called temporal-difference (TD) methods. Based on the general resemblance between the effective reinforcement term of TD models and the response of dopamine neurons, neuroscientists have developed a TD-learning time-delay actor-critic neural model and compared its per- formance with the behavior of monkeys in the laboratory. We have used such a neural network model to learn to predict variable-delay rewards in a robot spatial choice task similar to the one used by neuroscientists with primates. Such architecture implementing TD-learning appears as a promising mechanism for robotic systems that learn from simple human teaching signals in the real world.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
A.G. Barto. Adaptive critics and the basal ganglia. In J.C. Houck, J.L. Davis, and D.G. Beiser, editors, Models of Information Processing in the Basal Ganglia, pages 215–232. MIT Press, Cambridge, MA, 1995.
K-Team SA, Lausanne, Switzerland. Khepera K213 Vision Turret User Manual, November 1995. Version 1.0.
K-Team SA, Lausanne, Switzerland. Khepera User Manual, November 1995. Version 4.06.
S. Kakade and P. Dayan. Dopamine Bonuses. In Advances in Neural Information Processing Systems, volume 13. The MIT Press, 2000 (submitted).
A.H. Klopf. A neuronal model of classical conditioning. Psychobiology, 16(2):85–125, 1988.
K.J. Lang and G.E. Hinton. A time-delay neural network architecture for speech recognition.Technical Report CMU-DS-88-152, Dept. of Computer Science, Carnegie Mellon University, Pittsburgh, PA, December 1988.
F. Mondada, E. Franzi, and P. Ienne. Mobile robot miniaturization: A tool for investigating in control algorithms. In Proceedings of the Third International Symposium on Experimental Robotics, Kyoto, Japan, 1993.
P.R. Montague, P. Dayan, C. Person, and T.J. Sejnowski. Bee Foraging in uncertain environments using predictive hebbian learning. Nature, 377:725–728, October 26 1995.
A. Pérez-Uribe. Structure-Adaptable Digital Neural Networks, chapter 6. A Neurocontroller Architecture for Autonomous Robots, pages 95–116. Swiss Federal Institute of Technology-Lausanne, Ph.D Thesis 2052, 1999.
A. P#x00E9;rez-Uribe and B. Hirsbrunner. Learning and Foraging in Robot-bees. In Meyer, Berthoz, Floreano, Roitblat, and Wilson, editors, SAB2000 Proceedings Supplement Book, pages 185–194, Honolulu, 2000. International Society for Adaptive Behavior.
P. Redgrave, T.J. Prescott, and K. Gurney. Is the Short Latency Dopamine Burst Too Short to Signal Reward Error. Trends in Neurosciences, 22:146–151, 1999.
R.A. Rescorla and A.R. Wagner. A theory of Pavlovian conditioning: variations in the effectiveness of reinforcement and non-reinforcement. In A.H. Black and W.F. Prokasy, editors, Classical Conditioning II: Current Research and Theory. Appleton-Century-Crofts, New York, 1972.
W. Schultz. Predictive reward signal of dopamine neurons. Journal of Neurophysiology, 80:1–27, 14 March 1998.
W. Schultz, P. Apicella, and T. Ljungberg. Responses of monkey dopamine neurons to reward and conditioned stimuli during successive steps of learning a delayed response task. J. Neuroscience, 13(3):900–913, 1993.
W. Schultz, P. Dayan, and P. Read Montague. A Neural Substrate of Predictionand Reward. Science, 275: 1593–1599, 14 March 1997.
L.R. Squire and E.R. Kandel. Memory: From Minds to Molecules. Scientifc American Library, New York, November 1998.
R.E. Suri and W. Schultz. Learning of sequential movements by neural network model with dopamine-like reinforcement signal. Exp Brain Res, 121:350–354, 1998.
R.E. Suri and W. Schultz. A Neural Network Model With Dopamine-Like Reinforcement Signal That Learns a Spatial Delayed Responde Task. Neuroscience, 91(3):871–890, 1999.
R.S. Sutton. Learning to predict by the methods of Temporal Differences. Machine Learning, 3:9–44, 1988.
R.S. Sutton and A.G. Barto. Reinforcement Learning: An Introduction. The MIT Press, 1998.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2001 Springer-Verlag Berlin Heidelberg
About this chapter
Cite this chapter
Pérez-Uribe, A. (2001). Using a Time-Delay Actor-Critic Neural Architecture with Dopamine-Like Reinforcement Signal for Learning in Autonomous Robots. In: Wermter, S., Austin, J., Willshaw, D. (eds) Emergent Neural Computational Architectures Based on Neuroscience. Lecture Notes in Computer Science(), vol 2036. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-44597-8_37
Download citation
DOI: https://doi.org/10.1007/3-540-44597-8_37
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-42363-8
Online ISBN: 978-3-540-44597-5
eBook Packages: Springer Book Archive