Using a Time-Delay Actor-Critic Neural Architecture with Dopamine-Like Reinforcement Signal for Learning in Autonomous Robots

Pérez-Uribe, Andrés

doi:10.1007/3-540-44597-8_37

Andrés Pérez-Uribe⁴

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 2036))

724 Accesses
8 Citations

Abstract

Neuroscientists have identifed a neural substrate of predic- tion and reward in experiments with primates. The so-called dopamine neurons have been shown to code an error in the temporal prediction of rewards. Similarly, artificial systems can “learn to predict#x201D; by the so-called temporal-difference (TD) methods. Based on the general resemblance between the effective reinforcement term of TD models and the response of dopamine neurons, neuroscientists have developed a TD-learning time-delay actor-critic neural model and compared its per- formance with the behavior of monkeys in the laboratory. We have used such a neural network model to learn to predict variable-delay rewards in a robot spatial choice task similar to the one used by neuroscientists with primates. Such architecture implementing TD-learning appears as a promising mechanism for robotic systems that learn from simple human teaching signals in the real world.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

A.G. Barto. Adaptive critics and the basal ganglia. In J.C. Houck, J.L. Davis, and D.G. Beiser, editors, Models of Information Processing in the Basal Ganglia, pages 215–232. MIT Press, Cambridge, MA, 1995.
Google Scholar
K-Team SA, Lausanne, Switzerland. Khepera K213 Vision Turret User Manual, November 1995. Version 1.0.
Google Scholar
K-Team SA, Lausanne, Switzerland. Khepera User Manual, November 1995. Version 4.06.
Google Scholar
S. Kakade and P. Dayan. Dopamine Bonuses. In Advances in Neural Information Processing Systems, volume 13. The MIT Press, 2000 (submitted).
Google Scholar
A.H. Klopf. A neuronal model of classical conditioning. Psychobiology, 16(2):85–125, 1988.
Google Scholar
K.J. Lang and G.E. Hinton. A time-delay neural network architecture for speech recognition.Technical Report CMU-DS-88-152, Dept. of Computer Science, Carnegie Mellon University, Pittsburgh, PA, December 1988.
Google Scholar
F. Mondada, E. Franzi, and P. Ienne. Mobile robot miniaturization: A tool for investigating in control algorithms. In Proceedings of the Third International Symposium on Experimental Robotics, Kyoto, Japan, 1993.
Google Scholar
P.R. Montague, P. Dayan, C. Person, and T.J. Sejnowski. Bee Foraging in uncertain environments using predictive hebbian learning. Nature, 377:725–728, October 26 1995.
Article Google Scholar
A. Pérez-Uribe. Structure-Adaptable Digital Neural Networks, chapter 6. A Neurocontroller Architecture for Autonomous Robots, pages 95–116. Swiss Federal Institute of Technology-Lausanne, Ph.D Thesis 2052, 1999.
Google Scholar
A. P#x00E9;rez-Uribe and B. Hirsbrunner. Learning and Foraging in Robot-bees. In Meyer, Berthoz, Floreano, Roitblat, and Wilson, editors, SAB2000 Proceedings Supplement Book, pages 185–194, Honolulu, 2000. International Society for Adaptive Behavior.
Google Scholar
P. Redgrave, T.J. Prescott, and K. Gurney. Is the Short Latency Dopamine Burst Too Short to Signal Reward Error. Trends in Neurosciences, 22:146–151, 1999.
Article Google Scholar
R.A. Rescorla and A.R. Wagner. A theory of Pavlovian conditioning: variations in the effectiveness of reinforcement and non-reinforcement. In A.H. Black and W.F. Prokasy, editors, Classical Conditioning II: Current Research and Theory. Appleton-Century-Crofts, New York, 1972.
Google Scholar
W. Schultz. Predictive reward signal of dopamine neurons. Journal of Neurophysiology, 80:1–27, 14 March 1998.
Google Scholar
W. Schultz, P. Apicella, and T. Ljungberg. Responses of monkey dopamine neurons to reward and conditioned stimuli during successive steps of learning a delayed response task. J. Neuroscience, 13(3):900–913, 1993.
Google Scholar
W. Schultz, P. Dayan, and P. Read Montague. A Neural Substrate of Predictionand Reward. Science, 275: 1593–1599, 14 March 1997.
Article Google Scholar
L.R. Squire and E.R. Kandel. Memory: From Minds to Molecules. Scientifc American Library, New York, November 1998.
Google Scholar
R.E. Suri and W. Schultz. Learning of sequential movements by neural network model with dopamine-like reinforcement signal. Exp Brain Res, 121:350–354, 1998.
Article Google Scholar
R.E. Suri and W. Schultz. A Neural Network Model With Dopamine-Like Reinforcement Signal That Learns a Spatial Delayed Responde Task. Neuroscience, 91(3):871–890, 1999.
Article Google Scholar
R.S. Sutton. Learning to predict by the methods of Temporal Differences. Machine Learning, 3:9–44, 1988.
Google Scholar
R.S. Sutton and A.G. Barto. Reinforcement Learning: An Introduction. The MIT Press, 1998.
Google Scholar

Download references

Author information

Authors and Affiliations

Parallelism and Artificial Intelligence Group, University of Fribourg, Switzerland
Andrés Pérez-Uribe

Authors

Andrés Pérez-Uribe
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Centre of Informatics, SCET, University of Sunderland, St PetersWay, Sunderland SR6 0DD, UK
Stefan Wermter
Department of Computer Science, University of York, York, Y010 5DD, UK
Jim Austin
Institute for Adaptive and Neural Computation, University of Edinburgh, 5 Forrest Hill, Edinburgh, UK
David Willshaw

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Pérez-Uribe, A. (2001). Using a Time-Delay Actor-Critic Neural Architecture with Dopamine-Like Reinforcement Signal for Learning in Autonomous Robots. In: Wermter, S., Austin, J., Willshaw, D. (eds) Emergent Neural Computational Architectures Based on Neuroscience. Lecture Notes in Computer Science(), vol 2036. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-44597-8_37

Download citation

DOI: https://doi.org/10.1007/3-540-44597-8_37
Published: 24 July 2001
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-42363-8
Online ISBN: 978-3-540-44597-5
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics