Skip to main content

Using a Time-Delay Actor-Critic Neural Architecture with Dopamine-Like Reinforcement Signal for Learning in Autonomous Robots

  • Chapter
  • First Online:
Emergent Neural Computational Architectures Based on Neuroscience

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 2036))

Abstract

Neuroscientists have identifed a neural substrate of predic- tion and reward in experiments with primates. The so-called dopamine neurons have been shown to code an error in the temporal prediction of rewards. Similarly, artificial systems can “learn to predict#x201D; by the so-called temporal-difference (TD) methods. Based on the general resemblance between the effective reinforcement term of TD models and the response of dopamine neurons, neuroscientists have developed a TD-learning time-delay actor-critic neural model and compared its per- formance with the behavior of monkeys in the laboratory. We have used such a neural network model to learn to predict variable-delay rewards in a robot spatial choice task similar to the one used by neuroscientists with primates. Such architecture implementing TD-learning appears as a promising mechanism for robotic systems that learn from simple human teaching signals in the real world.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. A.G. Barto. Adaptive critics and the basal ganglia. In J.C. Houck, J.L. Davis, and D.G. Beiser, editors, Models of Information Processing in the Basal Ganglia, pages 215–232. MIT Press, Cambridge, MA, 1995.

    Google Scholar 

  2. K-Team SA, Lausanne, Switzerland. Khepera K213 Vision Turret User Manual, November 1995. Version 1.0.

    Google Scholar 

  3. K-Team SA, Lausanne, Switzerland. Khepera User Manual, November 1995. Version 4.06.

    Google Scholar 

  4. S. Kakade and P. Dayan. Dopamine Bonuses. In Advances in Neural Information Processing Systems, volume 13. The MIT Press, 2000 (submitted).

    Google Scholar 

  5. A.H. Klopf. A neuronal model of classical conditioning. Psychobiology, 16(2):85–125, 1988.

    Google Scholar 

  6. K.J. Lang and G.E. Hinton. A time-delay neural network architecture for speech recognition.Technical Report CMU-DS-88-152, Dept. of Computer Science, Carnegie Mellon University, Pittsburgh, PA, December 1988.

    Google Scholar 

  7. F. Mondada, E. Franzi, and P. Ienne. Mobile robot miniaturization: A tool for investigating in control algorithms. In Proceedings of the Third International Symposium on Experimental Robotics, Kyoto, Japan, 1993.

    Google Scholar 

  8. P.R. Montague, P. Dayan, C. Person, and T.J. Sejnowski. Bee Foraging in uncertain environments using predictive hebbian learning. Nature, 377:725–728, October 26 1995.

    Article  Google Scholar 

  9. A. Pérez-Uribe. Structure-Adaptable Digital Neural Networks, chapter 6. A Neurocontroller Architecture for Autonomous Robots, pages 95–116. Swiss Federal Institute of Technology-Lausanne, Ph.D Thesis 2052, 1999.

    Google Scholar 

  10. A. P#x00E9;rez-Uribe and B. Hirsbrunner. Learning and Foraging in Robot-bees. In Meyer, Berthoz, Floreano, Roitblat, and Wilson, editors, SAB2000 Proceedings Supplement Book, pages 185–194, Honolulu, 2000. International Society for Adaptive Behavior.

    Google Scholar 

  11. P. Redgrave, T.J. Prescott, and K. Gurney. Is the Short Latency Dopamine Burst Too Short to Signal Reward Error. Trends in Neurosciences, 22:146–151, 1999.

    Article  Google Scholar 

  12. R.A. Rescorla and A.R. Wagner. A theory of Pavlovian conditioning: variations in the effectiveness of reinforcement and non-reinforcement. In A.H. Black and W.F. Prokasy, editors, Classical Conditioning II: Current Research and Theory. Appleton-Century-Crofts, New York, 1972.

    Google Scholar 

  13. W. Schultz. Predictive reward signal of dopamine neurons. Journal of Neurophysiology, 80:1–27, 14 March 1998.

    Google Scholar 

  14. W. Schultz, P. Apicella, and T. Ljungberg. Responses of monkey dopamine neurons to reward and conditioned stimuli during successive steps of learning a delayed response task. J. Neuroscience, 13(3):900–913, 1993.

    Google Scholar 

  15. W. Schultz, P. Dayan, and P. Read Montague. A Neural Substrate of Predictionand Reward. Science, 275: 1593–1599, 14 March 1997.

    Article  Google Scholar 

  16. L.R. Squire and E.R. Kandel. Memory: From Minds to Molecules. Scientifc American Library, New York, November 1998.

    Google Scholar 

  17. R.E. Suri and W. Schultz. Learning of sequential movements by neural network model with dopamine-like reinforcement signal. Exp Brain Res, 121:350–354, 1998.

    Article  Google Scholar 

  18. R.E. Suri and W. Schultz. A Neural Network Model With Dopamine-Like Reinforcement Signal That Learns a Spatial Delayed Responde Task. Neuroscience, 91(3):871–890, 1999.

    Article  Google Scholar 

  19. R.S. Sutton. Learning to predict by the methods of Temporal Differences. Machine Learning, 3:9–44, 1988.

    Google Scholar 

  20. R.S. Sutton and A.G. Barto. Reinforcement Learning: An Introduction. The MIT Press, 1998.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2001 Springer-Verlag Berlin Heidelberg

About this chapter

Cite this chapter

Pérez-Uribe, A. (2001). Using a Time-Delay Actor-Critic Neural Architecture with Dopamine-Like Reinforcement Signal for Learning in Autonomous Robots. In: Wermter, S., Austin, J., Willshaw, D. (eds) Emergent Neural Computational Architectures Based on Neuroscience. Lecture Notes in Computer Science(), vol 2036. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-44597-8_37

Download citation

  • DOI: https://doi.org/10.1007/3-540-44597-8_37

  • Published:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-42363-8

  • Online ISBN: 978-3-540-44597-5

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics