Skip to main content
Log in

Actor-Critic Learning for Platform-Independent Robot Navigation

  • Published:
Cognitive Computation Aims and scope Submit manuscript

Abstract

This article describes an approach in the field of reinforcement learning for robot control and a new Modular Actor-Critic architecture which supports platform-independent robot control. The architecture is tested on a landmark approaching task using movable pan/tilt cameras which successfully control both a large PeopleBot and a small Sony Aibo robot to perform the navigation task, with no retraining required. The architecture provides insight into the skills transfer between different robotic platforms and the modularisation of the architecture derived from splitting the control tasks into their component parts. The architecture and underlying principles could be used in rapid prototyping of new robotic platforms, where an already functioning control system can be used to allow more sophisticated navigation.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15
Fig. 16
Fig. 17

Similar content being viewed by others

References

  1. Busquets D, Mantaras RL, Sierra C, Ditterich TG. Reinforcement learning for landmark-based robot navigation. Proceedings of the International Conference on Autonomous Agents and Multiagent Systems; 2002.

  2. Hafner R, Riedmiller M. Reinforcement learning on an omni-directional mobile robot. IEEE/RSJ International Conference on Intelligent Robots and Systems for Human Security, Health, and Prosperity; 2003.

  3. Kondo T, Ito K. A reinforcement learning with evolutionary state recruitment strategy for autonomous mobile robot control. Robot Auton Syst. 2004;46:11–124.

    Article  Google Scholar 

  4. Lee ISK, Lau HYK. Adaptive state space partitioning for reinforcement learning. Eng Appl Artif Intell. 2004;17:577–88.

    Article  Google Scholar 

  5. Weber C, Muse D, Elshaw M, Wermter S. A camera-direction dependent visual-motor coordinate transformation for a visually guided neural robot. Applications and Innovations in Intelligent Systems XIII—International Conference on Innovative Techniques and Applications of Artificial Intelligence; 2005. p. 151–64.

  6. Weber C, Muse D, Wermter S. Robot docking based on omni-directional vision and reinforcement learning. Research and Development in Intelligent Systems XXII—International Conference on Innovative Techniques and Applications of Artificial Intelligence; 2005. p. 23–36.

  7. Wermter S, Palm G, Elshaw M. Biomimetic neural learning for intelligent robots. New York: Springer; 2005.

    Google Scholar 

  8. Wermter S, Page M, Knowles M, Gallese V, Pulvermüller F, Taylor J. Multimodal communication in animals, humans and robots: an introduction to perspectives in brain-inspired informatics. Neural Netw. 2009;22:111–5.

    Article  PubMed  CAS  Google Scholar 

  9. Filliat D, Meyer JA. Map-based navigation in mobile robots. I. A review of localization strategies. J Cogn Syst Res. 2003;4(4):243–82.

    Article  Google Scholar 

  10. Filliat D, Meyer JA. Map-based navigation in mobile robots. II. A review of map-learning and path-planning strategies. J Cogn Syst Res. 2003;4(4):283–317.

    Article  Google Scholar 

  11. Sutton RS, Barto AG. Reinforcement learning an introduction. Cambridge, MA: MIT Press; 1998.

    Google Scholar 

  12. Wörgötter F. Actor-Critic models of animal control—a critique of reinforcement learning. Proceeding of Fourth International ICSC Symposium on Engineering of Intelligent Systems; 2004.

  13. Sierra C, Mantaras RL, Busquets D. Multiagent bidding bechanisms for robot qualitative navigation. Lect Notes Comput Sci. 2002;1986:198–205.

    Article  Google Scholar 

  14. Gaskett C, Fletcher L, Zelinsky A. Reinforcement learning for visual servoing of a mobile robot. Proceedings of the Australian Conference on Robotics and Automation; 2000.

  15. Bellman R. Adaptive control process: a guided tour. Princeton: Princeton University Press; 1961.

    Google Scholar 

  16. Lighthill J. Artificial intelligence: a general survey. Artificial Intelligence: A Paper Symposium. Science Research Council; 1973.

  17. Weber C, Wermter S, Zochios A. Robot docking with neural vision and reinforcement. Knowl Based Syst. 2004;12(2–4):165–72.

    Article  Google Scholar 

  18. Kaelbling LP, Littman ML, Moore AW. Reinforcement learning: a survey. J Artif Intell Res. 1996;4:237–85.

    Google Scholar 

  19. Pavlov IP. Conditioned reflexes: an investigation of the physiological activity of the cerebral cortex; 1927. http://psychclassics.yorku.ca/Pavlov/.

  20. Barto AG, Mahadevan S. Recent advances in hierarchical reinforcement learning. Discrete Event Dynamcal Systems: Theory Appl. 2003;13:341–79.

    Article  Google Scholar 

  21. Stringer SM, Rolls ET, Taylor P. Learning movement sequences with a delayed reward signal in a hierarchical model of motor function. Neural Netw. 2007;20:172–81.

    Article  PubMed  CAS  Google Scholar 

  22. Tham CK. Reinforcement learning of multiple tasks using a hierarchical CMAC architecture. Robot Auton Syst. 1995;15:247–74.

    Article  Google Scholar 

  23. Morimoto J, Doya K. Acquisition of stand-up behaviour by a real robot using hierarchical reinforcement learning. Robot Auton Syst. 2001;36(1):37–51.

    Article  Google Scholar 

  24. Singh S, Barto A, Chentanez N. Intrinsically motivated reinforcement learning. Proceedings of Neural Image Processing Systems Foundation; 2005.

  25. Konidaris GD, Barto AG. Autonomous shaping: knowledge transfer in reinforcement learning. Proceedings of the Twenty-Third International Conference on Machine Learning; 2006. p. 489–96.

  26. Smart WD, Kaelbling LP. Reinforcement learning for robot control. Proc SPIE: Mobile Robots XVI. 2001;4573:92–103.

  27. Wolpert DM, Ghahramani Z, Flanagan JR. Perspectives and problems in motor learning. Trends Cogn Sci. 2001;5(11):487–94.

    Article  PubMed  Google Scholar 

  28. Mitchell RJ, Keating DA, Goodhew ICB, Bishop JM. Multiple neural network control of simple mobile robot. Proceedings of the 4th IEEE Mediterranean Symposium on New Directions in Control and Automation; 1996. p. 271–5.

  29. Walter WG. A machine that learns. Sci Am. 1951;184(8):60–3.

    Google Scholar 

  30. Foster DJ, Morris RGN, Dayan P. A model of hippocampally dependent navigation, using the temporal learning rule. Hippocampus. 2000;10:1–16.

    Article  PubMed  CAS  Google Scholar 

  31. Singh SS, Tadic VB, Doucet A. A policy gradient method for semi-Markov decision processes with application to call admission control. Eur J Oper Res. 2007;178:808–18.

    Article  Google Scholar 

Download references

Acknowledgements

Early stages of this study were supported partially by the MirrorBot project and NestCom projects coordinated by Prof. Wermter. Thanks go to Kim Forster for her constant support and encouragement, Dr. Kevin Burn for discussions and Chris Rowan who assisted in the setup of the robots and experiments.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Stefan Wermter.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Muse, D., Wermter, S. Actor-Critic Learning for Platform-Independent Robot Navigation. Cogn Comput 1, 203–220 (2009). https://doi.org/10.1007/s12559-009-9021-z

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s12559-009-9021-z

Keywords

Navigation