Abstract
This article describes an approach in the field of reinforcement learning for robot control and a new Modular Actor-Critic architecture which supports platform-independent robot control. The architecture is tested on a landmark approaching task using movable pan/tilt cameras which successfully control both a large PeopleBot and a small Sony Aibo robot to perform the navigation task, with no retraining required. The architecture provides insight into the skills transfer between different robotic platforms and the modularisation of the architecture derived from splitting the control tasks into their component parts. The architecture and underlying principles could be used in rapid prototyping of new robotic platforms, where an already functioning control system can be used to allow more sophisticated navigation.
Similar content being viewed by others
References
Busquets D, Mantaras RL, Sierra C, Ditterich TG. Reinforcement learning for landmark-based robot navigation. Proceedings of the International Conference on Autonomous Agents and Multiagent Systems; 2002.
Hafner R, Riedmiller M. Reinforcement learning on an omni-directional mobile robot. IEEE/RSJ International Conference on Intelligent Robots and Systems for Human Security, Health, and Prosperity; 2003.
Kondo T, Ito K. A reinforcement learning with evolutionary state recruitment strategy for autonomous mobile robot control. Robot Auton Syst. 2004;46:11–124.
Lee ISK, Lau HYK. Adaptive state space partitioning for reinforcement learning. Eng Appl Artif Intell. 2004;17:577–88.
Weber C, Muse D, Elshaw M, Wermter S. A camera-direction dependent visual-motor coordinate transformation for a visually guided neural robot. Applications and Innovations in Intelligent Systems XIII—International Conference on Innovative Techniques and Applications of Artificial Intelligence; 2005. p. 151–64.
Weber C, Muse D, Wermter S. Robot docking based on omni-directional vision and reinforcement learning. Research and Development in Intelligent Systems XXII—International Conference on Innovative Techniques and Applications of Artificial Intelligence; 2005. p. 23–36.
Wermter S, Palm G, Elshaw M. Biomimetic neural learning for intelligent robots. New York: Springer; 2005.
Wermter S, Page M, Knowles M, Gallese V, Pulvermüller F, Taylor J. Multimodal communication in animals, humans and robots: an introduction to perspectives in brain-inspired informatics. Neural Netw. 2009;22:111–5.
Filliat D, Meyer JA. Map-based navigation in mobile robots. I. A review of localization strategies. J Cogn Syst Res. 2003;4(4):243–82.
Filliat D, Meyer JA. Map-based navigation in mobile robots. II. A review of map-learning and path-planning strategies. J Cogn Syst Res. 2003;4(4):283–317.
Sutton RS, Barto AG. Reinforcement learning an introduction. Cambridge, MA: MIT Press; 1998.
Wörgötter F. Actor-Critic models of animal control—a critique of reinforcement learning. Proceeding of Fourth International ICSC Symposium on Engineering of Intelligent Systems; 2004.
Sierra C, Mantaras RL, Busquets D. Multiagent bidding bechanisms for robot qualitative navigation. Lect Notes Comput Sci. 2002;1986:198–205.
Gaskett C, Fletcher L, Zelinsky A. Reinforcement learning for visual servoing of a mobile robot. Proceedings of the Australian Conference on Robotics and Automation; 2000.
Bellman R. Adaptive control process: a guided tour. Princeton: Princeton University Press; 1961.
Lighthill J. Artificial intelligence: a general survey. Artificial Intelligence: A Paper Symposium. Science Research Council; 1973.
Weber C, Wermter S, Zochios A. Robot docking with neural vision and reinforcement. Knowl Based Syst. 2004;12(2–4):165–72.
Kaelbling LP, Littman ML, Moore AW. Reinforcement learning: a survey. J Artif Intell Res. 1996;4:237–85.
Pavlov IP. Conditioned reflexes: an investigation of the physiological activity of the cerebral cortex; 1927. http://psychclassics.yorku.ca/Pavlov/.
Barto AG, Mahadevan S. Recent advances in hierarchical reinforcement learning. Discrete Event Dynamcal Systems: Theory Appl. 2003;13:341–79.
Stringer SM, Rolls ET, Taylor P. Learning movement sequences with a delayed reward signal in a hierarchical model of motor function. Neural Netw. 2007;20:172–81.
Tham CK. Reinforcement learning of multiple tasks using a hierarchical CMAC architecture. Robot Auton Syst. 1995;15:247–74.
Morimoto J, Doya K. Acquisition of stand-up behaviour by a real robot using hierarchical reinforcement learning. Robot Auton Syst. 2001;36(1):37–51.
Singh S, Barto A, Chentanez N. Intrinsically motivated reinforcement learning. Proceedings of Neural Image Processing Systems Foundation; 2005.
Konidaris GD, Barto AG. Autonomous shaping: knowledge transfer in reinforcement learning. Proceedings of the Twenty-Third International Conference on Machine Learning; 2006. p. 489–96.
Smart WD, Kaelbling LP. Reinforcement learning for robot control. Proc SPIE: Mobile Robots XVI. 2001;4573:92–103.
Wolpert DM, Ghahramani Z, Flanagan JR. Perspectives and problems in motor learning. Trends Cogn Sci. 2001;5(11):487–94.
Mitchell RJ, Keating DA, Goodhew ICB, Bishop JM. Multiple neural network control of simple mobile robot. Proceedings of the 4th IEEE Mediterranean Symposium on New Directions in Control and Automation; 1996. p. 271–5.
Walter WG. A machine that learns. Sci Am. 1951;184(8):60–3.
Foster DJ, Morris RGN, Dayan P. A model of hippocampally dependent navigation, using the temporal learning rule. Hippocampus. 2000;10:1–16.
Singh SS, Tadic VB, Doucet A. A policy gradient method for semi-Markov decision processes with application to call admission control. Eur J Oper Res. 2007;178:808–18.
Acknowledgements
Early stages of this study were supported partially by the MirrorBot project and NestCom projects coordinated by Prof. Wermter. Thanks go to Kim Forster for her constant support and encouragement, Dr. Kevin Burn for discussions and Chris Rowan who assisted in the setup of the robots and experiments.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Muse, D., Wermter, S. Actor-Critic Learning for Platform-Independent Robot Navigation. Cogn Comput 1, 203–220 (2009). https://doi.org/10.1007/s12559-009-9021-z
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s12559-009-9021-z