Abstract
A Policy Gradient Reinforcement Learning (RL) technique is used to design the low level controllers that drives the joints of articulated mobile robots: A search in the controller’s parameters space. There is an unknown value function that measures the quality of the controller respect to the parameters of it. The search is orientated by the approximation of the gradient of the value function. The approximation is made by means of the robot experiences and then the behaviors emerge. This technique is employed in a structure that processes sensor information to achieve coordination. The structure is based on a modularization principle in which complex overall behavior is the result of the interaction of individual ‘simple’ components. The simple components used are standard low level controllers (PID) which output is combined, sharing information between articulations and therefore taking integrated control actions. Modularization and Learning are cognitive features, here we endow the robots with this features. Learning experiences in simulated robots are presented as demonstration.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Beer, R.D.: Beyond control: The dynamics of brain-body-environment interaction in motor systems. In: Sternad, D. (ed.) Progress in Motor Control V: A Multidisciplinary Perspective, Pennsylvania (2005)
Cliff, D.: Biologically-inspired computing approaches to cognitive systems: A partial tour of the literature. Technical report, HP Labs. (2003)
The Cognitive Robot Companion Cogniron. European Project Consortium, http://www.cogniron.org
Flaxman, A.D., Kalai, A.T., McMahan, H.B.: Online convex optimization in the bandit setting: gradient descent without a gradient. In: SODA ’05: Proceedings of the sixteenth annual ACM-SIAM symposium on Discrete algorithms, Vancouver, British Columbia, pp. 385–394. Society for Industrial and Applied Mathematics, Philadelphia (2005)
Fujita, M., Kitano, H.: Development of an autonomous quadruped robot for robot entertainment. Autonomous Robots 5(1), 7–18 (1998)
Kaneko, K., Kanehiro, F., Kajita, S., Hirukawa, H., Kawasaki, T., Hirata, M., Akachi, K., Isozumi, T.: Humanoid robot hrp-2. In: Proceedings of the IEEE International Conference on Robotics and Automation, ICRA, IEEE Computer Society Press, Los Alamitos (2004)
Kuroki, Y., Blank, B., Mikami, T., Mayeux, P., Miyamoto, A., Playter, R., Nagasaya, K., Raibert, M., Nagano, M., Yamaguchi, J.: A motion creating system for a small biped entretainment robot. In: Proceedings of the IEEE International Conference on Intelligent Robots and Systems, IROS, IEEE Computer Society Press, Los Alamitos (2003)
Lewis, R.M., Torezon, V., Trosset, M.W.: Direct search methods: Then and now. Journal of Computational and applied Mathematics 124, 191–207 (2000)
Michel, O.: Webots: Professional mobile robot simulation. Journal of Advanced Robotics Systems 1(1), 39–42 (2004), http://www.ars-journal.com/ars/SubscriberArea/Volume1/39-42.pdf
Morimoto, J., Doya, K.: Acquisition of stand-up behavior by a real robot using hierarchical reinforcement learning. In: Proceedings of the Seventeenth International Conference on Machine Learning, ICML, San Francisco, California, USA, pp. 623–630. Morgan Kaufmann, San Francisco (2000)
Michael, T.: Rosenstein. Learning to exploit dynamics for robot motor coordination. PhD thesis, University of Massachusetts, Amherst (May 2003)
Rosenstein, M.T., Barto, A.G.: Robot weightlifting by direct policy search. In: Proceedings of the IEEE International Conference on Artificial Intelligence, IJCAI, pp. 839–846. IEEE Computer Society Press, Los Alamitos (2001), citeseer.ist.psu.edu/rosenstein01robot.html
Sutton, R., McAllester, D., Singh, S., Mansour, Y.: Policy gradient methods for reinforcement learning with function approximation. In: Advances in Neural Information Processing Systems, vol. 12, pp. 1057–1063. MIT Press, Cambridge (2000)
Sutton, R.S., Barto, A.G., Williams, R.J.: Reinforcement learning is direct adaptive optimal control. IEEE Control Systems Magazine 12(2), 19–22 (1992)
Tedrake, R.L.: Applied Optimal Control for Dynamically Stable Legged Locomotion. PhD thesis, Electrical Engineering and Computer Science, MIT (2004)
Williams, R.J.: Simple statistical gradient-following algorithms for connectionist reinforcement learning. Machine Learning 8, 229–256 (1992), citeseer.ist.psu.edu/williams92simple.html
Author information
Authors and Affiliations
Editor information
Rights and permissions
Copyright information
© 2007 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Pardo Ayala, D.E., Angulo Bahón, C. (2007). Emerging Behaviors by Learning Joint Coordination in Articulated Mobile Robots. In: Sandoval, F., Prieto, A., Cabestany, J., Graña, M. (eds) Computational and Ambient Intelligence. IWANN 2007. Lecture Notes in Computer Science, vol 4507. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-73007-1_97
Download citation
DOI: https://doi.org/10.1007/978-3-540-73007-1_97
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-73006-4
Online ISBN: 978-3-540-73007-1
eBook Packages: Computer ScienceComputer Science (R0)