Policy Learning – A Unified Perspective with Applications in Robotics

Peters, Jan; Kober, Jens; Nguyen-Tuong, Duy

doi:10.1007/978-3-540-89722-4_17

Policy Learning – A Unified Perspective with Applications in Robotics

Jan Peters^3,4,
Jens Kober³ &
Duy Nguyen-Tuong³

Conference paper

1110 Accesses
3 Citations

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 5323))

Abstract

Policy Learning approaches are among the best suited methods for high-dimensional, continuous control systems such as anthropomorphic robot arms and humanoid robots. In this paper, we show two contributions: firstly, we show a unified perspective which allows us to derive several policy learning algorithms from a common point of view, i.e, policy gradient algorithms, natural-gradient algorithms and EM-like policy learning. Secondly, we present several applications to both robot motor primitive learning as well as to robot control in task space. Results both from simulation and several different real robots are shown.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Aberdeen, D.: POMDPs and policy gradients. In: Proceedings of the Machine Learning Summer School (MLSS), Canberra, Australia (2006)
Google Scholar
Aberdeen, D.A.: Policy-Gradient Algorithms for Partially Observable Markov Decision Processes. Ph.D thesis, Australian National Unversity (2003)
Google Scholar
Benbrahim, H., Doleac, J., Franklin, J., Selfridge, O.: Real-time learning: A ball on a beam. In: Proceedings of the International Joint Conference on Neural Networks (IJCNN), Baltimore, MD (1992)
Google Scholar
Benbrahim, H., Franklin, J.: Biped dynamic walking using reinforcement learning. Robotics and Autonomous Systems 22, 283–302 (1997)
Article Google Scholar
Dayan, P., Hinton, G.E.: Using expectation-maximization for reinforcement learning. Neural Computation 9(2), 271–278 (1997)
Article MATH Google Scholar
Endo, G., Morimoto, J., Matsubara, T., Nakanishi, J., Cheng, G.: Learning cpg sensory feedback with policy gradient for biped locomotion for a full-body humanoid. In: Proceedings of the National Conference on Artificial Intelligence (AAAI), Pittsburgh, PA (2005)
Google Scholar
Geng, T., Porr, B., Wörgötter, F.: Fast biped walking with a reflexive neuronal controller and real-time online learning. In: Int. Journal of Robotics Res. (submitted, 2005)
Google Scholar
Gullapalli, V., Franklin, J., Benbrahim, H.: Aquiring robot skills via reinforcement learning. IEEE Control Systems Journal, Special Issue on Robotics: Capturing Natural Motion 4(1), 13–24 (1994)
Google Scholar
Kakade, S.A.: Natural policy gradient. In: Advances in Neural Information Processing Systems, Vancouver, CA, vol. 14 (2002)
Google Scholar
Kimura, H., Kobayashi, S.: Reinforcement learning for locomotion of a two-linked robot arm. In: Birk, A., Demiris, J. (eds.) EWLR 1997. LNCS, vol. 1545, pp. 144–153. Springer, Heidelberg (1998)
Google Scholar
Kimura, H., Kobayashi, S.: Reinforcement learning for continuous action using stochastic gradient ascent. In: Proceedings of the International Conference on Intelligent Autonomous Systems (IAS), Madison, Wisconsin, vol. 5, pp. 288–295 (1998)
Google Scholar
Kober, J., Peters, J.: Reinforcement learning of perceptual coupling for motor primitives. In: The European Workshop on Reinforcement Learning, EWRL (submitted, 2008)
Google Scholar
Kohl, N., Stone, P.: Policy gradient reinforcement learning for fast quadrupedal locomotion. In: Proceedings of the IEEE International Conference on Robotics and Automation (ICRA), New Orleans, LA (May 2004)
Google Scholar
Konda, V., Tsitsiklis, J.: Actor-critic algorithms. Advances in Neural Information Processing Systems 12 (2000)
Google Scholar
Mitsunaga, N., Smith, C., Kanda, T., Ishiguro, H., Hagita, N.: Robot behavior adaptation for human-robot interaction based on policy gradient reinforcement learning. In: Proceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Edmonton, Canada, pp. 1594–1601 (2005)
Google Scholar
Mori, T., Nakamura, Y., Sato, M.-a., Ishii, S.: Reinforcement learning for cpg-driven biped robot. In: Proceedings of the National Conference on Artificial Intelligence (AAAI), San Jose, CA, pp. 623–630 (2004)
Google Scholar
Peters, J.: The bias of the greedy update. Technical report, University of Southern California (2007)
Google Scholar
Peters, J., Schaal, S.: Learning operational space control. In: Proceedings of Robotics: Science and Systems (RSS), Philadelphia, PA (2006)
Google Scholar
Peters, J., Vijayakumar, S., Schaal, S.: Reinforcement learning for humanoid robotics. In: Proceedings of the IEEE-RAS International Conference on Humanoid Robots (HUMANOIDS), Karlsruhe, Germany (September 2003)
Google Scholar
Peters, J., Vijayakumar, S., Schaal, S.: Natural actor-critic. In: Gama, J., Camacho, R., Brazdil, P.B., Jorge, A.M., Torgo, L. (eds.) ECML 2005. LNCS, vol. 3720, pp. 280–291. Springer, Heidelberg (2005)
Chapter Google Scholar
Richter, S., Aberdeen, D., Yu, J.: Natural actor-critic for road traffic optimisation. In: Schoelkopf, B., Platt, J.C., Hofmann, T. (eds.) Advances in Neural Information Processing Systems, vol. 19. MIT Press, Cambridge (2007)
Google Scholar
Sato, M.-a., Nakamura, Y., Ishii, S.: Reinforcement learning for biped locomotion. In: Dorronsoro, J.R. (ed.) ICANN 2002. LNCS, vol. 2415, pp. 777–782. Springer, Heidelberg (2002)
Chapter Google Scholar
Sutton, R.S., McAllester, D., Singh, S., Mansour, Y.: Policy gradient methods for reinforcement learning with function approximation. In: Solla, S.A., Leen, T.K., Mueller, K.-R. (eds.) Advances in Neural Information Processing Systems (NIPS), Denver, CO. MIT Press, Cambridge (2000)
Google Scholar
Tedrake, R., Zhang, T.W., Seung, H.S.: Learning to walk in 20 minutes. In: Proceedings of the Yale Workshop on Adaptive and Learning Systems. Yale University, New Haven (2005)
Google Scholar

Download references

Author information

Authors and Affiliations

Max-Planck Institute for Biological Cybernetics, Spemannstr. 32, 72074, Tübingen
Jan Peters, Jens Kober & Duy Nguyen-Tuong
University of Southern California, Los Angeles, CA 90089, USA
Jan Peters

Authors

Jan Peters
View author publications
You can also search for this author in PubMed Google Scholar
Jens Kober
View author publications
You can also search for this author in PubMed Google Scholar
Duy Nguyen-Tuong
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

INRIA Lille-Nord Europe, 59650, Villeneuve d’Ascq, France
Sertan Girgin
INRIA, LIFL, CNRS, Université de Lille, Villeneuve d’Ascq, France
Manuel Loth , Rémi Munos , Philippe Preux & Daniil Ryabko , , &

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Peters, J., Kober, J., Nguyen-Tuong, D. (2008). Policy Learning – A Unified Perspective with Applications in Robotics. In: Girgin, S., Loth, M., Munos, R., Preux, P., Ryabko, D. (eds) Recent Advances in Reinforcement Learning. EWRL 2008. Lecture Notes in Computer Science(), vol 5323. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-89722-4_17

Download citation

DOI: https://doi.org/10.1007/978-3-540-89722-4_17
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-89721-7
Online ISBN: 978-3-540-89722-4
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics