Policy Learning for Motor Skills

Peters, Jan; Schaal, Stefan

doi:10.1007/978-3-540-69162-4_25

Jan Peters^1,2 &
Stefan Schaal^2,3

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 4985))

Included in the following conference series:

International Conference on Neural Information Processing

1568 Accesses
5 Citations

Abstract

Policy learning which allows autonomous robots to adapt to novel situations has been a long standing vision of robotics, artificial intelligence, and cognitive sciences. However, to date, learning techniques have yet to fulfill this promise as only few methods manage to scale into the high-dimensional domains of manipulator robotics, or even the new upcoming trend of humanoid robotics, and usually scaling was only achieved in precisely pre-structured domains. In this paper, we investigate the ingredients for a general approach policy learning with the goal of an application to motor skill refinement in order to get one step closer towards human-like performance. For doing so, we study two major components for such an approach, i.e., firstly, we study policy learning algorithms which can be applied in the general setting of motor skill learning, and, secondly, we study a theoretically well-founded general approach to representing the required control structures for task representation and execution.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 129.00; Price excludes VAT (USA)

Softcover Book: USD 169.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Aberdeen, D.: POMDPs and policy gradients. In: Proceedings of the Machine Learning Summer School (MLSS), Canberra, Australia (2006)
Google Scholar
Aberdeen, D.A.: Policy-Gradient Algorithms for Partially Observable Markov Decision Processes. PhD thesis, Australian National Unversity (2003)
Google Scholar
Dayan, P., Hinton, G.E.: Using expectation-maximization for reinforcement learning. Neural Computation 9(2), 271–278 (1997)
Article MATH Google Scholar
Ijspeert, A., Nakanishi, J., Schaal, S.: Learning attractor landscapes for learning motor primitives. In: Becker, S., Thrun, S., Obermayer, K. (eds.) Advances in Neural Information Processing Systems, vol. 15, pp. 1547–1554. MIT Press, Cambridge (2003)
Google Scholar
Kakade, S.A.: Natural policy gradient. In: Advances in Neural Information Processing Systems, Vancouver, CA, vol. 14 (2002)
Google Scholar
Konda, V., Tsitsiklis, J.: Actor-critic algorithms. Advances in Neural Information Processing Systems 12 (2000)
Google Scholar
Peters, J.: The bias of the greedy update. Technical report, University of Southern California (2007)
Google Scholar
Peters, J., Mistry, M., Udwadia, F., Cory, R., Nakanishi, J., Schaal, S.: A unifying methodology for the control of robotic systems. In: Proceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Edmonton, Canada (2005)
Google Scholar
Peters, J., Schaal, S.: Learning operational space control. In: Proceedings of Robotics: Science and Systems (RSS), Philadelphia, PA (2006)
Google Scholar
Peters, J., Vijayakumar, S., Schaal, S.: Reinforcement learning for humanoid robotics. In: Proceedings of the IEEE-RAS International Conference on Humanoid Robots (HUMANOIDS), Karlsruhe, Germany (September 2003)
Google Scholar
Peters, J., Vijayakumar, S., Schaal, S.: Natural actor-critic. In: Gama, J., Camacho, R., Brazdil, P.B., Jorge, A.M., Torgo, L. (eds.) ECML 2005. LNCS (LNAI), vol. 3720, pp. 280–291. Springer, Heidelberg (2005)
Chapter Google Scholar
Richter, S., Aberdeen, D., Yu, J.: Natural actor-critic for road traffic optimisation. In: Schoelkopf, B., Platt, J.C., Hofmann, T. (eds.) Advances in Neural Information Processing Systems, vol. 19, MIT Press, Cambridge (2007)
Google Scholar
Schaal, S.: Dynamic movement primitives - a framework for motor control in humans and humanoid robots. In: Proceedings of the International Symposium on Adaptive Motion of Animals and Machines (2003)
Google Scholar
Schaal, S., Ijspeert, A., Billard, A.: Computational approaches to motor learning by imitation. In: Frith, C.D., Wolpert, D. (eds.) The Neuroscience of Social Interaction, pp. 199–218. Oxford University Press, Oxford (2004)
Google Scholar
Sciavicco, L., Siciliano, B.: Modeling and control of robot manipulators. MacGraw-Hill, Heidelberg (2007)
Google Scholar
Sutton, R.S., McAllester, D., Singh, S., Mansour, Y.: Policy gradient methods for reinforcement learning with function approximation. In: Solla, S.A., Leen, T.K., Mueller, K.-R. (eds.) Advances in Neural Information Processing Systems (NIPS), Denver, CO, MIT Press, Cambridge (2000)
Google Scholar

Download references

Author information

Authors and Affiliations

Max-Planck Institute for Biological Cybernetics, Spemannstr. 32, 72074, Tübingen,
Jan Peters
University of Southern California, Los Angeles, CA 90089, USA
Jan Peters & Stefan Schaal
ATR Computational Neuroscience Laboratory, Soraku-gun, Kyoto, 619-0288, Japan
Stefan Schaal

Authors

Jan Peters
View author publications
You can also search for this author in PubMed Google Scholar
Stefan Schaal
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Masumi Ishikawa Kenji Doya Hiroyuki Miyamoto Takeshi Yamakawa

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Peters, J., Schaal, S. (2008). Policy Learning for Motor Skills. In: Ishikawa, M., Doya, K., Miyamoto, H., Yamakawa, T. (eds) Neural Information Processing. ICONIP 2007. Lecture Notes in Computer Science, vol 4985. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-69162-4_25

Download citation

DOI: https://doi.org/10.1007/978-3-540-69162-4_25
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-69159-4
Online ISBN: 978-3-540-69162-4
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics