Natural Policy Gradient Reinforcement Learning for a CPG Control of a Biped Robot

Nakamura, Yutaka; Mori, Takeshi; Ishii, Shin

doi:10.1007/978-3-540-30217-9_98

Natural Policy Gradient Reinforcement Learning for a CPG Control of a Biped Robot

Yutaka Nakamura^26,27,
Takeshi Mori²⁷ &
Shin Ishii^26,27

Conference paper

3025 Accesses
9 Citations

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 3242))

Abstract

Motivated by the perspective that animals’ rhythmic movements such as locomotion are controlled by neural circuits called central pattern generators (CPGs), motor control mechanisms by CPG have been studied. As an autonomous learning framework for a CPG controller, we previously proposed a reinforcement learning (RL) method called the CPG-actor-critic method. In this article, we propose a natural policy gradient learning algorithm for the CPG-actor-critic method, and applied our RL to an automatic control problem by a biped robot simulator. Computer simulations show that our RL makes the biped robot walk stably on various terrain.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 74.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Grillner, S., Wallen, P., Brodin, L., Lansner, A.: Neuronal network generating locomotor behavior in lamprey: circuitry, transmitters, membrane properties and simulations. Annual Review of Neuroscience 14, 169–199 (1991)
Article Google Scholar
Taga, G., Yamaguchi, Y., Shimizu, H.: Self-organized control of bipedal locomotion by neural oscillators in unpredictable environment. Biological Cybernetics 65, 147–159 (1991)
Article MATH Google Scholar
Sato, M., Nakamura, Y., Ishii, S.: Reinforcement learning for biped locomotion. In: Dorronsoro, J.R. (ed.) ICANN 2002. LNCS, vol. 2415, pp. 777–782. Springer, Heidelberg (2002)
Chapter Google Scholar
Sutton, R.S., Barto, A.G.: Reinforcement Learning: An Introduction. MIT Press, Cambridge (1998)
Google Scholar
Konda, V.R., Tsitsiklis, J.N.: Actor-critic algorithms. SIAM Journal on Control and Optimization 42, 1143–1146 (2003)
Article MATH MathSciNet Google Scholar
Sutton, R.S., McAllester, D., Singh, S., Manour, Y.: Policy gradient method for reinforcement learning with function approximation. In: Proceedings of the 1998 IEEE International Conference on Robotics & Automation (2000)
Google Scholar
Kakade, S.: A natural policy gradient. Advances in Neural Information Processing Systems 14, 1531–1538 (2001)
Google Scholar
Peters, J., Vijayakumar, S., Schaal, S.: Reinforcement learning for humanoid robotics. In: Third IEEE International Conference on Humanoid Robotics 2003, Germany (2003)
Google Scholar
Sato, M., Ishii, S.: Reinforcement learning based on on-line em algorithm. Advances in Neural Information Processing Systems 11, 1052–1058 (1999)
Google Scholar
Bradtke, S.J., Barto, A.G.: Linear least-squares algorithms for temporal difference learning. Machine Learning 22, 33–57 (1996)
MATH Google Scholar
Lagoudakis, M.G., Parr, R., Littman, M.L.: Least-squares methods in reinforcement learning for control. In: SETN, pp. 249–260 (2002)
Google Scholar

Download references

Author information

Authors and Affiliations

CREST, JST,
Yutaka Nakamura & Shin Ishii
Nara Institute of Science and Technology, 8916-5 Takayama-cho, Ikoma, Nara, 630-0192, Japan
Yutaka Nakamura, Takeshi Mori & Shin Ishii

Authors

Yutaka Nakamura
View author publications
You can also search for this author in PubMed Google Scholar
Takeshi Mori
View author publications
You can also search for this author in PubMed Google Scholar
Shin Ishii
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

University of Birmingham, Birmingham, UK
Xin Yao
Automated Scheduling, Optimisation and Planning Group, School of Computer Science & IT, University of Nottingham, NG8 1BB, Nottingham, UK
Edmund K. Burke
Intelligent Systems Group Department of Computer Science and Artificial Intelligence, University of the Basque Country, Paseo Manuel de Lardizábal 1, 20018, San Sebastian, Donostia, Spain
José A. Lozano
University of the West of England, Bristol, United Kingdom
Jim Smith
GeNeura Team, Depto. Arquitectura y Tecnología de Computadores, Universidad de Granada, Spain
Juan Julián Merelo-Guervós
School of Computer Science, The University of Birmingham, B15 1TT, Edgbaston, Birmingham, UK
John A. Bullinaria
School of Computer Science, University of Birmingham, B15 2TT, Birmingham, Great Britain
Jonathan E. Rowe
School of Computer Science, University of Birmingham, United Kingdom
Peter Tiňo
School of Computer Science, The University of Birmingham, B15 2TT, Birmingham, UK
Ata Kabán
Faculty of Computer Science, Dortmund University of Technology, 44221, Dortmund, Germany
Hans-Paul Schwefel

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Nakamura, Y., Mori, T., Ishii, S. (2004). Natural Policy Gradient Reinforcement Learning for a CPG Control of a Biped Robot. In: Yao, X., et al. Parallel Problem Solving from Nature - PPSN VIII. PPSN 2004. Lecture Notes in Computer Science, vol 3242. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-30217-9_98

Download citation

DOI: https://doi.org/10.1007/978-3-540-30217-9_98
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-23092-2
Online ISBN: 978-3-540-30217-9
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics