A feature selection method for a sample-based stochastic policy

Yamanaka, Jumpei; Nakamura, Yutaka; Ishiguro, Hiroshi

doi:10.1007/s10015-014-0158-9

A feature selection method for a sample-based stochastic policy

Original Article
Published: 05 September 2014

Volume 19, pages 251–257, (2014)
Cite this article

Artificial Life and Robotics Aims and scope Submit manuscript

Jumpei Yamanaka¹,
Yutaka Nakamura¹ &
Hiroshi Ishiguro¹

142 Accesses
Explore all metrics

Abstract

Stochastic policy gradient methods have been applied to a variety of robot control tasks such as robot’s acquisition of motor skills because they have an advantage in learning in high-dimensional and continuous feature spaces by combining some heuristics like motor primitives. However, when we apply one of them to a real-world task, it is difficult to represent the task well by designing the policy function and the feature space due to the lack of enough prior knowledge about the task. In this research, we propose a method to extract a preferred feature space autonomously to achieve a task using a stochastic policy gradient method for a sample-based policy. We apply our method to a control of linear dynamical system and the computer simulation result shows that a desirable controller is obtained and that the performance of the controller is improved by the feature selection.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Local and soft feature selection for value function approximation in batch reinforcement learning for robot navigation

Article 03 January 2024

A Gradient-Based Learning Algorithm for Mobile Robot Path Planning in Environment Exploration

Learning motions from demonstrations and rewards with time-invariant dynamical systems based policies

Article 04 May 2017

References

Baxter J, Bartlett PL (2001) Infinite-horizon policy-gradient estimation. J Artif Intell Res 15:319–350
MATH MathSciNet Google Scholar
Bishop CM, Nasrabadi NM (2006) Pattern recognition and machine learning, vol. 1. Springer, New York
Deisenroth MP, Rasmussen CE (2011) Pilco: A model-based and data-efficient approach to policy search. In: Proceedings of the 28th International Conference on machine learning (ICML-11), pp 465–472
Kanda T, Shiomi M, Miyashita Z, Ishiguro H, Hagita N (2009) An affective guide robot in a shopping mall. In: Proceedings of the 4th ACM/IEEE international conference on Human robot interaction, pp 173–180. ACM
Kimura H, Kobayashi S (1998) An analysis of actor/critic algorithms using eligibility traces: Reinforcement learning with imperfect value function. In: Proceedings of the Fifteenth International Conference on Machine Learning, pp 278–286. Morgan Kaufmann Publishers Inc.
Levine S, Popovic Z, Koltun V (2011) Nonlinear inverse reinforcement learning with Gaussian processes. In: Advances in Neural Information Processing Systems, pp 19–27
Mitsunaga N, Smith C, Kanda T, Ishiguro H, Hagita N (2005) Robot behavior adaptation for human-robot interaction based on policy gradient reinforcement learning. In: Intelligent Robots and Systems, 2005. (IROS 2005). 2005 IEEE/RSJ International Conference on, pp 218–225. IEEE
Mori T, Nakamura Y, Sato MA, Ishii S (2004) Reinforcement learning for CPG-driven biped robot. In: AAAI, pp 623–630
Okadome Y, Nakamura Y, Ishiguro H (2012) Control method for a redundant robot using stored instances. In: Proceedings of the International Symposium on Artificial life and robotics (AROB17th), pp 1123–1126
Peters J, Schaal S (2008) Natural actor-critic. Neurocomputing 71(7):1180–1190
Article Google Scholar
Peters J, Schaal S (2008) Reinforcement learning of motor skills with policy gradients. Neural Netw 21(4):682–697
Article Google Scholar
Rasmussen CE, Williams C (2006) Gaussian processes for machine learning. MIT Press
Sutton RS, Barto AG (1998) Reinforcement learning: an introduction. Adaptive computation and machine learning. MIT Press
Sutton RS, McAllester D, Singh S, Mansour Y (2000) Policy gradient methods for reinforcement learning with function approximation. Adv Neural Inf Process Systems 12(22):1057–1063
Google Scholar
Williams RJ (1992) Simple statistical gradient-following algorithms for connectionist reinforcement learning. Mach learn 8(3):229–256
MATH Google Scholar

Download references

Acknowledgments

This work was partly supported by JSPS KAKENHI Grant Number 26730136.

Author information

Authors and Affiliations

1-3 Machikanayama, Toyonaka, Osaka, 560-8531, Japan
Jumpei Yamanaka, Yutaka Nakamura & Hiroshi Ishiguro

Authors

Jumpei Yamanaka
View author publications
You can also search for this author in PubMed Google Scholar
Yutaka Nakamura
View author publications
You can also search for this author in PubMed Google Scholar
Hiroshi Ishiguro
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Yutaka Nakamura.

Additional information

This work was presented in part at the 19th International Symposium on Artificial Life and Robotics, Beppu, Oita, January 22–24, 2014.

About this article

Cite this article

Yamanaka, J., Nakamura, Y. & Ishiguro, H. A feature selection method for a sample-based stochastic policy. Artif Life Robotics 19, 251–257 (2014). https://doi.org/10.1007/s10015-014-0158-9

Download citation

Received: 21 March 2014
Accepted: 07 August 2014
Published: 05 September 2014
Issue Date: November 2014
DOI: https://doi.org/10.1007/s10015-014-0158-9

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A feature selection method for a sample-based stochastic policy

Abstract

Access this article

Similar content being viewed by others

Local and soft feature selection for value function approximation in batch reinforcement learning for robot navigation

A Gradient-Based Learning Algorithm for Mobile Robot Path Planning in Environment Exploration

Learning motions from demonstrations and rewards with time-invariant dynamical systems based policies

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Additional information

About this article

Cite this article

Keywords

Navigation

A feature selection method for a sample-based stochastic policy

Abstract

Access this article

Similar content being viewed by others

Local and soft feature selection for value function approximation in batch reinforcement learning for robot navigation

A Gradient-Based Learning Algorithm for Mobile Robot Path Planning in Environment Exploration

Learning motions from demonstrations and rewards with time-invariant dynamical systems based policies

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Additional information

About this article

Cite this article

Share this article

Keywords

Search

Navigation