Skip to main content

Advertisement

Log in

A feature selection method for a sample-based stochastic policy

  • Original Article
  • Published:
Artificial Life and Robotics Aims and scope Submit manuscript

Abstract

Stochastic policy gradient methods have been applied to a variety of robot control tasks such as robot’s acquisition of motor skills because they have an advantage in learning in high-dimensional and continuous feature spaces by combining some heuristics like motor primitives. However, when we apply one of them to a real-world task, it is difficult to represent the task well by designing the policy function and the feature space due to the lack of enough prior knowledge about the task. In this research, we propose a method to extract a preferred feature space autonomously to achieve a task using a stochastic policy gradient method for a sample-based policy. We apply our method to a control of linear dynamical system and the computer simulation result shows that a desirable controller is obtained and that the performance of the controller is improved by the feature selection.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8

Similar content being viewed by others

References

  1. Baxter J, Bartlett PL (2001) Infinite-horizon policy-gradient estimation. J Artif Intell Res 15:319–350

    MATH  MathSciNet  Google Scholar 

  2. Bishop CM, Nasrabadi NM (2006) Pattern recognition and machine learning, vol. 1. Springer, New York

  3. Deisenroth MP, Rasmussen CE (2011) Pilco: A model-based and data-efficient approach to policy search. In: Proceedings of the 28th International Conference on machine learning (ICML-11), pp 465–472

  4. Kanda T, Shiomi M, Miyashita Z, Ishiguro H, Hagita N (2009) An affective guide robot in a shopping mall. In: Proceedings of the 4th ACM/IEEE international conference on Human robot interaction, pp 173–180. ACM

  5. Kimura H, Kobayashi S (1998) An analysis of actor/critic algorithms using eligibility traces: Reinforcement learning with imperfect value function. In: Proceedings of the Fifteenth International Conference on Machine Learning, pp 278–286. Morgan Kaufmann Publishers Inc.

  6. Levine S, Popovic Z, Koltun V (2011) Nonlinear inverse reinforcement learning with Gaussian processes. In: Advances in Neural Information Processing Systems, pp 19–27

  7. Mitsunaga N, Smith C, Kanda T, Ishiguro H, Hagita N (2005) Robot behavior adaptation for human-robot interaction based on policy gradient reinforcement learning. In: Intelligent Robots and Systems, 2005. (IROS 2005). 2005 IEEE/RSJ International Conference on, pp 218–225. IEEE

  8. Mori T, Nakamura Y, Sato MA, Ishii S (2004) Reinforcement learning for CPG-driven biped robot. In: AAAI, pp 623–630

  9. Okadome Y, Nakamura Y, Ishiguro H (2012) Control method for a redundant robot using stored instances. In: Proceedings of the International Symposium on Artificial life and robotics (AROB17th), pp 1123–1126

  10. Peters J, Schaal S (2008) Natural actor-critic. Neurocomputing 71(7):1180–1190

    Article  Google Scholar 

  11. Peters J, Schaal S (2008) Reinforcement learning of motor skills with policy gradients. Neural Netw 21(4):682–697

    Article  Google Scholar 

  12. Rasmussen CE, Williams C (2006) Gaussian processes for machine learning. MIT Press

  13. Sutton RS, Barto AG (1998) Reinforcement learning: an introduction. Adaptive computation and machine learning. MIT Press

  14. Sutton RS, McAllester D, Singh S, Mansour Y (2000) Policy gradient methods for reinforcement learning with function approximation. Adv Neural Inf Process Systems 12(22):1057–1063

    Google Scholar 

  15. Williams RJ (1992) Simple statistical gradient-following algorithms for connectionist reinforcement learning. Mach learn 8(3):229–256

    MATH  Google Scholar 

Download references

Acknowledgments

This work was partly supported by JSPS KAKENHI Grant Number 26730136.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Yutaka Nakamura.

Additional information

This work was presented in part at the 19th International Symposium on Artificial Life and Robotics, Beppu, Oita, January 22–24, 2014.

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Yamanaka, J., Nakamura, Y. & Ishiguro, H. A feature selection method for a sample-based stochastic policy. Artif Life Robotics 19, 251–257 (2014). https://doi.org/10.1007/s10015-014-0158-9

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10015-014-0158-9

Keywords

Navigation