Abstract
Stochastic policy gradient methods have been applied to a variety of robot control tasks such as robot’s acquisition of motor skills because they have an advantage in learning in high-dimensional and continuous feature spaces by combining some heuristics like motor primitives. However, when we apply one of them to a real-world task, it is difficult to represent the task well by designing the policy function and the feature space due to the lack of enough prior knowledge about the task. In this research, we propose a method to extract a preferred feature space autonomously to achieve a task using a stochastic policy gradient method for a sample-based policy. We apply our method to a control of linear dynamical system and the computer simulation result shows that a desirable controller is obtained and that the performance of the controller is improved by the feature selection.








Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.References
Baxter J, Bartlett PL (2001) Infinite-horizon policy-gradient estimation. J Artif Intell Res 15:319–350
Bishop CM, Nasrabadi NM (2006) Pattern recognition and machine learning, vol. 1. Springer, New York
Deisenroth MP, Rasmussen CE (2011) Pilco: A model-based and data-efficient approach to policy search. In: Proceedings of the 28th International Conference on machine learning (ICML-11), pp 465–472
Kanda T, Shiomi M, Miyashita Z, Ishiguro H, Hagita N (2009) An affective guide robot in a shopping mall. In: Proceedings of the 4th ACM/IEEE international conference on Human robot interaction, pp 173–180. ACM
Kimura H, Kobayashi S (1998) An analysis of actor/critic algorithms using eligibility traces: Reinforcement learning with imperfect value function. In: Proceedings of the Fifteenth International Conference on Machine Learning, pp 278–286. Morgan Kaufmann Publishers Inc.
Levine S, Popovic Z, Koltun V (2011) Nonlinear inverse reinforcement learning with Gaussian processes. In: Advances in Neural Information Processing Systems, pp 19–27
Mitsunaga N, Smith C, Kanda T, Ishiguro H, Hagita N (2005) Robot behavior adaptation for human-robot interaction based on policy gradient reinforcement learning. In: Intelligent Robots and Systems, 2005. (IROS 2005). 2005 IEEE/RSJ International Conference on, pp 218–225. IEEE
Mori T, Nakamura Y, Sato MA, Ishii S (2004) Reinforcement learning for CPG-driven biped robot. In: AAAI, pp 623–630
Okadome Y, Nakamura Y, Ishiguro H (2012) Control method for a redundant robot using stored instances. In: Proceedings of the International Symposium on Artificial life and robotics (AROB17th), pp 1123–1126
Peters J, Schaal S (2008) Natural actor-critic. Neurocomputing 71(7):1180–1190
Peters J, Schaal S (2008) Reinforcement learning of motor skills with policy gradients. Neural Netw 21(4):682–697
Rasmussen CE, Williams C (2006) Gaussian processes for machine learning. MIT Press
Sutton RS, Barto AG (1998) Reinforcement learning: an introduction. Adaptive computation and machine learning. MIT Press
Sutton RS, McAllester D, Singh S, Mansour Y (2000) Policy gradient methods for reinforcement learning with function approximation. Adv Neural Inf Process Systems 12(22):1057–1063
Williams RJ (1992) Simple statistical gradient-following algorithms for connectionist reinforcement learning. Mach learn 8(3):229–256
Acknowledgments
This work was partly supported by JSPS KAKENHI Grant Number 26730136.
Author information
Authors and Affiliations
Corresponding author
Additional information
This work was presented in part at the 19th International Symposium on Artificial Life and Robotics, Beppu, Oita, January 22–24, 2014.
About this article
Cite this article
Yamanaka, J., Nakamura, Y. & Ishiguro, H. A feature selection method for a sample-based stochastic policy. Artif Life Robotics 19, 251–257 (2014). https://doi.org/10.1007/s10015-014-0158-9
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10015-014-0158-9