Expression of Continuous State and Action Spaces for <i>Q</i>-Learning Using Neural Networks and CMAC

Kazuaki Yamada

doi:10.20965/jrm.2012.p0330

single-rb.php

« previous

JRM Vol.24 No.2 pp. 330-339

doi: 10.20965/jrm.2012.p0330

(2012)

Paper:

Views over last 60 days: 617

Expression of Continuous State and Action Spaces for Q-Learning Using Neural Networks and CMAC

Kazuaki Yamada

Department of Mechanical Engineering, Toyo University, 2100 Kujirai, Kawagoe-shi, Saitama 350-8585, Japan

Received:

October 1, 2011

Accepted:

January 18, 2012

Published:

April 20, 2012

Keywords:

reinforcement learning, neural networks, CMAC, griddy Gibbs sampler, autonomous robots

Abstract

This paper proposes a new reinforcement learning algorithm that can learn, using neural networks and CMAC, a mapping function between highdimensional sensors and the motors of an autonomous robot. Conventional reinforcement learning algorithms require a lot of memory because they use lookup tables to describe high-dimensional mapping functions. Researchers have therefore tried to develop reinforcement learning algorithms that can learn the high-dimensional mapping functions. We apply the proposed method to an autonomous robot navigation problem and a multi-link robot arm reaching problem, and we evaluate the effectiveness of the method.

Cite this article as:

K. Yamada, “Expression of Continuous State and Action Spaces for Q-Learning Using Neural Networks and CMAC,” J. Robot. Mechatron., Vol.24 No.2, pp. 330-339, 2012.

Data files:

References

[1] L. P. Kaelbling, M. L. Littman, and A. W. Moore, “Reinforcement Learning: A Survey,” J. of Artificial Intelligence Research, Vol.4, pp. 237-285, 1996.
[2] R. S. Sutton and A. Barto, “Reinforcement Learning: An Introduction,” The MIT Press, 1998.
[3] Y. Takahashi, M. Asada, S. Noda, and K. Hosoda, “Sensor Space Segmentation for Mobile Robot Learning,” Proc. of ICMAS’96 Workshop on Learning, Interaction and Organizations inMultiagent Environment, 1996.
[4] T. Yairi, K. Hori, and S. Nakasuka, “Autonomous Reconstruction of State Space for Learning of Robot Behavior,” Proc. of IEEE/RSJ Int. Conf. on Intelligent Robots and Systems 2000, pp. 891-896, 2000.
[5] M. Asada, S. Noda, and K. Hosoda, “Action-Based Sensor Space Categorization for Robot Learning,” Proc. of IEEE/RSJ Int. Conf. on Intelligent Robots and Systems 1996, pp. 1502-1509, 1996.
[6] Y. Kashimura, A. Ueno, and S. Tatsumi, “A Continuous Action Space Representation by Particle Filter for Reinforcement Learning,” The 22nd Annual Conf. of the Japanese Society for Artificial Intelligence, 2008 (in Japanese).
[7] H. Kimura and S. Kobayashi, “An Analysis of Actor-Critic Algorithms using Eligibility Traces – Reinforcement Learning with Imperfect Value Function,” 15th Int. Conf. on Machine Learning, pp. 278-286, 1998.
[8] J.Morimoto and K. Doya, “Reinforcement learning of dynamic motor sequences: Learning to stand up,” Proc. of IEEE/RSJ Int. Conf. Intelligent Robots and Systems, Vol.3, pp. 1721-1726, 1998.
[9] K. Doya, “Reinforcement Learning in Continuous Time and Space,” Neural computation, Vol.12, pp. 219-245, 2000.
[10] K. Shibata, M. Sugisaka, and K. Ito, “Fast and Stable Learning in Direct-Vision-Based Reinforcement Learning,” Proc. of 6th Int. Symposium on Artificial Life and Robotics, Vol.1, pp. 200-203, 2001.
[11] J. Yoshimoto, S. Ishii, and M. Sato, “On-line EM reinforcement learning,” Proc. of IEEE-INNS-ENNS Int. Joint Conf. on Neural Networks (IJCNN2000), Vol.3, pp. 163-168, 2000.
[12] K. Yamada, “Network Parameter Setting for Reinforcement Learning Approaches Using Neural Networks,” J. of Advanced Computational Intelligence and Intelligent Informatics, Vol.15, No.7, pp. 822-830, 2011.
[13] H. Kimura, “Natural Gradient Actor-Critic Algorithms using Random Rectangular Coarse Coding,” The Society of Instrument and Control Engineers (SICE) Annual Conf. 2008, 2A17-1, pp. 2027-2034, 2008.
[14] Y. Omori, “Recent Developments in Markov Chain Monte Carlo Method,” J. of Japan Statistical Society, Vol.31, No.3, pp. 305-344, 2001 (in Japanese).
[15] S. Abe, “Neural Networks and Fuzzy Systems,” Springer, 1997.
[16] H. Shimodaira, “A Weight Value Initialization Method for Improving Learning Performance of the Back Propagation Algorithm in Neural Networks,” J. of Information Processing, Vol.35, No.10, pp. 2046-2053, 1994 (in Japanese).
[17] R. S. Sutton, “Generalization in Reinforcement Learning: Successful Examples Using Sparse Coarse Coding,” Advances in Neural Information Processing Systems, Vol.8, pp. 1038-1044, 1996.
[18] B. Sallans and G. E. Hinton, “Reinforcement Learning with Factored States and Actions,” J. of Machine Learning Research, Vol.5, pp. 1063-1088, 2004.

This article is published under a Creative Commons Attribution-NoDerivatives 4.0 Internationa License.

[1] [1] L. P. Kaelbling, M. L. Littman, and A. W. Moore, “Reinforcement Learning: A Survey,” J. of Artificial Intelligence Research, Vol.4, pp. 237-285, 1996.

[2] [2] R. S. Sutton and A. Barto, “Reinforcement Learning: An Introduction,” The MIT Press, 1998.

[3] [3] Y. Takahashi, M. Asada, S. Noda, and K. Hosoda, “Sensor Space Segmentation for Mobile Robot Learning,” Proc. of ICMAS’96 Workshop on Learning, Interaction and Organizations inMultiagent Environment, 1996.

[4] [4] T. Yairi, K. Hori, and S. Nakasuka, “Autonomous Reconstruction of State Space for Learning of Robot Behavior,” Proc. of IEEE/RSJ Int. Conf. on Intelligent Robots and Systems 2000, pp. 891-896, 2000.

[5] [5] M. Asada, S. Noda, and K. Hosoda, “Action-Based Sensor Space Categorization for Robot Learning,” Proc. of IEEE/RSJ Int. Conf. on Intelligent Robots and Systems 1996, pp. 1502-1509, 1996.

[6] [6] Y. Kashimura, A. Ueno, and S. Tatsumi, “A Continuous Action Space Representation by Particle Filter for Reinforcement Learning,” The 22nd Annual Conf. of the Japanese Society for Artificial Intelligence, 2008 (in Japanese).

[7] [7] H. Kimura and S. Kobayashi, “An Analysis of Actor-Critic Algorithms using Eligibility Traces – Reinforcement Learning with Imperfect Value Function,” 15th Int. Conf. on Machine Learning, pp. 278-286, 1998.

[8] [8] J.Morimoto and K. Doya, “Reinforcement learning of dynamic motor sequences: Learning to stand up,” Proc. of IEEE/RSJ Int. Conf. Intelligent Robots and Systems, Vol.3, pp. 1721-1726, 1998.

[9] [9] K. Doya, “Reinforcement Learning in Continuous Time and Space,” Neural computation, Vol.12, pp. 219-245, 2000.

[10] [10] K. Shibata, M. Sugisaka, and K. Ito, “Fast and Stable Learning in Direct-Vision-Based Reinforcement Learning,” Proc. of 6th Int. Symposium on Artificial Life and Robotics, Vol.1, pp. 200-203, 2001.

[11] [11] J. Yoshimoto, S. Ishii, and M. Sato, “On-line EM reinforcement learning,” Proc. of IEEE-INNS-ENNS Int. Joint Conf. on Neural Networks (IJCNN2000), Vol.3, pp. 163-168, 2000.

[12] [12] K. Yamada, “Network Parameter Setting for Reinforcement Learning Approaches Using Neural Networks,” J. of Advanced Computational Intelligence and Intelligent Informatics, Vol.15, No.7, pp. 822-830, 2011.

[13] [13] H. Kimura, “Natural Gradient Actor-Critic Algorithms using Random Rectangular Coarse Coding,” The Society of Instrument and Control Engineers (SICE) Annual Conf. 2008, 2A17-1, pp. 2027-2034, 2008.

[14] [14] Y. Omori, “Recent Developments in Markov Chain Monte Carlo Method,” J. of Japan Statistical Society, Vol.31, No.3, pp. 305-344, 2001 (in Japanese).

[15] [15] S. Abe, “Neural Networks and Fuzzy Systems,” Springer, 1997.

[16] [16] H. Shimodaira, “A Weight Value Initialization Method for Improving Learning Performance of the Back Propagation Algorithm in Neural Networks,” J. of Information Processing, Vol.35, No.10, pp. 2046-2053, 1994 (in Japanese).

[17] [17] R. S. Sutton, “Generalization in Reinforcement Learning: Successful Examples Using Sparse Coarse Coding,” Advances in Neural Information Processing Systems, Vol.8, pp. 1038-1044, 1996.

[18] [18] B. Sallans and G. E. Hinton, “Reinforcement Learning with Factored States and Actions,” J. of Machine Learning Research, Vol.5, pp. 1063-1088, 2004.