Policy Gradient Reinforcement Learning with Environmental Dynamics and Action-Values in Policies

Ishihara, Seiji; Igarashi, Harukazu

doi:10.1007/978-3-642-23851-2_13

Seiji Ishihara²⁵ &
Harukazu Igarashi²⁶

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 6881))

Included in the following conference series:

International Conference on Knowledge-Based and Intelligent Information and Engineering Systems

1284 Accesses

Abstract

The knowledge concerning an agent’s policies consists of two types: the environmental dynamics for defining state transitions around the agent, and the behavior knowledge for solving a given task. However, these two types of information, which are usually combined into state-value or action-value functions, are learned together by conventional reinforcement learning. If they are separated and learned independently, either might be reused in other tasks or environments. In our previous work, we presented learning rules using policy gradients with an objective function, which consists of two types of parameters representing environmental dynamics and behavior knowledge, to separate the learning for each type. In such a learning framework, state-values were used as an example of the set of parameters corresponding to behavior knowledge. By the simulation results on a pursuit problem, our method properly learned hunter-agent policies and reused either bit of knowledge. In this paper, we adopt action-values as a set of parameters in the objective function instead of state-values and present learning rules for the function. Simulation results on the same pursuit problem as in our previous work show that such parameters and learning rules are also useful.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Sutton, R.S., Barto, A.G.: Reinforcement Learning. MIT Press, Cambridge (1998)
Google Scholar
Ishihara, S., Igarashi, H.: Behavior Learning Based on a Policy Gradient Method: Separation of Environmental Dynamics and State Values in Policies. In: Ho, T.-B., Zhou, Z.-H. (eds.) PRICAI 2008. LNCS (LNAI), vol. 5351, pp. 164–174. Springer, Heidelberg (2008)
Chapter Google Scholar
Williams, R.J.: Simple Statistical Gradient-Following Algorithms for Connectionist Reinforcement Learning. Machine Learning 8, 229–256 (1992)
MATH Google Scholar
Kimura, H., Yamamura, M., Kobayashi, S.: Reinforcement Learning by Stochastic Hill Climbing on Discounted Reward. In: Proc. of ICML 1995, pp. 295–303 (1995)
Google Scholar
Sutton, R.S., McAllester, D., Singh, S., Mansour, Y.: Policy Gradient Methods for Reinforcement Learning with Function Approximation. In: Advances in NIPS, vol. 12, pp. 1057–1063. MIT Press, Cambridge (2000)
Google Scholar
Konda, V.R., Tsitsiklis, J.N.: Actor-Critic Algorithms. In: Advances in NIPS, vol. 12, pp. 1008–1014. MIT Press, Cambridge (2000)
Google Scholar
Baird, L., Moore, A.: Gradient Descent for General Reinforcement Learning. In: Advances in NIPS, vol. 11, pp. 968–974. MIT Press, Cambridge (1999)
Google Scholar
Igarashi, H., Ishihara, S., Kimura, M.: Reinforcement Learning in Non-Markov Decision Processes —Statistical Properties of Characteristic Eligibility—. Natural Sciences and Engineering 52(2), 1–7 (2008)
Google Scholar
Ishihara, S., Igarashi, H.: Applying the Policy Gradient Method to Behavior Learning in Multi-agent Systems: The Pursuit Problem. Systems and Computers in Japan 37(10), 101–109 (2006)
Article Google Scholar
Peshkin, L., Kim, K.E., Meuleau, N., Kaelbling, L.P.: Learning to cooperative via policy search. In: Proc. of UAI 2000, pp. 489–496 (2000)
Google Scholar

Download references

Author information

Authors and Affiliations

Kinki University, 1 Takaya-umenobe, Higashi-Hiroshima, 739–2116, Japan
Seiji Ishihara
Shibaura Institute of Technology, 3–7–5 Toyosu, Koto-ku, Tokyo, 135–8548, Japan
Harukazu Igarashi

Authors

Seiji Ishihara
View author publications
You can also search for this author in PubMed Google Scholar
Harukazu Igarashi
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Institute of Integrated Sensor Systems, University of Kaiserslautern, Erwin-Schroedinger-str. 12, 67663, Kaiserslautern, Germany
Andreas König
Knowledge-Based Systems Group, Department of omputer Science, University of Kaiserslautern, P.O. Box 3049, 67653, Kaiserslautern, Germany
Andreas Dengel
School of Business, University of Applied Sciences Northwestern Switzerland, Riggenbachstr. 16, 4600, Olten, Switzerland
Knut Hinkelmann
Graduate School of Engineering, Osaka Prefecture University, 1-1 Gakuen-cho, 599-8531, Sakai, Osaka, Japan
Koichi Kise
KES International, P.O. Box 2115, BN43 9AF, Shoreham-by-sea, UK
Robert J. Howlett
University of South Australia, Mawson Lakes, 5095, Adelaide, SA, Australia
Lakhmi C. Jain

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Ishihara, S., Igarashi, H. (2011). Policy Gradient Reinforcement Learning with Environmental Dynamics and Action-Values in Policies. In: König, A., Dengel, A., Hinkelmann, K., Kise, K., Howlett, R.J., Jain, L.C. (eds) Knowledge-Based and Intelligent Information and Engineering Systems. KES 2011. Lecture Notes in Computer Science(), vol 6881. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-23851-2_13

Download citation

DOI: https://doi.org/10.1007/978-3-642-23851-2_13
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-23850-5
Online ISBN: 978-3-642-23851-2
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics