Estimation of the Change of Agents Behavior Strategy Using State-Action History

Uchida, Shihori; Oba, Sigeyuki; Ishii, Shin

doi:10.1007/978-3-319-68612-7_12

Shihori Uchida¹⁷,
Sigeyuki Oba¹⁷ &
Shin Ishii^17,18

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 10614))

Included in the following conference series:

International Conference on Artificial Neural Networks

4212 Accesses

Abstract

Reinforcement learning (RL) provides a computational model to animal’s autonomous acquisition of behaviors even in an uncertain environment. Inverse reinforcement learning (IRL) is its opposite; given a history of behaviors of an agent, IRL attempts to determine the unknown characteristics, like a reward function, of the agent. Conventional IRL methods usually assume the agent has taken a stationary policy that is optimal in the environment. However, real RL agents do not necessarily take stationary policy, because they are often on the way of adapting to their own environments. Especially when facing an uncertain environment, an intelligent agent should take a mixed (or switching) strategy consisting of an exploitation that is best at the current situation and an exploration to resolve the environmental uncertainty. In this study, we propose a new IRL method that can identify both of a non-stationary policy and a fixed but unknown reward function, based on the behavioral history of a learning agent; in particular, we estimate a change point of the behavior policy from an exploratory one in the agent’s early stage of the learning and an exploitative one in its later learning stage. When applied to a computer simulation during a simple maze task of an agent, our method could identify the change point of the behavior policy and the fixed reward function, only from the agent’s history of behaviors.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Sutton, R.A., Barto, A.G.: Reinforcement Learning: An Introduction. MIT Press, Cambridge (1998)
Google Scholar
Ramachandran, D., Amir, E.: Bayesian inverse reinforcement learning. Urbana 51(61801), 1–4 (2007)
Google Scholar
Russell, S.: Learning agents for uncertain environments. In: Proceedings of the Eleventh Annual Conference on Computational Learning Theory, pp. 101–103. ACM, July 1998
Google Scholar
Abbeel, P., Ng, A.Y.: Apprenticeship learning via inverse reinforcement learning. In: Proceedings of the Twenty-First International Conference on Machine Learning, p. 1. ACM, July 2004
Google Scholar
Samejima, K., Doya, K., Ueda, Y., Kimura, M.: Estimating internal variables and paramters of a learning agent by a particle filter. In: Neural Information Processing Systems (NIPS), pp. 1335–1342, December 2003
Google Scholar
Ishii, S., Yoshida, W., Yoshimoto, J.: Control of exploitation-exploration meta-parameter in reinforcement learning. Neural Netw. 15(4), 665–687 (2002)
Article Google Scholar
Sakurai, S., Oba, S., Ishii, S.: Inverse reinforcement learning based on behaviors of a learning agent. In: Arik, S., Huang, T., Lai, W.K., Liu, Q. (eds.) ICONIP 2015. LNCS, vol. 9489, pp. 724–732. Springer, Cham (2015). doi:10.1007/978-3-319-26532-2_80
Chapter Google Scholar

Download references

Author information

Authors and Affiliations

Kyoto University, Yoshidahonmachi 36-1, Sakyo-ku, Kyoto-city, Japan
Shihori Uchida, Sigeyuki Oba & Shin Ishii
ATR Cognitive Mechanism Laboratories, Kyoto, Japan
Shin Ishii

Authors

Shihori Uchida
View author publications
You can also search for this author in PubMed Google Scholar
Sigeyuki Oba
View author publications
You can also search for this author in PubMed Google Scholar
Shin Ishii
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Shihori Uchida .

Editor information

Editors and Affiliations

University of Lausanne, Lausanne, Switzerland
Alessandra Lintas
University of Genoa, Genoa, Italy
Stefano Rovetta
Universitat Pompeu Fabra, Barcelona, Spain
Paul F.M.J. Verschure
University of Lausanne, Lausanne, Switzerland
Alessandro E.P. Villa

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Uchida, S., Oba, S., Ishii, S. (2017). Estimation of the Change of Agents Behavior Strategy Using State-Action History. In: Lintas, A., Rovetta, S., Verschure, P., Villa, A. (eds) Artificial Neural Networks and Machine Learning – ICANN 2017. ICANN 2017. Lecture Notes in Computer Science(), vol 10614. Springer, Cham. https://doi.org/10.1007/978-3-319-68612-7_12

Download citation

DOI: https://doi.org/10.1007/978-3-319-68612-7_12
Published: 25 October 2017
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-68611-0
Online ISBN: 978-3-319-68612-7
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics