Recursive Adaptation of Stepsize Parameter for Non-stationary Environments

Noda, Itsuki

doi:10.1007/978-3-642-11814-2_5

Itsuki Noda²¹

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 5924))

Included in the following conference series:

International Workshop on Adaptive and Learning Agents

660 Accesses
2 Citations

Abstract

In this article, we propose a method to adapt stepsize parameters used in reinforcement learning for non-stationary environments. In general reinforcement learning situations, a stepsize parameter is decreased to zero during learning, because the environment is generally supposed to be noisy but stationary, such that the true expected rewards are fixed. On the other hand, we assume that in the real world, the true expected reward changes over time and hence, the learning agent must adapt the change through continuous learning. We derive the higher-order derivatives of exponential moving average (which is used to estimate the expected values of states or actions in major reinforcement learning methods) using stepsize parameters. We also illustrate a mechanism to calculate these derivatives in a recursive manner. Using the mechanism, we construct a precise and flexible adaptation method for the stepsize parameter in order to optimize a certain criterion, for example, to minimize square errors. The proposed method is validated both theoretically and experimentally.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Sutton, R.S., Barto, A.G.: Reinforcement Learning: An Introduction. MIT Press, Cambridge (1998)
Google Scholar
Even-dar, E., Mansour, Y.: Learning rates for q-learning. Journal of Machine Learning Research 5, 2003 (2003)
Google Scholar
George, A.P., Powell, W.B.: Adaptive stepsizes for recursive estimation with applications in approximate dynamic programming. Machine Learning 65(1), 167–198 (2006)
Article Google Scholar
Sato, M., Kimura, H., Kobayashi, S.: TD algorithm for the variance of return and mean-variance reinforcement learning (in Japanese). Transactions of the Japanese Society for Artificial Intelligence 16(3F), 353–362 (2001)
Article Google Scholar
Benveniste, A., Metivier, M., Priouret, P.: Adaptive Algorithms and Stockastic Approximations. Springer, Heidelberg (1990)
Book MATH Google Scholar
Douglas, S.C., Mathews, V.J.: Stochastic gradient adaptive step size algorithms for adaptive filtering. In: Proc. International Conference on Digital Signal Processing, pp. 142–147 (1995)
Google Scholar
Ahmadi, M., Taylor, M.E., Stone, P.: IFSA: Incremental feature-set augmentation for reinforcement learning tasks. In: The Sixth International Joint Conference on Autonomous Agents and Multiagent Systems (May 2007)
Google Scholar
Sutton, R.S., Precup, D., Singh, S.: Between mdps and semi-mdps: A framework for temporal abstraction in reinforcement learning. Artificial Intelligence 112, 181–211 (1999)
Article MathSciNet MATH Google Scholar
Dietterich, T.G.: Hierarchical reinforcement learning with the maxq value function decomposition. Journal of Artificial Intelligence Research 13, 227–303 (2000)
MathSciNet MATH Google Scholar
Schoknecht, R., Riedmiller, M.: Speeding-up reinforcement learning with multi-step actions. In: Dorronsoro, J.R. (ed.) ICANN 2002. LNCS, vol. 2415, pp. 813–818. Springer, Heidelberg (2002)
Chapter Google Scholar
Bowling, M., Veloso, M.: Multiagent learning using a variable learning rate. Artificial Intelligence 136, 215–250 (2002)
Article MathSciNet MATH Google Scholar
Abdallah, S., Lesser, V.: Learning the task allocation game. In: Proc. of AAMAS 2006, IFAAMAS, May 2006, pp. 850–857 (2006)
Google Scholar
Marden, J.R., Arslan, G., Shamma, J.S.: Regret based dynamics: Convergence in weakly acyclic games. In: Proc. of AAMAS 2007, IFAAMAS, May 2007, pp. 194–201 (2007)
Google Scholar

Download references

Author information

Authors and Affiliations

ITRI, National Institute of Advanced Industrial Science and Technology, Taiwan
Itsuki Noda

Authors

Itsuki Noda
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

The University of Southern California, Los Angeles, CA, USA
Matthew E. Taylor
Maastricht University, 6200 MD, Maastricht, The Netherlands
Karl Tuyls

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Noda, I. (2010). Recursive Adaptation of Stepsize Parameter for Non-stationary Environments. In: Taylor, M.E., Tuyls, K. (eds) Adaptive and Learning Agents. ALA 2009. Lecture Notes in Computer Science(), vol 5924. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-11814-2_5

Download citation

DOI: https://doi.org/10.1007/978-3-642-11814-2_5
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-11813-5
Online ISBN: 978-3-642-11814-2
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics