Economical reinforcement learning for non stationary problems

Chatenet, Nathalie; Bersini, Hugues

doi:10.1007/BFb0020168

Nathalie Chatenet¹ &
Hugues Bersini²

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 1327))

Included in the following conference series:

International Conference on Artificial Neural Networks

93 Accesses
2 Citations

Abstract

In a lot of reinforcement learning applications and solutions, time information is implicitly contained in state information. Although time sequential decomposition is inherent to dynamic programming, this aspect has been simply omitted in usual Q Learning applications. A non stationary environment, a non stationary reward/ punishment or a time dependent cost to minimize will naturally lead to non stationary optimal solutions in which time has to be explicitly accounted in the search for the optimal solution. Although largely neglected so far, non stationarity and the computational cost it is likely to rapidly induce, should instead becomes of concern to the reinforcement learning community. The particular nature of time entails a dedicated processing when attempting to develop economical and heuristic solutions. In this paper two of such possible heuristics are proposed, justified and illustrated on a simple application.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

A.W. Moore. Variable Resolution Dynamic Programming: Efficiently Learning Action Map in Multivariate Real-valued State-spaces. In Proceedings of the Eighth International Workshop on Machine Learning, pages 333–337, 1991.
Google Scholar
C. Watkins. Technical Note Qlearning. Machine Learning, 8:279–297, 1992.
Google Scholar
D. Chapman and L. Kaebling. Input Generalization in Delayed Reinforcement Learning: An Algorithm and Performance Comparisons, 1991.
Google Scholar
M. Dorigo and H. Bersini. A Comparison of Q-Learning and Classifier Systems. In Proceedings of SAB III, pages 248–255. MIT Press, 1994.
Google Scholar
N. Chatenet. Adaptation of Turnpike Theorem in a Variant of Qlearning. Technical Report 96016, University of Bordeaux I, 1996.
Google Scholar
R. Sutton. Integrated Modeling and Control Based on Reinforcement Learning and Dynamic Programming. In Advances in Neural Information Processing 3, pages 471–478, 1991.
Google Scholar
S. Achmanov. Programmation Linéaire. MIR, 1984.
Google Scholar

Download references

Author information

Authors and Affiliations

Cemagref de Bordeaux (division ORH), 50, Avenue de Verdun, Gazinet, 33 612, Cestas Cedex, France
Nathalie Chatenet
Université libre de Bruxelles - IRIDIA - CP194/6, 50, avenue Franklin Roosevelt, 1050, Bruxelles, Belgium
Hugues Bersini

Authors

Nathalie Chatenet
View author publications
You can also search for this author in PubMed Google Scholar
Hugues Bersini
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Wulfram Gerstner Alain Germond Martin Hasler Jean-Daniel Nicoud

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Chatenet, N., Bersini, H. (1997). Economical reinforcement learning for non stationary problems. In: Gerstner, W., Germond, A., Hasler, M., Nicoud, JD. (eds) Artificial Neural Networks — ICANN'97. ICANN 1997. Lecture Notes in Computer Science, vol 1327. Springer, Berlin, Heidelberg. https://doi.org/10.1007/BFb0020168

Download citation

DOI: https://doi.org/10.1007/BFb0020168
Published: 09 June 2005
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-63631-1
Online ISBN: 978-3-540-69620-9
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics