Open Theoretical Questions in Reinforcement Learning

Sutton, Richard S.

doi:10.1007/3-540-49097-3_2

Richard S. Sutton³

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 1572))

Included in the following conference series:

European Conference on Computational Learning Theory

766 Accesses
14 Citations

Abstract

Reinforcement learning (RL) concerns the problem of a learning agent interacting with its environment to achieve a goal. Instead of being given examples of desired behavior, the learning agent must discover by trial and error how to behave in order to get the most reward. The environment is a Markov decision process (MDP) with state set, $ \mathcal{S} $, and action set, $ \mathcal{A} $. The agent and the environment interact in a sequence of discrete steps, t = 0, 1, 2,... The state and action at one time step, $ s_t \in \mathcal{S} $ and $ a_t \in \mathcal{A} $, determine the probability distribution for the state at the next time step, $ s_{t + 1} \in \mathcal{S} $ and, jointly, the distribution for the next reward, r _t+1 ∈ ℜ. The agent’s objective is to chose each aint to maximize the subsequent return:

$$ R_t = \sum\limits_{k = 0}^\infty {\gamma ^k r_{t + 1 + k} ,} $$

where the discount rate, 0 ≤ γ ≤ 1, determines the relative weighting of immediate and delayed rewards. In some environments, the interaction consists of a sequence of episodes, each starting in a given state and ending upon arrival in a terminal state, terminating the series above. In other cases the interaction is continual, without interruption, and the sum may have an infinite number of terms (in which case we usually assume γ < 1). Infinite horizon cases with γ = 1 are also possible though less common (e.g., see Mahadevan, 1996).

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Baird, L.C. (1995). Residual algorithms: Reinforcement learning with function approximation. In Proceedings of the Twelfth International Conference on Machine Learning, pp. 30–37. Morgan Kaufmann, San Francisco.
Google Scholar
Bertsekas, D.P., and Tsitsiklis, J.N. (1996). Neuro-Dynamic Programming. Athena Scientific, Belmont, MA.
MATH Google Scholar
Crites, R.H., and Barto, A.G.(1996). Improving elevator performance using reinforcement learning. In Advances in Neural Information Processing Systems: Proceedings of the 1995 Conference, pp. 1017–1023. MIT Press, Cambridge, MA.
Google Scholar
Kearns, M., Mansour, Y., Ng, A.Y. (in prep.). Sparse sampling methods for planning and learning in large and partially observable Markov decision processes.
Google Scholar
Loch J., and Singh S. (1998). Using eligibility traces to find the best memoryless policy in partially observable Markov decision processes. In Proceedings of the Fifteenth International Conference on Machine Learning. Morgan Kaufmann, San Francisco.
Google Scholar
Mahadevan, S. (1996). Average reward reinforcement learning: Foundations, algorithms, and empirical results. Machine Learning, 22:159–196.
Google Scholar
Moore, A.W., and Atkeson, C.G.(1993). Prioritized sweeping: Reinforcement learning with less data and less real time. Machine Learning, 13:103–130.
Google Scholar
Singh, S.P.(1993). Learning to Solve Markovian Decision Processes. Ph.D. thesis, University of Massachusetts, Amherst. Appeared as CMPSCI Technical Report 93-77.
Google Scholar
Singh, S.P., and Bertsekas, D. (1997). Reinforcement learning for dynamic channel allocation in cellular telephone systems. In Advances in Neural Information Processing Systems: Proceedings of the 1996 Conference, pp. 974–980. MIT Press, Cambridge, MA.
Google Scholar
Singh S., and Dayan P. (1998). Analytical mean squared error curves for temporal difference learning. Machine Learning.
Google Scholar
Singh, S.P., and Sutton, R.S. (1996). Reinforcement learning with replacing eligibility traces. Machine Learning, 22:123–158.
MATH Google Scholar
Sutton, R.S. (1984). Temporal Credit Assignment in Reinforcement Learning. Ph.D. thesis, University of Massachusetts, Amherst.
Google Scholar
Sutton, R.S. (1996). Generalization in reinforcement learning: Successful examples using sparse coarse coding. In Advances in Neural Information Processing Systems: Proceedings of the 1995 Conference, pp. 1038–1044. MIT Press, Cambridge, MA.
Google Scholar
Sutton, R.S., and Barto, A.G. (1998). Reinforcement Learning: An Introduction. MIT Press, Cambridge, MA.
Google Scholar
Tesauro, G.J. (1995). Temporal difference learning and TD-Gammon. Communications of the ACM, 38:58–68.
Article Google Scholar
Tsitsiklis, J.N., and Van Roy, B. (1997). An analysis of temporal-difference learning with function approximation. IEEE Transactions on Automatic Control, 42:674–690.
Article MATH Google Scholar
Watkins, C.J.C.H. (1989). Learning from Delayed Rewards. Ph.D. thesis, Cambridge University.
Google Scholar
Watkins, C.J.C.H., and Dayan, P. (1992). Q-learning. Machine Learning, 8:279–292.
MATH Google Scholar

Download references

Author information

Authors and Affiliations

AT&T Labs, Florham Park, NJ, 07932, USA
Richard S. Sutton

Authors

Richard S. Sutton
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Lehrstuhl für Informatik II, Universität Dortmund, D-44221, Dortmund, Germany
Paul Fischer
Fakultät für Mathematik, Ruhr Universität Bochum, D-44780, Bochum, Germany
Hans Ulrich Simon

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Sutton, R.S. (1999). Open Theoretical Questions in Reinforcement Learning. In: Fischer, P., Simon, H.U. (eds) Computational Learning Theory. EuroCOLT 1999. Lecture Notes in Computer Science(), vol 1572. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-49097-3_2

Download citation

DOI: https://doi.org/10.1007/3-540-49097-3_2
Published: 19 November 1999
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-65701-9
Online ISBN: 978-3-540-49097-5
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics