Bounded Parameter Markov Decision Processes with Average Reward Criterion

Tewari, Ambuj; Bartlett, Peter L.

doi:10.1007/978-3-540-72927-3_20

Ambuj Tewari¹ &
Peter L. Bartlett²

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 4539))

Included in the following conference series:

International Conference on Computational Learning Theory

3265 Accesses
9 Citations

Abstract

Bounded parameter Markov Decision Processes (BMDPs) address the issue of dealing with uncertainty in the parameters of a Markov Decision Process (MDP). Unlike the case of an MDP, the notion of an optimal policy for a BMDP is not entirely straightforward. We consider two notions of optimality based on optimistic and pessimistic criteria. These have been analyzed for discounted BMDPs. Here we provide results for average reward BMDPs.

We establish a fundamental relationship between the discounted and the average reward problems, prove the existence of Blackwell optimal policies and, for both notions of optimality, derive algorithms that converge to the optimal value function.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Givan, R., Leach, S., Dean, T.: Bounded-parameter Markov decision processes. Artificial Intelligence 122, 71–109 (2000)
Article MATH MathSciNet Google Scholar
Strehl, A.L., Littman, M.: A theoretical analysis of model-based interval estimation. In: Proceedings of the Twenty-Second International Conference on Machine Learning, pp. 857–864. ACM Press, New York (2005)
Google Scholar
Auer, P., Ortner, R.: Logarithmic online regret bounds for undiscounted reinforcement learning. In: dvances in Neural Information Processing Systems 19, MIT Press, Cambridge (2007) (to appear)
Google Scholar
Brafman, R.I., Tennenholtz, M.: R-MAX – a general polynomial time algorithm for near-optimal reinforcement learning. Journal of Machine Learning Research 3, 213–231 (2002)
Article MathSciNet Google Scholar
Even-Dar, E., Mansour, Y.: Convergence of optimistic and incremental Q-learning. In: Advances in Neural Information Processing Systems 14, pp. 1499–1506. MIT Press, Cambridge (2001)
Google Scholar
Nilim, A., El Ghaoui, L.: Robust control of Markov decision processes with uncertain transition matrices. Operations Research 53, 780–798 (2005)
Article MathSciNet Google Scholar
Bertsekas, D.P.: Dynamic Programming and Optimal Control. Vol. 2. Athena Scientific, Belmont, MA (1995)
Google Scholar
Burnetas, A.N., Katehakis, M.N.: Optimal adaptive policies for Markov decision processes. Mathematics of Operations Research 22, 222–255 (1997)
Article MATH MathSciNet Google Scholar

Download references

Author information

Authors and Affiliations

University of California, Berkeley, Division of Computer Science, 544 Soda Hall # 1776, Berkeley, CA 94720-1776, USA
Ambuj Tewari
University of California, Berkeley, Division of Computer Science and Department of Statistics, 387 Soda Hall # 1776, Berkeley, CA 94720-1776, USA
Peter L. Bartlett

Authors

Ambuj Tewari
View author publications
You can also search for this author in PubMed Google Scholar
Peter L. Bartlett
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Nader H. Bshouty Claudio Gentile

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Tewari, A., Bartlett, P.L. (2007). Bounded Parameter Markov Decision Processes with Average Reward Criterion. In: Bshouty, N.H., Gentile, C. (eds) Learning Theory. COLT 2007. Lecture Notes in Computer Science(), vol 4539. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-72927-3_20

Download citation

DOI: https://doi.org/10.1007/978-3-540-72927-3_20
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-72925-9
Online ISBN: 978-3-540-72927-3
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics