Bounded parameter Markov decision processes

Givan, Robert; Leach, Sonia; Dean, Thomas

doi:10.1007/3-540-63912-8_89

Robert Givan¹,
Sonia Leach¹ &
Thomas Dean¹

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 1348))

Included in the following conference series:

European Conference on Planning

388 Accesses
10 Citations

Abstract

In this paper, we introduce the notion of a bounded parameter Markov decision process (BMDP) as a generalization of the familiar exact MDP. A bounded parameter MDP is a set of exact MDPs specified by giving upper and lower bounds on transition probabilities and rewards (all the MDPs in the set share the same state and action space). BMDPs form an efficiently solvable special case of the already known class of MDPs with imprecise parameters (MDPIPs). Bounded parameter MDPs can be used to represent variation or uncertainty concerning the parameters of sequential decision problems in cases where no prior probabilities on the parameter values are available. Bounded parameter MDPs can also be used in aggregation schemes to represent the variation in the transition probabilities for different base states aggregated together in the same aggregate state.

We introduce interval value functions as a natural extension of traditional value functions. An interval value function assigns a closed real interval to each state, representing the assertion that the value of that state falls within that interval. An interval value function can be used to bound the performance of a policy over the set of exact MDPs associated with a given bounded parameter MDP. We describe an iterative dynamic programming algorithm called interval policy evaluation which computes an interval value function for a given BMDP and specified policy. Interval policy evaluation on a policy π computes the most restrictive interval value function that is sound, i. e., that bounds the value function for π in every exact MDP in the set defined by the bounded parameter MDP. We define optimistic and pessimistic notions of optimal policy, and provide a variant of value iteration [Bellman, 1957] that we call interval value iteration which computes a policies for a BMDP that are optimal in these senses.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Bellman, Richard 1957. Dynamic Programming. Princeton University Press.
Google Scholar
Bertsekas, D. P. and Castafion, D. A. 1989. Adaptive aggregation for infinite horizon dynamic programming. IEEE Transactions on Automatic Control 34(6):589–598.
Google Scholar
Boutilier, Craig and Dearden, Richard 1994. Using abstractions for decision theoretic planning with time constraints. In Proceedings AAAI-94. AAAI. 1016–1022.
Google Scholar
[Boutilier et al., 1995a] Boutilier, Craig; Dean, Thomas; and Hanks, Steve 1995a. Planning under uncertainty: Structural assumptions and computational leverage. In Proceedings of the Third European Workshop on Planning.
Google Scholar
[Boutilier et al., 1995b] Boutilier, Craig; Dearden, Richard; and Goldszmidt, Moises 1995b. Exploiting structure in policy construction. In Proceedings IJCAI 14. IJCAII. 1104–1111.
Google Scholar
Dean, Thomas and Givan, Robert 1997. Model minimization in Markov decision processes. In Proceedings AAAI-97. AAAI.
Google Scholar
[Dean et al., 1997] Dean, Thomas; Givan, Robert; and Leach, Sonia 1997. Model reduction techniques for computing approximately optimal solutions for Markov decision processes. In Thirteenth Conference on Uncertainty in Artificial Intelligence.
Google Scholar
[Littman et al., 1995] Littman, Michael; Dean, Thomas; and Kaelbling, Leslie 1995. On the complexity of solving Markov decision problems. In Eleventh Conference on Uncertainty in Artificial Intelligence. 394–402.
Google Scholar
Littman, Michael L. 1997. Probabilistic propositional planning: Representations and complexity. In Proceedings AAAI-97. AAAI.
Google Scholar
Lovejoy, William S. 1991. A survey of algorithmic methods for partially observed Markov decision processes. Annals of Operations Research 28:47–66.
Google Scholar
Puterman, Martin L. 1994. Markov Decision Processes. John Wiley & Sons, New York.
Google Scholar
Satia, J. K. and Lave, R. E. 1973. Markovian decision processes with uncertain transition probabilities. Operations Research 21:728–740.
Google Scholar
White, C. C. and Eldeib, H. K. 1986. Parameter imprecision in finite state, finite action dynamic programs. Operations Research 34:120–129.
Google Scholar
White, C. C. and Eldeib, H. K. 1994. Markov decision processes with imprecise transition probabilities. Operations Research 43:739–749.
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Science, Brown University, 115 Waterman Street, 02912, Providence, RI, USA
Robert Givan, Sonia Leach & Thomas Dean

Authors

Robert Givan
View author publications
You can also search for this author in PubMed Google Scholar
Sonia Leach
View author publications
You can also search for this author in PubMed Google Scholar
Thomas Dean
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Sam Steel Rachid Alami

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Givan, R., Leach, S., Dean, T. (1997). Bounded parameter Markov decision processes. In: Steel, S., Alami, R. (eds) Recent Advances in AI Planning. ECP 1997. Lecture Notes in Computer Science, vol 1348. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-63912-8_89

Download citation

DOI: https://doi.org/10.1007/3-540-63912-8_89
Published: 29 July 2005
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-63912-1
Online ISBN: 978-3-540-69665-0
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics