Space-Progressive Value Iteration: An Anytime Algorithm for a Class of POMDPs

Zhang, Nevin L.; Zhang, Weihong

doi:10.1007/3-540-44652-4_8

Nevin L. Zhang² &
Weihong Zhang²

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 2143))

Included in the following conference series:

European Conference on Symbolic and Quantitative Approaches to Reasoning and Uncertainty

737 Accesses
1 Citations

Abstract

Finding optimal policies for general partially observable Markov decision processes (POMDPs) is computationally difficult primarily due to the need to perform dynamic-programming (DP) updates over the entire belief space. In this paper, we first study a somewhat restrictive class of special POMDPs called almost discernible POMDPs and propose an anytime algorithm called space-progressive value iteration(SPVI). SPVI does not perform DP updates over the entire belief space. Rather it restricts DP updates to a belief subspace that grows over time. It is argued that given sufficient time SPVI can find near-optimal policies for almost discernible POMDPs. We then show how SPVI can be applied to more a general class of POMDPs. Empirical results are presented to show the effectiveness of SPVI.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

K. J. Astrom. Optimal control of Markov decision processes with incomplete state estimation. Journal of Mathematical Analysis and Applications, 10, 174–205, 1965.
Article MathSciNet Google Scholar
A. R. Cassandra, M. L. Littman, and N. L. Zhang. Incremental pruning: a simple, fast, exact method for partially observable Markov decision processes. Proceedings of Thirteenth Conference on Uncertainty in Artificial Intelligence, 54–61, 1997.
Google Scholar
T. Dean, L. P. Kaelbling, J. Kirman and A. Nicholson. Planning under time constraints in stochastic domains. Artificial Intelligence, Vol 76, Num 1–2, 35–74, 1995.
Article Google Scholar
E. A. Hansen. Finite-memory controls of partially observable systems. PhD thesis, Depart of Computer Science, University of Massachusetts at Amherst, 1998.
Google Scholar
M. Hauskrecht. Value-function approximations for partially observable Markov decision processes. Journal of Artificial Intelligence Research, 13, 33–94, 2000.
MATH MathSciNet Google Scholar
L. P. Kaelbling, M. L. Littman and A. R. Cassandra. Planning and acting in partially observable stochastic domains. Artificial Intelligence, 101, 99–134, 1998.
Article MATH MathSciNet Google Scholar
M. L. Littman, A. R. Cassandra and L. P. Kaelbling. Learning policies for partially observable environments: scaling up. Proceedings of the Twelfth International Conference on Machine Learning, 263–370, 1995.
Google Scholar
C. H. Papadimitriou and J. N. Tsitsiklis(1987). The complexity of Markov decision processes. Mathematics of Operations Research, 12(3), 441–450, 1987.
Article MATH MathSciNet Google Scholar
E. J. Sondik. The optimal control of partially observable Markov processes. PhD thesis, Stanford University, 1971.
Google Scholar
N. L. Zhang and W. Liu. A model approximation scheme for planning in stochastic domains. Journal of Artificial Intelligence Research, 7, 199–230, 1997.
MathSciNet Google Scholar
N. L. Zhang and W. Zhang. Speeding up the convergence of value iteration in partially observable Markov decision processes. Journal of Artificial Intelligence Research, 14, 29–51, 2001.
MathSciNet Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Science, Hong Kong University of Science & Technology, Clear Water Bay, Kowloon, Hong Kong, China
Nevin L. Zhang & Weihong Zhang

Authors

Nevin L. Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Weihong Zhang
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Université Paul Sabatier, IRIT-CNRS, 118 route de Narbonne, 31062, Toulouse Cedex 4, France
Salem Benferhat & Philippe Besnard &

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Zhang, N.L., Zhang, W. (2001). Space-Progressive Value Iteration: An Anytime Algorithm for a Class of POMDPs. In: Benferhat, S., Besnard, P. (eds) Symbolic and Quantitative Approaches to Reasoning with Uncertainty. ECSQARU 2001. Lecture Notes in Computer Science(), vol 2143. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-44652-4_8

Download citation

DOI: https://doi.org/10.1007/3-540-44652-4_8
Published: 30 August 2001
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-42464-2
Online ISBN: 978-3-540-44652-1
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics