Value Iteration over Belief Subspace

Zhang, Weihong

doi:10.1007/3-540-44652-4_7

Value Iteration over Belief Subspace

Weihong Zhang²

Conference paper
First Online: 01 January 2001

729 Accesses

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 2143))

Abstract

Partially Observable Markov Decision Processes (POMDPs) provide an elegant framework for AI planning tasks with uncertainties. Value iteration is a well-known algorithm for solving POMDPs. It is notoriously difficult because at each step it needs to account for every belief state in a continuous space. In this paper, we show that value iteration can be conducted over a subset of belief space. Then, we study a class of POMDPs, namely informative POMDPs, where each observation provides good albeit incomplete information about world states. For informative POMDPs, value iteration can be conducted over a small subset of belief space. This yields two advantages: First, fewer vectors are in need to represent value functions. Second, value iteration can be accelerated. Empirical studies are presented to demonstrate these two advantages.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Astrom, K. J.(1965). Optimal control of Markov decision processes with incomplete state estimation. Journal of Mathematical Analysis and Applications, 10, 403–406.
Article MathSciNet Google Scholar
Bertsekas, D. P. and Gallagher, R. G.(1995). Data Networks Prentice Hall., Englewood Cliffs, N.J..
Google Scholar
Boutilier, C, Dearden, R. and Goldszmidt, M. (1995). Exploiting structures in policy construction. In Proceedings of IJCAI-95, 1104–1111.
Google Scholar
Boutilier, C. and Poole, D.(1996). Computing optimal policies for partially observable decision processes using compact representations. In Proceedings of AAAI-96, 1168–1175.
Google Scholar
Cassandra, A. R.(1998). Exact and approximate algorithms for partially observable Markov decision processes. PhD Thesis, Department of Computer Science, Brown University.
Google Scholar
Cassandra, A. R., Littman, M. L. and Zhang, N. L.(1997). Incremental pruning: A simple, fast, exact method for partially observable Markov decision processes. Proceedings of Thirteenth Conference on Uncertainty in Artificial Intelligence, 54–61.
Google Scholar
Choi, S.P.M., Yeung, D. Y. and Zhang, N.L. An environment model for non stationary reinforcement learning. Advances in Neural Information Processing Systems 12(NIPS-99), 987–993.
Google Scholar
Dean, T. L., Kaelbling, L. P., Kirman, J. and Nicholson, A., (1995). Planning under time constraints in stochastic domains. Artificial Intelligence, Vol. 76(1–2), 35–74.
Article Google Scholar
Hansen, E. A. (1998). Finite-memory controls of partially obsenable systems. PhD thesis, Depart of Computer Science, University of Massachusetts at Amherst.
Google Scholar
Hauskrecht, M. and Fraser, H.(1998). Planning medical therapy using partially observable Markov decision processes. In Proceedings of the 9-th International Workshop on Principles of Diagnosis, Cape Cod, MA, 182–189, 1998.
Google Scholar
Hauskrecht, M.(2000). Value-function approximations for partially observable Markov decision processes. Journal of Artificial Intelligence Research, 13, 33–94.
Google Scholar
Papadimitriou, C. H. andTsitsiklis, J. N.(1987). The complexity of Markov decision processes. Mathematics of Operations Research, Vol. 12, No. 3, 441–450.
Article MATH MathSciNet Google Scholar
Puterman, M.L. (1990), Markov decision processes, in Heyman, D. P. and Sobel, M. J. (eds.), Handbooks in OR & MS, Vol. 2, 331–434, Elsevier Science Publishers.
Google Scholar
Sondik, E. J. (1971). The optimal control of partially observable decision processes. Ph D thesis, Stanford University, Stanford, California, USA.
Google Scholar
Zhang, N. L. and Liu, W. (1997). A model approximation scheme for planning in stochastic domains. Journal of Artificial Intelligence Research, 7, 199–230.
MathSciNet Google Scholar
Zhang, N. L. and Zhang, W. (2001). Space-progressive value iteration: an anytime algorithm for a class of POMDPs. To appear in ECSQARU-2001.
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Science, Hong Kong University of Science & Technology, Clear Water Bay, Kowloon, Hong Kong, China
Weihong Zhang

Authors

Weihong Zhang
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Université Paul Sabatier, IRIT-CNRS, 118 route de Narbonne, 31062, Toulouse Cedex 4, France
Salem Benferhat & Philippe Besnard &

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Zhang, W. (2001). Value Iteration over Belief Subspace. In: Benferhat, S., Besnard, P. (eds) Symbolic and Quantitative Approaches to Reasoning with Uncertainty. ECSQARU 2001. Lecture Notes in Computer Science(), vol 2143. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-44652-4_7

Download citation

DOI: https://doi.org/10.1007/3-540-44652-4_7
Published: 30 August 2001
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-42464-2
Online ISBN: 978-3-540-44652-1
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics