Skip to main content

Value Iteration over Belief Subspace

  • Conference paper
  • First Online:
  • 729 Accesses

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 2143))

Abstract

Partially Observable Markov Decision Processes (POMDPs) provide an elegant framework for AI planning tasks with uncertainties. Value iteration is a well-known algorithm for solving POMDPs. It is notoriously difficult because at each step it needs to account for every belief state in a continuous space. In this paper, we show that value iteration can be conducted over a subset of belief space. Then, we study a class of POMDPs, namely informative POMDPs, where each observation provides good albeit incomplete information about world states. For informative POMDPs, value iteration can be conducted over a small subset of belief space. This yields two advantages: First, fewer vectors are in need to represent value functions. Second, value iteration can be accelerated. Empirical studies are presented to demonstrate these two advantages.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Astrom, K. J.(1965). Optimal control of Markov decision processes with incomplete state estimation. Journal of Mathematical Analysis and Applications, 10, 403–406.

    Article  MathSciNet  Google Scholar 

  2. Bertsekas, D. P. and Gallagher, R. G.(1995). Data Networks Prentice Hall., Englewood Cliffs, N.J..

    Google Scholar 

  3. Boutilier, C, Dearden, R. and Goldszmidt, M. (1995). Exploiting structures in policy construction. In Proceedings of IJCAI-95, 1104–1111.

    Google Scholar 

  4. Boutilier, C. and Poole, D.(1996). Computing optimal policies for partially observable decision processes using compact representations. In Proceedings of AAAI-96, 1168–1175.

    Google Scholar 

  5. Cassandra, A. R.(1998). Exact and approximate algorithms for partially observable Markov decision processes. PhD Thesis, Department of Computer Science, Brown University.

    Google Scholar 

  6. Cassandra, A. R., Littman, M. L. and Zhang, N. L.(1997). Incremental pruning: A simple, fast, exact method for partially observable Markov decision processes. Proceedings of Thirteenth Conference on Uncertainty in Artificial Intelligence, 54–61.

    Google Scholar 

  7. Choi, S.P.M., Yeung, D. Y. and Zhang, N.L. An environment model for non stationary reinforcement learning. Advances in Neural Information Processing Systems 12(NIPS-99), 987–993.

    Google Scholar 

  8. Dean, T. L., Kaelbling, L. P., Kirman, J. and Nicholson, A., (1995). Planning under time constraints in stochastic domains. Artificial Intelligence, Vol. 76(1–2), 35–74.

    Article  Google Scholar 

  9. Hansen, E. A. (1998). Finite-memory controls of partially obsenable systems. PhD thesis, Depart of Computer Science, University of Massachusetts at Amherst.

    Google Scholar 

  10. Hauskrecht, M. and Fraser, H.(1998). Planning medical therapy using partially observable Markov decision processes. In Proceedings of the 9-th International Workshop on Principles of Diagnosis, Cape Cod, MA, 182–189, 1998.

    Google Scholar 

  11. Hauskrecht, M.(2000). Value-function approximations for partially observable Markov decision processes. Journal of Artificial Intelligence Research, 13, 33–94.

    Google Scholar 

  12. Papadimitriou, C. H. andTsitsiklis, J. N.(1987). The complexity of Markov decision processes. Mathematics of Operations Research, Vol. 12, No. 3, 441–450.

    Article  MATH  MathSciNet  Google Scholar 

  13. Puterman, M.L. (1990), Markov decision processes, in Heyman, D. P. and Sobel, M. J. (eds.), Handbooks in OR & MS, Vol. 2, 331–434, Elsevier Science Publishers.

    Google Scholar 

  14. Sondik, E. J. (1971). The optimal control of partially observable decision processes. Ph D thesis, Stanford University, Stanford, California, USA.

    Google Scholar 

  15. Zhang, N. L. and Liu, W. (1997). A model approximation scheme for planning in stochastic domains. Journal of Artificial Intelligence Research, 7, 199–230.

    MathSciNet  Google Scholar 

  16. Zhang, N. L. and Zhang, W. (2001). Space-progressive value iteration: an anytime algorithm for a class of POMDPs. To appear in ECSQARU-2001.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2001 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Zhang, W. (2001). Value Iteration over Belief Subspace. In: Benferhat, S., Besnard, P. (eds) Symbolic and Quantitative Approaches to Reasoning with Uncertainty. ECSQARU 2001. Lecture Notes in Computer Science(), vol 2143. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-44652-4_7

Download citation

  • DOI: https://doi.org/10.1007/3-540-44652-4_7

  • Published:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-42464-2

  • Online ISBN: 978-3-540-44652-1

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics