Skip to main content

Space-Progressive Value Iteration: An Anytime Algorithm for a Class of POMDPs

  • Conference paper
  • First Online:
Symbolic and Quantitative Approaches to Reasoning with Uncertainty (ECSQARU 2001)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 2143))

Abstract

Finding optimal policies for general partially observable Markov decision processes (POMDPs) is computationally difficult primarily due to the need to perform dynamic-programming (DP) updates over the entire belief space. In this paper, we first study a somewhat restrictive class of special POMDPs called almost discernible POMDPs and propose an anytime algorithm called space-progressive value iteration(SPVI). SPVI does not perform DP updates over the entire belief space. Rather it restricts DP updates to a belief subspace that grows over time. It is argued that given sufficient time SPVI can find near-optimal policies for almost discernible POMDPs. We then show how SPVI can be applied to more a general class of POMDPs. Empirical results are presented to show the effectiveness of SPVI.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. K. J. Astrom. Optimal control of Markov decision processes with incomplete state estimation. Journal of Mathematical Analysis and Applications, 10, 174–205, 1965.

    Article  MathSciNet  Google Scholar 

  2. A. R. Cassandra, M. L. Littman, and N. L. Zhang. Incremental pruning: a simple, fast, exact method for partially observable Markov decision processes. Proceedings of Thirteenth Conference on Uncertainty in Artificial Intelligence, 54–61, 1997.

    Google Scholar 

  3. T. Dean, L. P. Kaelbling, J. Kirman and A. Nicholson. Planning under time constraints in stochastic domains. Artificial Intelligence, Vol 76, Num 1–2, 35–74, 1995.

    Article  Google Scholar 

  4. E. A. Hansen. Finite-memory controls of partially observable systems. PhD thesis, Depart of Computer Science, University of Massachusetts at Amherst, 1998.

    Google Scholar 

  5. M. Hauskrecht. Value-function approximations for partially observable Markov decision processes. Journal of Artificial Intelligence Research, 13, 33–94, 2000.

    MATH  MathSciNet  Google Scholar 

  6. L. P. Kaelbling, M. L. Littman and A. R. Cassandra. Planning and acting in partially observable stochastic domains. Artificial Intelligence, 101, 99–134, 1998.

    Article  MATH  MathSciNet  Google Scholar 

  7. M. L. Littman, A. R. Cassandra and L. P. Kaelbling. Learning policies for partially observable environments: scaling up. Proceedings of the Twelfth International Conference on Machine Learning, 263–370, 1995.

    Google Scholar 

  8. C. H. Papadimitriou and J. N. Tsitsiklis(1987). The complexity of Markov decision processes. Mathematics of Operations Research, 12(3), 441–450, 1987.

    Article  MATH  MathSciNet  Google Scholar 

  9. E. J. Sondik. The optimal control of partially observable Markov processes. PhD thesis, Stanford University, 1971.

    Google Scholar 

  10. N. L. Zhang and W. Liu. A model approximation scheme for planning in stochastic domains. Journal of Artificial Intelligence Research, 7, 199–230, 1997.

    MathSciNet  Google Scholar 

  11. N. L. Zhang and W. Zhang. Speeding up the convergence of value iteration in partially observable Markov decision processes. Journal of Artificial Intelligence Research, 14, 29–51, 2001.

    MathSciNet  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2001 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Zhang, N.L., Zhang, W. (2001). Space-Progressive Value Iteration: An Anytime Algorithm for a Class of POMDPs. In: Benferhat, S., Besnard, P. (eds) Symbolic and Quantitative Approaches to Reasoning with Uncertainty. ECSQARU 2001. Lecture Notes in Computer Science(), vol 2143. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-44652-4_8

Download citation

  • DOI: https://doi.org/10.1007/3-540-44652-4_8

  • Published:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-42464-2

  • Online ISBN: 978-3-540-44652-1

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics