Skip to main content

A Stochastic Point-Based Algorithm for POMDPs

  • Conference paper
Book cover Advances in Artificial Intelligence (Canadian AI 2008)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 5032))

  • 1441 Accesses

Abstract

We introduce a new backup operator for point-based POMDP algorithms which performs a look-ahead search at depth greater than one. We apply this operator into a new algorithm, called Stochastic Search Value Iteration (SSVI). This new algorithm relies on stochastic exploration of the environment in order to update the value function. This is in opposition with existing POMDP point-based algorithms. The underlying ideas on which SSVI is based are very similar to temporal difference learning algorithms for MDPs. In particular, SSVI takes advantage of a soft-max action selection function and of the random character of the environment itself. Empirical results on usual benchmark problems show that our algorithm performs a bit better and a bit faster than HSVI2, the state of the art algorithm. This suggests that stochastic algorithms are an alternative for solving large POMDPs.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Pineau, J., Gordon, G., Thrun, S.: Point-based value iteration: An anytime algorithm for pomdps. In: Proceedings of the International Joint Conference on Artificial Intelligence (IJCAI 2003), Acapulco, Mexico, pp. 1025–1032 (2003)

    Google Scholar 

  2. Smith, T., Simmons, R.: Heuristic search value iteration for pomdps. In: Proceedings of the 20th Conference on Uncertainty in Artificial Intelligence (UAI 2004), Banff, Canada (2004)

    Google Scholar 

  3. Vlassis, N., Spaan, M.T.J.: A fast point-based algorithm for POMDPs. In: Benelearn 2004: Proceedings of the Annual Machine Learning Conference of Belgium and the Netherlands, Brussels, Belgium, January 2004, pp. 170–176 (2003); (Also presented at the NIPS-16 workshop ‘Planning for the Real-World’, Whistler, Canada, December 2003)

    Google Scholar 

  4. Paquet, S., Tobin, L., Chaib-draa, B.: An Online POMDP Algorithm for Complex Multiagent Environments. In: Proceedings of The fourth International Joint Conference on Autonomous Agents and Multi Agent Systems (AAMAS 2005), Utrecht, The Netherlands (2005)

    Google Scholar 

  5. Paquet, S., Chaib-draa, B., Ross, S.: Hybrid POMDP Algorithms. In: Proceedings of The Workshop on Multi-Agent Sequential Decision Making in Uncertain Domains (MSMD 2006), Hakodate, Hokkaido, Japan (2006)

    Google Scholar 

  6. Ross, S., Chaib-draa, B.: AEMS: An Anytime Online Search Algorithm for Approximate Policy Refinement in Large POMDPs. In: Proceedings of The 20th Joint Conference in Artificial Intelligence (IJCAI 2007), Hyderabad, India (2007)

    Google Scholar 

  7. Sondik, E.J.: The Optimal Control of Partially Observable Markov Processes. PhD thesis, Stanford University (1971)

    Google Scholar 

  8. Smith, T., Simmons, R.: Point-based POMDP Algorithms: Improved Analysis and Implementation. In: Proceedings of the 21th Conference on Uncertainty in Artificial Intelligence (UAI 2005), Edinburgh, Scotland (2005)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Sabine Bergler

Rights and permissions

Reprints and permissions

Copyright information

© 2008 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Laviolette, F., Tobin, L. (2008). A Stochastic Point-Based Algorithm for POMDPs. In: Bergler, S. (eds) Advances in Artificial Intelligence. Canadian AI 2008. Lecture Notes in Computer Science(), vol 5032. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-68825-9_31

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-68825-9_31

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-68821-1

  • Online ISBN: 978-3-540-68825-9

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics