Skip to main content
Log in

Point-based online value iteration algorithm in large POMDP

  • Published:
Applied Intelligence Aims and scope Submit manuscript

Abstract

Partially observable Markov decision process (POMDP) is an ideal framework for sequential decision-making under uncertainty in stochastic domains. However, it is notoriously computationally intractable to solving POMDP in real-time system. In order to address this problem, this paper proposes a point-based online value iteration (PBOVI) algorithm which involves performing value backup at specific reachable belief points, rather than over the entire belief simplex, to speed up computation processes, exploits branch-and-bound pruning approach to prune the AND/OR tree of belief states online, and proposes a novel idea to reuse the belief states that have been searched to avoid repeated computation. The experiment and simulation results show that the proposed algorithm can simultaneously satisfy the requirement of low errors and high timeliness in real-time system.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Algorithm 1
Fig. 5

Similar content being viewed by others

Explore related subjects

Discover the latest articles, news and stories from top researchers in related subjects.

References

  1. Pakizeh E, Palhang M, Pedram MM (2013) Multi-criteria expertness based cooperative Q-learning. Appl Intell 39(1):28–40

    Article  Google Scholar 

  2. Bonet B, Geffner H (2001) Planning and control in artificial intelligence: a unifying perspective. Appl Intell 14(3):237–252

    Article  MATH  Google Scholar 

  3. Min HJ, Cho SB (2010) Adaptive behaviors of reactive mobile robot with Bayesian inference in nonstationary environments. Appl Intell 33(3):264–277

    Article  Google Scholar 

  4. Vien NA, Ertel W, Dang VH, Chung TC (2013) Learning via human feedback in continuous state and action spaces. Appl Intell 39(2):267–278

    Article  Google Scholar 

  5. Vien NA, Ertel W, Dang VH, Chung TC (2013) Monte-Carlo tree search for Bayesian reinforcement learning. Appl Intell 39(2):345–353

    Article  Google Scholar 

  6. Voorbrk F, Massios N (2001) Decision-theoretic planning for autonomous robotic surveillance. Appl Intell 14(3):253–262

    Article  Google Scholar 

  7. Tawfik AY, Khan S (2005) Temporal relevance in dynamic decision networks with sparse evidence. Appl Intell 23(2):87–96

    Article  Google Scholar 

  8. Kaelbling LP, Littman ML, Cassandra A (1998) Planning and acting in partially observable stochastic domains. Artif Intell 101(2):99–134

    Article  MATH  MathSciNet  Google Scholar 

  9. Pineau J, Gordon G, Thrun S (2006) Anytime point-based approximations for large POMDPs. J Artif Intell Res 27:335–380

    MATH  Google Scholar 

  10. Spaan MTJ, Vlassis N (2005) Perseus: randomized point-based value iteration for POMDPs. J Artif Intell Res 24:195–220

    MATH  Google Scholar 

  11. Smith T, Simmons R (2005) Point-based POMDP algorithms: improved analysis and implementation. In: Proceedings of uncertainty in artificial intelligence (UAI-05), Cambridge, MA, pp 542–547

    Google Scholar 

  12. McMahan HB, Likhachev M, Gordon G (2005) Bounded real-time dynamic programming: RTDP with monotone upper bounds and performance guarantees. In: Proceedings of the twenty-second international conference on machine learning, New York, pp 569–576

    Google Scholar 

  13. Shani G, Brafman R, Shimony S (2007) Forward search value iteration for POMDPs. In: Proceedings of the 20th international joint conference on artificial intelligence, Hyderabad, India, pp 2619–2624

    Google Scholar 

  14. Ross S, Pineau J, Paquet S, Chaib-draa B (2008) Online planning algorithms for POMDPs. J Artif Intell Res 32:663–704

    MATH  MathSciNet  Google Scholar 

  15. McAllester D, Singh S (1999) Approximate planning for factored POMDPs using belief state simplification. In: Proceedings of the fifteenth conference on uncertainty in artificial intelligence, Stockholm, Sweden, pp 409–416

    Google Scholar 

  16. Paquet S, Tobin L, Chaib-draa B (2005) An online POMDP algorithm for complex multiagent environments. In: Proceedings of fourth international joint conference on autonomous agents and multiagent systems, Utrecht, Netherlands, pp 970–977

    Chapter  Google Scholar 

  17. Ross S, Chaib-draa B (2007) Aems: an anytime online search algorithm for approximate policy refinement in large POMDPs. In: Proceedings of the 20th international joint conference on artificial intelligence, Hyderabad, India, pp 2592–2598

    Google Scholar 

  18. Paquet S (2005) Distributed decision-making and task coordination in dynamic, uncertain and real-time multiagent environments. PhD thesis, Laval University, Canada

  19. He RJ, Brunskill E, Roy N (2011) Efficient planning under uncertainty with macro-actions. J Artif Intell Res 40:23–570

    Google Scholar 

  20. Roy N, Gordon G (2005) Finding approximate POMDPS solutions through belief compression. J Artif Intell Res 23:1–40

    Article  MATH  Google Scholar 

  21. Sondik E (1978) The optimal control of partially observable Markov processes over the infinite horizon: discounted costs. Oper Res 26(2):282–304

    Article  MATH  MathSciNet  Google Scholar 

  22. Kaplow R (2010) Point-based POMDP solvers: survey and comparative analysis. MS Thesis, McGill University, Canada

  23. Puterman ML (1994) Markov decision processes: discrete stochastic dynamic programming. Wiley, New York

    Book  MATH  Google Scholar 

  24. Hauskrecht M (2000) Value-function approximations for partially observable Markov decision processes. J Artif Intell Res 13:33–94

    MATH  MathSciNet  Google Scholar 

  25. Smith T, Simmons R (2004) Heuristic search value iteration for POMDPs. In: Proceedings of uncertainty in artificial intelligence (UAI-04), Banff, Canada, pp 520–527

    Google Scholar 

Download references

Acknowledgements

The authors wish to thank Min Wu, Weihua Cao, Xin Chen, and Yong He for their assistance and advice. We would also like to thank the anonymous reviewers for their helpful comments and suggestions. This work was supported by the NNSF of China under grant 61074058 and 60874042, and also was supported by the Shenzhen Technology Innovation Program of China under grant JCYJ20120617134831736.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Bo Wu.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Wu, B., Zheng, HY. & Feng, YP. Point-based online value iteration algorithm in large POMDP. Appl Intell 40, 546–555 (2014). https://doi.org/10.1007/s10489-013-0479-8

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10489-013-0479-8

Keywords

Navigation