Abstract
Partially observable Markov decision process (POMDP) is an ideal framework for sequential decision-making under uncertainty in stochastic domains. However, it is notoriously computationally intractable to solving POMDP in real-time system. In order to address this problem, this paper proposes a point-based online value iteration (PBOVI) algorithm which involves performing value backup at specific reachable belief points, rather than over the entire belief simplex, to speed up computation processes, exploits branch-and-bound pruning approach to prune the AND/OR tree of belief states online, and proposes a novel idea to reuse the belief states that have been searched to avoid repeated computation. The experiment and simulation results show that the proposed algorithm can simultaneously satisfy the requirement of low errors and high timeliness in real-time system.






Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.References
Pakizeh E, Palhang M, Pedram MM (2013) Multi-criteria expertness based cooperative Q-learning. Appl Intell 39(1):28–40
Bonet B, Geffner H (2001) Planning and control in artificial intelligence: a unifying perspective. Appl Intell 14(3):237–252
Min HJ, Cho SB (2010) Adaptive behaviors of reactive mobile robot with Bayesian inference in nonstationary environments. Appl Intell 33(3):264–277
Vien NA, Ertel W, Dang VH, Chung TC (2013) Learning via human feedback in continuous state and action spaces. Appl Intell 39(2):267–278
Vien NA, Ertel W, Dang VH, Chung TC (2013) Monte-Carlo tree search for Bayesian reinforcement learning. Appl Intell 39(2):345–353
Voorbrk F, Massios N (2001) Decision-theoretic planning for autonomous robotic surveillance. Appl Intell 14(3):253–262
Tawfik AY, Khan S (2005) Temporal relevance in dynamic decision networks with sparse evidence. Appl Intell 23(2):87–96
Kaelbling LP, Littman ML, Cassandra A (1998) Planning and acting in partially observable stochastic domains. Artif Intell 101(2):99–134
Pineau J, Gordon G, Thrun S (2006) Anytime point-based approximations for large POMDPs. J Artif Intell Res 27:335–380
Spaan MTJ, Vlassis N (2005) Perseus: randomized point-based value iteration for POMDPs. J Artif Intell Res 24:195–220
Smith T, Simmons R (2005) Point-based POMDP algorithms: improved analysis and implementation. In: Proceedings of uncertainty in artificial intelligence (UAI-05), Cambridge, MA, pp 542–547
McMahan HB, Likhachev M, Gordon G (2005) Bounded real-time dynamic programming: RTDP with monotone upper bounds and performance guarantees. In: Proceedings of the twenty-second international conference on machine learning, New York, pp 569–576
Shani G, Brafman R, Shimony S (2007) Forward search value iteration for POMDPs. In: Proceedings of the 20th international joint conference on artificial intelligence, Hyderabad, India, pp 2619–2624
Ross S, Pineau J, Paquet S, Chaib-draa B (2008) Online planning algorithms for POMDPs. J Artif Intell Res 32:663–704
McAllester D, Singh S (1999) Approximate planning for factored POMDPs using belief state simplification. In: Proceedings of the fifteenth conference on uncertainty in artificial intelligence, Stockholm, Sweden, pp 409–416
Paquet S, Tobin L, Chaib-draa B (2005) An online POMDP algorithm for complex multiagent environments. In: Proceedings of fourth international joint conference on autonomous agents and multiagent systems, Utrecht, Netherlands, pp 970–977
Ross S, Chaib-draa B (2007) Aems: an anytime online search algorithm for approximate policy refinement in large POMDPs. In: Proceedings of the 20th international joint conference on artificial intelligence, Hyderabad, India, pp 2592–2598
Paquet S (2005) Distributed decision-making and task coordination in dynamic, uncertain and real-time multiagent environments. PhD thesis, Laval University, Canada
He RJ, Brunskill E, Roy N (2011) Efficient planning under uncertainty with macro-actions. J Artif Intell Res 40:23–570
Roy N, Gordon G (2005) Finding approximate POMDPS solutions through belief compression. J Artif Intell Res 23:1–40
Sondik E (1978) The optimal control of partially observable Markov processes over the infinite horizon: discounted costs. Oper Res 26(2):282–304
Kaplow R (2010) Point-based POMDP solvers: survey and comparative analysis. MS Thesis, McGill University, Canada
Puterman ML (1994) Markov decision processes: discrete stochastic dynamic programming. Wiley, New York
Hauskrecht M (2000) Value-function approximations for partially observable Markov decision processes. J Artif Intell Res 13:33–94
Smith T, Simmons R (2004) Heuristic search value iteration for POMDPs. In: Proceedings of uncertainty in artificial intelligence (UAI-04), Banff, Canada, pp 520–527
Acknowledgements
The authors wish to thank Min Wu, Weihua Cao, Xin Chen, and Yong He for their assistance and advice. We would also like to thank the anonymous reviewers for their helpful comments and suggestions. This work was supported by the NNSF of China under grant 61074058 and 60874042, and also was supported by the Shenzhen Technology Innovation Program of China under grant JCYJ20120617134831736.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Wu, B., Zheng, HY. & Feng, YP. Point-based online value iteration algorithm in large POMDP. Appl Intell 40, 546–555 (2014). https://doi.org/10.1007/s10489-013-0479-8
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10489-013-0479-8