Point-based online value iteration algorithm in large POMDP

Wu, Bo; Zheng, Hong-Yan; Feng, Yan-Peng

doi:10.1007/s10489-013-0479-8

Point-based online value iteration algorithm in large POMDP

Published: 13 October 2013

Volume 40, pages 546–555, (2014)
Cite this article

Applied Intelligence Aims and scope Submit manuscript

Bo Wu¹,
Hong-Yan Zheng¹ &
Yan-Peng Feng¹

369 Accesses
7 Citations
Explore all metrics

Abstract

Partially observable Markov decision process (POMDP) is an ideal framework for sequential decision-making under uncertainty in stochastic domains. However, it is notoriously computationally intractable to solving POMDP in real-time system. In order to address this problem, this paper proposes a point-based online value iteration (PBOVI) algorithm which involves performing value backup at specific reachable belief points, rather than over the entire belief simplex, to speed up computation processes, exploits branch-and-bound pruning approach to prune the AND/OR tree of belief states online, and proposes a novel idea to reuse the belief states that have been searched to avoid repeated computation. The experiment and simulation results show that the proposed algorithm can simultaneously satisfy the requirement of low errors and high timeliness in real-time system.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A Fast Approximation Method for Partially Observable Markov Decision Processes

Article 07 December 2018

Expectation-maximization for Bayes-adaptive POMDPs

Article 03 September 2015

Adaptive Discretization Using Voronoi Trees for Continuous-Action POMDPs

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

References

Pakizeh E, Palhang M, Pedram MM (2013) Multi-criteria expertness based cooperative Q-learning. Appl Intell 39(1):28–40
Article Google Scholar
Bonet B, Geffner H (2001) Planning and control in artificial intelligence: a unifying perspective. Appl Intell 14(3):237–252
Article MATH Google Scholar
Min HJ, Cho SB (2010) Adaptive behaviors of reactive mobile robot with Bayesian inference in nonstationary environments. Appl Intell 33(3):264–277
Article Google Scholar
Vien NA, Ertel W, Dang VH, Chung TC (2013) Learning via human feedback in continuous state and action spaces. Appl Intell 39(2):267–278
Article Google Scholar
Vien NA, Ertel W, Dang VH, Chung TC (2013) Monte-Carlo tree search for Bayesian reinforcement learning. Appl Intell 39(2):345–353
Article Google Scholar
Voorbrk F, Massios N (2001) Decision-theoretic planning for autonomous robotic surveillance. Appl Intell 14(3):253–262
Article Google Scholar
Tawfik AY, Khan S (2005) Temporal relevance in dynamic decision networks with sparse evidence. Appl Intell 23(2):87–96
Article Google Scholar
Kaelbling LP, Littman ML, Cassandra A (1998) Planning and acting in partially observable stochastic domains. Artif Intell 101(2):99–134
Article MATH MathSciNet Google Scholar
Pineau J, Gordon G, Thrun S (2006) Anytime point-based approximations for large POMDPs. J Artif Intell Res 27:335–380
MATH Google Scholar
Spaan MTJ, Vlassis N (2005) Perseus: randomized point-based value iteration for POMDPs. J Artif Intell Res 24:195–220
MATH Google Scholar
Smith T, Simmons R (2005) Point-based POMDP algorithms: improved analysis and implementation. In: Proceedings of uncertainty in artificial intelligence (UAI-05), Cambridge, MA, pp 542–547
Google Scholar
McMahan HB, Likhachev M, Gordon G (2005) Bounded real-time dynamic programming: RTDP with monotone upper bounds and performance guarantees. In: Proceedings of the twenty-second international conference on machine learning, New York, pp 569–576
Google Scholar
Shani G, Brafman R, Shimony S (2007) Forward search value iteration for POMDPs. In: Proceedings of the 20th international joint conference on artificial intelligence, Hyderabad, India, pp 2619–2624
Google Scholar
Ross S, Pineau J, Paquet S, Chaib-draa B (2008) Online planning algorithms for POMDPs. J Artif Intell Res 32:663–704
MATH MathSciNet Google Scholar
McAllester D, Singh S (1999) Approximate planning for factored POMDPs using belief state simplification. In: Proceedings of the fifteenth conference on uncertainty in artificial intelligence, Stockholm, Sweden, pp 409–416
Google Scholar
Paquet S, Tobin L, Chaib-draa B (2005) An online POMDP algorithm for complex multiagent environments. In: Proceedings of fourth international joint conference on autonomous agents and multiagent systems, Utrecht, Netherlands, pp 970–977
Chapter Google Scholar
Ross S, Chaib-draa B (2007) Aems: an anytime online search algorithm for approximate policy refinement in large POMDPs. In: Proceedings of the 20th international joint conference on artificial intelligence, Hyderabad, India, pp 2592–2598
Google Scholar
Paquet S (2005) Distributed decision-making and task coordination in dynamic, uncertain and real-time multiagent environments. PhD thesis, Laval University, Canada
He RJ, Brunskill E, Roy N (2011) Efficient planning under uncertainty with macro-actions. J Artif Intell Res 40:23–570
Google Scholar
Roy N, Gordon G (2005) Finding approximate POMDPS solutions through belief compression. J Artif Intell Res 23:1–40
Article MATH Google Scholar
Sondik E (1978) The optimal control of partially observable Markov processes over the infinite horizon: discounted costs. Oper Res 26(2):282–304
Article MATH MathSciNet Google Scholar
Kaplow R (2010) Point-based POMDP solvers: survey and comparative analysis. MS Thesis, McGill University, Canada
Puterman ML (1994) Markov decision processes: discrete stochastic dynamic programming. Wiley, New York
Book MATH Google Scholar
Hauskrecht M (2000) Value-function approximations for partially observable Markov decision processes. J Artif Intell Res 13:33–94
MATH MathSciNet Google Scholar
Smith T, Simmons R (2004) Heuristic search value iteration for POMDPs. In: Proceedings of uncertainty in artificial intelligence (UAI-04), Banff, Canada, pp 520–527
Google Scholar

Download references

Acknowledgements

The authors wish to thank Min Wu, Weihua Cao, Xin Chen, and Yong He for their assistance and advice. We would also like to thank the anonymous reviewers for their helpful comments and suggestions. This work was supported by the NNSF of China under grant 61074058 and 60874042, and also was supported by the Shenzhen Technology Innovation Program of China under grant JCYJ20120617134831736.

Author information

Authors and Affiliations

Education Technology and Information Center, Shenzhen Polytechnic, Shenzhen, 518055, China
Bo Wu, Hong-Yan Zheng & Yan-Peng Feng

Authors

Bo Wu
View author publications
You can also search for this author in PubMed Google Scholar
Hong-Yan Zheng
View author publications
You can also search for this author in PubMed Google Scholar
Yan-Peng Feng
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Bo Wu.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Wu, B., Zheng, HY. & Feng, YP. Point-based online value iteration algorithm in large POMDP. Appl Intell 40, 546–555 (2014). https://doi.org/10.1007/s10489-013-0479-8

Download citation

Published: 13 October 2013
Issue Date: April 2014
DOI: https://doi.org/10.1007/s10489-013-0479-8

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Point-based online value iteration algorithm in large POMDP

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

A Fast Approximation Method for Partially Observable Markov Decision Processes

Expectation-maximization for Bayes-adaptive POMDPs

Adaptive Discretization Using Voronoi Trees for Continuous-Action POMDPs

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Subscribe and save

Buy Now

Navigation

Point-based online value iteration algorithm in large POMDP

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

A Fast Approximation Method for Partially Observable Markov Decision Processes

Expectation-maximization for Bayes-adaptive POMDPs

Adaptive Discretization Using Voronoi Trees for Continuous-Action POMDPs

Explore related subjects

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now

Search

Navigation