High-efficiency online planning using composite bounds search under partial observation

Chen, Yanjie; Liu, Jiangjiang; Huang, Yibin; Zhang, Hui; Wang, Yaonao

doi:10.1007/s10489-022-03914-5

High-efficiency online planning using composite bounds search under partial observation

Published: 30 July 2022

Volume 53, pages 8146–8159, (2023)
Cite this article

Applied Intelligence Aims and scope Submit manuscript

Yanjie Chen ORCID: orcid.org/0000-0001-9750-9177^1,4,
Jiangjiang Liu¹,
Yibin Huang¹,
Hui Zhang^2,4 &
…
Yaonao Wang^3,4

297 Accesses
1 Citation
1 Altmetric
Explore all metrics

Abstract

Motion planning in uncertain environments is a common challenge and essential for autonomous robot operations. Representatively, the determinized sparse partially observable tree (DESPOT) algorithm shows reasonable performance for planning under uncertainty. However, DESPOT may generate a low-quality solution due to inaccurate searches and low efficiencies in the belief tree construction. Therefore, this paper proposes a high-efficiency online planning method built upon the DESPOT algorithm, namely, the DESPOT with discounted upper and lower bounds (DESPOT-DULB) algorithm, to simultaneously improve the efficiency and performance of motion planning. Particularly, the node’s information is represented by combining the upper and lower bounds of the node (ULB) in the forward exploration of the action space to reasonably assist the optimal action selection. Then, a discounted factor based on the depth information of the belief tree is introduced to reduce the gap between the upper bound and lower bound both in the action space and observation space. As a result, the proposed method can comprehensively represent the information of the node to ensure a near-optimal forward search. The theoretical proofs of the proposed method are provided as well. The simulation results, including three representative scenario comparisons and a parameter sensitivity analysis, demonstrate that the proposed method exhibits favorable performances in many examples of interest.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

An Online POMDP Solver for Uncertainty Planning in Dynamic Environment

Bayesian incremental inference update by re-using calculations from belief space planning: a new paradigm

Article 24 August 2022

Decentralized multi-robot belief space planning in unknown environments via identification and efficient re-evaluation of impacted paths

Article 22 July 2017

References

Dai XY, Meng QH, Jin S (2021) Uncertainty-driven active view planning in feature-based monocular vSLAM. Appl Soft Comput 108:107459
Article Google Scholar
Nakrani NM, Joshi MM (2022) A human-like decision intelligence for obstacle avoidance in autonomous vehicle parking. Appl Intell 52(4):1–20
Article Google Scholar
Hubmann C, Schulz J, Becker M, Althoff D, Stiller C (2018) Automated driving in uncertain environments: planning with interaction and uncertain maneuver prediction. IEEE Trans Intell Veh 3(1):5–17
Article Google Scholar
Smallwood R, Sondik E (1973) The optimal control of partially observable Markov processes over a finite horizon. Oper Res 21:1071–1088
Article MATH Google Scholar
Bai H, Cai S, Ye N, Hsu D, Lee WS (2015) Intention-aware online POMDP planning for autonomous driving in a crowd. In: Proceedings of the IEEE International Conference on Robotics and Automation (ICRA), pp 454–460
Garg NP, Hsu D, Lee WS (2019) Learning to grasp under uncertainty using POMDPs. In: Proceedings of the IEEE International Conference on Robotics and Automation (ICRA), pp 2751–2757
Wu K, Lee WS, Hsu D (2015) POMDP to the rescue: boosting performance for Robocup rescue. In: proceedings of the IEEE international conference on intelligent robots and systems (IROS), pp 5294–5299
Folsom-Kovarik JT, Sukthankar G, Schatz S (2013) Tractable POMDP representations for intelligent tutoring systems. ACM Trans Intell Syst Technol 4(2):1–22
Article Google Scholar
Kaelbling LP, Littman ML, Cassandra AR (1998) Planning and acting in partially observable stochastic domains. Artif Intell 101(1–2):99–134
Article MathSciNet MATH Google Scholar
Deb S, Tammi K, Gao XZ, Kalita K, Mahanta P, Cross S (2022) A robust two-stage planning model for the charging station placement problem considering road traffic uncertainty. IEEE Trans Intell Transp Syst 23(7):1–15
Article Google Scholar
Sung I, Choi B, Nielsen P (2021) On the training of a neural network for online path planning with offline path planning algorithms. Int J Inf Manag 57:102142
Article Google Scholar
Nicol S, Chads I (2012) Which states matter? An application of an intelligent discretization method to solve a continuous POMDP in conservation biology. PLoS One 7(2):e28993
Article Google Scholar
Browne CB, Powley E, Whitehouse D, Lucas SM, Cowling PI, Rohlfshagen P, Tavener S, Perez D, Samothrakis S, Colton S (2012) A survey of Monte Carlo tree search methods. IEEE Trans Comput Intell AI in Games 4(1):1–43
Article Google Scholar
Silver D, Veness J (2010) Monte-Carlo planning in large POMDPs. Adv Neural Inf Proces Syst 23:2164–2172
Google Scholar
Auer P, Cesa-Bianchi N, Fischer P (2002) Finite-time analysis of the multiarmed bandit problem. Mach Learn 47(2–3):235–256
Article MATH Google Scholar
Somani A, Ye N, Hsu D, Lee WS (2013) DESPOT: online POMDP planning with regularization. Adv Neural Inf Proces Syst 58:231–266
MathSciNet MATH Google Scholar
Bougie N, Ichise R (2021) Fast and slow curiosity for high-level exploration in reinforcement learning. Appl Intell 51(2):1086–1107
Article Google Scholar
Chen Y, Kochenderfer MJ, Spaan MTJ (2018) Improving offline value-function approximations for POMDPs by reducing discount factors. In: Proceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp 3531–3536
Kurniawati H, Hsu D, Lee WS (2008) SARSOP: efficient point-based POMDP planning by approximating optimally reachable belief spaces. Robot: Sci Syst 4:65–72
Google Scholar
Bai H, Hsu D, Lee WS (2014) Integrated perception and planning in the continuous space: a POMDP approach. Int J Robot Res 33(9):1288–1302
Article Google Scholar
Zhang Z, Hsu D, Lee WS, Lim ZW, Bai A (2015) Please: palm leaf search for pomdps with large observation spaces. In: Proceedings of the Twenty-Fifth International Conference on Automated Planning and Scheduling, pp. 249–258
Wu B, Zheng HY, Feng YP (2014) Point-based online value iteration algorithm in large POMDP. Appl Intell 40(3):546–555
Article Google Scholar
He R, Brunskill E, Roy N (2011) Efficient planning under uncertainty with macro-actions. J Artif Intell Res 40:523–570
Article MATH Google Scholar
Ross S, Pineau J, Paquet S, Chaib-Draa B (2008) Online planning algorithms for POMDPs. J Artif Intell Res 32:663–704
Article MathSciNet MATH Google Scholar
Zhang S, Sridharan M, Washington C (2013) Active visual planning for mobile robot teams using hierarchical pomdps. IEEE Trans Robot 29(4):975–985
Article Google Scholar
Koval M, Hsu D, Pollard N, Srinivasa SS (2020) Configuration lattices for planar contact manipulation under uncertainty. In: Proceedings of International Workshop on the Algorithmic Foundations of Robotics, pp. 768–783
Sun K, Schlotfeldt B, Pappas GJ (2020) Stochastic motion planning under partial observability for mobile robots with continuous range measurements. IEEE Trans Robot 37(3):979–995
Article Google Scholar
Vien NA, Ngo H, Lee S, Chung T (2014) Approximate planning for Bayesian hierarchical reinforcement learning. Appl Intell 41(3):808–819
Article Google Scholar
Ye N, Somani A, Hsu D, Lee WS (2017) DESPOT: online POMDP planning with regularization. J Artif Intell Res 58:231–266
Article MathSciNet MATH Google Scholar
Garg NP, Hsu D, Lee WS (2019) DESPOT-alpha: online POMDP planning with large state and observation spaces. Robot: Sci and Syst. https://doi.org/10.15607/RSS.2019.XV.006
Luo Y, Bai H, Hsu D, Lee WS (2019) Importance sampling for online planning under uncertainty. Int J Robot Res 38(2–3):162–181
Article Google Scholar
Cai P, Luo Y, Hsu D, Lee WS (2021) HyP-DESPOT: a hybrid parallel algorithm for online planning under uncertainty. Int J Robot Res 40(2–3):558–573
Article Google Scholar
Wu C, Kong R, Yang G, Kong X, Zhang Z, Yu Y, Liu W (2021) LB-DESPOT: efficient online POMDP planning considering lower bound in action selection. In: Proceedings of the AAAI Conference on Artificial Intelligence 35(18):15927–15928
Yoon S, Fern A, Givan R, Kambhampati S (2008) Probabilistic planning via determinization in hindsight. In: Proceedings of AAAI Conference on Artificial Intelligence 2:1010–1016

Download references

Acknowledgments

This work was supported in part by the National Key Research and Development Program of China (No. 2021ZD0114503), the Major Research plan of the National Natural Science Foundation of China (No. 92148204), the National Natural Science Foundation of China (No. 61803089, 61971071, 62027810, 62133005), the Hunan Science Fund for Distinguished Young Scholars (No. 2021JJ10025), the Hunan key research and development program (No. 2021GK4011, 2022GK2011), the Changsha Science and Technology Major Project (No. kh2003026), the Joint Open Foundation of State Key Laboratory of Robotics (No. 2021-KF-22-17), the Tianjin University-Fuzhou University Independent Innovation Fund (No. TF2022-4), the China University industry-University-research Innovation Fund (No. 2020HYA06006).

Author information

Authors and Affiliations

School of Mechanical Engineering and Automation, Fuzhou University, Fuzhou, 350108, China
Yanjie Chen, Jiangjiang Liu & Yibin Huang
School of Robotics, Hunan University, Changsha, 410082, China
Hui Zhang
College of Electrical and Information Engineering, Hunan University, Changsha, 410082, China
Yaonao Wang
National Engineering Research Center of Robot Visual Perception and Control Technology, Changsha, 410082, China
Yanjie Chen, Hui Zhang & Yaonao Wang

Authors

Yanjie Chen
View author publications
You can also search for this author in PubMed Google Scholar
Jiangjiang Liu
View author publications
You can also search for this author in PubMed Google Scholar
Yibin Huang
View author publications
You can also search for this author in PubMed Google Scholar
Hui Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Yaonao Wang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Hui Zhang.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendices

Appendix 1 Proof

Let CH(b, a^∗) be all the set of b^′ = τ(b, a, z) for some $ z\in {Z}_{b,{a}^{\ast }} $, that is, the set of children nodes of b in the DESPOT-DULB tree. If E(b) > 0, then ε(b) > 0, that is, μ(b) − l(b) > 0, and thus μ(b) ≠ l₀(b). Hence we have

$$ {\displaystyle \begin{array}{c}\mu (b)=d\left(b,{a}^{\ast}\right)=\rho \left(b,{a}^{\ast}\right)\\ {}+\left(\sum \limits_{b^{\prime}\in CH\left(b,{a}^{\ast}\right)}\mu \left({b}^{\prime}\right)+\omega \sum \limits_{b^{\prime}\in CH\left(b,{a}^{\ast}\right)}l\left({b}^{\prime}\right)\right)/\beta \end{array}} $$

(A1)

and

$$ {\displaystyle \begin{array}{c}l(b)\ge m\left(b,{a}^{\ast}\right)\ge \rho \left(b,{a}^{\ast}\right)\\ {}+\left(\sum \limits_{b^{\prime}\in CH\left(b,{a}^{\ast}\right)}l\left({b}^{\prime}\right)+\omega \sum \limits_{b^{\prime}\in CH\left(b,{a}^{\ast}\right)}\mu \left({b}^{\prime}\right)\right)/\beta \end{array}} $$

(A2)

where 0 < ω < 1, and β > 1. Subtracting the Eq. (A1) by the Eq. (A2), we have

$$ \mu (b)-l(b)\le \left(1-\omega \right)\sum \limits_{b^{\prime}\in CH\left(b,{a}^{\ast}\right)}\left[\mu \left({b}^{\prime}\right)-l\left({b}^{\prime}\right)\right]/\beta $$

(A3)

where

$$ \beta ={\kappa}^{\Delta (b)} $$

(A4)

Considering 0 < 1 − ω < 1, κ > 1, we have

$$ {\displaystyle \begin{array}{c}\left(1-\omega \right)\sum \limits_{b^{\prime}\in CH\left(b,{a}^{\ast}\right)}\left[\mu \left({b}^{\prime}\right)-l\left({b}^{\prime}\right)\right]/\beta \\ {}<\sum \limits_{b^{\prime}\in CH\left(b,{a}^{\ast}\right)}\left[\mu \left({b}^{\prime}\right)-l\left({b}^{\prime}\right)\right]\end{array}} $$

(A5)

That is

$$ \mu (b)-l(b)\le \sum \limits_{b^{\prime}\in CH\left(b,{a}^{\ast}\right)}\left[\mu \left({b}^{\prime}\right)-l\left({b}^{\prime}\right)\right] $$

(A6)

Combining the Eq. (A4) and Eq. (A6), we have

$$ {\displaystyle \begin{array}{c}{\kappa}^{\varDelta (b)}\left[\mu (b)-l(b)\right]\le {\kappa}^{\varDelta (b)}\sum \limits_{b^{\prime}\in CH\left(b,{a}^{\ast}\right)}\left[\mu \left({b}^{\prime}\right)-l\left({b}^{\prime}\right)\right]\\ {}\le {\kappa}^{\varDelta \left({b}^{\prime}\right)}\sum \limits_{b^{\prime}\in CH\left(b,{a}^{\ast}\right)}\left[\mu \left({b}^{\prime}\right)-l\left({b}^{\prime}\right)\right]\end{array}} $$

(A7)

Note that

$$ \frac{\mid {\varPhi}_b\mid }{K}\xi \varepsilon \left({b}_0\right)=\sum \limits_{b^{\prime}\in CH\left(b,{a}^{\ast}\right)}\frac{\mid {\varPhi}_{b^{\prime }}\mid }{K}\xi \varepsilon \left({b}_0\right) $$

(A8)

Hence, we have

$$ {\displaystyle \begin{array}{c}{\kappa}^{\varDelta (b)}\left[\mu (b)-l(b)\right]-\frac{\mid {\varPhi}_b\mid }{K}\xi \varepsilon \left({b}_0\right)\\ {}\le \sum \limits_{b^{\prime}\in CH\left(b,{a}^{\ast}\right)}\left\{{\kappa}^{\varDelta \left({b}^{\prime}\right)}\left[\mu \left({b}^{\prime}\right)-l\left({b}^{\prime}\right)\right]-\frac{\mid {\varPhi}_{b^{\prime }}\mid }{K}\xi \varepsilon \left({b}_0\right)\right\}\end{array}} $$

(A9)

That is, $ E(b)\le \sum \limits_{z\in {Z}_{b,{a}^{\ast }}}E\left({b}^{\prime}\right) $.

Appendix 2 Proof

Let U₀^′(b) = U₀(b) + δ, then U₀^′ is an exact upper bound. Let μ₀^′ be the corresponding initial upper bound, and μ^′ be the corresponding upper bound on ν^∗(b). Then μ₀^′ is a valid initial upper bound for ν^∗(b) and the backup equations ensure that μ^′(b) is a valid upper bound for ν^∗(b). On the other hand, it is easily shown by induction that

$$ \mu (b)+{\gamma}^{\varDelta (b)}\frac{\mid {\varPhi}_b\mid }{K}\delta \ge {\mu}^{\prime }(b) $$

(B10)

When a special case for b = b₀, we have

$$ \mu (b)+\delta \ge {\mu}^{\prime}\left({b}_0\right) $$

(B11)

Hence, when the algorithm terminates, we have

$$ \mu \left({b}_0\right)+\delta \ge {\mu}^{\prime}\left({b}_0\right)\ge {\nu}^{\ast}\left({b}_0\right) $$

(B12)

Equivalently,

$$ {\displaystyle \begin{array}{c}{\nu}_{\hat{\pi}}=l\left({b}_0\right)\ge {\nu}^{\ast}\left({b}_0\right)-\left(\mu \left({b}_0\right)-l\left({b}_0\right)\right)-\delta \\ {}={\nu}^{\ast}\left({b}_0\right)-\varepsilon \left({b}_0\right)-\delta \end{array}} $$

(B13)

The Eq. (B13) holds because the initialization and the computation of the lower bound l via the backup equations are exactly that for finding a regularized optimal policy value in the partial DESPOT-DULB.

Rights and permissions

Springer Nature or its licensor holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Chen, Y., Liu, J., Huang, Y. et al. High-efficiency online planning using composite bounds search under partial observation. Appl Intell 53, 8146–8159 (2023). https://doi.org/10.1007/s10489-022-03914-5

Download citation

Accepted: 19 June 2022
Published: 30 July 2022
Issue Date: April 2023
DOI: https://doi.org/10.1007/s10489-022-03914-5

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

High-efficiency online planning using composite bounds search under partial observation

Abstract

Access this article

Similar content being viewed by others

An Online POMDP Solver for Uncertainty Planning in Dynamic Environment

Bayesian incremental inference update by re-using calculations from belief space planning: a new paradigm

Decentralized multi-robot belief space planning in unknown environments via identification and efficient re-evaluation of impacted paths

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s note

Appendices

Appendix 1

Proof

Appendix 2

Proof

Rights and permissions

About this article

Cite this article

Keywords

Navigation

High-efficiency online planning using composite bounds search under partial observation

Abstract

Access this article

Similar content being viewed by others

An Online POMDP Solver for Uncertainty Planning in Dynamic Environment

Bayesian incremental inference update by re-using calculations from belief space planning: a new paradigm

Decentralized multi-robot belief space planning in unknown environments via identification and efficient re-evaluation of impacted paths

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s note

Appendices

Appendix 1

Proof

Appendix 2

Proof

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation