Skip to main content
Log in

High-efficiency online planning using composite bounds search under partial observation

  • Published:
Applied Intelligence Aims and scope Submit manuscript

Abstract

Motion planning in uncertain environments is a common challenge and essential for autonomous robot operations. Representatively, the determinized sparse partially observable tree (DESPOT) algorithm shows reasonable performance for planning under uncertainty. However, DESPOT may generate a low-quality solution due to inaccurate searches and low efficiencies in the belief tree construction. Therefore, this paper proposes a high-efficiency online planning method built upon the DESPOT algorithm, namely, the DESPOT with discounted upper and lower bounds (DESPOT-DULB) algorithm, to simultaneously improve the efficiency and performance of motion planning. Particularly, the node’s information is represented by combining the upper and lower bounds of the node (ULB) in the forward exploration of the action space to reasonably assist the optimal action selection. Then, a discounted factor based on the depth information of the belief tree is introduced to reduce the gap between the upper bound and lower bound both in the action space and observation space. As a result, the proposed method can comprehensively represent the information of the node to ensure a near-optimal forward search. The theoretical proofs of the proposed method are provided as well. The simulation results, including three representative scenario comparisons and a parameter sensitivity analysis, demonstrate that the proposed method exhibits favorable performances in many examples of interest.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5

Similar content being viewed by others

References

  1. Dai XY, Meng QH, Jin S (2021) Uncertainty-driven active view planning in feature-based monocular vSLAM. Appl Soft Comput 108:107459

    Article  Google Scholar 

  2. Nakrani NM, Joshi MM (2022) A human-like decision intelligence for obstacle avoidance in autonomous vehicle parking. Appl Intell 52(4):1–20

    Article  Google Scholar 

  3. Hubmann C, Schulz J, Becker M, Althoff D, Stiller C (2018) Automated driving in uncertain environments: planning with interaction and uncertain maneuver prediction. IEEE Trans Intell Veh 3(1):5–17

    Article  Google Scholar 

  4. Smallwood R, Sondik E (1973) The optimal control of partially observable Markov processes over a finite horizon. Oper Res 21:1071–1088

    Article  MATH  Google Scholar 

  5. Bai H, Cai S, Ye N, Hsu D, Lee WS (2015) Intention-aware online POMDP planning for autonomous driving in a crowd. In: Proceedings of the IEEE International Conference on Robotics and Automation (ICRA), pp 454–460

  6. Garg NP, Hsu D, Lee WS (2019) Learning to grasp under uncertainty using POMDPs. In: Proceedings of the IEEE International Conference on Robotics and Automation (ICRA), pp 2751–2757

  7. Wu K, Lee WS, Hsu D (2015) POMDP to the rescue: boosting performance for Robocup rescue. In: proceedings of the IEEE international conference on intelligent robots and systems (IROS), pp 5294–5299

  8. Folsom-Kovarik JT, Sukthankar G, Schatz S (2013) Tractable POMDP representations for intelligent tutoring systems. ACM Trans Intell Syst Technol 4(2):1–22

    Article  Google Scholar 

  9. Kaelbling LP, Littman ML, Cassandra AR (1998) Planning and acting in partially observable stochastic domains. Artif Intell 101(1–2):99–134

    Article  MathSciNet  MATH  Google Scholar 

  10. Deb S, Tammi K, Gao XZ, Kalita K, Mahanta P, Cross S (2022) A robust two-stage planning model for the charging station placement problem considering road traffic uncertainty. IEEE Trans Intell Transp Syst 23(7):1–15

    Article  Google Scholar 

  11. Sung I, Choi B, Nielsen P (2021) On the training of a neural network for online path planning with offline path planning algorithms. Int J Inf Manag 57:102142

    Article  Google Scholar 

  12. Nicol S, Chads I (2012) Which states matter? An application of an intelligent discretization method to solve a continuous POMDP in conservation biology. PLoS One 7(2):e28993

    Article  Google Scholar 

  13. Browne CB, Powley E, Whitehouse D, Lucas SM, Cowling PI, Rohlfshagen P, Tavener S, Perez D, Samothrakis S, Colton S (2012) A survey of Monte Carlo tree search methods. IEEE Trans Comput Intell AI in Games 4(1):1–43

    Article  Google Scholar 

  14. Silver D, Veness J (2010) Monte-Carlo planning in large POMDPs. Adv Neural Inf Proces Syst 23:2164–2172

    Google Scholar 

  15. Auer P, Cesa-Bianchi N, Fischer P (2002) Finite-time analysis of the multiarmed bandit problem. Mach Learn 47(2–3):235–256

    Article  MATH  Google Scholar 

  16. Somani A, Ye N, Hsu D, Lee WS (2013) DESPOT: online POMDP planning with regularization. Adv Neural Inf Proces Syst 58:231–266

    MathSciNet  MATH  Google Scholar 

  17. Bougie N, Ichise R (2021) Fast and slow curiosity for high-level exploration in reinforcement learning. Appl Intell 51(2):1086–1107

    Article  Google Scholar 

  18. Chen Y, Kochenderfer MJ, Spaan MTJ (2018) Improving offline value-function approximations for POMDPs by reducing discount factors. In: Proceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp 3531–3536

  19. Kurniawati H, Hsu D, Lee WS (2008) SARSOP: efficient point-based POMDP planning by approximating optimally reachable belief spaces. Robot: Sci Syst 4:65–72

    Google Scholar 

  20. Bai H, Hsu D, Lee WS (2014) Integrated perception and planning in the continuous space: a POMDP approach. Int J Robot Res 33(9):1288–1302

    Article  Google Scholar 

  21. Zhang Z, Hsu D, Lee WS, Lim ZW, Bai A (2015) Please: palm leaf search for pomdps with large observation spaces. In: Proceedings of the Twenty-Fifth International Conference on Automated Planning and Scheduling, pp. 249–258

  22. Wu B, Zheng HY, Feng YP (2014) Point-based online value iteration algorithm in large POMDP. Appl Intell 40(3):546–555

    Article  Google Scholar 

  23. He R, Brunskill E, Roy N (2011) Efficient planning under uncertainty with macro-actions. J Artif Intell Res 40:523–570

    Article  MATH  Google Scholar 

  24. Ross S, Pineau J, Paquet S, Chaib-Draa B (2008) Online planning algorithms for POMDPs. J Artif Intell Res 32:663–704

    Article  MathSciNet  MATH  Google Scholar 

  25. Zhang S, Sridharan M, Washington C (2013) Active visual planning for mobile robot teams using hierarchical pomdps. IEEE Trans Robot 29(4):975–985

    Article  Google Scholar 

  26. Koval M, Hsu D, Pollard N, Srinivasa SS (2020) Configuration lattices for planar contact manipulation under uncertainty. In: Proceedings of International Workshop on the Algorithmic Foundations of Robotics, pp. 768–783

  27. Sun K, Schlotfeldt B, Pappas GJ (2020) Stochastic motion planning under partial observability for mobile robots with continuous range measurements. IEEE Trans Robot 37(3):979–995

    Article  Google Scholar 

  28. Vien NA, Ngo H, Lee S, Chung T (2014) Approximate planning for Bayesian hierarchical reinforcement learning. Appl Intell 41(3):808–819

    Article  Google Scholar 

  29. Ye N, Somani A, Hsu D, Lee WS (2017) DESPOT: online POMDP planning with regularization. J Artif Intell Res 58:231–266

    Article  MathSciNet  MATH  Google Scholar 

  30. Garg NP, Hsu D, Lee WS (2019) DESPOT-alpha: online POMDP planning with large state and observation spaces. Robot: Sci and Syst. https://doi.org/10.15607/RSS.2019.XV.006

  31. Luo Y, Bai H, Hsu D, Lee WS (2019) Importance sampling for online planning under uncertainty. Int J Robot Res 38(2–3):162–181

    Article  Google Scholar 

  32. Cai P, Luo Y, Hsu D, Lee WS (2021) HyP-DESPOT: a hybrid parallel algorithm for online planning under uncertainty. Int J Robot Res 40(2–3):558–573

    Article  Google Scholar 

  33. Wu C, Kong R, Yang G, Kong X, Zhang Z, Yu Y, Liu W (2021) LB-DESPOT: efficient online POMDP planning considering lower bound in action selection. In: Proceedings of the AAAI Conference on Artificial Intelligence 35(18):15927–15928

  34. Yoon S, Fern A, Givan R, Kambhampati S (2008) Probabilistic planning via determinization in hindsight. In: Proceedings of AAAI Conference on Artificial Intelligence 2:1010–1016

Download references

Acknowledgments

This work was supported in part by the National Key Research and Development Program of China (No. 2021ZD0114503), the Major Research plan of the National Natural Science Foundation of China (No. 92148204), the National Natural Science Foundation of China (No. 61803089, 61971071, 62027810, 62133005), the Hunan Science Fund for Distinguished Young Scholars (No. 2021JJ10025), the Hunan key research and development program (No. 2021GK4011, 2022GK2011), the Changsha Science and Technology Major Project (No. kh2003026), the Joint Open Foundation of State Key Laboratory of Robotics (No. 2021-KF-22-17), the Tianjin University-Fuzhou University Independent Innovation Fund (No. TF2022-4), the China University industry-University-research Innovation Fund (No. 2020HYA06006).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Hui Zhang.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendices

Appendix 1

Proof

Let CH(b, a) be all the set of b = τ(b, a, z) for some \( z\in {Z}_{b,{a}^{\ast }} \), that is, the set of children nodes of b in the DESPOT-DULB tree. If E(b) > 0, then ε(b) > 0, that is, μ(b) − l(b) > 0, and thus μ(b) ≠ l0(b). Hence we have

$$ {\displaystyle \begin{array}{c}\mu (b)=d\left(b,{a}^{\ast}\right)=\rho \left(b,{a}^{\ast}\right)\\ {}+\left(\sum \limits_{b^{\prime}\in CH\left(b,{a}^{\ast}\right)}\mu \left({b}^{\prime}\right)+\omega \sum \limits_{b^{\prime}\in CH\left(b,{a}^{\ast}\right)}l\left({b}^{\prime}\right)\right)/\beta \end{array}} $$
(A1)

and

$$ {\displaystyle \begin{array}{c}l(b)\ge m\left(b,{a}^{\ast}\right)\ge \rho \left(b,{a}^{\ast}\right)\\ {}+\left(\sum \limits_{b^{\prime}\in CH\left(b,{a}^{\ast}\right)}l\left({b}^{\prime}\right)+\omega \sum \limits_{b^{\prime}\in CH\left(b,{a}^{\ast}\right)}\mu \left({b}^{\prime}\right)\right)/\beta \end{array}} $$
(A2)

where 0 < ω < 1, and β > 1. Subtracting the Eq. (A1) by the Eq. (A2), we have

$$ \mu (b)-l(b)\le \left(1-\omega \right)\sum \limits_{b^{\prime}\in CH\left(b,{a}^{\ast}\right)}\left[\mu \left({b}^{\prime}\right)-l\left({b}^{\prime}\right)\right]/\beta $$
(A3)

where

$$ \beta ={\kappa}^{\Delta (b)} $$
(A4)

Considering 0 < 1 − ω < 1, κ > 1, we have

$$ {\displaystyle \begin{array}{c}\left(1-\omega \right)\sum \limits_{b^{\prime}\in CH\left(b,{a}^{\ast}\right)}\left[\mu \left({b}^{\prime}\right)-l\left({b}^{\prime}\right)\right]/\beta \\ {}<\sum \limits_{b^{\prime}\in CH\left(b,{a}^{\ast}\right)}\left[\mu \left({b}^{\prime}\right)-l\left({b}^{\prime}\right)\right]\end{array}} $$
(A5)

That is

$$ \mu (b)-l(b)\le \sum \limits_{b^{\prime}\in CH\left(b,{a}^{\ast}\right)}\left[\mu \left({b}^{\prime}\right)-l\left({b}^{\prime}\right)\right] $$
(A6)

Combining the Eq. (A4) and Eq. (A6), we have

$$ {\displaystyle \begin{array}{c}{\kappa}^{\varDelta (b)}\left[\mu (b)-l(b)\right]\le {\kappa}^{\varDelta (b)}\sum \limits_{b^{\prime}\in CH\left(b,{a}^{\ast}\right)}\left[\mu \left({b}^{\prime}\right)-l\left({b}^{\prime}\right)\right]\\ {}\le {\kappa}^{\varDelta \left({b}^{\prime}\right)}\sum \limits_{b^{\prime}\in CH\left(b,{a}^{\ast}\right)}\left[\mu \left({b}^{\prime}\right)-l\left({b}^{\prime}\right)\right]\end{array}} $$
(A7)

Note that

$$ \frac{\mid {\varPhi}_b\mid }{K}\xi \varepsilon \left({b}_0\right)=\sum \limits_{b^{\prime}\in CH\left(b,{a}^{\ast}\right)}\frac{\mid {\varPhi}_{b^{\prime }}\mid }{K}\xi \varepsilon \left({b}_0\right) $$
(A8)

Hence, we have

$$ {\displaystyle \begin{array}{c}{\kappa}^{\varDelta (b)}\left[\mu (b)-l(b)\right]-\frac{\mid {\varPhi}_b\mid }{K}\xi \varepsilon \left({b}_0\right)\\ {}\le \sum \limits_{b^{\prime}\in CH\left(b,{a}^{\ast}\right)}\left\{{\kappa}^{\varDelta \left({b}^{\prime}\right)}\left[\mu \left({b}^{\prime}\right)-l\left({b}^{\prime}\right)\right]-\frac{\mid {\varPhi}_{b^{\prime }}\mid }{K}\xi \varepsilon \left({b}_0\right)\right\}\end{array}} $$
(A9)

That is, \( E(b)\le \sum \limits_{z\in {Z}_{b,{a}^{\ast }}}E\left({b}^{\prime}\right) \).

Appendix 2

Proof

Let U0(b) = U0(b) + δ, then U0 is an exact upper bound. Let μ0 be the corresponding initial upper bound, and μ be the corresponding upper bound on ν(b). Then μ0 is a valid initial upper bound for ν(b) and the backup equations ensure that μ(b) is a valid upper bound for ν(b). On the other hand, it is easily shown by induction that

$$ \mu (b)+{\gamma}^{\varDelta (b)}\frac{\mid {\varPhi}_b\mid }{K}\delta \ge {\mu}^{\prime }(b) $$
(B10)

When a special case for b = b0, we have

$$ \mu (b)+\delta \ge {\mu}^{\prime}\left({b}_0\right) $$
(B11)

Hence, when the algorithm terminates, we have

$$ \mu \left({b}_0\right)+\delta \ge {\mu}^{\prime}\left({b}_0\right)\ge {\nu}^{\ast}\left({b}_0\right) $$
(B12)

Equivalently,

$$ {\displaystyle \begin{array}{c}{\nu}_{\hat{\pi}}=l\left({b}_0\right)\ge {\nu}^{\ast}\left({b}_0\right)-\left(\mu \left({b}_0\right)-l\left({b}_0\right)\right)-\delta \\ {}={\nu}^{\ast}\left({b}_0\right)-\varepsilon \left({b}_0\right)-\delta \end{array}} $$
(B13)

The Eq. (B13) holds because the initialization and the computation of the lower bound via the backup equations are exactly that for finding a regularized optimal policy value in the partial DESPOT-DULB.

Rights and permissions

Springer Nature or its licensor holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Chen, Y., Liu, J., Huang, Y. et al. High-efficiency online planning using composite bounds search under partial observation. Appl Intell 53, 8146–8159 (2023). https://doi.org/10.1007/s10489-022-03914-5

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10489-022-03914-5

Keywords

Navigation