Abstract
Motion planning in uncertain environments is a common challenge and essential for autonomous robot operations. Representatively, the determinized sparse partially observable tree (DESPOT) algorithm shows reasonable performance for planning under uncertainty. However, DESPOT may generate a low-quality solution due to inaccurate searches and low efficiencies in the belief tree construction. Therefore, this paper proposes a high-efficiency online planning method built upon the DESPOT algorithm, namely, the DESPOT with discounted upper and lower bounds (DESPOT-DULB) algorithm, to simultaneously improve the efficiency and performance of motion planning. Particularly, the node’s information is represented by combining the upper and lower bounds of the node (ULB) in the forward exploration of the action space to reasonably assist the optimal action selection. Then, a discounted factor based on the depth information of the belief tree is introduced to reduce the gap between the upper bound and lower bound both in the action space and observation space. As a result, the proposed method can comprehensively represent the information of the node to ensure a near-optimal forward search. The theoretical proofs of the proposed method are provided as well. The simulation results, including three representative scenario comparisons and a parameter sensitivity analysis, demonstrate that the proposed method exhibits favorable performances in many examples of interest.
Similar content being viewed by others
References
Dai XY, Meng QH, Jin S (2021) Uncertainty-driven active view planning in feature-based monocular vSLAM. Appl Soft Comput 108:107459
Nakrani NM, Joshi MM (2022) A human-like decision intelligence for obstacle avoidance in autonomous vehicle parking. Appl Intell 52(4):1–20
Hubmann C, Schulz J, Becker M, Althoff D, Stiller C (2018) Automated driving in uncertain environments: planning with interaction and uncertain maneuver prediction. IEEE Trans Intell Veh 3(1):5–17
Smallwood R, Sondik E (1973) The optimal control of partially observable Markov processes over a finite horizon. Oper Res 21:1071–1088
Bai H, Cai S, Ye N, Hsu D, Lee WS (2015) Intention-aware online POMDP planning for autonomous driving in a crowd. In: Proceedings of the IEEE International Conference on Robotics and Automation (ICRA), pp 454–460
Garg NP, Hsu D, Lee WS (2019) Learning to grasp under uncertainty using POMDPs. In: Proceedings of the IEEE International Conference on Robotics and Automation (ICRA), pp 2751–2757
Wu K, Lee WS, Hsu D (2015) POMDP to the rescue: boosting performance for Robocup rescue. In: proceedings of the IEEE international conference on intelligent robots and systems (IROS), pp 5294–5299
Folsom-Kovarik JT, Sukthankar G, Schatz S (2013) Tractable POMDP representations for intelligent tutoring systems. ACM Trans Intell Syst Technol 4(2):1–22
Kaelbling LP, Littman ML, Cassandra AR (1998) Planning and acting in partially observable stochastic domains. Artif Intell 101(1–2):99–134
Deb S, Tammi K, Gao XZ, Kalita K, Mahanta P, Cross S (2022) A robust two-stage planning model for the charging station placement problem considering road traffic uncertainty. IEEE Trans Intell Transp Syst 23(7):1–15
Sung I, Choi B, Nielsen P (2021) On the training of a neural network for online path planning with offline path planning algorithms. Int J Inf Manag 57:102142
Nicol S, Chads I (2012) Which states matter? An application of an intelligent discretization method to solve a continuous POMDP in conservation biology. PLoS One 7(2):e28993
Browne CB, Powley E, Whitehouse D, Lucas SM, Cowling PI, Rohlfshagen P, Tavener S, Perez D, Samothrakis S, Colton S (2012) A survey of Monte Carlo tree search methods. IEEE Trans Comput Intell AI in Games 4(1):1–43
Silver D, Veness J (2010) Monte-Carlo planning in large POMDPs. Adv Neural Inf Proces Syst 23:2164–2172
Auer P, Cesa-Bianchi N, Fischer P (2002) Finite-time analysis of the multiarmed bandit problem. Mach Learn 47(2–3):235–256
Somani A, Ye N, Hsu D, Lee WS (2013) DESPOT: online POMDP planning with regularization. Adv Neural Inf Proces Syst 58:231–266
Bougie N, Ichise R (2021) Fast and slow curiosity for high-level exploration in reinforcement learning. Appl Intell 51(2):1086–1107
Chen Y, Kochenderfer MJ, Spaan MTJ (2018) Improving offline value-function approximations for POMDPs by reducing discount factors. In: Proceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp 3531–3536
Kurniawati H, Hsu D, Lee WS (2008) SARSOP: efficient point-based POMDP planning by approximating optimally reachable belief spaces. Robot: Sci Syst 4:65–72
Bai H, Hsu D, Lee WS (2014) Integrated perception and planning in the continuous space: a POMDP approach. Int J Robot Res 33(9):1288–1302
Zhang Z, Hsu D, Lee WS, Lim ZW, Bai A (2015) Please: palm leaf search for pomdps with large observation spaces. In: Proceedings of the Twenty-Fifth International Conference on Automated Planning and Scheduling, pp. 249–258
Wu B, Zheng HY, Feng YP (2014) Point-based online value iteration algorithm in large POMDP. Appl Intell 40(3):546–555
He R, Brunskill E, Roy N (2011) Efficient planning under uncertainty with macro-actions. J Artif Intell Res 40:523–570
Ross S, Pineau J, Paquet S, Chaib-Draa B (2008) Online planning algorithms for POMDPs. J Artif Intell Res 32:663–704
Zhang S, Sridharan M, Washington C (2013) Active visual planning for mobile robot teams using hierarchical pomdps. IEEE Trans Robot 29(4):975–985
Koval M, Hsu D, Pollard N, Srinivasa SS (2020) Configuration lattices for planar contact manipulation under uncertainty. In: Proceedings of International Workshop on the Algorithmic Foundations of Robotics, pp. 768–783
Sun K, Schlotfeldt B, Pappas GJ (2020) Stochastic motion planning under partial observability for mobile robots with continuous range measurements. IEEE Trans Robot 37(3):979–995
Vien NA, Ngo H, Lee S, Chung T (2014) Approximate planning for Bayesian hierarchical reinforcement learning. Appl Intell 41(3):808–819
Ye N, Somani A, Hsu D, Lee WS (2017) DESPOT: online POMDP planning with regularization. J Artif Intell Res 58:231–266
Garg NP, Hsu D, Lee WS (2019) DESPOT-alpha: online POMDP planning with large state and observation spaces. Robot: Sci and Syst. https://doi.org/10.15607/RSS.2019.XV.006
Luo Y, Bai H, Hsu D, Lee WS (2019) Importance sampling for online planning under uncertainty. Int J Robot Res 38(2–3):162–181
Cai P, Luo Y, Hsu D, Lee WS (2021) HyP-DESPOT: a hybrid parallel algorithm for online planning under uncertainty. Int J Robot Res 40(2–3):558–573
Wu C, Kong R, Yang G, Kong X, Zhang Z, Yu Y, Liu W (2021) LB-DESPOT: efficient online POMDP planning considering lower bound in action selection. In: Proceedings of the AAAI Conference on Artificial Intelligence 35(18):15927–15928
Yoon S, Fern A, Givan R, Kambhampati S (2008) Probabilistic planning via determinization in hindsight. In: Proceedings of AAAI Conference on Artificial Intelligence 2:1010–1016
Acknowledgments
This work was supported in part by the National Key Research and Development Program of China (No. 2021ZD0114503), the Major Research plan of the National Natural Science Foundation of China (No. 92148204), the National Natural Science Foundation of China (No. 61803089, 61971071, 62027810, 62133005), the Hunan Science Fund for Distinguished Young Scholars (No. 2021JJ10025), the Hunan key research and development program (No. 2021GK4011, 2022GK2011), the Changsha Science and Technology Major Project (No. kh2003026), the Joint Open Foundation of State Key Laboratory of Robotics (No. 2021-KF-22-17), the Tianjin University-Fuzhou University Independent Innovation Fund (No. TF2022-4), the China University industry-University-research Innovation Fund (No. 2020HYA06006).
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Appendices
Appendix 1
Proof
Let CH(b, a∗) be all the set of b′ = τ(b, a, z) for some \( z\in {Z}_{b,{a}^{\ast }} \), that is, the set of children nodes of b in the DESPOT-DULB tree. If E(b) > 0, then ε(b) > 0, that is, μ(b) − l(b) > 0, and thus μ(b) ≠ l0(b). Hence we have
and
where 0 < ω < 1, and β > 1. Subtracting the Eq. (A1) by the Eq. (A2), we have
where
Considering 0 < 1 − ω < 1, κ > 1, we have
That is
Combining the Eq. (A4) and Eq. (A6), we have
Note that
Hence, we have
That is, \( E(b)\le \sum \limits_{z\in {Z}_{b,{a}^{\ast }}}E\left({b}^{\prime}\right) \).
Appendix 2
Proof
Let U0′(b) = U0(b) + δ, then U0′ is an exact upper bound. Let μ0′ be the corresponding initial upper bound, and μ′ be the corresponding upper bound on ν∗(b). Then μ0′ is a valid initial upper bound for ν∗(b) and the backup equations ensure that μ′(b) is a valid upper bound for ν∗(b). On the other hand, it is easily shown by induction that
When a special case for b = b0, we have
Hence, when the algorithm terminates, we have
Equivalently,
The Eq. (B13) holds because the initialization and the computation of the lower bound l via the backup equations are exactly that for finding a regularized optimal policy value in the partial DESPOT-DULB.
Rights and permissions
Springer Nature or its licensor holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Chen, Y., Liu, J., Huang, Y. et al. High-efficiency online planning using composite bounds search under partial observation. Appl Intell 53, 8146–8159 (2023). https://doi.org/10.1007/s10489-022-03914-5
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10489-022-03914-5