Abstract
The partially observable Markov decision process (POMDP) provides a principled general framework for robot planning under uncertainty. Leveraging the idea of Monte Carlo sampling, recent POMDP planning algorithms have scaled up to various challenging robotic tasks, including, e.g., real-time online planning for autonomous vehicles. To further improve online planning performance, this paper presents IS-DESPOT, which introduces importance sampling to DESPOT, a state-of-the-art sampling-based POMDP algorithm for planning under uncertainty. Importance sampling improves the planning performance when there are critical, but rare events, which are difficult to sample. We prove that IS-DESPOT retains the theoretical guarantee of DESPOT. We present a general method for learning the importance sampling distribution and demonstrate empirically that importance sampling significantly improves the performance of online POMDP planning for suitable tasks.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Bai, H., Cai, S., Ye, N., Hsu, D., Lee, W.S.: Intention-aware online POMDP planning for autonomous driving in a crowd. In: Proc. IEEE Int. Conf. on Robotics & Automation (2015)
Bai, H., Hsu, D., Kochenderfer, M.J., Lee,W.S.: Unmanned aircraft collision avoidance using continuous-state POMDPs. Robotics: Science and Systems VII 1 (2012)
Bai, H., Hsu, D., Lee, W.S., Ngo, V.A.: Monte Carlo value iteration for continuous-state POMDPs. In: Algorithmic Foundations of Robotics IX (2010)
Folsom-Kovarik, J., Sukthankar, G., Schatz, S.: Tractable POMDP representations for intelligent tutoring systems. ACM Trans. on Intelligent Systems & Technology 4(2) (2013)
Gelly, S., Silver, D.: Combining online and offline knowledge in UCT. In: Proc. Int. Conf. on Machine Learning (2007)
Glasserman, P.: Monte Carlo methods in financial engineering, vol. 53. Springer Science & Business Media (2003)
He, R., Brunskill, E., Roy, N.: Efficient planning under uncertainty with macro-actions. J. Artificial Intelligence Research 40(1) (2011)
Hsiao, K., Kaelbling, L.P., Lozano-Perez, T.: Grasping POMDPs. In: Proc. IEEE Int. Conf. on Robotics & Automation (2007)
Kaelbling, L.P., Littman, M.L., Cassandra, A.R.: Planning and acting in partially observable stochastic domains. Artificial Intelligence 101(1) (1998)
Kalos, M., Whitlock, P.: Monte Carlo Methods, vol. 1. JohnWiley & Sons, New York (1986)
Kearns, M., Mansour, Y., Ng, A.Y.: A sparse sampling algorithm for near-optimal planning in large Markov decision processes. Machine Learning 49(2-3) (2002)
Koller, D., Friedman, N.: Probabilistic graphical models: principles and techniques. MIT press (2009)
Koval, M., Pollard, N., Srinivasa, S.: Pre- and post-contact policy decomposition for planar contact manipulation under uncertainty. Int. J. Robotics Research 35(1–3) (2016)
Koval, M., Hsu, D., Pollard, N., Srinivasa, S.: Configuration lattices for planar contact manipulation under uncertainty. In: Algorithmic Foundations of Robotics XII–Proc. Int.Workshop on the Algorithmic Foundations of Robotics (WAFR) (2016)
Kurniawati, H., Hsu, D., Lee, W.S.: SARSOP: efficient point-based POMDP planning by approximating optimally reachable belief spaces. In: Proc. Robotics: Science & Systems (2008)
Owen, A.B.: Monte Carlo theory, methods and examples (2013)
Pineau, J., Gordon, G., Thrun, S.: Point-based value iteration: An anytime algorithm for POMDPs. In: Proc. Int. Jnt. Conf. on Artificial Intelligence (2003)
Ross, S., Pineau, J., Paquet, S., Chaib-Draa, B.: Online planning algorithms for POMDPs. J. Artificial Intelligence Research 32 (2008)
Silver, D., Veness, J.: Monte-Carlo planning in large POMDPs. In: Advances in Neural Information Processing Systems (NIPS) (2010)
Smallwood, R., Sondik, E.: The optimal control of partially observable Markov processes over a finite horizon. Operations Research 21 (1973)
Smith, T., Simmons, R.: Heuristic search value iteration for POMDPs. In: Proc. Uncertainty in Artificial Intelligence (2004)
Somani, A., Ye, N., Hsu, D., Lee,W.S.: DESPOT: Online POMDP planning with regularization. In: Advances in Neural Information Processing Systems (NIPS) (2013)
Spaan, M., Vlassis, N.: Perseus: Randomized point-based value iteration for POMDPs. J. Artificial Intelligence Research 24 (2005)
Veach, E.: Robust Monte Carlo methods for light transport simulation. Ph.D. thesis, Stanford University (1997)
Wu, K., Lee, W.S., Hsu, D.: POMDP to the rescue: Boosting performance for RoboCup rescue. In: Proc. IEEE/RSJ Int. Conf. on Intelligent Robots & Systems (2015)
Ye, N., Somani, A., Hsu, D., Lee,W.S.: DESPOT: Online POMDP planning with regularization. J. Artificial Intelligence Research (to appear)
Humanitarian robotics and automation technology challenge (HRATC) 2015, http://www.isr.uc.pt/HRATC2015
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2020 Springer Nature Switzerland AG
About this chapter
Cite this chapter
Luo, Y., Bai, H., Hsu, D., Lee, W.S. (2020). Importance Sampling for Online Planning under Uncertainty. In: Goldberg, K., Abbeel, P., Bekris, K., Miller, L. (eds) Algorithmic Foundations of Robotics XII. Springer Proceedings in Advanced Robotics, vol 13. Springer, Cham. https://doi.org/10.1007/978-3-030-43089-4_16
Download citation
DOI: https://doi.org/10.1007/978-3-030-43089-4_16
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-43088-7
Online ISBN: 978-3-030-43089-4
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)