Point-Based Policy Transformation: Adapting Policy to Changing POMDP Models

Kurniawati, Hanna; Patrikalakis, Nicholas M.

doi:10.1007/978-3-642-36279-8_30

Hanna Kurniawati⁷ &
Nicholas M. Patrikalakis⁸

Part of the book series: Springer Tracts in Advanced Robotics ((STAR,volume 86))

5059 Accesses
6 Citations

Abstract

Motion planning under uncertainty that can efficiently take into account changes in the environment is critical for robots to operate reliably in our living spaces. Partially Observable Markov Decision Process (POMDP) provides a systematic and general framework for motion planning under uncertainty. Point-based POMDP has advanced POMDP planning tremendously over the past few years, enabling POMDP planning to be practical for many simple to moderately difficult robotics problems. However, when environmental changes alter the POMDP model, most existing POMDP planners recompute the solution from scratch, often wasting significant computational resources that have been spent for solving the original problem. In this paper, we propose a novel algorithm, called Point-Based Policy Transformation (PBPT), that solves the altered POMDP problem by transforming the solution of the original problem to accommodate changes in the problem. PBPT uses the point-based POMDP approach. It transforms the original solution by modifying the set of sampled beliefs that represents the belief space B, and then uses this new set of sampled beliefs to revise the original solution. Preliminary results indicate that PBPT generates a good policy for the altered POMDP model in a matter of minutes, while recomputing the policy using the fastest offline POMDP planner today fails to find a policy with similar quality after two hours of planning time, even when the policy for the original problem is reused as an initial policy.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 169.00; Price excludes VAT (USA)

Softcover Book: USD 219.99; Price excludes VAT (USA)

Hardcover Book: USD 219.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Auer, P., Cesa-Bianchi, N., Freund, Y., Schapire, R.E.: The non-stochastic multi-armed bandit problem. SIAM Journal on Computing 32(1), 48–77 (2003)
Article MathSciNet Google Scholar
Bai, H., Hsu, D., Lee, W.S., Ngo, V.A.: Monte Carlo Value Iteration for Continuous-State POMDPs. In: Hsu, D., Isler, V., Latombe, J.-C., Lin, M.C. (eds.) Algorithmic Foundations of Robotics IX. STAR, vol. 68, pp. 175–191. Springer, Heidelberg (2010)
Chapter Google Scholar
van den Berg, J., Abbeel, P., Goldberg, K.: LQG-MP: Optimized Path Planning for Robots with Motion Uncertainty and Imperfect State Information. In: RSS (2010)
Google Scholar
van den Berg, J., Overmars, M.: Roadmap-based motion planning in dynamic environments. IEEE TRO 21(5), 885–897 (2005)
Google Scholar
de Berg, M., Cheong, O., van Kreveld, M., Overmars, M.: Computational Geometry: Algorithms and Applications. Springer (2000)
Google Scholar
Dudley, R.M.: Real Analysis and Probability. Cambridge University Press (2002)
Google Scholar
Hauser, K.: Randomized Belief-Space Replanning in Partially-Observable Continuous Spaces. In: Hsu, D., Isler, V., Latombe, J.-C., Lin, M.C. (eds.) Algorithmic Foundations of Robotics IX. STAR, vol. 68, pp. 193–209. Springer, Heidelberg (2010)
Chapter Google Scholar
He, R., Brunskill, E., Roy, N.: PUMA: planning under uncertainty with macro-actions. In: AAAI (2010)
Google Scholar
Jaillet, L., Siméon, T.: A PRM-based motion planner for dynamically changing environments. In: IROS (2004)
Google Scholar
Kurniawati, H., Bandyopadhyay, T., Patrikalakis, N.M.: Global motion planning under uncertain motion, sensing, and environment map. In: RSS (2011)
Google Scholar
Kurniawati, H., Hsu, D., Lee, W.S.: SARSOP: Efficient point-based POMDP planning by approximating optimally reachable belief spaces. In: RSS (2008)
Google Scholar
Lamiraux, F., Bonnafous, D., Lefebvre, O.: Reactive path deformation for nonholonomic mobile robots. IEEE TRO 20(6), 967–977 (2004)
Google Scholar
LaValle, S.M., Sharma, R.: On motion planning in changing, partially-predictable environments. IJRR 16(6), 775–805 (1997)
Google Scholar
Leven, P., Hutchinson, S.: Real-time path planning in changing environments. IJRR 21(12), 999–1030 (2001)
Google Scholar
Papadimitriou, C.H., Tsitsiklis, J.N.: The Complexity of Markov Decision Processes. Math. of Operation Research 12(3), 441–450 (1987)
Article MATH MathSciNet Google Scholar
Pineau, J., Gordon, G., Thrun, S.: Point-based value iteration: An anytime algorithm for POMDPs. In: IJCAI, pp. 1025–1032 (2003)
Google Scholar
Platt, R., Tedrake, R., Lozano-Perez, T., Kaelbling, L.P.: Belief space planning assuming maximum likelihood observations. In: RSS (2010)
Google Scholar
Porta, J.M., Vlassis, N., Spaan, M.T.J., Poupart, P.: Point-Based Value Iteration for Continuous POMDPs. JMLR 7, 2329–2367 (2006)
MATH MathSciNet Google Scholar
Prentice, S., Roy, N.: The Belief Roadmap: Efficient Planning in Linear POMDPs by Factoring the Covariance. In: ISRR (2007)
Google Scholar
Ross, S., Chaib-draa, B., Pineau, J.: Bayes-adaptive POMDPs. In: NIPS (2007)
Google Scholar
Ross, S., Pineau, J., Paquet, S., Chaib-draa, B.: Online planning algorithms for POMDPs. JAIR 32, 663–704 (2008)
MATH MathSciNet Google Scholar
Smith, T., Simmons, R.: Point-based POMDP algorithms: Improved analysis and implementation. In: UAI (July 2005)
Google Scholar
Stentz, A.: The Focussed D* Algorithm for Real-Time Replanning. In: IJCAI (1995)
Google Scholar
Taylor, M.E., Stone, P.: Transfer learning for reinforcement learning domains: A survey. JMLR 10(1), 1633–1685 (2009)
MATH MathSciNet Google Scholar
Thrun, S.: Monte carlo POMDPs. In: NIPS, pp. 1064–1070 (2000)
Google Scholar

Download references

Author information

Authors and Affiliations

School of Information Technology & Electrical Engineering, University of Queensland, Queensland, Australia
Hanna Kurniawati
Department of Mechanical Engineering, Center for Ocean Engineering, Massachusetts Institute of Technology, Cambridge, USA
Nicholas M. Patrikalakis

Authors

Hanna Kurniawati
View author publications
You can also search for this author in PubMed Google Scholar
Nicholas M. Patrikalakis
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Hanna Kurniawati .

Editor information

Editors and Affiliations

Associate Professor of Aeronautics and, Massachusetts Institute of Technology, Cambridge, Massachusetts, USA
Emilio Frazzoli
MIT Comp. Sci. & Artificial Intel. Lab., Massachusetts Institute of Technology, Cambridge, Massachusetts, USA
Tomas Lozano-Perez
Robust Robotics Group, Massachusetts Institute of Technology CSAIL, Cambridge, Massachusetts, USA
Nicholas Roy
Massachusetts Institute of Technology, Cambridge, Massachusetts, USA
Daniela Rus

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Kurniawati, H., Patrikalakis, N.M. (2013). Point-Based Policy Transformation: Adapting Policy to Changing POMDP Models. In: Frazzoli, E., Lozano-Perez, T., Roy, N., Rus, D. (eds) Algorithmic Foundations of Robotics X. Springer Tracts in Advanced Robotics, vol 86. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-36279-8_30

Download citation

DOI: https://doi.org/10.1007/978-3-642-36279-8_30
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-36278-1
Online ISBN: 978-3-642-36279-8
eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics