Skip to main content

Point-Based Policy Transformation: Adapting Policy to Changing POMDP Models

  • Conference paper
Algorithmic Foundations of Robotics X

Part of the book series: Springer Tracts in Advanced Robotics ((STAR,volume 86))

Abstract

Motion planning under uncertainty that can efficiently take into account changes in the environment is critical for robots to operate reliably in our living spaces. Partially Observable Markov Decision Process (POMDP) provides a systematic and general framework for motion planning under uncertainty. Point-based POMDP has advanced POMDP planning tremendously over the past few years, enabling POMDP planning to be practical for many simple to moderately difficult robotics problems. However, when environmental changes alter the POMDP model, most existing POMDP planners recompute the solution from scratch, often wasting significant computational resources that have been spent for solving the original problem. In this paper, we propose a novel algorithm, called Point-Based Policy Transformation (PBPT), that solves the altered POMDP problem by transforming the solution of the original problem to accommodate changes in the problem. PBPT uses the point-based POMDP approach. It transforms the original solution by modifying the set of sampled beliefs that represents the belief space B, and then uses this new set of sampled beliefs to revise the original solution. Preliminary results indicate that PBPT generates a good policy for the altered POMDP model in a matter of minutes, while recomputing the policy using the fastest offline POMDP planner today fails to find a policy with similar quality after two hours of planning time, even when the policy for the original problem is reused as an initial policy.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 169.00
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 219.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 219.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Auer, P., Cesa-Bianchi, N., Freund, Y., Schapire, R.E.: The non-stochastic multi-armed bandit problem. SIAM Journal on Computing 32(1), 48–77 (2003)

    Article  MathSciNet  Google Scholar 

  2. Bai, H., Hsu, D., Lee, W.S., Ngo, V.A.: Monte Carlo Value Iteration for Continuous-State POMDPs. In: Hsu, D., Isler, V., Latombe, J.-C., Lin, M.C. (eds.) Algorithmic Foundations of Robotics IX. STAR, vol. 68, pp. 175–191. Springer, Heidelberg (2010)

    Chapter  Google Scholar 

  3. van den Berg, J., Abbeel, P., Goldberg, K.: LQG-MP: Optimized Path Planning for Robots with Motion Uncertainty and Imperfect State Information. In: RSS (2010)

    Google Scholar 

  4. van den Berg, J., Overmars, M.: Roadmap-based motion planning in dynamic environments. IEEE TRO 21(5), 885–897 (2005)

    Google Scholar 

  5. de Berg, M., Cheong, O., van Kreveld, M., Overmars, M.: Computational Geometry: Algorithms and Applications. Springer (2000)

    Google Scholar 

  6. Dudley, R.M.: Real Analysis and Probability. Cambridge University Press (2002)

    Google Scholar 

  7. Hauser, K.: Randomized Belief-Space Replanning in Partially-Observable Continuous Spaces. In: Hsu, D., Isler, V., Latombe, J.-C., Lin, M.C. (eds.) Algorithmic Foundations of Robotics IX. STAR, vol. 68, pp. 193–209. Springer, Heidelberg (2010)

    Chapter  Google Scholar 

  8. He, R., Brunskill, E., Roy, N.: PUMA: planning under uncertainty with macro-actions. In: AAAI (2010)

    Google Scholar 

  9. Jaillet, L., Siméon, T.: A PRM-based motion planner for dynamically changing environments. In: IROS (2004)

    Google Scholar 

  10. Kurniawati, H., Bandyopadhyay, T., Patrikalakis, N.M.: Global motion planning under uncertain motion, sensing, and environment map. In: RSS (2011)

    Google Scholar 

  11. Kurniawati, H., Hsu, D., Lee, W.S.: SARSOP: Efficient point-based POMDP planning by approximating optimally reachable belief spaces. In: RSS (2008)

    Google Scholar 

  12. Lamiraux, F., Bonnafous, D., Lefebvre, O.: Reactive path deformation for nonholonomic mobile robots. IEEE TRO 20(6), 967–977 (2004)

    Google Scholar 

  13. LaValle, S.M., Sharma, R.: On motion planning in changing, partially-predictable environments. IJRR 16(6), 775–805 (1997)

    Google Scholar 

  14. Leven, P., Hutchinson, S.: Real-time path planning in changing environments. IJRR 21(12), 999–1030 (2001)

    Google Scholar 

  15. Papadimitriou, C.H., Tsitsiklis, J.N.: The Complexity of Markov Decision Processes. Math. of Operation Research 12(3), 441–450 (1987)

    Article  MATH  MathSciNet  Google Scholar 

  16. Pineau, J., Gordon, G., Thrun, S.: Point-based value iteration: An anytime algorithm for POMDPs. In: IJCAI, pp. 1025–1032 (2003)

    Google Scholar 

  17. Platt, R., Tedrake, R., Lozano-Perez, T., Kaelbling, L.P.: Belief space planning assuming maximum likelihood observations. In: RSS (2010)

    Google Scholar 

  18. Porta, J.M., Vlassis, N., Spaan, M.T.J., Poupart, P.: Point-Based Value Iteration for Continuous POMDPs. JMLR 7, 2329–2367 (2006)

    MATH  MathSciNet  Google Scholar 

  19. Prentice, S., Roy, N.: The Belief Roadmap: Efficient Planning in Linear POMDPs by Factoring the Covariance. In: ISRR (2007)

    Google Scholar 

  20. Ross, S., Chaib-draa, B., Pineau, J.: Bayes-adaptive POMDPs. In: NIPS (2007)

    Google Scholar 

  21. Ross, S., Pineau, J., Paquet, S., Chaib-draa, B.: Online planning algorithms for POMDPs. JAIR 32, 663–704 (2008)

    MATH  MathSciNet  Google Scholar 

  22. Smith, T., Simmons, R.: Point-based POMDP algorithms: Improved analysis and implementation. In: UAI (July 2005)

    Google Scholar 

  23. Stentz, A.: The Focussed D* Algorithm for Real-Time Replanning. In: IJCAI (1995)

    Google Scholar 

  24. Taylor, M.E., Stone, P.: Transfer learning for reinforcement learning domains: A survey. JMLR 10(1), 1633–1685 (2009)

    MATH  MathSciNet  Google Scholar 

  25. Thrun, S.: Monte carlo POMDPs. In: NIPS, pp. 1064–1070 (2000)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Hanna Kurniawati .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2013 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Kurniawati, H., Patrikalakis, N.M. (2013). Point-Based Policy Transformation: Adapting Policy to Changing POMDP Models. In: Frazzoli, E., Lozano-Perez, T., Roy, N., Rus, D. (eds) Algorithmic Foundations of Robotics X. Springer Tracts in Advanced Robotics, vol 86. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-36279-8_30

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-36279-8_30

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-36278-1

  • Online ISBN: 978-3-642-36279-8

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics