Skip to main content
Log in

Policy iteration for bounded-parameter POMDPs

  • Focus
  • Published:
Soft Computing Aims and scope Submit manuscript

Abstract

POMDP is considered as a basic model for decision making under uncertainty. As a generalization of the exact POMDP model, the bounded-parameter POMDP (BPOMDP) provides only upper and lower bounds on the state-transition probabilities, observation probabilities and rewards, which is particularly suitable for characterizing the situations where the underlying model is imprecisely given or time-varying. This paper presents the optimistic criterion for optimality for solving BPOMDPs, under which the optimistically optimal value function is defined. By representing a policy explicitly as a finite-state controller, we propose a policy iteration approach that is shown to converge to an \(\epsilon\)-optimal policy under the optimistic optimality criterion.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1

Similar content being viewed by others

Notes

  1. In the remainder of this paper, we refer to a policy as an FSC, thus denote both by p.

References

  • Erez T, Tramper JJ, Smart WD, Gielen SCAM (2011) A POMDP model of eye-hand coordination. In: Proceedings of the 25th national conference on artificial intelligence (AAAI), pp 952–957

  • Givan R, Leach S, Dean T (2000) Bounded-parameter Markov decision processes. Artif Intell 122(1–2):71–109

    Article  MathSciNet  MATH  Google Scholar 

  • Hansen EA (1997) An improved policy iteration algorithm for partially observable MDPs. Adv Neural Inf Process Syst (NIPS) 10:1015–1021

    Google Scholar 

  • Hansen EA (1998) Solving POMDPs by searching in policy space. In: Proceedings of the 14th conference on uncertainty in artificial intelligence (UAI), pp 211–219

  • Hansen EA (2008) Sparse stochastic finite-state controllers for POMDPs. In: Proceedings of the 24th conference on uncertainty in artificial intelligence (UAI), pp 256–263

  • Hansen EA, Zhou R (2003) Synthesis of hierarchical finite-state controllers forPOMDPs. In: Proceedings of the 13th international conference on automated planning and scheduling (ICAPS), pp 113–122

  • Itoh H, Nakamura K (2007) Partially observable Markov decision processes with imprecise parameters. Artif Intell 171(8–9):453–490

    Article  MathSciNet  MATH  Google Scholar 

  • Ji S, Parr R, Li H, Liao X, Carin L (2007) Point-based policy iteration. In: Proceedings of the 22nd national conference on artificial intelligence (AAAI), pp 1243–1249

  • Kaelbling LP, Littman ML, Cassandra AR (1998) Planning and acting in partially observable stochastic domains. Artif Intell 101(1–2):99–134

    Article  MathSciNet  MATH  Google Scholar 

  • Krishnamurthy V (2011) Bayesian sequential detection with phase-distributed change time and nonlinear penalty—a POMDP lattice programming approach. IEEE Trans Inf Theory 57(10):7096–7124

    Article  MathSciNet  Google Scholar 

  • Ni Y, Liu ZQ (2008) Bounded-parameter partially observable Markov decision processes. In: Proceedings of the 18th international conference on automated planning and scheduling (ICAPS), pp 240–247

  • Pineau J, Gordon G, Thrun S (2006) Anytime point-based approximations for large POMDPs. J Artif Intell Res (JAIR) 27:335–380

    MATH  Google Scholar 

  • Poupart P, Boutilier C (2003) Bounded finite state controllers. In: Adv Neural Inf Process Syst (NIPS) 16:823–830

  • Puterman ML (1994) Markov decision processes: discrete stochastic dynamic programming. Wiley, New York

  • Satia J, Lave R (1973) Markovian decision processes with uncertain transition probabilities. Oper Res 21:728–740

    Article  MathSciNet  MATH  Google Scholar 

  • Tewari A, Bartlett PL (2007) Bounded parameter Markov decision processes with average reward criterion. In: Proceedings of the 20th annual conference on learning theory (COLT), pp 263–277

Download references

Acknowledgments

This work was supported by National Natural Science Foundation of China (No. 71101027), Program for Young Excellent Talents, UIBE (No. 12YQ08), a GRF grant from RGC UGC Hong Kong (GRF Project No. 9041574) and a grant from City University of Hong Kong (Project No. 7008026).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Yaodong Ni.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Ni, Y., Liu, ZQ. Policy iteration for bounded-parameter POMDPs. Soft Comput 17, 537–548 (2013). https://doi.org/10.1007/s00500-012-0932-3

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00500-012-0932-3

Keywords

Navigation