Abstract
POMDP is considered as a basic model for decision making under uncertainty. As a generalization of the exact POMDP model, the bounded-parameter POMDP (BPOMDP) provides only upper and lower bounds on the state-transition probabilities, observation probabilities and rewards, which is particularly suitable for characterizing the situations where the underlying model is imprecisely given or time-varying. This paper presents the optimistic criterion for optimality for solving BPOMDPs, under which the optimistically optimal value function is defined. By representing a policy explicitly as a finite-state controller, we propose a policy iteration approach that is shown to converge to an \(\epsilon\)-optimal policy under the optimistic optimality criterion.
Similar content being viewed by others
Notes
In the remainder of this paper, we refer to a policy as an FSC, thus denote both by p.
References
Erez T, Tramper JJ, Smart WD, Gielen SCAM (2011) A POMDP model of eye-hand coordination. In: Proceedings of the 25th national conference on artificial intelligence (AAAI), pp 952–957
Givan R, Leach S, Dean T (2000) Bounded-parameter Markov decision processes. Artif Intell 122(1–2):71–109
Hansen EA (1997) An improved policy iteration algorithm for partially observable MDPs. Adv Neural Inf Process Syst (NIPS) 10:1015–1021
Hansen EA (1998) Solving POMDPs by searching in policy space. In: Proceedings of the 14th conference on uncertainty in artificial intelligence (UAI), pp 211–219
Hansen EA (2008) Sparse stochastic finite-state controllers for POMDPs. In: Proceedings of the 24th conference on uncertainty in artificial intelligence (UAI), pp 256–263
Hansen EA, Zhou R (2003) Synthesis of hierarchical finite-state controllers forPOMDPs. In: Proceedings of the 13th international conference on automated planning and scheduling (ICAPS), pp 113–122
Itoh H, Nakamura K (2007) Partially observable Markov decision processes with imprecise parameters. Artif Intell 171(8–9):453–490
Ji S, Parr R, Li H, Liao X, Carin L (2007) Point-based policy iteration. In: Proceedings of the 22nd national conference on artificial intelligence (AAAI), pp 1243–1249
Kaelbling LP, Littman ML, Cassandra AR (1998) Planning and acting in partially observable stochastic domains. Artif Intell 101(1–2):99–134
Krishnamurthy V (2011) Bayesian sequential detection with phase-distributed change time and nonlinear penalty—a POMDP lattice programming approach. IEEE Trans Inf Theory 57(10):7096–7124
Ni Y, Liu ZQ (2008) Bounded-parameter partially observable Markov decision processes. In: Proceedings of the 18th international conference on automated planning and scheduling (ICAPS), pp 240–247
Pineau J, Gordon G, Thrun S (2006) Anytime point-based approximations for large POMDPs. J Artif Intell Res (JAIR) 27:335–380
Poupart P, Boutilier C (2003) Bounded finite state controllers. In: Adv Neural Inf Process Syst (NIPS) 16:823–830
Puterman ML (1994) Markov decision processes: discrete stochastic dynamic programming. Wiley, New York
Satia J, Lave R (1973) Markovian decision processes with uncertain transition probabilities. Oper Res 21:728–740
Tewari A, Bartlett PL (2007) Bounded parameter Markov decision processes with average reward criterion. In: Proceedings of the 20th annual conference on learning theory (COLT), pp 263–277
Acknowledgments
This work was supported by National Natural Science Foundation of China (No. 71101027), Program for Young Excellent Talents, UIBE (No. 12YQ08), a GRF grant from RGC UGC Hong Kong (GRF Project No. 9041574) and a grant from City University of Hong Kong (Project No. 7008026).
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Ni, Y., Liu, ZQ. Policy iteration for bounded-parameter POMDPs. Soft Comput 17, 537–548 (2013). https://doi.org/10.1007/s00500-012-0932-3
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00500-012-0932-3