Policy iteration for bounded-parameter POMDPs

Ni, Yaodong; Liu, Zhi-Qiang

doi:10.1007/s00500-012-0932-3

Policy iteration for bounded-parameter POMDPs

Focus
Published: 27 September 2012

Volume 17, pages 537–548, (2013)
Cite this article

Soft Computing Aims and scope Submit manuscript

Yaodong Ni¹ &
Zhi-Qiang Liu²

282 Accesses
4 Citations
Explore all metrics

Abstract

POMDP is considered as a basic model for decision making under uncertainty. As a generalization of the exact POMDP model, the bounded-parameter POMDP (BPOMDP) provides only upper and lower bounds on the state-transition probabilities, observation probabilities and rewards, which is particularly suitable for characterizing the situations where the underlying model is imprecisely given or time-varying. This paper presents the optimistic criterion for optimality for solving BPOMDPs, under which the optimistically optimal value function is defined. By representing a policy explicitly as a finite-state controller, we propose a policy iteration approach that is shown to converge to an \(\epsilon\)-optimal policy under the optimistic optimality criterion.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A practical guide to multi-objective reinforcement learning and planning

Article Open access 13 April 2022

Conor F. Hayes, Roxana Rădulescu, … Diederik M. Roijers

Challenges of real-world reinforcement learning: definitions, benchmarks and analysis

Article 22 April 2021

Gabriel Dulac-Arnold, Nir Levine, … Todd Hester

Introduction to Reinforcement Learning

Notes

In the remainder of this paper, we refer to a policy as an FSC, thus denote both by p.

References

Erez T, Tramper JJ, Smart WD, Gielen SCAM (2011) A POMDP model of eye-hand coordination. In: Proceedings of the 25th national conference on artificial intelligence (AAAI), pp 952–957
Givan R, Leach S, Dean T (2000) Bounded-parameter Markov decision processes. Artif Intell 122(1–2):71–109
Article MathSciNet MATH Google Scholar
Hansen EA (1997) An improved policy iteration algorithm for partially observable MDPs. Adv Neural Inf Process Syst (NIPS) 10:1015–1021
Google Scholar
Hansen EA (1998) Solving POMDPs by searching in policy space. In: Proceedings of the 14th conference on uncertainty in artificial intelligence (UAI), pp 211–219
Hansen EA (2008) Sparse stochastic finite-state controllers for POMDPs. In: Proceedings of the 24th conference on uncertainty in artificial intelligence (UAI), pp 256–263
Hansen EA, Zhou R (2003) Synthesis of hierarchical finite-state controllers forPOMDPs. In: Proceedings of the 13th international conference on automated planning and scheduling (ICAPS), pp 113–122
Itoh H, Nakamura K (2007) Partially observable Markov decision processes with imprecise parameters. Artif Intell 171(8–9):453–490
Article MathSciNet MATH Google Scholar
Ji S, Parr R, Li H, Liao X, Carin L (2007) Point-based policy iteration. In: Proceedings of the 22nd national conference on artificial intelligence (AAAI), pp 1243–1249
Kaelbling LP, Littman ML, Cassandra AR (1998) Planning and acting in partially observable stochastic domains. Artif Intell 101(1–2):99–134
Article MathSciNet MATH Google Scholar
Krishnamurthy V (2011) Bayesian sequential detection with phase-distributed change time and nonlinear penalty—a POMDP lattice programming approach. IEEE Trans Inf Theory 57(10):7096–7124
Article MathSciNet Google Scholar
Ni Y, Liu ZQ (2008) Bounded-parameter partially observable Markov decision processes. In: Proceedings of the 18th international conference on automated planning and scheduling (ICAPS), pp 240–247
Pineau J, Gordon G, Thrun S (2006) Anytime point-based approximations for large POMDPs. J Artif Intell Res (JAIR) 27:335–380
MATH Google Scholar
Poupart P, Boutilier C (2003) Bounded finite state controllers. In: Adv Neural Inf Process Syst (NIPS) 16:823–830
Puterman ML (1994) Markov decision processes: discrete stochastic dynamic programming. Wiley, New York
Satia J, Lave R (1973) Markovian decision processes with uncertain transition probabilities. Oper Res 21:728–740
Article MathSciNet MATH Google Scholar
Tewari A, Bartlett PL (2007) Bounded parameter Markov decision processes with average reward criterion. In: Proceedings of the 20th annual conference on learning theory (COLT), pp 263–277

Download references

Acknowledgments

This work was supported by National Natural Science Foundation of China (No. 71101027), Program for Young Excellent Talents, UIBE (No. 12YQ08), a GRF grant from RGC UGC Hong Kong (GRF Project No. 9041574) and a grant from City University of Hong Kong (Project No. 7008026).

Author information

Authors and Affiliations

School of Information Technology and Management, University of International Business and Economics, Beijing, 100029, China
Yaodong Ni
School of Creative Media, City University of Hong Kong, Hong Kong, China
Zhi-Qiang Liu

Authors

Yaodong Ni
View author publications
You can also search for this author in PubMed Google Scholar
Zhi-Qiang Liu
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Yaodong Ni.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Ni, Y., Liu, ZQ. Policy iteration for bounded-parameter POMDPs. Soft Comput 17, 537–548 (2013). https://doi.org/10.1007/s00500-012-0932-3

Download citation

Published: 27 September 2012
Issue Date: April 2013
DOI: https://doi.org/10.1007/s00500-012-0932-3

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Policy iteration for bounded-parameter POMDPs

Abstract

Access this article

Similar content being viewed by others

A practical guide to multi-objective reinforcement learning and planning

Challenges of real-world reinforcement learning: definitions, benchmarks and analysis

Introduction to Reinforcement Learning

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Policy iteration for bounded-parameter POMDPs

Abstract

Access this article

Similar content being viewed by others

A practical guide to multi-objective reinforcement learning and planning

Challenges of real-world reinforcement learning: definitions, benchmarks and analysis

Introduction to Reinforcement Learning

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation