Policy iteration for robust nonstationary Markov decision processes

Sinha, Saumya; Ghate, Archis

doi:10.1007/s11590-016-1040-6

Policy iteration for robust nonstationary Markov decision processes

Original Paper
Published: 18 May 2016

Volume 10, pages 1613–1628, (2016)
Cite this article

Optimization Letters Aims and scope Submit manuscript

Saumya Sinha¹ &
Archis Ghate¹

363 Accesses
8 Citations
3 Altmetric
Explore all metrics

Abstract

Policy iteration is a well-studied algorithm for solving stationary Markov decision processes (MDPs). It has also been extended to robust stationary MDPs. For robust nonstationary MDPs, however, an “as is” execution of this algorithm is not possible because it would call for an infinite amount of computation in each iteration. We therefore present a policy iteration algorithm for robust nonstationary MDPs, which performs finitely implementable approximate variants of policy evaluation and policy improvement in each iteration. We prove that the sequence of cost-to-go functions produced by this algorithm monotonically converges pointwise to the optimal cost-to-go function; the policies generated converge subsequentially to an optimal policy.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Notes

Most MDPs discussed in this paper are finite-state, finite-action and infinite-horizon; we therefore omit such qualifiers for brevity throughout, unless they are essential for clarity.

References

Aliprantis, C.D., Border, K.C.: Infinite-dimensional analysis: a hitchhiker’s guide. Springer-Verlag, Berlin (1994)
Book MATH Google Scholar
Ben-Tal, A., El Ghaoui, L., Nemirovski, A.: Robust optimization. Princeton University Press, Princeton (2009)
Book MATH Google Scholar
Cheevaprawatdomrong, T., Schochetman, I.E., Smith, R.L., Garcia, A.: Solution and forecast horizons for infinite-horizon non-homogeneous Markov decision processes. Math. Oper. Res. 32(1), 51–72 (2007)
Article MathSciNet MATH Google Scholar
Garcia, A., Smith, R.L.: Solving nonstationary infinite horizon dynamic optimization problems. J. Math. Anal. Appl. 244, 304–317 (2000)
Article MathSciNet MATH Google Scholar
Ghate, A.: Infinite horizon problems. In: Cochran, J.J. (ed.) Wiley encyclopedia of operations research and management science. Wiley, Hoboken (2010)
Ghate, A., Sharma, D., Smith, R.L.: A shadow simplex method for infinite linear programs. Oper. Res. 58(4), 865–877 (2010)
Article MathSciNet MATH Google Scholar
Ghate, A., Smith, R.L.: A linear programming approach to nonstationary infinite-horizon Markov decision processes. Oper. Res. 61(2), 413–425 (2013)
Article MathSciNet MATH Google Scholar
Hopp, W.J., Bean, J.C., Smith, R.L.: A new optimality criterion for non-homogeneous Markov decision processes. Oper. Res. 35, 875–883 (1987)
Article MathSciNet MATH Google Scholar
Howard, R.A.: Dynamic programming and Markov processes. PhD thesis, MIT, Cambridge, MA, USA (1960)
Iyengar, G.N.: Robust dynamic programming. Math. Oper. Res. 30(2), 257–280 (2005)
Article MathSciNet MATH Google Scholar
Lee, I., Epelman, M.A., Romeijn, H.E., Smith, R.L.: Simplex algorithm for countable-state discounted Markov decision processes (2014). http://www.optimization-online.org/DB_HTML/2014/11/4645.html. Accessed Nov 2014
Nilim, A., El, L.: Ghaoui. Robust control of markov decision processes with uncertain transition matrices. Oper. Res. 53(5), 780–798 (2005)
Article MathSciNet MATH Google Scholar
Puterman, M.L.: Markov decision processes : discrete stochastic dynamic programming. John Wiley and Sons, New York (1994)
Book MATH Google Scholar
Schochetman, I.E., Smith, R.L.: Infinite horizon optimization. Math. Oper. Res. 14(3), 559–574 (1989)
Article MathSciNet MATH Google Scholar
Schochetman, I.E., Smith, R.L.: Finite dimensional approximation in infinite dimensional mathematical programming. Math. Program. 54(3), 307–333 (1992)
Article MathSciNet MATH Google Scholar
Ye, Y.: The simplex and policy iteration methods are strongly polynomial for the Markov decision problem with a fixed discount rate. Math. Oper. Res. 36(4), 593–603 (2011)
Article MathSciNet MATH Google Scholar

Download references

Acknowledgments

Funded in part by the National Science Foundation through grant #CMMI 1333260.

Author information

Authors and Affiliations

University of Washington, Seattle, WA, USA
Saumya Sinha & Archis Ghate

Authors

Saumya Sinha
View author publications
You can also search for this author in PubMed Google Scholar
Archis Ghate
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Archis Ghate.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Sinha, S., Ghate, A. Policy iteration for robust nonstationary Markov decision processes. Optim Lett 10, 1613–1628 (2016). https://doi.org/10.1007/s11590-016-1040-6

Download citation

Received: 17 September 2015
Accepted: 02 May 2016
Published: 18 May 2016
Issue Date: December 2016
DOI: https://doi.org/10.1007/s11590-016-1040-6

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Policy iteration for robust nonstationary Markov decision processes

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Approximate Policy Iteration for Markov Decision Processes via Quantitative Adaptive Aggregations

Analysis of Markov Decision Processes Under Parameter Uncertainty

Computing semi-stationary optimal policies for multichain semi-Markov decision processes

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Subscribe and save

Buy Now

Policy iteration for robust nonstationary Markov decision processes

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Approximate Policy Iteration for Markov Decision Processes via Quantitative Adaptive Aggregations

Analysis of Markov Decision Processes Under Parameter Uncertainty

Computing semi-stationary optimal policies for multichain semi-Markov decision processes

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now