Skip to main content
Log in

Policy iteration for robust nonstationary Markov decision processes

  • Original Paper
  • Published:
Optimization Letters Aims and scope Submit manuscript

Abstract

Policy iteration is a well-studied algorithm for solving stationary Markov decision processes (MDPs). It has also been extended to robust stationary MDPs. For robust nonstationary MDPs, however, an “as is” execution of this algorithm is not possible because it would call for an infinite amount of computation in each iteration. We therefore present a policy iteration algorithm for robust nonstationary MDPs, which performs finitely implementable approximate variants of policy evaluation and policy improvement in each iteration. We prove that the sequence of cost-to-go functions produced by this algorithm monotonically converges pointwise to the optimal cost-to-go function; the policies generated converge subsequentially to an optimal policy.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

Notes

  1. Most MDPs discussed in this paper are finite-state, finite-action and infinite-horizon; we therefore omit such qualifiers for brevity throughout, unless they are essential for clarity.

References

  1. Aliprantis, C.D., Border, K.C.: Infinite-dimensional analysis: a hitchhiker’s guide. Springer-Verlag, Berlin (1994)

    Book  MATH  Google Scholar 

  2. Ben-Tal, A., El Ghaoui, L., Nemirovski, A.: Robust optimization. Princeton University Press, Princeton (2009)

    Book  MATH  Google Scholar 

  3. Cheevaprawatdomrong, T., Schochetman, I.E., Smith, R.L., Garcia, A.: Solution and forecast horizons for infinite-horizon non-homogeneous Markov decision processes. Math. Oper. Res. 32(1), 51–72 (2007)

    Article  MathSciNet  MATH  Google Scholar 

  4. Garcia, A., Smith, R.L.: Solving nonstationary infinite horizon dynamic optimization problems. J. Math. Anal. Appl. 244, 304–317 (2000)

    Article  MathSciNet  MATH  Google Scholar 

  5. Ghate, A.: Infinite horizon problems. In: Cochran, J.J. (ed.) Wiley encyclopedia of operations research and management science. Wiley, Hoboken (2010)

  6. Ghate, A., Sharma, D., Smith, R.L.: A shadow simplex method for infinite linear programs. Oper. Res. 58(4), 865–877 (2010)

    Article  MathSciNet  MATH  Google Scholar 

  7. Ghate, A., Smith, R.L.: A linear programming approach to nonstationary infinite-horizon Markov decision processes. Oper. Res. 61(2), 413–425 (2013)

    Article  MathSciNet  MATH  Google Scholar 

  8. Hopp, W.J., Bean, J.C., Smith, R.L.: A new optimality criterion for non-homogeneous Markov decision processes. Oper. Res. 35, 875–883 (1987)

    Article  MathSciNet  MATH  Google Scholar 

  9. Howard, R.A.: Dynamic programming and Markov processes. PhD thesis, MIT, Cambridge, MA, USA (1960)

  10. Iyengar, G.N.: Robust dynamic programming. Math. Oper. Res. 30(2), 257–280 (2005)

    Article  MathSciNet  MATH  Google Scholar 

  11. Lee, I., Epelman, M.A., Romeijn, H.E., Smith, R.L.: Simplex algorithm for countable-state discounted Markov decision processes (2014). http://www.optimization-online.org/DB_HTML/2014/11/4645.html. Accessed Nov 2014

  12. Nilim, A., El, L.: Ghaoui. Robust control of markov decision processes with uncertain transition matrices. Oper. Res. 53(5), 780–798 (2005)

    Article  MathSciNet  MATH  Google Scholar 

  13. Puterman, M.L.: Markov decision processes : discrete stochastic dynamic programming. John Wiley and Sons, New York (1994)

    Book  MATH  Google Scholar 

  14. Schochetman, I.E., Smith, R.L.: Infinite horizon optimization. Math. Oper. Res. 14(3), 559–574 (1989)

    Article  MathSciNet  MATH  Google Scholar 

  15. Schochetman, I.E., Smith, R.L.: Finite dimensional approximation in infinite dimensional mathematical programming. Math. Program. 54(3), 307–333 (1992)

    Article  MathSciNet  MATH  Google Scholar 

  16. Ye, Y.: The simplex and policy iteration methods are strongly polynomial for the Markov decision problem with a fixed discount rate. Math. Oper. Res. 36(4), 593–603 (2011)

    Article  MathSciNet  MATH  Google Scholar 

Download references

Acknowledgments

Funded in part by the National Science Foundation through grant #CMMI 1333260.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Archis Ghate.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Sinha, S., Ghate, A. Policy iteration for robust nonstationary Markov decision processes. Optim Lett 10, 1613–1628 (2016). https://doi.org/10.1007/s11590-016-1040-6

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11590-016-1040-6

Keywords

Navigation