Abstract
Policy iteration is a well-studied algorithm for solving stationary Markov decision processes (MDPs). It has also been extended to robust stationary MDPs. For robust nonstationary MDPs, however, an “as is” execution of this algorithm is not possible because it would call for an infinite amount of computation in each iteration. We therefore present a policy iteration algorithm for robust nonstationary MDPs, which performs finitely implementable approximate variants of policy evaluation and policy improvement in each iteration. We prove that the sequence of cost-to-go functions produced by this algorithm monotonically converges pointwise to the optimal cost-to-go function; the policies generated converge subsequentially to an optimal policy.
Similar content being viewed by others
Notes
Most MDPs discussed in this paper are finite-state, finite-action and infinite-horizon; we therefore omit such qualifiers for brevity throughout, unless they are essential for clarity.
References
Aliprantis, C.D., Border, K.C.: Infinite-dimensional analysis: a hitchhiker’s guide. Springer-Verlag, Berlin (1994)
Ben-Tal, A., El Ghaoui, L., Nemirovski, A.: Robust optimization. Princeton University Press, Princeton (2009)
Cheevaprawatdomrong, T., Schochetman, I.E., Smith, R.L., Garcia, A.: Solution and forecast horizons for infinite-horizon non-homogeneous Markov decision processes. Math. Oper. Res. 32(1), 51–72 (2007)
Garcia, A., Smith, R.L.: Solving nonstationary infinite horizon dynamic optimization problems. J. Math. Anal. Appl. 244, 304–317 (2000)
Ghate, A.: Infinite horizon problems. In: Cochran, J.J. (ed.) Wiley encyclopedia of operations research and management science. Wiley, Hoboken (2010)
Ghate, A., Sharma, D., Smith, R.L.: A shadow simplex method for infinite linear programs. Oper. Res. 58(4), 865–877 (2010)
Ghate, A., Smith, R.L.: A linear programming approach to nonstationary infinite-horizon Markov decision processes. Oper. Res. 61(2), 413–425 (2013)
Hopp, W.J., Bean, J.C., Smith, R.L.: A new optimality criterion for non-homogeneous Markov decision processes. Oper. Res. 35, 875–883 (1987)
Howard, R.A.: Dynamic programming and Markov processes. PhD thesis, MIT, Cambridge, MA, USA (1960)
Iyengar, G.N.: Robust dynamic programming. Math. Oper. Res. 30(2), 257–280 (2005)
Lee, I., Epelman, M.A., Romeijn, H.E., Smith, R.L.: Simplex algorithm for countable-state discounted Markov decision processes (2014). http://www.optimization-online.org/DB_HTML/2014/11/4645.html. Accessed Nov 2014
Nilim, A., El, L.: Ghaoui. Robust control of markov decision processes with uncertain transition matrices. Oper. Res. 53(5), 780–798 (2005)
Puterman, M.L.: Markov decision processes : discrete stochastic dynamic programming. John Wiley and Sons, New York (1994)
Schochetman, I.E., Smith, R.L.: Infinite horizon optimization. Math. Oper. Res. 14(3), 559–574 (1989)
Schochetman, I.E., Smith, R.L.: Finite dimensional approximation in infinite dimensional mathematical programming. Math. Program. 54(3), 307–333 (1992)
Ye, Y.: The simplex and policy iteration methods are strongly polynomial for the Markov decision problem with a fixed discount rate. Math. Oper. Res. 36(4), 593–603 (2011)
Acknowledgments
Funded in part by the National Science Foundation through grant #CMMI 1333260.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Sinha, S., Ghate, A. Policy iteration for robust nonstationary Markov decision processes. Optim Lett 10, 1613–1628 (2016). https://doi.org/10.1007/s11590-016-1040-6
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11590-016-1040-6