Abstract
Markov decision processes (MDPs) are a standard model for sequential decision-making problems and are widely used across many scientific areas, including formal methods and artificial intelligence (AI). MDPs do, however, come with the restrictive assumption that the transition probabilities need to be precisely known. Robust MDPs (RMDPs) overcome this assumption by instead defining the transition probabilities to belong to some uncertainty set. We present a gentle survey on RMDPs, providing a tutorial covering their fundamentals. In particular, we discuss RMDP semantics and how to solve them by extending standard MDP methods such as value iteration and policy iteration. We also discuss how RMDPs relate to other models and how they are used in several contexts, including reinforcement learning and abstraction techniques. We conclude with some challenges for future work on RMDPs.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Abate, A., Prandini, M., Lygeros, J., Sastry, S.: Probabilistic reachability and safety for controlled discrete time stochastic hybrid systems. Automatica 44(11), 2724ā2734 (2008)
Alur, R., Henzinger, T.A., Lafferriere, G., Pappas, G.J.: Discrete abstractions of hybrid systems. Proc. IEEE 88(7), 971ā984 (2000)
AndrĆ©s, I., de Barros, L.N., MauĆ”, D.D., SimĆ£o, T.D.: When a robot reaches out for human help. In: Simari, G.R., FermĆ©, E., GutiĆ©rrez Segura, F., RodrĆguez Melquiades, J.A. (eds.) IBERAMIA 2018. LNCS (LNAI), vol. 11238, pp. 277ā289. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-03928-8_23
Andriushchenko, R., et al.: Tools at the frontiers of quantitative verification. CoRR abs/2405.13583 (2024)
Arming, S., Bartocci, E., Chatterjee, K., Katoen, J.-P., Sokolova, A.: Parameter-independent strategies for pMDPs via POMDPs. In: McIver, A., Horvath, A. (eds.) QEST 2018. LNCS, vol. 11024, pp. 53ā70. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-99154-2_4
Ashok, P., KÅetĆnskĆ½, J., Weininger, M.: PAC statistical model checking for Markov decision processes and stochastic games. In: Dillig, I., Tasiran, S. (eds.) CAV 2019. LNCS, vol. 11561, pp. 497ā519. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-25540-4_29
Badings, T.S., Cubuktepe, M., Jansen, N., Junges, S., Katoen, J., Topcu, U.: Scenario-based verification of uncertain parametric MDPs. Int. J. Softw. Tools Technol. Transf. 24(5), 803ā819 (2022)
Badings, T.S., Jansen, N., Junges, S., Stoelinga, M., Volk, M.: Sampling-based verification of CTMCs with uncertain rates. In: Shoham, S., Vizel, Y. (eds.) CAV 2022. LNCS, vol. 13372, pp. 26ā47. Springer, Cham (2022). https://doi.org/10.1007/978-3-031-13188-2_2
Badings, T.S., Romao, L., Abate, A., Jansen, N.: Probabilities are not enough: formal controller synthesis for stochastic dynamical models with epistemic uncertainty. In: AAAI, pp. 14701ā14710. AAAI Press (2023)
Badings, T.S., et al.: Robust control for dynamical systems with non-gaussian noise via formal abstractions. J. Artif. Intell. Res. 76, 341ā391 (2023)
Badings, T.S., SimĆ£o, T.D., Suilen, M., Jansen, N.: Decision-making under uncertainty: beyond probabilities. Int. J. Softw. Tools Technol. Transf. 25(3), 375ā391 (2023)
Baier, C., Bertrand, N., GrƶĆer, M.: On decision problems for probabilistic BĆ¼chi automata. In: Amadio, R. (ed.) FoSSaCS 2008. LNCS, vol. 4962, pp. 287ā301. Springer, Heidelberg (2008). https://doi.org/10.1007/978-3-540-78499-9_21
Baier, C., Hermanns, H., Katoen, J.-P.: The 10,000 facets of MDP model checking. In: Steffen, B., Woeginger, G. (eds.) Computing and Software Science. LNCS, vol. 10000, pp. 420ā451. Springer, Cham (2019). https://doi.org/10.1007/978-3-319-91908-9_21
Baier, C., Katoen, J.: Principles of Model Checking. MIT Press, Cambridge (2008)
Baier, C., Klein, J., Leuschner, L., Parker, D., Wunderlich, S.: Ensuring the reliability of your model checker: interval iteration for Markov decision processes. In: Majumdar, R., KunÄak, V. (eds.) CAV 2017. LNCS, vol. 10426, pp. 160ā180. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-63387-9_8
Behzadian, B., Petrik, M., Ho, C.P.: Fast algorithms for \(l_{\infty }\)-constrained S-rectangular robust MDPs. In: NeurIPS, pp. 25982ā25992 (2021)
Bovy, E.M., Suilen, M., Junges, S., Jansen, N.: Imprecise probabilities meet partial observability: game semantics for robust POMDPs. CoRR abs/2405.04941 (2024)
BrĆ”zdil, T., et al.: Verification of Markov decision processes using learning algorithms. In: Cassez, F., Raskin, J.-F. (eds.) ATVA 2014. LNCS, vol. 8837, pp. 98ā114. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-11936-6_8
Campi, M.C., CarĆØ, A., Garatti, S.: The scenario approach: a tool at the service of data-driven decision making. Annu. Rev. Control. 52, 1ā17 (2021)
Campi, M.C., Garatti, S.: The exact feasibility of randomized solutions of uncertain convex programs. SIAM J. Optim. 19(3), 1211ā1230 (2008)
Cauchi, N., Abate, A.: \(\sf StocHy\): automated verification and synthesis of stochastic processes. In: Vojnar, T., Zhang, L. (eds.) TACAS 2019. LNCS, vol. 11428, pp. 247ā264. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-17465-1_14
Chamie, M.E., Mostafa, H.: Robust action selection in partially observable Markov decision processes with model uncertainty. In: CDC, pp. 5586ā5591. IEEE (2018)
Chatterjee, K., ChmelĆk, M., Karkhanis, D., NovotnĆ½, P., Royer, A.: Multiple-environment Markov decision processes: efficient analysis and applications. In: ICAPS, pp. 48ā56. AAAI Press (2020)
Chatterjee, K., Doyen, L., Henzinger, T.A.: Qualitative analysis of partially-observable Markov decision processes. In: HlinÄnĆ½, P., KuÄera, A. (eds.) MFCS 2010. LNCS, vol. 6281, pp. 258ā269. Springer, Heidelberg (2010). https://doi.org/10.1007/978-3-642-15155-2_24
Chatterjee, K., Goharshady, E.K., Karrabi, M., NovotnĆ½, P., Zikelic, D.: Solving long-run average reward robust MDPs via stochastic games. CoRR abs/2312.13912 (2023)
Chen, T., Han, T., Kwiatkowska, M.Z.: On the complexity of model checking interval-valued discrete time Markov chains. Inf. Process. Lett. 113(7), 210ā216 (2013)
Clarke, E.M., Klieber, W., NovĆ”Äek, M., Zuliani, P.: Model checking and the state explosion problem. In: Meyer, B., Nordio, M. (eds.) LASER 2011. LNCS, vol. 7682, pp. 1ā30. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-642-35746-6_1
Coppola, R., Peruffo, A., Romao, L., Abate, A., Jr., M.M.: Data-driven interval MDP for robust control synthesis. CoRR abs/2404.08344 (2024)
Costen, C., Rigter, M., Lacerda, B., Hawes, N.: Planning with hidden parameter polynomial MDPs. In: AAAI, pp. 11963ā11971. AAAI Press (2023)
Cubuktepe, M., et al.: Sequential convex programming for the efficient verification of parametric MDPs. In: Legay, A., Margaria, T. (eds.) TACAS 2017. LNCS, vol. 10206, pp. 133ā150. Springer, Heidelberg (2017). https://doi.org/10.1007/978-3-662-54580-5_8
Cubuktepe, M., Jansen, N., Junges, S., Katoen, J.-P., Topcu, U.: Synthesis in pMDPs: a tale of 1001 parameters. In: Lahiri, S.K., Wang, C. (eds.) ATVA 2018. LNCS, vol. 11138, pp. 160ā176. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01090-4_10
Cubuktepe, M., Jansen, N., Junges, S., Katoen, J.-P., Topcu, U.: Scenario-based verification of uncertain MDPs. In: Biere, A., Parker, D. (eds.) TACAS 2020. LNCS, vol. 12078, pp. 287ā305. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-45190-5_16
Cubuktepe, M., Jansen, N., Junges, S., Katoen, J., Topcu, U.: Convex optimization for parameter synthesis in MDPs. IEEE Trans. Autom. Control 67(12), 6333ā6348 (2022)
Cubuktepe, M., Jansen, N., Junges, S., Marandi, A., Suilen, M., Topcu, U.: Robust finite-state controllers for uncertain POMDPs. In: AAAI, pp. 11792ā11800. AAAI Press (2021)
Daca, P., Henzinger, T.A., KÅetĆnskĆ½, J., Petrov, T.: Faster statistical model checking for unbounded temporal properties. In: Chechik, M., Raskin, J.-F. (eds.) TACAS 2016. LNCS, vol. 9636, pp. 112ā129. Springer, Heidelberg (2016). https://doi.org/10.1007/978-3-662-49674-9_7
Dehnert, C., et al.: PROPhESY: A PRObabilistic ParamEter SYnthesis tool. In: Kroening, D., PÄsÄreanu, C.S. (eds.) CAV 2015. LNCS, vol. 9206, pp. 214ā231. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-21690-4_13
Dehnert, C., et al.: Parameter synthesis for probabilistic systems. In: MBMV, pp. 72ā74. Albert-Ludwigs-UniversitƤt Freiburg (2016)
Delage, A., Buffet, O., Dibangoye, J.S., Saffidine, A.: HSVI can solve zero-sum partially observable stochastic games. Dyn. Games Appl. 14, 751ā805 (2023)
Fecher, H., Leucker, M., Wolf, V.: Donāt Know in probabilistic systems. In: Valmari, A. (ed.) SPIN 2006. LNCS, vol. 3925, pp. 71ā88. Springer, Heidelberg (2006). https://doi.org/10.1007/11691617_5
Fijalkow, N., et al.: Games on graphs. CoRR abs/2305.10546 (2023)
Forejt, V., Kwiatkowska, M., Norman, G., Parker, D.: Automated verification techniques for probabilistic systems. In: Bernardo, M., Issarny, V. (eds.) SFM 2011. LNCS, vol. 6659, pp. 53ā113. Springer, Heidelberg (2011). https://doi.org/10.1007/978-3-642-21455-4_3
Gadot, U., Derman, E., Kumar, N., Elfatihi, M.M., Levy, K., Mannor, S.: Solving non-rectangular reward-robust MDPs via frequency regularization. In: AAAI, pp. 21090ā21098. AAAI Press (2024)
Galesloot, M.F., et al.: Pessimistic iterative planning for robust POMDPs (2024)
Ghavamzadeh, M., Petrik, M., Chow, Y.: Safe policy improvement by minimizing robust baseline regret. In: NIPS, pp. 2298ā2306 (2016)
Girard, A., Pappas, G.J.: Approximation metrics for discrete and continuous systems. IEEE Trans. Autom. Control 52(5), 782ā798 (2007)
Givan, R., Leach, S.M., Dean, T.L.: Bounded-parameter Markov decision processes. Artif. Intell. 122(1ā2), 71ā109 (2000)
Goyal, V., Grand-ClĆ©ment, J.: Robust Markov decision processes: beyond rectangularity. Math. Oper. Res. 48(1), 203ā226 (2023)
Grand-ClƩment, J., Petrik, M.: Reducing Blackwell and average optimality to discounted MDPs via the Blackwell discount factor. In: NeurIPS (2023)
Grand-ClƩment, J., Petrik, M., Vieille, N.: Beyond discounted returns: robust Markov decision processes with average and Blackwell optimality. CoRR abs/2312.03618 (2023)
Guez, A., Silver, D., Dayan, P.: Efficient Bayes-adaptive reinforcement learning using sample-based search. In: NIPS, pp. 1034ā1042 (2012)
Haddad, S., Monmege, B.: Reachability in MDPs: refining convergence of value iteration. In: Ouaknine, J., Potapov, I., Worrell, J. (eds.) RP 2014. LNCS, vol. 8762, pp. 125ā137. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-11439-2_10
Hansson, H., Jonsson, B.: A logic for reasoning about time and reliability. Formal Aspects Comput. 6(5), 512ā535 (1994)
Hartmanns, A., Junges, S., Quatmann, T., Weininger, M.: A practitionerās guide to MDP model checking algorithms. In: Sankaranarayanan, S., Sharygina, N. (eds.) TACAS 2023. LNCS, vol. 13993, pp. 469ā488. Springer, Cham (2023). https://doi.org/10.1007/978-3-031-30823-9_24
Hartmanns, A., Kaminski, B.L.: Optimistic value iteration. In: Lahiri, S.K., Wang, C. (eds.) CAV 2020. LNCS, vol. 12225, pp. 488ā511. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-53291-8_26
Hartmanns, A., Klauck, M., Parker, D., Quatmann, T., Ruijters, E.: The quantitative verification benchmark set. In: Vojnar, T., Zhang, L. (eds.) TACAS 2019, Part I. LNCS, vol. 11427, pp. 344ā350. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-17462-0_20
Hashemi, V., Hermanns, H., Song, L., Subramani, K., Turrini, A., Wojciechowski, P.: Compositional bisimulation minimization for interval Markov decision processes. In: Dediu, A.-H., JanouÅ”ek, J., MartĆn-Vide, C., Truthe, B. (eds.) LATA 2016. LNCS, vol. 9618, pp. 114ā126. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-30000-9_9
Hensel, C., Junges, S., Katoen, J., Quatmann, T., Volk, M.: The probabilistic model checker storm. Int. J. Softw. Tools Technol. Transf. 24(4), 589ā610 (2022)
Ho, C.P., Petrik, M., Wiesemann, W.: Fast Bellman updates for robust MDPs. In: ICML. Proceedings of Machine Learning Research, vol. 80, pp. 1984ā1993. PMLR (2018)
Ho, C.P., Petrik, M., Wiesemann, W.: Partial policy iteration for l1-robust Markov decision processes. J. Mach. Learn. Res. 22, 275:1ā275:46 (2021)
Ho, C.P., Petrik, M., Wiesemann, W.: Robust \(\phi \)-divergence MDPs. In: NeurIPS (2022)
Hoeffding, W.: Probability inequalities for sums of bounded random variables. J. Am. Stat. Assoc. 58, 13ā30 (1963)
HĆ¼llermeier, E., Waegeman, W.: Aleatoric and epistemic uncertainty in machine learning: an introduction to concepts and methods. Mach. Learn. 110(3), 457ā506 (2021)
Itoh, H., Nakamura, K.: Partially observable Markov decision processes with imprecise parameters. Artif. Intell. 171(8ā9), 453ā490 (2007)
Iyengar, G.N.: Robust dynamic programming. Math. Oper. Res. 30(2), 257ā280 (2005)
Jaeger, M., Bacci, G., Bacci, G., Larsen, K.G., Jensen, P.G.: Approximating Euclidean by imprecise Markov decision processes. In: Margaria, T., Steffen, B. (eds.) ISoLA 2020. LNCS, vol. 12476, pp. 275ā289. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-61362-4_15
Jaksch, T., Ortner, R., Auer, P.: Near-optimal regret bounds for reinforcement learning. J. Mach. Learn. Res. 11, 1563ā1600 (2010)
Jansen, N., Junges, S., Katoen, J.: Parameter synthesis in Markov models: a gentle survey. In: Raskin, J.F., Chatterjee, K., Doyen, L., Majumdar, R. (eds.) Principles of Systems Design. LNCS, vol. 13660, pp. 407ā437. Springer, Cham (2022). https://doi.org/10.1007/978-3-031-22337-2_20
Jonsson, B., Larsen, K.G.: Specification and refinement of probabilistic processes. In: Proceedings of the Sixth Annual Symposium on Logic in Computer Science (LICS 1991), Amsterdam, The Netherlands, 15ā18 July 1991, pp. 266ā277. IEEE Computer Society (1991). https://doi.org/10.1109/LICS.1991.151651
Junges, S., et al.: Parameter synthesis for Markov models: covering the parameter space. Formal Methods Syst. Des. 62(1), 181ā259 (2024)
Junges, S., Katoen, J., PĆ©rez, G.A., Winkler, T.: The complexity of reachability in parametric Markov decision processes. J. Comput. Syst. Sci. 119, 183ā210 (2021)
Kaelbling, L.P., Littman, M.L., Cassandra, A.R.: Planning and acting in partially observable stochastic domains. Artif. Intell. 101(1ā2), 99ā134 (1998)
Katoen, J.: The probabilistic model checking landscape. In: LICS, pp. 31ā45. ACM (2016)
Katoen, J.-P., Klink, D., Leucker, M., Wolf, V.: Three-valued abstraction for continuous-time Markov chains. In: Damm, W., Hermanns, H. (eds.) CAV 2007. LNCS, vol. 4590, pp. 311ā324. Springer, Heidelberg (2007). https://doi.org/10.1007/978-3-540-73368-3_37
Katoen, J., Klink, D., Leucker, M., Wolf, V.: Three-valued abstraction for probabilistic systems. J. Log. Algebraic Methods Program. 81(4), 356ā389 (2012)
Kattenbelt, M., Kwiatkowska, M.Z., Norman, G., Parker, D.: A game-based abstraction-refinement framework for Markov decision processes. Formal Methods Syst. Des. 36(3), 246ā280 (2010)
Kaufman, D.L., Schaefer, A.J.: Robust modified policy iteration. INFORMS J. Comput. 25(3), 396ā410 (2013)
Kumar, N., Derman, E., Geist, M., Levy, K.Y., Mannor, S.: Policy gradient for rectangular robust Markov decision processes. In: NeurIPS (2023)
Kwiatkowska, M.Z., Norman, G., Parker, D.: Game-based abstraction for Markov decision processes. In: QEST, pp. 157ā166. IEEE Computer Society (2006)
Kwiatkowska, M., Norman, G., Parker, D.: Stochastic model checking. In: Bernardo, M., Hillston, J. (eds.) SFM 2007. LNCS, vol. 4486, pp. 220ā270. Springer, Heidelberg (2007). https://doi.org/10.1007/978-3-540-72522-0_6
Kwiatkowska, M., Norman, G., Parker, D.: PRISM 4.0: verification of probabilistic real-time systems. In: Gopalakrishnan, G., Qadeer, S. (eds.) CAV 2011. LNCS, vol. 6806, pp. 585ā591. Springer, Heidelberg (2011). https://doi.org/10.1007/978-3-642-22110-1_47
Lahijanian, M., Andersson, S.B., Belta, C.: Formal verification and synthesis for discrete-time stochastic systems. IEEE Trans. Autom. Control 60(8), 2031ā2045 (2015)
Laroche, R., Trichelair, P., des Combes, R.T.: Safe policy improvement with baseline bootstrapping. In: ICML. Proceedings of Machine Learning Research, vol. 97, pp. 3652ā3661. PMLR (2019)
Larsen, K.G., Skou, A.: Bisimulation through probabilistic testing. Inf. Comput. 94(1), 1ā28 (1991)
Lavaei, A., Soudjani, S., Abate, A., Zamani, M.: Automated verification and synthesis of stochastic hybrid systems: a survey. Autom. 146, 110617 (2022)
Lavaei, A., Soudjani, S., Frazzoli, E., Zamani, M.: Constructing MDP abstractions using data with formal guarantees. IEEE Control. Syst. Lett. 7, 460ā465 (2023)
Legay, A., Delahaye, B., Bensalem, S.: Statistical model checking: an overview. In: Barringer, H. (ed.) RV 2010. LNCS, vol. 6418, pp. 122ā135. Springer, Heidelberg (2010). https://doi.org/10.1007/978-3-642-16612-9_11
Madani, O., Hanks, S., Condon, A.: On the undecidability of probabilistic planning and related stochastic optimization problems. Artif. Intell. 147(1ā2), 5ā34 (2003)
Mannor, S., Simester, D., Sun, P., Tsitsiklis, J.N.: Bias and variance approximation in value function estimates. Manag. Sci. 53(2), 308ā322 (2007)
Mathiesen, F.B., Lahijanian, M., Laurenti, L.: Intervalmdp.jl: accelerated value iteration for interval Markov decision processes. Technical report. arXiv:2401.04068, arXiv (2024)
Meggendorfer, T., Weininger, M., Wienhƶft, P.: What are the Odds? Improving the foundations of statistical model checking. CoRR abs/2404.05424 (2024)
Moos, J., Hansel, K., Abdulsamad, H., Stark, S., Clever, D., Peters, J.: Robust reinforcement learning: a review of foundations and recent advances. Mach. Learn. Knowl. Extr. 4(1), 276ā315 (2022)
Nakao, H., Jiang, R., Shen, S.: Distributionally robust partially observable Markov decision process with moment-based ambiguity. SIAM J. Optim. 31(1), 461ā488 (2021)
Nilim, A., Ghaoui, L.E.: Robust control of Markov decision processes with uncertain transition matrices. Oper. Res. 53(5), 780ā798 (2005)
Osogami, T.: Robust partially observable Markov decision process. In: ICML. JMLR Workshop and Conference Proceedings, vol. 37, pp. 106ā115. JMLR.org (2015)
Ou, W., Bi, S.: Sequential decision-making under uncertainty: a robust MDPs review. CoRR abs/2305.10546 (2024)
Pnueli, A.: The temporal logic of programs. In: FOCS, pp. 46ā57. IEEE Computer Society (1977)
Ponnambalam, C.T., Oliehoek, F.A., Spaan, M.T.J.: Abstraction-guided policy recovery from expert demonstrations. In: ICAPS, pp. 560ā568. AAAI Press (2021)
Puggelli, A., Li, W., Sangiovanni-Vincentelli, A.L., Seshia, S.A.: Polynomial-time verification of PCTL properties of MDPs with convex uncertainties. In: Sharygina, N., Veith, H. (eds.) CAV 2013. LNCS, vol. 8044, pp. 527ā542. Springer, Heidelberg (2013). https://doi.org/10.1007/978-3-642-39799-8_35
Puterman, M.L.: Markov Decision Processes: Discrete Stochastic Dynamic Programming. Wiley Series in Probability and Statistics. Wiley, Hoboken (1994)
Quatmann, T., Dehnert, C., Jansen, N., Junges, S., Katoen, J.-P.: Parameter synthesis for Markov models: faster than ever. In: Artho, C., Legay, A., Peled, D. (eds.) ATVA 2016. LNCS, vol. 9938, pp. 50ā67. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46520-3_4
Quatmann, T., Katoen, J.-P.: Sound value iteration. In: Chockler, H., Weissenbacher, G. (eds.) CAV 2018. LNCS, vol. 10981, pp. 643ā661. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-96145-3_37
Raskin, J., Sankur, O.: Multiple-environment Markov decision processes. In: FSTTCS. LIPIcs, vol. 29, pp. 531ā543. Schloss Dagstuhl - Leibniz-Zentrum fĆ¼r Informatik (2014)
Rickard, L., Abate, A., Margellos, K.: Learning robust policies for uncertain parametric Markov decision processes. CoRR abs/2312.06344 (2023)
Rigter, M., Lacerda, B., Hawes, N.: Minimax regret optimisation for robust planning in uncertain Markov decision processes. In: AAAI, pp. 11930ā11938. AAAI Press (2021)
Rigter, M., Lacerda, B., Hawes, N.: Risk-averse Bayes-adaptive reinforcement learning. In: NeurIPS, pp. 1142ā1154 (2021)
Saghafian, S.: Ambiguous partially observable Markov decision processes: structural results and applications. J. Econ. Theory 178, 1ā35 (2018)
SimĆ£o, T.D., Suilen, M., Jansen, N.: Safe policy improvement for POMDPs via finite-state controllers. In: AAAI, pp. 15109ā15117. AAAI Press (2023)
Strehl, A.L., Li, L., Littman, M.L.: Reinforcement learning in finite MDPs: PAC analysis. J. Mach. Learn. Res. 10, 2413ā2444 (2009)
Strehl, A.L., Littman, M.L.: An analysis of model-based interval estimation for Markov decision processes. J. Comput. Syst. Sci. 74(8), 1309ā1331 (2008)
Suilen, M., Jansen, N., Cubuktepe, M., Topcu, U.: Robust policy synthesis for uncertain pomdps via convex optimization. In: IJCAI, pp. 4113ā4120. ijcai.org (2020)
Suilen, M., SimĆ£o, T.D., Parker, D., Jansen, N.: Robust anytime learning of Markov decision processes. In: NeurIPS (2022)
Suilen, M., van der Vegt, M., Junges, S.: A PSPACE algorithm for almost-sure Rabin objectives in multi-environment MDPs. CoRR abs/2407.07006 (2024)
Sutton, R.S., Barto, A.G.: Reinforcement Learning - An Introduction. Adaptive Computation and Machine Learning. MIT Press, Cambridge (1998)
van der Vegt, M., Jansen, N., Junges, S.: Robust almost-sure reachability in multi-environment MDPs. In: Sankaranarayanan, S., Sharygina, N. (eds.) TACAS 2023. LNCS, vol. 13993, pp. 508ā526. Springer, Cham (2023). https://doi.org/10.1007/978-3-031-30823-9_26
Wang, Q., Ho, C.P., Petrik, M.: Policy gradient in robust MDPs with global convergence guarantee. In: ICML. Proceedings of Machine Learning Research, vol. 202, pp. 35763ā35797. PMLR (2023)
Weissman, T., Ordentlich, E., Seroussi, G., Verdu, S., Weinberger, M.J.: Inequalities for the l1 deviation of the empirical distribution. Technical report, Hewlett-Packard Labs (2003)
Wienhƶft, P., Suilen, M., SimĆ£o, T.D., Dubslaff, C., Baier, C., Jansen, N.: More for less: safe policy improvement with stronger performance guarantees. In: IJCAI, pp. 4406ā4415. ijcai.org (2023)
Wiesemann, W., Kuhn, D., Rustem, B.: Robust Markov decision processes. Math. Oper. Res. 38(1), 153ā183 (2013)
Wolff, E.M., Topcu, U., Murray, R.M.: Robust control of uncertain Markov decision processes with temporal logic specifications. In: CDC, pp. 3372ā3379. IEEE (2012)
Wooding, B., Lavaei, A.: Impact: interval MDP parallel construction for controller synthesis of large-scale stochastic systems. CoRR abs/2401.03555 (2024)
Xu, H., Mannor, S.: Distributionally robust Markov decision processes. Math. Oper. Res. 37(2), 288ā300 (2012)
Yang, C., Littman, M.L., Carbin, M.: On the (in)tractability of reinforcement learning for LTL objectives. In: IJCAI, pp. 3650ā3658. ijcai.org (2022)
Acknowledgements
This work was supported by the ERC Starting Grant 101077178 (DEUCE) and the European Unionās Horizon 2020 research and innovation programme (FUN2MODEL, grant agreement No. 834115), as well as the NWO grants OCENW.KLEIN.187 and NWA.1160.18.238 (PrimaVera).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
Ā© 2025 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this chapter
Cite this chapter
Suilen, M., Badings, T., Bovy, E.M., Parker, D., Jansen, N. (2025). Robust Markov Decision Processes: A Place Where AI and Formal Methods Meet. In: Jansen, N., et al. Principles of Verification: Cycling the Probabilistic Landscape . Lecture Notes in Computer Science, vol 15262. Springer, Cham. https://doi.org/10.1007/978-3-031-75778-5_7
Download citation
DOI: https://doi.org/10.1007/978-3-031-75778-5_7
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-75777-8
Online ISBN: 978-3-031-75778-5
eBook Packages: Computer ScienceComputer Science (R0)