Skip to main content

Robust Markov Decision Processes: A Place Where AI and Formal Methods Meet

  • Chapter
  • First Online:
Principles of Verification: Cycling the Probabilistic Landscape

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 15262))

  • 180 Accesses

Abstract

Markov decision processes (MDPs) are a standard model for sequential decision-making problems and are widely used across many scientific areas, including formal methods and artificial intelligence (AI). MDPs do, however, come with the restrictive assumption that the transition probabilities need to be precisely known. Robust MDPs (RMDPs) overcome this assumption by instead defining the transition probabilities to belong to some uncertainty set. We present a gentle survey on RMDPs, providing a tutorial covering their fundamentals. In particular, we discuss RMDP semantics and how to solve them by extending standard MDP methods such as value iteration and policy iteration. We also discuss how RMDPs relate to other models and how they are used in several contexts, including reinforcement learning and abstraction techniques. We conclude with some challenges for future work on RMDPs.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

References

  1. Abate, A., Prandini, M., Lygeros, J., Sastry, S.: Probabilistic reachability and safety for controlled discrete time stochastic hybrid systems. Automatica 44(11), 2724ā€“2734 (2008)

    Article  MathSciNet  Google Scholar 

  2. Alur, R., Henzinger, T.A., Lafferriere, G., Pappas, G.J.: Discrete abstractions of hybrid systems. Proc. IEEE 88(7), 971ā€“984 (2000)

    Article  Google Scholar 

  3. AndrĆ©s, I., de Barros, L.N., MauĆ”, D.D., SimĆ£o, T.D.: When a robot reaches out for human help. In: Simari, G.R., FermĆ©, E., GutiĆ©rrez Segura, F., RodrĆ­guez Melquiades, J.A. (eds.) IBERAMIA 2018. LNCS (LNAI), vol. 11238, pp. 277ā€“289. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-03928-8_23

    Chapter  Google Scholar 

  4. Andriushchenko, R., et al.: Tools at the frontiers of quantitative verification. CoRR abs/2405.13583 (2024)

    Google Scholar 

  5. Arming, S., Bartocci, E., Chatterjee, K., Katoen, J.-P., Sokolova, A.: Parameter-independent strategies for pMDPs via POMDPs. In: McIver, A., Horvath, A. (eds.) QEST 2018. LNCS, vol. 11024, pp. 53ā€“70. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-99154-2_4

    Chapter  Google Scholar 

  6. Ashok, P., KřetĆ­nskĆ½, J., Weininger, M.: PAC statistical model checking for Markov decision processes and stochastic games. In: Dillig, I., Tasiran, S. (eds.) CAV 2019. LNCS, vol. 11561, pp. 497ā€“519. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-25540-4_29

    Chapter  Google Scholar 

  7. Badings, T.S., Cubuktepe, M., Jansen, N., Junges, S., Katoen, J., Topcu, U.: Scenario-based verification of uncertain parametric MDPs. Int. J. Softw. Tools Technol. Transf. 24(5), 803ā€“819 (2022)

    Article  Google Scholar 

  8. Badings, T.S., Jansen, N., Junges, S., Stoelinga, M., Volk, M.: Sampling-based verification of CTMCs with uncertain rates. In: Shoham, S., Vizel, Y. (eds.) CAV 2022. LNCS, vol. 13372, pp. 26ā€“47. Springer, Cham (2022). https://doi.org/10.1007/978-3-031-13188-2_2

    Chapter  Google Scholar 

  9. Badings, T.S., Romao, L., Abate, A., Jansen, N.: Probabilities are not enough: formal controller synthesis for stochastic dynamical models with epistemic uncertainty. In: AAAI, pp. 14701ā€“14710. AAAI Press (2023)

    Google Scholar 

  10. Badings, T.S., et al.: Robust control for dynamical systems with non-gaussian noise via formal abstractions. J. Artif. Intell. Res. 76, 341ā€“391 (2023)

    Article  MathSciNet  Google Scholar 

  11. Badings, T.S., SimĆ£o, T.D., Suilen, M., Jansen, N.: Decision-making under uncertainty: beyond probabilities. Int. J. Softw. Tools Technol. Transf. 25(3), 375ā€“391 (2023)

    Article  Google Scholar 

  12. Baier, C., Bertrand, N., GrĆ¶ĆŸer, M.: On decision problems for probabilistic BĆ¼chi automata. In: Amadio, R. (ed.) FoSSaCS 2008. LNCS, vol. 4962, pp. 287ā€“301. Springer, Heidelberg (2008). https://doi.org/10.1007/978-3-540-78499-9_21

    Chapter  Google Scholar 

  13. Baier, C., Hermanns, H., Katoen, J.-P.: The 10,000 facets of MDP model checking. In: Steffen, B., Woeginger, G. (eds.) Computing and Software Science. LNCS, vol. 10000, pp. 420ā€“451. Springer, Cham (2019). https://doi.org/10.1007/978-3-319-91908-9_21

    Chapter  Google Scholar 

  14. Baier, C., Katoen, J.: Principles of Model Checking. MIT Press, Cambridge (2008)

    Google Scholar 

  15. Baier, C., Klein, J., Leuschner, L., Parker, D., Wunderlich, S.: Ensuring the reliability of your model checker: interval iteration for Markov decision processes. In: Majumdar, R., Kunčak, V. (eds.) CAV 2017. LNCS, vol. 10426, pp. 160ā€“180. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-63387-9_8

    Chapter  Google Scholar 

  16. Behzadian, B., Petrik, M., Ho, C.P.: Fast algorithms for \(l_{\infty }\)-constrained S-rectangular robust MDPs. In: NeurIPS, pp. 25982ā€“25992 (2021)

    Google Scholar 

  17. Bovy, E.M., Suilen, M., Junges, S., Jansen, N.: Imprecise probabilities meet partial observability: game semantics for robust POMDPs. CoRR abs/2405.04941 (2024)

    Google Scholar 

  18. BrĆ”zdil, T., et al.: Verification of Markov decision processes using learning algorithms. In: Cassez, F., Raskin, J.-F. (eds.) ATVA 2014. LNCS, vol. 8837, pp. 98ā€“114. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-11936-6_8

    Chapter  Google Scholar 

  19. Campi, M.C., CarĆØ, A., Garatti, S.: The scenario approach: a tool at the service of data-driven decision making. Annu. Rev. Control. 52, 1ā€“17 (2021)

    Article  MathSciNet  Google Scholar 

  20. Campi, M.C., Garatti, S.: The exact feasibility of randomized solutions of uncertain convex programs. SIAM J. Optim. 19(3), 1211ā€“1230 (2008)

    Article  MathSciNet  Google Scholar 

  21. Cauchi, N., Abate, A.: \(\sf StocHy\): automated verification and synthesis of stochastic processes. In: Vojnar, T., Zhang, L. (eds.) TACAS 2019. LNCS, vol. 11428, pp. 247ā€“264. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-17465-1_14

    Chapter  Google Scholar 

  22. Chamie, M.E., Mostafa, H.: Robust action selection in partially observable Markov decision processes with model uncertainty. In: CDC, pp. 5586ā€“5591. IEEE (2018)

    Google Scholar 

  23. Chatterjee, K., ChmelĆ­k, M., Karkhanis, D., NovotnĆ½, P., Royer, A.: Multiple-environment Markov decision processes: efficient analysis and applications. In: ICAPS, pp. 48ā€“56. AAAI Press (2020)

    Google Scholar 

  24. Chatterjee, K., Doyen, L., Henzinger, T.A.: Qualitative analysis of partially-observable Markov decision processes. In: HliněnĆ½, P., Kučera, A. (eds.) MFCS 2010. LNCS, vol. 6281, pp. 258ā€“269. Springer, Heidelberg (2010). https://doi.org/10.1007/978-3-642-15155-2_24

    Chapter  Google Scholar 

  25. Chatterjee, K., Goharshady, E.K., Karrabi, M., NovotnĆ½, P., Zikelic, D.: Solving long-run average reward robust MDPs via stochastic games. CoRR abs/2312.13912 (2023)

    Google Scholar 

  26. Chen, T., Han, T., Kwiatkowska, M.Z.: On the complexity of model checking interval-valued discrete time Markov chains. Inf. Process. Lett. 113(7), 210ā€“216 (2013)

    Article  MathSciNet  Google Scholar 

  27. Clarke, E.M., Klieber, W., NovĆ”Äek, M., Zuliani, P.: Model checking and the state explosion problem. In: Meyer, B., Nordio, M. (eds.) LASER 2011. LNCS, vol. 7682, pp. 1ā€“30. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-642-35746-6_1

    Chapter  Google Scholar 

  28. Coppola, R., Peruffo, A., Romao, L., Abate, A., Jr., M.M.: Data-driven interval MDP for robust control synthesis. CoRR abs/2404.08344 (2024)

    Google Scholar 

  29. Costen, C., Rigter, M., Lacerda, B., Hawes, N.: Planning with hidden parameter polynomial MDPs. In: AAAI, pp. 11963ā€“11971. AAAI Press (2023)

    Google Scholar 

  30. Cubuktepe, M., et al.: Sequential convex programming for the efficient verification of parametric MDPs. In: Legay, A., Margaria, T. (eds.) TACAS 2017. LNCS, vol. 10206, pp. 133ā€“150. Springer, Heidelberg (2017). https://doi.org/10.1007/978-3-662-54580-5_8

    Chapter  Google Scholar 

  31. Cubuktepe, M., Jansen, N., Junges, S., Katoen, J.-P., Topcu, U.: Synthesis in pMDPs: a tale of 1001 parameters. In: Lahiri, S.K., Wang, C. (eds.) ATVA 2018. LNCS, vol. 11138, pp. 160ā€“176. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01090-4_10

    Chapter  Google Scholar 

  32. Cubuktepe, M., Jansen, N., Junges, S., Katoen, J.-P., Topcu, U.: Scenario-based verification of uncertain MDPs. In: Biere, A., Parker, D. (eds.) TACAS 2020. LNCS, vol. 12078, pp. 287ā€“305. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-45190-5_16

    Chapter  Google Scholar 

  33. Cubuktepe, M., Jansen, N., Junges, S., Katoen, J., Topcu, U.: Convex optimization for parameter synthesis in MDPs. IEEE Trans. Autom. Control 67(12), 6333ā€“6348 (2022)

    Article  MathSciNet  Google Scholar 

  34. Cubuktepe, M., Jansen, N., Junges, S., Marandi, A., Suilen, M., Topcu, U.: Robust finite-state controllers for uncertain POMDPs. In: AAAI, pp. 11792ā€“11800. AAAI Press (2021)

    Google Scholar 

  35. Daca, P., Henzinger, T.A., KřetĆ­nskĆ½, J., Petrov, T.: Faster statistical model checking for unbounded temporal properties. In: Chechik, M., Raskin, J.-F. (eds.) TACAS 2016. LNCS, vol. 9636, pp. 112ā€“129. Springer, Heidelberg (2016). https://doi.org/10.1007/978-3-662-49674-9_7

    Chapter  Google Scholar 

  36. Dehnert, C., et al.: PROPhESY: A PRObabilistic ParamEter SYnthesis tool. In: Kroening, D., Păsăreanu, C.S. (eds.) CAV 2015. LNCS, vol. 9206, pp. 214ā€“231. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-21690-4_13

    Chapter  Google Scholar 

  37. Dehnert, C., et al.: Parameter synthesis for probabilistic systems. In: MBMV, pp. 72ā€“74. Albert-Ludwigs-UniversitƤt Freiburg (2016)

    Google Scholar 

  38. Delage, A., Buffet, O., Dibangoye, J.S., Saffidine, A.: HSVI can solve zero-sum partially observable stochastic games. Dyn. Games Appl. 14, 751ā€“805 (2023)

    Article  MathSciNet  Google Scholar 

  39. Fecher, H., Leucker, M., Wolf, V.: Donā€™t Know in probabilistic systems. In: Valmari, A. (ed.) SPIN 2006. LNCS, vol. 3925, pp. 71ā€“88. Springer, Heidelberg (2006). https://doi.org/10.1007/11691617_5

    Chapter  Google Scholar 

  40. Fijalkow, N., et al.: Games on graphs. CoRR abs/2305.10546 (2023)

    Google Scholar 

  41. Forejt, V., Kwiatkowska, M., Norman, G., Parker, D.: Automated verification techniques for probabilistic systems. In: Bernardo, M., Issarny, V. (eds.) SFM 2011. LNCS, vol. 6659, pp. 53ā€“113. Springer, Heidelberg (2011). https://doi.org/10.1007/978-3-642-21455-4_3

    Chapter  Google Scholar 

  42. Gadot, U., Derman, E., Kumar, N., Elfatihi, M.M., Levy, K., Mannor, S.: Solving non-rectangular reward-robust MDPs via frequency regularization. In: AAAI, pp. 21090ā€“21098. AAAI Press (2024)

    Google Scholar 

  43. Galesloot, M.F., et al.: Pessimistic iterative planning for robust POMDPs (2024)

    Google Scholar 

  44. Ghavamzadeh, M., Petrik, M., Chow, Y.: Safe policy improvement by minimizing robust baseline regret. In: NIPS, pp. 2298ā€“2306 (2016)

    Google Scholar 

  45. Girard, A., Pappas, G.J.: Approximation metrics for discrete and continuous systems. IEEE Trans. Autom. Control 52(5), 782ā€“798 (2007)

    Article  MathSciNet  Google Scholar 

  46. Givan, R., Leach, S.M., Dean, T.L.: Bounded-parameter Markov decision processes. Artif. Intell. 122(1ā€“2), 71ā€“109 (2000)

    Article  MathSciNet  Google Scholar 

  47. Goyal, V., Grand-ClĆ©ment, J.: Robust Markov decision processes: beyond rectangularity. Math. Oper. Res. 48(1), 203ā€“226 (2023)

    Article  MathSciNet  Google Scholar 

  48. Grand-ClƩment, J., Petrik, M.: Reducing Blackwell and average optimality to discounted MDPs via the Blackwell discount factor. In: NeurIPS (2023)

    Google Scholar 

  49. Grand-ClƩment, J., Petrik, M., Vieille, N.: Beyond discounted returns: robust Markov decision processes with average and Blackwell optimality. CoRR abs/2312.03618 (2023)

    Google Scholar 

  50. Guez, A., Silver, D., Dayan, P.: Efficient Bayes-adaptive reinforcement learning using sample-based search. In: NIPS, pp. 1034ā€“1042 (2012)

    Google Scholar 

  51. Haddad, S., Monmege, B.: Reachability in MDPs: refining convergence of value iteration. In: Ouaknine, J., Potapov, I., Worrell, J. (eds.) RP 2014. LNCS, vol. 8762, pp. 125ā€“137. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-11439-2_10

    Chapter  Google Scholar 

  52. Hansson, H., Jonsson, B.: A logic for reasoning about time and reliability. Formal Aspects Comput. 6(5), 512ā€“535 (1994)

    Article  Google Scholar 

  53. Hartmanns, A., Junges, S., Quatmann, T., Weininger, M.: A practitionerā€™s guide to MDP model checking algorithms. In: Sankaranarayanan, S., Sharygina, N. (eds.) TACAS 2023. LNCS, vol. 13993, pp. 469ā€“488. Springer, Cham (2023). https://doi.org/10.1007/978-3-031-30823-9_24

    Chapter  Google Scholar 

  54. Hartmanns, A., Kaminski, B.L.: Optimistic value iteration. In: Lahiri, S.K., Wang, C. (eds.) CAV 2020. LNCS, vol. 12225, pp. 488ā€“511. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-53291-8_26

    Chapter  Google Scholar 

  55. Hartmanns, A., Klauck, M., Parker, D., Quatmann, T., Ruijters, E.: The quantitative verification benchmark set. In: Vojnar, T., Zhang, L. (eds.) TACAS 2019, Part I. LNCS, vol. 11427, pp. 344ā€“350. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-17462-0_20

    Chapter  Google Scholar 

  56. Hashemi, V., Hermanns, H., Song, L., Subramani, K., Turrini, A., Wojciechowski, P.: Compositional bisimulation minimization for interval Markov decision processes. In: Dediu, A.-H., JanouÅ”ek, J., MartĆ­n-Vide, C., Truthe, B. (eds.) LATA 2016. LNCS, vol. 9618, pp. 114ā€“126. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-30000-9_9

    Chapter  Google Scholar 

  57. Hensel, C., Junges, S., Katoen, J., Quatmann, T., Volk, M.: The probabilistic model checker storm. Int. J. Softw. Tools Technol. Transf. 24(4), 589ā€“610 (2022)

    Article  Google Scholar 

  58. Ho, C.P., Petrik, M., Wiesemann, W.: Fast Bellman updates for robust MDPs. In: ICML. Proceedings of Machine Learning Research, vol. 80, pp. 1984ā€“1993. PMLR (2018)

    Google Scholar 

  59. Ho, C.P., Petrik, M., Wiesemann, W.: Partial policy iteration for l1-robust Markov decision processes. J. Mach. Learn. Res. 22, 275:1ā€“275:46 (2021)

    Google Scholar 

  60. Ho, C.P., Petrik, M., Wiesemann, W.: Robust \(\phi \)-divergence MDPs. In: NeurIPS (2022)

    Google Scholar 

  61. Hoeffding, W.: Probability inequalities for sums of bounded random variables. J. Am. Stat. Assoc. 58, 13ā€“30 (1963)

    Article  MathSciNet  Google Scholar 

  62. HĆ¼llermeier, E., Waegeman, W.: Aleatoric and epistemic uncertainty in machine learning: an introduction to concepts and methods. Mach. Learn. 110(3), 457ā€“506 (2021)

    Article  MathSciNet  Google Scholar 

  63. Itoh, H., Nakamura, K.: Partially observable Markov decision processes with imprecise parameters. Artif. Intell. 171(8ā€“9), 453ā€“490 (2007)

    Article  MathSciNet  Google Scholar 

  64. Iyengar, G.N.: Robust dynamic programming. Math. Oper. Res. 30(2), 257ā€“280 (2005)

    Article  MathSciNet  Google Scholar 

  65. Jaeger, M., Bacci, G., Bacci, G., Larsen, K.G., Jensen, P.G.: Approximating Euclidean by imprecise Markov decision processes. In: Margaria, T., Steffen, B. (eds.) ISoLA 2020. LNCS, vol. 12476, pp. 275ā€“289. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-61362-4_15

    Chapter  Google Scholar 

  66. Jaksch, T., Ortner, R., Auer, P.: Near-optimal regret bounds for reinforcement learning. J. Mach. Learn. Res. 11, 1563ā€“1600 (2010)

    MathSciNet  Google Scholar 

  67. Jansen, N., Junges, S., Katoen, J.: Parameter synthesis in Markov models: a gentle survey. In: Raskin, J.F., Chatterjee, K., Doyen, L., Majumdar, R. (eds.) Principles of Systems Design. LNCS, vol. 13660, pp. 407ā€“437. Springer, Cham (2022). https://doi.org/10.1007/978-3-031-22337-2_20

    Chapter  Google Scholar 

  68. Jonsson, B., Larsen, K.G.: Specification and refinement of probabilistic processes. In: Proceedings of the Sixth Annual Symposium on Logic in Computer Science (LICS 1991), Amsterdam, The Netherlands, 15ā€“18 July 1991, pp. 266ā€“277. IEEE Computer Society (1991). https://doi.org/10.1109/LICS.1991.151651

  69. Junges, S., et al.: Parameter synthesis for Markov models: covering the parameter space. Formal Methods Syst. Des. 62(1), 181ā€“259 (2024)

    Article  Google Scholar 

  70. Junges, S., Katoen, J., PĆ©rez, G.A., Winkler, T.: The complexity of reachability in parametric Markov decision processes. J. Comput. Syst. Sci. 119, 183ā€“210 (2021)

    Article  MathSciNet  Google Scholar 

  71. Kaelbling, L.P., Littman, M.L., Cassandra, A.R.: Planning and acting in partially observable stochastic domains. Artif. Intell. 101(1ā€“2), 99ā€“134 (1998)

    Article  MathSciNet  Google Scholar 

  72. Katoen, J.: The probabilistic model checking landscape. In: LICS, pp. 31ā€“45. ACM (2016)

    Google Scholar 

  73. Katoen, J.-P., Klink, D., Leucker, M., Wolf, V.: Three-valued abstraction for continuous-time Markov chains. In: Damm, W., Hermanns, H. (eds.) CAV 2007. LNCS, vol. 4590, pp. 311ā€“324. Springer, Heidelberg (2007). https://doi.org/10.1007/978-3-540-73368-3_37

    Chapter  Google Scholar 

  74. Katoen, J., Klink, D., Leucker, M., Wolf, V.: Three-valued abstraction for probabilistic systems. J. Log. Algebraic Methods Program. 81(4), 356ā€“389 (2012)

    Article  MathSciNet  Google Scholar 

  75. Kattenbelt, M., Kwiatkowska, M.Z., Norman, G., Parker, D.: A game-based abstraction-refinement framework for Markov decision processes. Formal Methods Syst. Des. 36(3), 246ā€“280 (2010)

    Article  Google Scholar 

  76. Kaufman, D.L., Schaefer, A.J.: Robust modified policy iteration. INFORMS J. Comput. 25(3), 396ā€“410 (2013)

    Article  MathSciNet  Google Scholar 

  77. Kumar, N., Derman, E., Geist, M., Levy, K.Y., Mannor, S.: Policy gradient for rectangular robust Markov decision processes. In: NeurIPS (2023)

    Google Scholar 

  78. Kwiatkowska, M.Z., Norman, G., Parker, D.: Game-based abstraction for Markov decision processes. In: QEST, pp. 157ā€“166. IEEE Computer Society (2006)

    Google Scholar 

  79. Kwiatkowska, M., Norman, G., Parker, D.: Stochastic model checking. In: Bernardo, M., Hillston, J. (eds.) SFM 2007. LNCS, vol. 4486, pp. 220ā€“270. Springer, Heidelberg (2007). https://doi.org/10.1007/978-3-540-72522-0_6

    Chapter  Google Scholar 

  80. Kwiatkowska, M., Norman, G., Parker, D.: PRISM 4.0: verification of probabilistic real-time systems. In: Gopalakrishnan, G., Qadeer, S. (eds.) CAV 2011. LNCS, vol. 6806, pp. 585ā€“591. Springer, Heidelberg (2011). https://doi.org/10.1007/978-3-642-22110-1_47

    Chapter  Google Scholar 

  81. Lahijanian, M., Andersson, S.B., Belta, C.: Formal verification and synthesis for discrete-time stochastic systems. IEEE Trans. Autom. Control 60(8), 2031ā€“2045 (2015)

    Article  MathSciNet  Google Scholar 

  82. Laroche, R., Trichelair, P., des Combes, R.T.: Safe policy improvement with baseline bootstrapping. In: ICML. Proceedings of Machine Learning Research, vol. 97, pp. 3652ā€“3661. PMLR (2019)

    Google Scholar 

  83. Larsen, K.G., Skou, A.: Bisimulation through probabilistic testing. Inf. Comput. 94(1), 1ā€“28 (1991)

    Article  MathSciNet  Google Scholar 

  84. Lavaei, A., Soudjani, S., Abate, A., Zamani, M.: Automated verification and synthesis of stochastic hybrid systems: a survey. Autom. 146, 110617 (2022)

    Article  MathSciNet  Google Scholar 

  85. Lavaei, A., Soudjani, S., Frazzoli, E., Zamani, M.: Constructing MDP abstractions using data with formal guarantees. IEEE Control. Syst. Lett. 7, 460ā€“465 (2023)

    Article  MathSciNet  Google Scholar 

  86. Legay, A., Delahaye, B., Bensalem, S.: Statistical model checking: an overview. In: Barringer, H. (ed.) RV 2010. LNCS, vol. 6418, pp. 122ā€“135. Springer, Heidelberg (2010). https://doi.org/10.1007/978-3-642-16612-9_11

    Chapter  Google Scholar 

  87. Madani, O., Hanks, S., Condon, A.: On the undecidability of probabilistic planning and related stochastic optimization problems. Artif. Intell. 147(1ā€“2), 5ā€“34 (2003)

    Article  MathSciNet  Google Scholar 

  88. Mannor, S., Simester, D., Sun, P., Tsitsiklis, J.N.: Bias and variance approximation in value function estimates. Manag. Sci. 53(2), 308ā€“322 (2007)

    Article  Google Scholar 

  89. Mathiesen, F.B., Lahijanian, M., Laurenti, L.: Intervalmdp.jl: accelerated value iteration for interval Markov decision processes. Technical report. arXiv:2401.04068, arXiv (2024)

  90. Meggendorfer, T., Weininger, M., Wienhƶft, P.: What are the Odds? Improving the foundations of statistical model checking. CoRR abs/2404.05424 (2024)

    Google Scholar 

  91. Moos, J., Hansel, K., Abdulsamad, H., Stark, S., Clever, D., Peters, J.: Robust reinforcement learning: a review of foundations and recent advances. Mach. Learn. Knowl. Extr. 4(1), 276ā€“315 (2022)

    Article  Google Scholar 

  92. Nakao, H., Jiang, R., Shen, S.: Distributionally robust partially observable Markov decision process with moment-based ambiguity. SIAM J. Optim. 31(1), 461ā€“488 (2021)

    Article  MathSciNet  Google Scholar 

  93. Nilim, A., Ghaoui, L.E.: Robust control of Markov decision processes with uncertain transition matrices. Oper. Res. 53(5), 780ā€“798 (2005)

    Article  MathSciNet  Google Scholar 

  94. Osogami, T.: Robust partially observable Markov decision process. In: ICML. JMLR Workshop and Conference Proceedings, vol. 37, pp. 106ā€“115. JMLR.org (2015)

    Google Scholar 

  95. Ou, W., Bi, S.: Sequential decision-making under uncertainty: a robust MDPs review. CoRR abs/2305.10546 (2024)

    Google Scholar 

  96. Pnueli, A.: The temporal logic of programs. In: FOCS, pp. 46ā€“57. IEEE Computer Society (1977)

    Google Scholar 

  97. Ponnambalam, C.T., Oliehoek, F.A., Spaan, M.T.J.: Abstraction-guided policy recovery from expert demonstrations. In: ICAPS, pp. 560ā€“568. AAAI Press (2021)

    Google Scholar 

  98. Puggelli, A., Li, W., Sangiovanni-Vincentelli, A.L., Seshia, S.A.: Polynomial-time verification of PCTL properties of MDPs with convex uncertainties. In: Sharygina, N., Veith, H. (eds.) CAV 2013. LNCS, vol. 8044, pp. 527ā€“542. Springer, Heidelberg (2013). https://doi.org/10.1007/978-3-642-39799-8_35

    Chapter  Google Scholar 

  99. Puterman, M.L.: Markov Decision Processes: Discrete Stochastic Dynamic Programming. Wiley Series in Probability and Statistics. Wiley, Hoboken (1994)

    Google Scholar 

  100. Quatmann, T., Dehnert, C., Jansen, N., Junges, S., Katoen, J.-P.: Parameter synthesis for Markov models: faster than ever. In: Artho, C., Legay, A., Peled, D. (eds.) ATVA 2016. LNCS, vol. 9938, pp. 50ā€“67. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46520-3_4

    Chapter  Google Scholar 

  101. Quatmann, T., Katoen, J.-P.: Sound value iteration. In: Chockler, H., Weissenbacher, G. (eds.) CAV 2018. LNCS, vol. 10981, pp. 643ā€“661. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-96145-3_37

    Chapter  Google Scholar 

  102. Raskin, J., Sankur, O.: Multiple-environment Markov decision processes. In: FSTTCS. LIPIcs, vol. 29, pp. 531ā€“543. Schloss Dagstuhl - Leibniz-Zentrum fĆ¼r Informatik (2014)

    Google Scholar 

  103. Rickard, L., Abate, A., Margellos, K.: Learning robust policies for uncertain parametric Markov decision processes. CoRR abs/2312.06344 (2023)

    Google Scholar 

  104. Rigter, M., Lacerda, B., Hawes, N.: Minimax regret optimisation for robust planning in uncertain Markov decision processes. In: AAAI, pp. 11930ā€“11938. AAAI Press (2021)

    Google Scholar 

  105. Rigter, M., Lacerda, B., Hawes, N.: Risk-averse Bayes-adaptive reinforcement learning. In: NeurIPS, pp. 1142ā€“1154 (2021)

    Google Scholar 

  106. Saghafian, S.: Ambiguous partially observable Markov decision processes: structural results and applications. J. Econ. Theory 178, 1ā€“35 (2018)

    Article  MathSciNet  Google Scholar 

  107. SimĆ£o, T.D., Suilen, M., Jansen, N.: Safe policy improvement for POMDPs via finite-state controllers. In: AAAI, pp. 15109ā€“15117. AAAI Press (2023)

    Google Scholar 

  108. Strehl, A.L., Li, L., Littman, M.L.: Reinforcement learning in finite MDPs: PAC analysis. J. Mach. Learn. Res. 10, 2413ā€“2444 (2009)

    MathSciNet  Google Scholar 

  109. Strehl, A.L., Littman, M.L.: An analysis of model-based interval estimation for Markov decision processes. J. Comput. Syst. Sci. 74(8), 1309ā€“1331 (2008)

    Article  MathSciNet  Google Scholar 

  110. Suilen, M., Jansen, N., Cubuktepe, M., Topcu, U.: Robust policy synthesis for uncertain pomdps via convex optimization. In: IJCAI, pp. 4113ā€“4120. ijcai.org (2020)

    Google Scholar 

  111. Suilen, M., SimĆ£o, T.D., Parker, D., Jansen, N.: Robust anytime learning of Markov decision processes. In: NeurIPS (2022)

    Google Scholar 

  112. Suilen, M., van der Vegt, M., Junges, S.: A PSPACE algorithm for almost-sure Rabin objectives in multi-environment MDPs. CoRR abs/2407.07006 (2024)

    Google Scholar 

  113. Sutton, R.S., Barto, A.G.: Reinforcement Learning - An Introduction. Adaptive Computation and Machine Learning. MIT Press, Cambridge (1998)

    Book  Google Scholar 

  114. van der Vegt, M., Jansen, N., Junges, S.: Robust almost-sure reachability in multi-environment MDPs. In: Sankaranarayanan, S., Sharygina, N. (eds.) TACAS 2023. LNCS, vol. 13993, pp. 508ā€“526. Springer, Cham (2023). https://doi.org/10.1007/978-3-031-30823-9_26

    Chapter  Google Scholar 

  115. Wang, Q., Ho, C.P., Petrik, M.: Policy gradient in robust MDPs with global convergence guarantee. In: ICML. Proceedings of Machine Learning Research, vol. 202, pp. 35763ā€“35797. PMLR (2023)

    Google Scholar 

  116. Weissman, T., Ordentlich, E., Seroussi, G., Verdu, S., Weinberger, M.J.: Inequalities for the l1 deviation of the empirical distribution. Technical report, Hewlett-Packard Labs (2003)

    Google Scholar 

  117. Wienhƶft, P., Suilen, M., SimĆ£o, T.D., Dubslaff, C., Baier, C., Jansen, N.: More for less: safe policy improvement with stronger performance guarantees. In: IJCAI, pp. 4406ā€“4415. ijcai.org (2023)

    Google Scholar 

  118. Wiesemann, W., Kuhn, D., Rustem, B.: Robust Markov decision processes. Math. Oper. Res. 38(1), 153ā€“183 (2013)

    Article  MathSciNet  Google Scholar 

  119. Wolff, E.M., Topcu, U., Murray, R.M.: Robust control of uncertain Markov decision processes with temporal logic specifications. In: CDC, pp. 3372ā€“3379. IEEE (2012)

    Google Scholar 

  120. Wooding, B., Lavaei, A.: Impact: interval MDP parallel construction for controller synthesis of large-scale stochastic systems. CoRR abs/2401.03555 (2024)

    Google Scholar 

  121. Xu, H., Mannor, S.: Distributionally robust Markov decision processes. Math. Oper. Res. 37(2), 288ā€“300 (2012)

    Article  MathSciNet  Google Scholar 

  122. Yang, C., Littman, M.L., Carbin, M.: On the (in)tractability of reinforcement learning for LTL objectives. In: IJCAI, pp. 3650ā€“3658. ijcai.org (2022)

    Google Scholar 

Download references

Acknowledgements

This work was supported by the ERC Starting Grant 101077178 (DEUCE) and the European Unionā€™s Horizon 2020 research and innovation programme (FUN2MODEL, grant agreement No. 834115), as well as the NWO grants OCENW.KLEIN.187 and NWA.1160.18.238 (PrimaVera).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Marnix Suilen .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

Ā© 2025 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this chapter

Check for updates. Verify currency and authenticity via CrossMark

Cite this chapter

Suilen, M., Badings, T., Bovy, E.M., Parker, D., Jansen, N. (2025). Robust Markov Decision Processes: A Place Where AI and Formal Methods Meet. In: Jansen, N., et al. Principles of Verification: Cycling the Probabilistic Landscape . Lecture Notes in Computer Science, vol 15262. Springer, Cham. https://doi.org/10.1007/978-3-031-75778-5_7

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-75778-5_7

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-75777-8

  • Online ISBN: 978-3-031-75778-5

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics