Robust Markov Decision Processes: A Place Where AI and Formal Methods Meet

Suilen, Marnix; Badings, Thom; Bovy, Eline M.; Parker, David; Jansen, Nils

doi:10.1007/978-3-031-75778-5_7

Marnix Suilen¹⁵,
Thom Badings¹⁵,
Eline M. Bovy¹⁵,
David Parker¹⁶ &
…
Nils Jansen^15,17

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 15262))

180 Accesses

Abstract

Markov decision processes (MDPs) are a standard model for sequential decision-making problems and are widely used across many scientific areas, including formal methods and artificial intelligence (AI). MDPs do, however, come with the restrictive assumption that the transition probabilities need to be precisely known. Robust MDPs (RMDPs) overcome this assumption by instead defining the transition probabilities to belong to some uncertainty set. We present a gentle survey on RMDPs, providing a tutorial covering their fundamentals. In particular, we discuss RMDP semantics and how to solve them by extending standard MDP methods such as value iteration and policy iteration. We also discuss how RMDPs relate to other models and how they are used in several contexts, including reinforcement learning and abstraction techniques. We conclude with some challenges for future work on RMDPs.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 64.99; Price excludes VAT (USA)

Softcover Book: USD 79.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Decision-making under uncertainty: beyond probabilities

Article Open access 30 May 2023

Probabilistic Majorization of Partially Observable Markov Decision Processes

Domain independent heuristics for online stochastic contingent planning

Article Open access 08 July 2024

References

Abate, A., Prandini, M., Lygeros, J., Sastry, S.: Probabilistic reachability and safety for controlled discrete time stochastic hybrid systems. Automatica 44(11), 2724–2734 (2008)
Article MathSciNet Google Scholar
Alur, R., Henzinger, T.A., Lafferriere, G., Pappas, G.J.: Discrete abstractions of hybrid systems. Proc. IEEE 88(7), 971–984 (2000)
Article Google Scholar
Andrés, I., de Barros, L.N., Mauá, D.D., Simão, T.D.: When a robot reaches out for human help. In: Simari, G.R., Fermé, E., Gutiérrez Segura, F., Rodríguez Melquiades, J.A. (eds.) IBERAMIA 2018. LNCS (LNAI), vol. 11238, pp. 277–289. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-03928-8_23
Chapter Google Scholar
Andriushchenko, R., et al.: Tools at the frontiers of quantitative verification. CoRR abs/2405.13583 (2024)
Google Scholar
Arming, S., Bartocci, E., Chatterjee, K., Katoen, J.-P., Sokolova, A.: Parameter-independent strategies for pMDPs via POMDPs. In: McIver, A., Horvath, A. (eds.) QEST 2018. LNCS, vol. 11024, pp. 53–70. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-99154-2_4
Chapter Google Scholar
Ashok, P., Křetínský, J., Weininger, M.: PAC statistical model checking for Markov decision processes and stochastic games. In: Dillig, I., Tasiran, S. (eds.) CAV 2019. LNCS, vol. 11561, pp. 497–519. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-25540-4_29
Chapter Google Scholar
Badings, T.S., Cubuktepe, M., Jansen, N., Junges, S., Katoen, J., Topcu, U.: Scenario-based verification of uncertain parametric MDPs. Int. J. Softw. Tools Technol. Transf. 24(5), 803–819 (2022)
Article Google Scholar
Badings, T.S., Jansen, N., Junges, S., Stoelinga, M., Volk, M.: Sampling-based verification of CTMCs with uncertain rates. In: Shoham, S., Vizel, Y. (eds.) CAV 2022. LNCS, vol. 13372, pp. 26–47. Springer, Cham (2022). https://doi.org/10.1007/978-3-031-13188-2_2
Chapter Google Scholar
Badings, T.S., Romao, L., Abate, A., Jansen, N.: Probabilities are not enough: formal controller synthesis for stochastic dynamical models with epistemic uncertainty. In: AAAI, pp. 14701–14710. AAAI Press (2023)
Google Scholar
Badings, T.S., et al.: Robust control for dynamical systems with non-gaussian noise via formal abstractions. J. Artif. Intell. Res. 76, 341–391 (2023)
Article MathSciNet Google Scholar
Badings, T.S., Simão, T.D., Suilen, M., Jansen, N.: Decision-making under uncertainty: beyond probabilities. Int. J. Softw. Tools Technol. Transf. 25(3), 375–391 (2023)
Article Google Scholar
Baier, C., Bertrand, N., Größer, M.: On decision problems for probabilistic Büchi automata. In: Amadio, R. (ed.) FoSSaCS 2008. LNCS, vol. 4962, pp. 287–301. Springer, Heidelberg (2008). https://doi.org/10.1007/978-3-540-78499-9_21
Chapter Google Scholar
Baier, C., Hermanns, H., Katoen, J.-P.: The 10,000 facets of MDP model checking. In: Steffen, B., Woeginger, G. (eds.) Computing and Software Science. LNCS, vol. 10000, pp. 420–451. Springer, Cham (2019). https://doi.org/10.1007/978-3-319-91908-9_21
Chapter Google Scholar
Baier, C., Katoen, J.: Principles of Model Checking. MIT Press, Cambridge (2008)
Google Scholar
Baier, C., Klein, J., Leuschner, L., Parker, D., Wunderlich, S.: Ensuring the reliability of your model checker: interval iteration for Markov decision processes. In: Majumdar, R., Kunčak, V. (eds.) CAV 2017. LNCS, vol. 10426, pp. 160–180. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-63387-9_8
Chapter Google Scholar
Behzadian, B., Petrik, M., Ho, C.P.: Fast algorithms for $l_{\infty }$-constrained S-rectangular robust MDPs. In: NeurIPS, pp. 25982–25992 (2021)
Google Scholar
Bovy, E.M., Suilen, M., Junges, S., Jansen, N.: Imprecise probabilities meet partial observability: game semantics for robust POMDPs. CoRR abs/2405.04941 (2024)
Google Scholar
Brázdil, T., et al.: Verification of Markov decision processes using learning algorithms. In: Cassez, F., Raskin, J.-F. (eds.) ATVA 2014. LNCS, vol. 8837, pp. 98–114. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-11936-6_8
Chapter Google Scholar
Campi, M.C., Carè, A., Garatti, S.: The scenario approach: a tool at the service of data-driven decision making. Annu. Rev. Control. 52, 1–17 (2021)
Article MathSciNet Google Scholar
Campi, M.C., Garatti, S.: The exact feasibility of randomized solutions of uncertain convex programs. SIAM J. Optim. 19(3), 1211–1230 (2008)
Article MathSciNet Google Scholar
Cauchi, N., Abate, A.: $\sf StocHy$: automated verification and synthesis of stochastic processes. In: Vojnar, T., Zhang, L. (eds.) TACAS 2019. LNCS, vol. 11428, pp. 247–264. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-17465-1_14
Chapter Google Scholar
Chamie, M.E., Mostafa, H.: Robust action selection in partially observable Markov decision processes with model uncertainty. In: CDC, pp. 5586–5591. IEEE (2018)
Google Scholar
Chatterjee, K., Chmelík, M., Karkhanis, D., Novotný, P., Royer, A.: Multiple-environment Markov decision processes: efficient analysis and applications. In: ICAPS, pp. 48–56. AAAI Press (2020)
Google Scholar
Chatterjee, K., Doyen, L., Henzinger, T.A.: Qualitative analysis of partially-observable Markov decision processes. In: Hliněný, P., Kučera, A. (eds.) MFCS 2010. LNCS, vol. 6281, pp. 258–269. Springer, Heidelberg (2010). https://doi.org/10.1007/978-3-642-15155-2_24
Chapter Google Scholar
Chatterjee, K., Goharshady, E.K., Karrabi, M., Novotný, P., Zikelic, D.: Solving long-run average reward robust MDPs via stochastic games. CoRR abs/2312.13912 (2023)
Google Scholar
Chen, T., Han, T., Kwiatkowska, M.Z.: On the complexity of model checking interval-valued discrete time Markov chains. Inf. Process. Lett. 113(7), 210–216 (2013)
Article MathSciNet Google Scholar
Clarke, E.M., Klieber, W., Nováček, M., Zuliani, P.: Model checking and the state explosion problem. In: Meyer, B., Nordio, M. (eds.) LASER 2011. LNCS, vol. 7682, pp. 1–30. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-642-35746-6_1
Chapter Google Scholar
Coppola, R., Peruffo, A., Romao, L., Abate, A., Jr., M.M.: Data-driven interval MDP for robust control synthesis. CoRR abs/2404.08344 (2024)
Google Scholar
Costen, C., Rigter, M., Lacerda, B., Hawes, N.: Planning with hidden parameter polynomial MDPs. In: AAAI, pp. 11963–11971. AAAI Press (2023)
Google Scholar
Cubuktepe, M., et al.: Sequential convex programming for the efficient verification of parametric MDPs. In: Legay, A., Margaria, T. (eds.) TACAS 2017. LNCS, vol. 10206, pp. 133–150. Springer, Heidelberg (2017). https://doi.org/10.1007/978-3-662-54580-5_8
Chapter Google Scholar
Cubuktepe, M., Jansen, N., Junges, S., Katoen, J.-P., Topcu, U.: Synthesis in pMDPs: a tale of 1001 parameters. In: Lahiri, S.K., Wang, C. (eds.) ATVA 2018. LNCS, vol. 11138, pp. 160–176. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01090-4_10
Chapter Google Scholar
Cubuktepe, M., Jansen, N., Junges, S., Katoen, J.-P., Topcu, U.: Scenario-based verification of uncertain MDPs. In: Biere, A., Parker, D. (eds.) TACAS 2020. LNCS, vol. 12078, pp. 287–305. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-45190-5_16
Chapter Google Scholar
Cubuktepe, M., Jansen, N., Junges, S., Katoen, J., Topcu, U.: Convex optimization for parameter synthesis in MDPs. IEEE Trans. Autom. Control 67(12), 6333–6348 (2022)
Article MathSciNet Google Scholar
Cubuktepe, M., Jansen, N., Junges, S., Marandi, A., Suilen, M., Topcu, U.: Robust finite-state controllers for uncertain POMDPs. In: AAAI, pp. 11792–11800. AAAI Press (2021)
Google Scholar
Daca, P., Henzinger, T.A., Křetínský, J., Petrov, T.: Faster statistical model checking for unbounded temporal properties. In: Chechik, M., Raskin, J.-F. (eds.) TACAS 2016. LNCS, vol. 9636, pp. 112–129. Springer, Heidelberg (2016). https://doi.org/10.1007/978-3-662-49674-9_7
Chapter Google Scholar
Dehnert, C., et al.: PROPhESY: A PRObabilistic ParamEter SYnthesis tool. In: Kroening, D., Păsăreanu, C.S. (eds.) CAV 2015. LNCS, vol. 9206, pp. 214–231. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-21690-4_13
Chapter Google Scholar
Dehnert, C., et al.: Parameter synthesis for probabilistic systems. In: MBMV, pp. 72–74. Albert-Ludwigs-Universität Freiburg (2016)
Google Scholar
Delage, A., Buffet, O., Dibangoye, J.S., Saffidine, A.: HSVI can solve zero-sum partially observable stochastic games. Dyn. Games Appl. 14, 751–805 (2023)
Article MathSciNet Google Scholar
Fecher, H., Leucker, M., Wolf, V.: Don’t Know in probabilistic systems. In: Valmari, A. (ed.) SPIN 2006. LNCS, vol. 3925, pp. 71–88. Springer, Heidelberg (2006). https://doi.org/10.1007/11691617_5
Chapter Google Scholar
Fijalkow, N., et al.: Games on graphs. CoRR abs/2305.10546 (2023)
Google Scholar
Forejt, V., Kwiatkowska, M., Norman, G., Parker, D.: Automated verification techniques for probabilistic systems. In: Bernardo, M., Issarny, V. (eds.) SFM 2011. LNCS, vol. 6659, pp. 53–113. Springer, Heidelberg (2011). https://doi.org/10.1007/978-3-642-21455-4_3
Chapter Google Scholar
Gadot, U., Derman, E., Kumar, N., Elfatihi, M.M., Levy, K., Mannor, S.: Solving non-rectangular reward-robust MDPs via frequency regularization. In: AAAI, pp. 21090–21098. AAAI Press (2024)
Google Scholar
Galesloot, M.F., et al.: Pessimistic iterative planning for robust POMDPs (2024)
Google Scholar
Ghavamzadeh, M., Petrik, M., Chow, Y.: Safe policy improvement by minimizing robust baseline regret. In: NIPS, pp. 2298–2306 (2016)
Google Scholar
Girard, A., Pappas, G.J.: Approximation metrics for discrete and continuous systems. IEEE Trans. Autom. Control 52(5), 782–798 (2007)
Article MathSciNet Google Scholar
Givan, R., Leach, S.M., Dean, T.L.: Bounded-parameter Markov decision processes. Artif. Intell. 122(1–2), 71–109 (2000)
Article MathSciNet Google Scholar
Goyal, V., Grand-Clément, J.: Robust Markov decision processes: beyond rectangularity. Math. Oper. Res. 48(1), 203–226 (2023)
Article MathSciNet Google Scholar
Grand-Clément, J., Petrik, M.: Reducing Blackwell and average optimality to discounted MDPs via the Blackwell discount factor. In: NeurIPS (2023)
Google Scholar
Grand-Clément, J., Petrik, M., Vieille, N.: Beyond discounted returns: robust Markov decision processes with average and Blackwell optimality. CoRR abs/2312.03618 (2023)
Google Scholar
Guez, A., Silver, D., Dayan, P.: Efficient Bayes-adaptive reinforcement learning using sample-based search. In: NIPS, pp. 1034–1042 (2012)
Google Scholar
Haddad, S., Monmege, B.: Reachability in MDPs: refining convergence of value iteration. In: Ouaknine, J., Potapov, I., Worrell, J. (eds.) RP 2014. LNCS, vol. 8762, pp. 125–137. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-11439-2_10
Chapter Google Scholar
Hansson, H., Jonsson, B.: A logic for reasoning about time and reliability. Formal Aspects Comput. 6(5), 512–535 (1994)
Article Google Scholar
Hartmanns, A., Junges, S., Quatmann, T., Weininger, M.: A practitioner’s guide to MDP model checking algorithms. In: Sankaranarayanan, S., Sharygina, N. (eds.) TACAS 2023. LNCS, vol. 13993, pp. 469–488. Springer, Cham (2023). https://doi.org/10.1007/978-3-031-30823-9_24
Chapter Google Scholar
Hartmanns, A., Kaminski, B.L.: Optimistic value iteration. In: Lahiri, S.K., Wang, C. (eds.) CAV 2020. LNCS, vol. 12225, pp. 488–511. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-53291-8_26
Chapter Google Scholar
Hartmanns, A., Klauck, M., Parker, D., Quatmann, T., Ruijters, E.: The quantitative verification benchmark set. In: Vojnar, T., Zhang, L. (eds.) TACAS 2019, Part I. LNCS, vol. 11427, pp. 344–350. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-17462-0_20
Chapter Google Scholar
Hashemi, V., Hermanns, H., Song, L., Subramani, K., Turrini, A., Wojciechowski, P.: Compositional bisimulation minimization for interval Markov decision processes. In: Dediu, A.-H., Janoušek, J., Martín-Vide, C., Truthe, B. (eds.) LATA 2016. LNCS, vol. 9618, pp. 114–126. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-30000-9_9
Chapter Google Scholar
Hensel, C., Junges, S., Katoen, J., Quatmann, T., Volk, M.: The probabilistic model checker storm. Int. J. Softw. Tools Technol. Transf. 24(4), 589–610 (2022)
Article Google Scholar
Ho, C.P., Petrik, M., Wiesemann, W.: Fast Bellman updates for robust MDPs. In: ICML. Proceedings of Machine Learning Research, vol. 80, pp. 1984–1993. PMLR (2018)
Google Scholar
Ho, C.P., Petrik, M., Wiesemann, W.: Partial policy iteration for l1-robust Markov decision processes. J. Mach. Learn. Res. 22, 275:1–275:46 (2021)
Google Scholar
Ho, C.P., Petrik, M., Wiesemann, W.: Robust $\phi $-divergence MDPs. In: NeurIPS (2022)
Google Scholar
Hoeffding, W.: Probability inequalities for sums of bounded random variables. J. Am. Stat. Assoc. 58, 13–30 (1963)
Article MathSciNet Google Scholar
Hüllermeier, E., Waegeman, W.: Aleatoric and epistemic uncertainty in machine learning: an introduction to concepts and methods. Mach. Learn. 110(3), 457–506 (2021)
Article MathSciNet Google Scholar
Itoh, H., Nakamura, K.: Partially observable Markov decision processes with imprecise parameters. Artif. Intell. 171(8–9), 453–490 (2007)
Article MathSciNet Google Scholar
Iyengar, G.N.: Robust dynamic programming. Math. Oper. Res. 30(2), 257–280 (2005)
Article MathSciNet Google Scholar
Jaeger, M., Bacci, G., Bacci, G., Larsen, K.G., Jensen, P.G.: Approximating Euclidean by imprecise Markov decision processes. In: Margaria, T., Steffen, B. (eds.) ISoLA 2020. LNCS, vol. 12476, pp. 275–289. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-61362-4_15
Chapter Google Scholar
Jaksch, T., Ortner, R., Auer, P.: Near-optimal regret bounds for reinforcement learning. J. Mach. Learn. Res. 11, 1563–1600 (2010)
MathSciNet Google Scholar
Jansen, N., Junges, S., Katoen, J.: Parameter synthesis in Markov models: a gentle survey. In: Raskin, J.F., Chatterjee, K., Doyen, L., Majumdar, R. (eds.) Principles of Systems Design. LNCS, vol. 13660, pp. 407–437. Springer, Cham (2022). https://doi.org/10.1007/978-3-031-22337-2_20
Chapter Google Scholar
Jonsson, B., Larsen, K.G.: Specification and refinement of probabilistic processes. In: Proceedings of the Sixth Annual Symposium on Logic in Computer Science (LICS 1991), Amsterdam, The Netherlands, 15–18 July 1991, pp. 266–277. IEEE Computer Society (1991). https://doi.org/10.1109/LICS.1991.151651
Junges, S., et al.: Parameter synthesis for Markov models: covering the parameter space. Formal Methods Syst. Des. 62(1), 181–259 (2024)
Article Google Scholar
Junges, S., Katoen, J., Pérez, G.A., Winkler, T.: The complexity of reachability in parametric Markov decision processes. J. Comput. Syst. Sci. 119, 183–210 (2021)
Article MathSciNet Google Scholar
Kaelbling, L.P., Littman, M.L., Cassandra, A.R.: Planning and acting in partially observable stochastic domains. Artif. Intell. 101(1–2), 99–134 (1998)
Article MathSciNet Google Scholar
Katoen, J.: The probabilistic model checking landscape. In: LICS, pp. 31–45. ACM (2016)
Google Scholar
Katoen, J.-P., Klink, D., Leucker, M., Wolf, V.: Three-valued abstraction for continuous-time Markov chains. In: Damm, W., Hermanns, H. (eds.) CAV 2007. LNCS, vol. 4590, pp. 311–324. Springer, Heidelberg (2007). https://doi.org/10.1007/978-3-540-73368-3_37
Chapter Google Scholar
Katoen, J., Klink, D., Leucker, M., Wolf, V.: Three-valued abstraction for probabilistic systems. J. Log. Algebraic Methods Program. 81(4), 356–389 (2012)
Article MathSciNet Google Scholar
Kattenbelt, M., Kwiatkowska, M.Z., Norman, G., Parker, D.: A game-based abstraction-refinement framework for Markov decision processes. Formal Methods Syst. Des. 36(3), 246–280 (2010)
Article Google Scholar
Kaufman, D.L., Schaefer, A.J.: Robust modified policy iteration. INFORMS J. Comput. 25(3), 396–410 (2013)
Article MathSciNet Google Scholar
Kumar, N., Derman, E., Geist, M., Levy, K.Y., Mannor, S.: Policy gradient for rectangular robust Markov decision processes. In: NeurIPS (2023)
Google Scholar
Kwiatkowska, M.Z., Norman, G., Parker, D.: Game-based abstraction for Markov decision processes. In: QEST, pp. 157–166. IEEE Computer Society (2006)
Google Scholar
Kwiatkowska, M., Norman, G., Parker, D.: Stochastic model checking. In: Bernardo, M., Hillston, J. (eds.) SFM 2007. LNCS, vol. 4486, pp. 220–270. Springer, Heidelberg (2007). https://doi.org/10.1007/978-3-540-72522-0_6
Chapter Google Scholar
Kwiatkowska, M., Norman, G., Parker, D.: PRISM 4.0: verification of probabilistic real-time systems. In: Gopalakrishnan, G., Qadeer, S. (eds.) CAV 2011. LNCS, vol. 6806, pp. 585–591. Springer, Heidelberg (2011). https://doi.org/10.1007/978-3-642-22110-1_47
Chapter Google Scholar
Lahijanian, M., Andersson, S.B., Belta, C.: Formal verification and synthesis for discrete-time stochastic systems. IEEE Trans. Autom. Control 60(8), 2031–2045 (2015)
Article MathSciNet Google Scholar
Laroche, R., Trichelair, P., des Combes, R.T.: Safe policy improvement with baseline bootstrapping. In: ICML. Proceedings of Machine Learning Research, vol. 97, pp. 3652–3661. PMLR (2019)
Google Scholar
Larsen, K.G., Skou, A.: Bisimulation through probabilistic testing. Inf. Comput. 94(1), 1–28 (1991)
Article MathSciNet Google Scholar
Lavaei, A., Soudjani, S., Abate, A., Zamani, M.: Automated verification and synthesis of stochastic hybrid systems: a survey. Autom. 146, 110617 (2022)
Article MathSciNet Google Scholar
Lavaei, A., Soudjani, S., Frazzoli, E., Zamani, M.: Constructing MDP abstractions using data with formal guarantees. IEEE Control. Syst. Lett. 7, 460–465 (2023)
Article MathSciNet Google Scholar
Legay, A., Delahaye, B., Bensalem, S.: Statistical model checking: an overview. In: Barringer, H. (ed.) RV 2010. LNCS, vol. 6418, pp. 122–135. Springer, Heidelberg (2010). https://doi.org/10.1007/978-3-642-16612-9_11
Chapter Google Scholar
Madani, O., Hanks, S., Condon, A.: On the undecidability of probabilistic planning and related stochastic optimization problems. Artif. Intell. 147(1–2), 5–34 (2003)
Article MathSciNet Google Scholar
Mannor, S., Simester, D., Sun, P., Tsitsiklis, J.N.: Bias and variance approximation in value function estimates. Manag. Sci. 53(2), 308–322 (2007)
Article Google Scholar
Mathiesen, F.B., Lahijanian, M., Laurenti, L.: Intervalmdp.jl: accelerated value iteration for interval Markov decision processes. Technical report. arXiv:2401.04068, arXiv (2024)
Meggendorfer, T., Weininger, M., Wienhöft, P.: What are the Odds? Improving the foundations of statistical model checking. CoRR abs/2404.05424 (2024)
Google Scholar
Moos, J., Hansel, K., Abdulsamad, H., Stark, S., Clever, D., Peters, J.: Robust reinforcement learning: a review of foundations and recent advances. Mach. Learn. Knowl. Extr. 4(1), 276–315 (2022)
Article Google Scholar
Nakao, H., Jiang, R., Shen, S.: Distributionally robust partially observable Markov decision process with moment-based ambiguity. SIAM J. Optim. 31(1), 461–488 (2021)
Article MathSciNet Google Scholar
Nilim, A., Ghaoui, L.E.: Robust control of Markov decision processes with uncertain transition matrices. Oper. Res. 53(5), 780–798 (2005)
Article MathSciNet Google Scholar
Osogami, T.: Robust partially observable Markov decision process. In: ICML. JMLR Workshop and Conference Proceedings, vol. 37, pp. 106–115. JMLR.org (2015)
Google Scholar
Ou, W., Bi, S.: Sequential decision-making under uncertainty: a robust MDPs review. CoRR abs/2305.10546 (2024)
Google Scholar
Pnueli, A.: The temporal logic of programs. In: FOCS, pp. 46–57. IEEE Computer Society (1977)
Google Scholar
Ponnambalam, C.T., Oliehoek, F.A., Spaan, M.T.J.: Abstraction-guided policy recovery from expert demonstrations. In: ICAPS, pp. 560–568. AAAI Press (2021)
Google Scholar
Puggelli, A., Li, W., Sangiovanni-Vincentelli, A.L., Seshia, S.A.: Polynomial-time verification of PCTL properties of MDPs with convex uncertainties. In: Sharygina, N., Veith, H. (eds.) CAV 2013. LNCS, vol. 8044, pp. 527–542. Springer, Heidelberg (2013). https://doi.org/10.1007/978-3-642-39799-8_35
Chapter Google Scholar
Puterman, M.L.: Markov Decision Processes: Discrete Stochastic Dynamic Programming. Wiley Series in Probability and Statistics. Wiley, Hoboken (1994)
Google Scholar
Quatmann, T., Dehnert, C., Jansen, N., Junges, S., Katoen, J.-P.: Parameter synthesis for Markov models: faster than ever. In: Artho, C., Legay, A., Peled, D. (eds.) ATVA 2016. LNCS, vol. 9938, pp. 50–67. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46520-3_4
Chapter Google Scholar
Quatmann, T., Katoen, J.-P.: Sound value iteration. In: Chockler, H., Weissenbacher, G. (eds.) CAV 2018. LNCS, vol. 10981, pp. 643–661. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-96145-3_37
Chapter Google Scholar
Raskin, J., Sankur, O.: Multiple-environment Markov decision processes. In: FSTTCS. LIPIcs, vol. 29, pp. 531–543. Schloss Dagstuhl - Leibniz-Zentrum für Informatik (2014)
Google Scholar
Rickard, L., Abate, A., Margellos, K.: Learning robust policies for uncertain parametric Markov decision processes. CoRR abs/2312.06344 (2023)
Google Scholar
Rigter, M., Lacerda, B., Hawes, N.: Minimax regret optimisation for robust planning in uncertain Markov decision processes. In: AAAI, pp. 11930–11938. AAAI Press (2021)
Google Scholar
Rigter, M., Lacerda, B., Hawes, N.: Risk-averse Bayes-adaptive reinforcement learning. In: NeurIPS, pp. 1142–1154 (2021)
Google Scholar
Saghafian, S.: Ambiguous partially observable Markov decision processes: structural results and applications. J. Econ. Theory 178, 1–35 (2018)
Article MathSciNet Google Scholar
Simão, T.D., Suilen, M., Jansen, N.: Safe policy improvement for POMDPs via finite-state controllers. In: AAAI, pp. 15109–15117. AAAI Press (2023)
Google Scholar
Strehl, A.L., Li, L., Littman, M.L.: Reinforcement learning in finite MDPs: PAC analysis. J. Mach. Learn. Res. 10, 2413–2444 (2009)
MathSciNet Google Scholar
Strehl, A.L., Littman, M.L.: An analysis of model-based interval estimation for Markov decision processes. J. Comput. Syst. Sci. 74(8), 1309–1331 (2008)
Article MathSciNet Google Scholar
Suilen, M., Jansen, N., Cubuktepe, M., Topcu, U.: Robust policy synthesis for uncertain pomdps via convex optimization. In: IJCAI, pp. 4113–4120. ijcai.org (2020)
Google Scholar
Suilen, M., Simão, T.D., Parker, D., Jansen, N.: Robust anytime learning of Markov decision processes. In: NeurIPS (2022)
Google Scholar
Suilen, M., van der Vegt, M., Junges, S.: A PSPACE algorithm for almost-sure Rabin objectives in multi-environment MDPs. CoRR abs/2407.07006 (2024)
Google Scholar
Sutton, R.S., Barto, A.G.: Reinforcement Learning - An Introduction. Adaptive Computation and Machine Learning. MIT Press, Cambridge (1998)
Book Google Scholar
van der Vegt, M., Jansen, N., Junges, S.: Robust almost-sure reachability in multi-environment MDPs. In: Sankaranarayanan, S., Sharygina, N. (eds.) TACAS 2023. LNCS, vol. 13993, pp. 508–526. Springer, Cham (2023). https://doi.org/10.1007/978-3-031-30823-9_26
Chapter Google Scholar
Wang, Q., Ho, C.P., Petrik, M.: Policy gradient in robust MDPs with global convergence guarantee. In: ICML. Proceedings of Machine Learning Research, vol. 202, pp. 35763–35797. PMLR (2023)
Google Scholar
Weissman, T., Ordentlich, E., Seroussi, G., Verdu, S., Weinberger, M.J.: Inequalities for the l1 deviation of the empirical distribution. Technical report, Hewlett-Packard Labs (2003)
Google Scholar
Wienhöft, P., Suilen, M., Simão, T.D., Dubslaff, C., Baier, C., Jansen, N.: More for less: safe policy improvement with stronger performance guarantees. In: IJCAI, pp. 4406–4415. ijcai.org (2023)
Google Scholar
Wiesemann, W., Kuhn, D., Rustem, B.: Robust Markov decision processes. Math. Oper. Res. 38(1), 153–183 (2013)
Article MathSciNet Google Scholar
Wolff, E.M., Topcu, U., Murray, R.M.: Robust control of uncertain Markov decision processes with temporal logic specifications. In: CDC, pp. 3372–3379. IEEE (2012)
Google Scholar
Wooding, B., Lavaei, A.: Impact: interval MDP parallel construction for controller synthesis of large-scale stochastic systems. CoRR abs/2401.03555 (2024)
Google Scholar
Xu, H., Mannor, S.: Distributionally robust Markov decision processes. Math. Oper. Res. 37(2), 288–300 (2012)
Article MathSciNet Google Scholar
Yang, C., Littman, M.L., Carbin, M.: On the (in)tractability of reinforcement learning for LTL objectives. In: IJCAI, pp. 3650–3658. ijcai.org (2022)
Google Scholar

Download references

Acknowledgements

This work was supported by the ERC Starting Grant 101077178 (DEUCE) and the European Union’s Horizon 2020 research and innovation programme (FUN2MODEL, grant agreement No. 834115), as well as the NWO grants OCENW.KLEIN.187 and NWA.1160.18.238 (PrimaVera).

Author information

Authors and Affiliations

Radboud University, Nijmegen, The Netherlands
Marnix Suilen, Thom Badings, Eline M. Bovy & Nils Jansen
University of Oxford, Oxford, UK
David Parker
Ruhr-University Bochum, Bochum, Germany
Nils Jansen

Authors

Marnix Suilen
View author publications
You can also search for this author in PubMed Google Scholar
Thom Badings
View author publications
You can also search for this author in PubMed Google Scholar
Eline M. Bovy
View author publications
You can also search for this author in PubMed Google Scholar
David Parker
View author publications
You can also search for this author in PubMed Google Scholar
Nils Jansen
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Marnix Suilen .

Editor information

Editors and Affiliations

Ruhr University Bochum and Radboud University Nijmegen, Bochum, Germany
Nils Jansen
Radboud University Nijmegen, Nijmegen, The Netherlands
Sebastian Junges
Saarland University and University College London, Saarbrücken, Germany
Benjamin Lucien Kaminski
University of Oldenburg, Oldenburg, Germany
Christoph Matheja
RWTH Aachen University, Aachen, Germany
Thomas Noll
RWTH Aachen University, Aachen, Germany
Tim Quatmann
University of Twente and Radboud University Nijmegen, Enschede, The Netherlands
Mariëlle Stoelinga
Eindhoven University of Technology, Eindhoven, The Netherlands
Matthias Volk

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Suilen, M., Badings, T., Bovy, E.M., Parker, D., Jansen, N. (2025). Robust Markov Decision Processes: A Place Where AI and Formal Methods Meet. In: Jansen, N., et al. Principles of Verification: Cycling the Probabilistic Landscape . Lecture Notes in Computer Science, vol 15262. Springer, Cham. https://doi.org/10.1007/978-3-031-75778-5_7

Download citation

DOI: https://doi.org/10.1007/978-3-031-75778-5_7
Published: 18 November 2024
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-75777-8
Online ISBN: 978-3-031-75778-5
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Robust Markov Decision Processes: A Place Where AI and Formal Methods Meet

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Decision-making under uncertainty: beyond probabilities

Probabilistic Majorization of Partially Observable Markov Decision Processes

Domain independent heuristics for online stochastic contingent planning

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this chapter

Cite this chapter

Download citation

Publish with us

Subscribe and save

Buy Now

Navigation

Robust Markov Decision Processes: A Place Where AI and Formal Methods Meet

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Decision-making under uncertainty: beyond probabilities

Probabilistic Majorization of Partially Observable Markov Decision Processes

Domain independent heuristics for online stochastic contingent planning

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this chapter

Cite this chapter

Download citation

Share this chapter

Publish with us

Search

Navigation