Abstract
We focus on the planning and verification problems for very large probabilistic systems, such as Markov decision processes (MDPs), from a complexity point of view. More precisely, we deal with the problem of designing an efficient approximation method to compute a near-optimal policy for the planning problem in discounted MDPs and the satisfaction probabilities of interesting properties, like reachability or safety, over the Markov chain obtained by restricting the MDP to the near-optimal policy. In this paper, we present two different approaches. The first one is based on sparse sampling while the second uses a variant of the multiplicative weights update algorithm. The complexity of the first approximation method is independent of the size of the state space and uses only a probabilistic generator of the MDP. We give a complete analysis of this approach, for which the control parameter is mainly the targeted quality of the approximation. The second approach is more prospective and is different in the sense that the method can be controlled dynamically by observing its speed of convergence. Parts of this paper have already been presented in Lassaigne and Peyronnet (in Proceedings of the ACM Symposium on applied computing, SAC 2012, pp 1314–1319, ACM 2012), by the same authors.
Similar content being viewed by others
References
Alur, R., Henzinger, T.A.: Reactive modules. Formal Methods System Design 15(1), 7–48 (1999)
Andrea, B., Luca De A.: Model checking of probabalistic and nondeterministic systems. In: Proc. 15th conference on Foundations of Software Technology and Theoretical Computer Science (FSTTCS), volume 1026 of Lecture Notes in Computer Science, pp. 499–513. Springer, Berlin (1995)
Arora, S., Hazan, E., Kale, S.: The multiplicative weights update method: a meta-algorithm and applications. Theory Comput. 8(1), 121–164 (2012)
Auer, P., Cesa-Bianchi, N., Fischer, P.: Finite-time analysis of the multiarmed bandit problem. Mach. Learn. 47(2–3), 235–256 (2002)
Bertsekas, D.P., Castanon, D.A.: Rollout algorithms for stochastic scheduling problems. J. Heuristics 5(1), 89–108 (1999)
Chang, H.S., Fu, M.C., Hu, J., Marcus, S.I.: A survey of some simulation-based algorithms for markov decision processes. Commun. Inf. Systems 7(1), 59–92 (2007)
Courcoubetis, C., Yannakakis, M.: The complexity of probabilistic verification. J. ACM (JACM) 42(4), 857–907 (1995)
Fearnley, J.: Exponential lower bounds for policy iteration. In: Automata, Languages and Programming, 37th International Colloquium, ICALP 2010, Bordeaux, France, July 6–10, 2010, Proceedings, Part II, volume 6199 of Lecture Notes in Computer Science, pp. 551–562. Springer, Berlin (2010)
Friedmann, O.: An exponential lower bound for the parity game strategy improvement algorithm as we know it. In: Proceedings of the 24th Annual IEEE Symposium on Logic in Computer Science, LICS 2009, 11–14 August 2009, Los Angeles, CA, USA, pp. 145–156. IEEE Computer Society (2009)
Guirado, G., Hérault, T., Lassaigne, R., Peyronnet, S.: Distribution, approximation and probabilistic model checking. Electr. Notes Theor. Comput. Sci. - Proc. of Parallel and Distributed Model Checking (PDMC) 135(2), 19–30 (2006)
Hamidouche, K., Borghi, A., Esterie, P., Falcou, J., Peyronnet, S.: Three high performance architectures in the parallel approximate probabilistic model checking boat. In: Proc. of Parallel and Distributed Model Checking (PDMC) (2010)
Hansen, T.D., Miltersen, P.B., Zwick, U.: Strategy iteration is strongly polynomial for 2-player turn-based stochastic games with a constant discount factor. In: Innovations in Computer Science - ICS 2010, Tsinghua University, Beijing, China, January 7–9, 2011. Proceedings, pp. 253–263. Tsinghua University Press (2011)
Henriques, D., Martins, J.G., Zuliani, P., Platzer, A., Clarke, E.M.: Statistical model checking for markov decision processes. In: Quantitative Evaluation of Systems (QEST), 2012 Ninth International Conference on, pp. 84–93 (2012)
Hérault, T., Lassaigne, R., Magniette, F., Peyronnet, S.: Approximate probabilistic model checking. In: Proceedings of the 5th verification, model checking, and abstract interpretation (VMCAI), volume 2937 of Lecture Notes in Computer Science, pp. 73–84. Springer, Berlin (2004)
Herault, T., Lassaigne, R., Peyronnet, S.: APMC 3.0: Approximate verification of discrete and continuous time markov chains. In: Third International Conference on the Quantitative evaluation of systems (QEST), pp. 129–130. IEEE Computer Society (2006)
Hinton, A., Kwiatkowska, M.Z., Norman, G., Parker, D.: PRISM: A tool for automatic verification of probabilistic systems. In: Tools and algorithms for construction and analysis of systems (TACAS), volume 3920 of Lecture Notes in Computer Science, pp. 441–444. Springer, Berlin (2006)
Hoeffding, W.: Probability inequalities for sums of bounded random variables. J. Am. Stat. Assoc. 58(301), 13–30 (1963)
Howard, R.A.: Dynamic Programming and Markov Process. MIT Press, Cambridge (1960)
Karp, R.M., Luby, M.: Monte-carlo algorithms for enumeration and reliability problems. In: Proc. of the 24th Annual symposium on foundations of computer science (FOCS), pp. 56–64. IEEE (1983)
Kearns, M., Mansour, Y., Ng, A.Y.: A sparse sampling algorithm for near-optimal planning in large Markov decision processes. Mach. Learn. 49(2–3), 193–208 (2002)
Kwiatkowska, M., Norman, G., Parker, D.: PRISM 4.0: Verification of probabilistic real-time systems. In: Gopalakrishnan, G., Qadeer, S. (eds.), Proc. 23rd International Conference on Computer Aided Verification (CAV’11), volume 6806 of LNCS, pp. 585–591. Springer, Berlin (2011)
Lassaigne, R., Peyronnet, S.: Approximate planning and verification for large markov decision processes. In: Proceedings of the ACM Symposium on applied computing, SAC 2012, pp. 1314–1319. ACM (2012)
Lassaigne, R., Peyronnet, S.: Probabilistic verification and approximation. Ann. Pure Appl. Logic 152(1–3), 122–131 (2008)
Legay, A., Sedwards, S.: Lightweight monte carlo algorithm for markov decision processes. arXiv preprint, arXiv:1310.3609 (2013)
Putterman, M.L.: Markov Decision Processes: Discrete Stochastic Dynamic Programming. John Wiley and Sons, New York (1994)
Segala, R., Lynch, N.: Probabilistic simulations for probabilistic processes. Nordic J. Comput. 2(2), 250–273 (1995)
Tesauro, G., Galperin, G.R.: On-line policy improvement using monte-carlo search. Adv. Neural Inf. Process. Systems, 1068–1074 (1997)
Vardi, M.Y.: Automatic verification of probabilistic concurrent finite state programs. In: Proc. of the 26th Foundations of Computer Science (FOCS), pp. 327–338 (1984)
Ye, Y.: A new complexity result on solving the markov decision problem. Math. Oper. Res. 30(3), 733–749 (2005)
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Lassaigne, R., Peyronnet, S. Approximate planning and verification for large Markov decision processes. Int J Softw Tools Technol Transfer 17, 457–467 (2015). https://doi.org/10.1007/s10009-014-0344-z
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10009-014-0344-z