Skip to main content
Log in

Approximate planning and verification for large Markov decision processes

  • SMC
  • Published:
International Journal on Software Tools for Technology Transfer Aims and scope Submit manuscript

Abstract

We focus on the planning and verification problems for very large probabilistic systems, such as Markov decision processes (MDPs), from a complexity point of view. More precisely, we deal with the problem of designing an efficient approximation method to compute a near-optimal policy for the planning problem in discounted MDPs and the satisfaction probabilities of interesting properties, like reachability or safety, over the Markov chain obtained by restricting the MDP to the near-optimal policy. In this paper, we present two different approaches. The first one is based on sparse sampling while the second uses a variant of the multiplicative weights update algorithm. The complexity of the first approximation method is independent of the size of the state space and uses only a probabilistic generator of the MDP. We give a complete analysis of this approach, for which the control parameter is mainly the targeted quality of the approximation. The second approach is more prospective and is different in the sense that the method can be controlled dynamically by observing its speed of convergence. Parts of this paper have already been presented in Lassaigne and Peyronnet (in Proceedings of the ACM Symposium on applied computing, SAC 2012, pp 1314–1319, ACM 2012), by the same authors.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4

Similar content being viewed by others

References

  1. Alur, R., Henzinger, T.A.: Reactive modules. Formal Methods System Design 15(1), 7–48 (1999)

    Article  MathSciNet  Google Scholar 

  2. Andrea, B., Luca De A.: Model checking of probabalistic and nondeterministic systems. In: Proc. 15th conference on Foundations of Software Technology and Theoretical Computer Science (FSTTCS), volume 1026 of Lecture Notes in Computer Science, pp. 499–513. Springer, Berlin (1995)

  3. Arora, S., Hazan, E., Kale, S.: The multiplicative weights update method: a meta-algorithm and applications. Theory Comput. 8(1), 121–164 (2012)

    Article  MathSciNet  Google Scholar 

  4. Auer, P., Cesa-Bianchi, N., Fischer, P.: Finite-time analysis of the multiarmed bandit problem. Mach. Learn. 47(2–3), 235–256 (2002)

    Article  MATH  Google Scholar 

  5. Bertsekas, D.P., Castanon, D.A.: Rollout algorithms for stochastic scheduling problems. J. Heuristics 5(1), 89–108 (1999)

    Article  MATH  Google Scholar 

  6. Chang, H.S., Fu, M.C., Hu, J., Marcus, S.I.: A survey of some simulation-based algorithms for markov decision processes. Commun. Inf. Systems 7(1), 59–92 (2007)

  7. Courcoubetis, C., Yannakakis, M.: The complexity of probabilistic verification. J. ACM (JACM) 42(4), 857–907 (1995)

    Article  MathSciNet  MATH  Google Scholar 

  8. Fearnley, J.: Exponential lower bounds for policy iteration. In: Automata, Languages and Programming, 37th International Colloquium, ICALP 2010, Bordeaux, France, July 6–10, 2010, Proceedings, Part II, volume 6199 of Lecture Notes in Computer Science, pp. 551–562. Springer, Berlin (2010)

  9. Friedmann, O.: An exponential lower bound for the parity game strategy improvement algorithm as we know it. In: Proceedings of the 24th Annual IEEE Symposium on Logic in Computer Science, LICS 2009, 11–14 August 2009, Los Angeles, CA, USA, pp. 145–156. IEEE Computer Society (2009)

  10. Guirado, G., Hérault, T., Lassaigne, R., Peyronnet, S.: Distribution, approximation and probabilistic model checking. Electr. Notes Theor. Comput. Sci. - Proc. of Parallel and Distributed Model Checking (PDMC) 135(2), 19–30 (2006)

    Article  Google Scholar 

  11. Hamidouche, K., Borghi, A., Esterie, P., Falcou, J., Peyronnet, S.: Three high performance architectures in the parallel approximate probabilistic model checking boat. In: Proc. of Parallel and Distributed Model Checking (PDMC) (2010)

  12. Hansen, T.D., Miltersen, P.B., Zwick, U.: Strategy iteration is strongly polynomial for 2-player turn-based stochastic games with a constant discount factor. In: Innovations in Computer Science - ICS 2010, Tsinghua University, Beijing, China, January 7–9, 2011. Proceedings, pp. 253–263. Tsinghua University Press (2011)

  13. Henriques, D., Martins, J.G., Zuliani, P., Platzer, A., Clarke, E.M.: Statistical model checking for markov decision processes. In: Quantitative Evaluation of Systems (QEST), 2012 Ninth International Conference on, pp. 84–93 (2012)

  14. Hérault, T., Lassaigne, R., Magniette, F., Peyronnet, S.: Approximate probabilistic model checking. In: Proceedings of the 5th verification, model checking, and abstract interpretation (VMCAI), volume 2937 of Lecture Notes in Computer Science, pp. 73–84. Springer, Berlin (2004)

  15. Herault, T., Lassaigne, R., Peyronnet, S.: APMC 3.0: Approximate verification of discrete and continuous time markov chains. In: Third International Conference on the Quantitative evaluation of systems (QEST), pp. 129–130. IEEE Computer Society (2006)

  16. Hinton, A., Kwiatkowska, M.Z., Norman, G., Parker, D.: PRISM: A tool for automatic verification of probabilistic systems. In: Tools and algorithms for construction and analysis of systems (TACAS), volume 3920 of Lecture Notes in Computer Science, pp. 441–444. Springer, Berlin (2006)

  17. Hoeffding, W.: Probability inequalities for sums of bounded random variables. J. Am. Stat. Assoc. 58(301), 13–30 (1963)

    Article  MathSciNet  MATH  Google Scholar 

  18. Howard, R.A.: Dynamic Programming and Markov Process. MIT Press, Cambridge (1960)

    Google Scholar 

  19. Karp, R.M., Luby, M.: Monte-carlo algorithms for enumeration and reliability problems. In: Proc. of the 24th Annual symposium on foundations of computer science (FOCS), pp. 56–64. IEEE (1983)

  20. Kearns, M., Mansour, Y., Ng, A.Y.: A sparse sampling algorithm for near-optimal planning in large Markov decision processes. Mach. Learn. 49(2–3), 193–208 (2002)

    Article  MATH  Google Scholar 

  21. Kwiatkowska, M., Norman, G., Parker, D.: PRISM 4.0: Verification of probabilistic real-time systems. In: Gopalakrishnan, G., Qadeer, S. (eds.), Proc. 23rd International Conference on Computer Aided Verification (CAV’11), volume 6806 of LNCS, pp. 585–591. Springer, Berlin (2011)

  22. Lassaigne, R., Peyronnet, S.: Approximate planning and verification for large markov decision processes. In: Proceedings of the ACM Symposium on applied computing, SAC 2012, pp. 1314–1319. ACM (2012)

  23. Lassaigne, R., Peyronnet, S.: Probabilistic verification and approximation. Ann. Pure Appl. Logic 152(1–3), 122–131 (2008)

    Article  MathSciNet  MATH  Google Scholar 

  24. Legay, A., Sedwards, S.: Lightweight monte carlo algorithm for markov decision processes. arXiv preprint, arXiv:1310.3609 (2013)

  25. Putterman, M.L.: Markov Decision Processes: Discrete Stochastic Dynamic Programming. John Wiley and Sons, New York (1994)

    Book  Google Scholar 

  26. Segala, R., Lynch, N.: Probabilistic simulations for probabilistic processes. Nordic J. Comput. 2(2), 250–273 (1995)

    MathSciNet  MATH  Google Scholar 

  27. Tesauro, G., Galperin, G.R.: On-line policy improvement using monte-carlo search. Adv. Neural Inf. Process. Systems, 1068–1074 (1997)

  28. Vardi, M.Y.: Automatic verification of probabilistic concurrent finite state programs. In: Proc. of the 26th Foundations of Computer Science (FOCS), pp. 327–338 (1984)

  29. Ye, Y.: A new complexity result on solving the markov decision problem. Math. Oper. Res. 30(3), 733–749 (2005)

    Article  MathSciNet  MATH  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Sylvain Peyronnet.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Lassaigne, R., Peyronnet, S. Approximate planning and verification for large Markov decision processes. Int J Softw Tools Technol Transfer 17, 457–467 (2015). https://doi.org/10.1007/s10009-014-0344-z

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10009-014-0344-z

Keywords

Navigation