Skip to main content

Approximate Policy Iteration for Markov Decision Processes via Quantitative Adaptive Aggregations

  • Conference paper
  • First Online:
Automated Technology for Verification and Analysis (ATVA 2016)

Part of the book series: Lecture Notes in Computer Science ((LNPSE,volume 9938))

Abstract

We consider the problem of finding an optimal policy in a Markov decision process that maximises the expected discounted sum of rewards over an infinite time horizon. Since the explicit iterative dynamical programming scheme does not scale when increasing the dimension of the state space, a number of approximate methods have been developed. These are typically based on value or policy iteration, enabling further speedups through lumped and distributed updates, or by employing succinct representations of the value functions. However, none of the existing approximate techniques provides general, explicit and tunable bounds on the approximation error, a problem particularly relevant when the level of accuracy affects the optimality of the policy. In this paper we propose a new approximate policy iteration scheme that mitigates the state-space explosion problem by adaptive state-space aggregation, at the same time providing rigorous and explicit error bounds that can be used to control the optimality level of the obtained policy. We evaluate the new approach on a case study, demonstrating evidence that the state-space reduction results in considerable acceleration of the policy iteration scheme, while being able to meet the required level of precision.

This work has been partially supported by the ERC Advanced Grant VERIWARE, the EPSRC Mobile Autonomy Programme Grant EP/M019918/1, the Czech Grant Agency grant No. GA16-17538S (M. Češka), and the John Fell Oxford University Press (OUP) Research Fund.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Abate, A., Brim, L., Češka, M., Kwiatkowska, M.: Adaptive aggregation of Markov chains: quantitative analysis of chemical reaction networks. In: Kroening, D., Păsăreanu, C.S. (eds.) CAV 2015. LNCS, vol. 9206, pp. 195–213. Springer, Heidelberg (2015). doi:10.1007/978-3-319-21690-4_12

    Chapter  Google Scholar 

  2. Abate, A., D’Innocenzo, A., Benedetto, M.D.: Approximate abstractions of stochastic hybrid systems. IEEE Trans. Autom. Control 56(11), 2688–2694 (2011)

    Article  MathSciNet  Google Scholar 

  3. Bertsekas, D.: Dynamic Programming and Optimal Control, vol. I. Athena Scientific, Belmont (1995)

    MATH  Google Scholar 

  4. Bertsekas, D.: Approximate policy iteration: a survey and some new methods. J. Control Theor. Appl. 9(3), 310–335 (2011)

    Article  MathSciNet  MATH  Google Scholar 

  5. Bertsekas, D.: Dynamic Programming and Optimal Control, Vol. II: Approximate Dynamic Programming. Athena Scientific, Belmont (2012)

    MATH  Google Scholar 

  6. Bertsekas, D.: Tsitsiklis: Neuro-Dynamic Programming. Athena Scientific, Belmont (1996)

    Google Scholar 

  7. Brázdil, T., Chatterjee, K., Chmelík, M., Forejt, V., Křetínský, J., Kwiatkowska, M., Parker, D., Ujma, M.: Verification of Markov decision processes using learning algorithms. In: Cassez, F., Raskin, J.-F. (eds.) ATVA 2014. LNCS, vol. 8837, pp. 98–114. Springer, Heidelberg (2014). doi:10.1007/978-3-319-11936-6_8

    Google Scholar 

  8. D’Innocenzo, A., Abate, A., Katoen, J.-P.: Robust PCTL model checking. In: Proceedings of the HSCC 2012, pp. 275–285. ACM (2012)

    Google Scholar 

  9. Haesaert, S., Babuska, R., Abate, A.: Sampling-based approximations with quantitative performance for the probabilistic reach-avoid problem over general Markov processes. arXiv (2014). arXiv:1409.0553

  10. Katoen, J.-P., Klink, D., Leucker, M., Wolf, V.: Three-valued abstraction for probabilistic systems. J. Logic Algebraic Program. 81(4), 356–389 (2012)

    Article  MathSciNet  MATH  Google Scholar 

  11. Kattenbelt, M., Kwiatkowska, M., Norman, G., Parker, D.: A game-based abstraction-refinement framework for Markov decision processes. Formal Methods Syst. Des. 36(3), 246–280 (2010)

    Article  MATH  Google Scholar 

  12. Kwiatkowska, M., Norman, G., Parker, D.: PRISM 4.0: verification of probabilistic real-time systems. In: Gopalakrishnan, G., Qadeer, S. (eds.) CAV 2011. LNCS, vol. 6806, pp. 585–591. Springer, Heidelberg (2011). doi:10.1007/978-3-642-22110-1_47

    Chapter  Google Scholar 

  13. Lahijanian, M., Andersson, S.B., Belta, C.: Formal verification and synthesis for discrete-time stochastic systems. IEEE Trans. Autom. Control 60(8), 2031–2045 (2015)

    Article  MathSciNet  Google Scholar 

  14. McMahan, H.B., Likhachev, M., Gordon, G.J.: Bounded real-time dynamic programming: RTDP with monotone upper bounds and performance guarantees. In: Proceedings of the ICML, pp. 569–576. ACM (2005)

    Google Scholar 

  15. Munos, R., Szepesvari, C.: Finite time bounds for fitted value iteration. J. Mach. Learn. Res. 9, 815–857 (2008)

    MathSciNet  MATH  Google Scholar 

  16. Puterman, M.: Markov Decision Processes: Discrete Stochastic Dynamic Programming. John Wiley and Sons, Hoboken (2005)

    MATH  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Milan Češka .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2016 Springer International Publishing AG

About this paper

Cite this paper

Abate, A., Češka, M., Kwiatkowska, M. (2016). Approximate Policy Iteration for Markov Decision Processes via Quantitative Adaptive Aggregations. In: Artho, C., Legay, A., Peled, D. (eds) Automated Technology for Verification and Analysis. ATVA 2016. Lecture Notes in Computer Science(), vol 9938. Springer, Cham. https://doi.org/10.1007/978-3-319-46520-3_2

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-46520-3_2

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-46519-7

  • Online ISBN: 978-3-319-46520-3

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics