Approximate Policy Iteration for Markov Decision Processes via Quantitative Adaptive Aggregations

Abate, Alessandro; Češka, Milan; Kwiatkowska, Marta

doi:10.1007/978-3-319-46520-3_2

Alessandro Abate¹⁶,
Milan Češka^16,17 &
Marta Kwiatkowska¹⁶

Part of the book series: Lecture Notes in Computer Science ((LNPSE,volume 9938))

Included in the following conference series:

International Symposium on Automated Technology for Verification and Analysis

839 Accesses
2 Citations
1 Altmetric

Abstract

We consider the problem of finding an optimal policy in a Markov decision process that maximises the expected discounted sum of rewards over an infinite time horizon. Since the explicit iterative dynamical programming scheme does not scale when increasing the dimension of the state space, a number of approximate methods have been developed. These are typically based on value or policy iteration, enabling further speedups through lumped and distributed updates, or by employing succinct representations of the value functions. However, none of the existing approximate techniques provides general, explicit and tunable bounds on the approximation error, a problem particularly relevant when the level of accuracy affects the optimality of the policy. In this paper we propose a new approximate policy iteration scheme that mitigates the state-space explosion problem by adaptive state-space aggregation, at the same time providing rigorous and explicit error bounds that can be used to control the optimality level of the obtained policy. We evaluate the new approach on a case study, demonstrating evidence that the state-space reduction results in considerable acceleration of the policy iteration scheme, while being able to meet the required level of precision.

This work has been partially supported by the ERC Advanced Grant VERIWARE, the EPSRC Mobile Autonomy Programme Grant EP/M019918/1, the Czech Grant Agency grant No. GA16-17538S (M. Češka), and the John Fell Oxford University Press (OUP) Research Fund.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Abate, A., Brim, L., Češka, M., Kwiatkowska, M.: Adaptive aggregation of Markov chains: quantitative analysis of chemical reaction networks. In: Kroening, D., Păsăreanu, C.S. (eds.) CAV 2015. LNCS, vol. 9206, pp. 195–213. Springer, Heidelberg (2015). doi:10.1007/978-3-319-21690-4_12
Chapter Google Scholar
Abate, A., D’Innocenzo, A., Benedetto, M.D.: Approximate abstractions of stochastic hybrid systems. IEEE Trans. Autom. Control 56(11), 2688–2694 (2011)
Article MathSciNet Google Scholar
Bertsekas, D.: Dynamic Programming and Optimal Control, vol. I. Athena Scientific, Belmont (1995)
MATH Google Scholar
Bertsekas, D.: Approximate policy iteration: a survey and some new methods. J. Control Theor. Appl. 9(3), 310–335 (2011)
Article MathSciNet MATH Google Scholar
Bertsekas, D.: Dynamic Programming and Optimal Control, Vol. II: Approximate Dynamic Programming. Athena Scientific, Belmont (2012)
MATH Google Scholar
Bertsekas, D.: Tsitsiklis: Neuro-Dynamic Programming. Athena Scientific, Belmont (1996)
Google Scholar
Brázdil, T., Chatterjee, K., Chmelík, M., Forejt, V., Křetínský, J., Kwiatkowska, M., Parker, D., Ujma, M.: Verification of Markov decision processes using learning algorithms. In: Cassez, F., Raskin, J.-F. (eds.) ATVA 2014. LNCS, vol. 8837, pp. 98–114. Springer, Heidelberg (2014). doi:10.1007/978-3-319-11936-6_8
Google Scholar
D’Innocenzo, A., Abate, A., Katoen, J.-P.: Robust PCTL model checking. In: Proceedings of the HSCC 2012, pp. 275–285. ACM (2012)
Google Scholar
Haesaert, S., Babuska, R., Abate, A.: Sampling-based approximations with quantitative performance for the probabilistic reach-avoid problem over general Markov processes. arXiv (2014). arXiv:1409.0553
Katoen, J.-P., Klink, D., Leucker, M., Wolf, V.: Three-valued abstraction for probabilistic systems. J. Logic Algebraic Program. 81(4), 356–389 (2012)
Article MathSciNet MATH Google Scholar
Kattenbelt, M., Kwiatkowska, M., Norman, G., Parker, D.: A game-based abstraction-refinement framework for Markov decision processes. Formal Methods Syst. Des. 36(3), 246–280 (2010)
Article MATH Google Scholar
Kwiatkowska, M., Norman, G., Parker, D.: PRISM 4.0: verification of probabilistic real-time systems. In: Gopalakrishnan, G., Qadeer, S. (eds.) CAV 2011. LNCS, vol. 6806, pp. 585–591. Springer, Heidelberg (2011). doi:10.1007/978-3-642-22110-1_47
Chapter Google Scholar
Lahijanian, M., Andersson, S.B., Belta, C.: Formal verification and synthesis for discrete-time stochastic systems. IEEE Trans. Autom. Control 60(8), 2031–2045 (2015)
Article MathSciNet Google Scholar
McMahan, H.B., Likhachev, M., Gordon, G.J.: Bounded real-time dynamic programming: RTDP with monotone upper bounds and performance guarantees. In: Proceedings of the ICML, pp. 569–576. ACM (2005)
Google Scholar
Munos, R., Szepesvari, C.: Finite time bounds for fitted value iteration. J. Mach. Learn. Res. 9, 815–857 (2008)
MathSciNet MATH Google Scholar
Puterman, M.: Markov Decision Processes: Discrete Stochastic Dynamic Programming. John Wiley and Sons, Hoboken (2005)
MATH Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Science, University of Oxford, Oxford, UK
Alessandro Abate, Milan Češka & Marta Kwiatkowska
Faculty of Information Technology, Brno University of Technology, Brno, Czech Republic
Milan Češka

Authors

Alessandro Abate
View author publications
You can also search for this author in PubMed Google Scholar
Milan Češka
View author publications
You can also search for this author in PubMed Google Scholar
Marta Kwiatkowska
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Milan Češka .

Editor information

Editors and Affiliations

AIST , Osaka, Japan
Cyrille Artho
Inria Rennes , Rennes, France
Axel Legay
Bar Ilan University , Ramat Gan, Israel
Doron Peled

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Abate, A., Češka, M., Kwiatkowska, M. (2016). Approximate Policy Iteration for Markov Decision Processes via Quantitative Adaptive Aggregations. In: Artho, C., Legay, A., Peled, D. (eds) Automated Technology for Verification and Analysis. ATVA 2016. Lecture Notes in Computer Science(), vol 9938. Springer, Cham. https://doi.org/10.1007/978-3-319-46520-3_2

Download citation

DOI: https://doi.org/10.1007/978-3-319-46520-3_2
Published: 22 September 2016
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-46519-7
Online ISBN: 978-3-319-46520-3
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics