Bounded Aggregation for Continuous Time Markov Decision Processes

Buchholz, Peter; Dohndorf, Iryna; Frank, Alexander; Scheftelowitsch, Dimitri

doi:10.1007/978-3-319-66583-2_2

Peter Buchholz¹⁵,
Iryna Dohndorf¹⁵,
Alexander Frank¹⁵ &
…
Dimitri Scheftelowitsch¹⁵

Part of the book series: Lecture Notes in Computer Science ((LNPSE,volume 10497))

Included in the following conference series:

European Workshop on Performance Engineering

738 Accesses

Abstract

Markov decision processes suffer from two problems, namely the so-called state space explosion which may lead to long computation times and the memoryless property of states which limits the modeling power with respect to real systems. In this paper we combine existing state aggregation and optimization methods for a new aggregation based optimization method. More specifically, we compute reward bounds on an aggregated model by exchanging state space size with uncertainty. We propose an approach for continuous time Markov decision models with discounted or average reward measures.

The approach starts with a portioned state space which consists of blocks that represent an abstract, high-level view on the state space. The sojourn time in each block can then be represented by a phase-type distribution (PHD). Using known properties of PHDs, we can then bound sojourn times in the blocks and also the accumulated reward in each sojourn by constraining the set of possible initial vectors in order to derive tighter bounds for the sojourn times, and, ultimatively, for the average or discounted reward measures. Furthermore, given a fixed policy for the CTMDP, we can then further constrain the initial vector which improves reward bounds. The aggregation approach is illustrated on randomly generated models.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Abate, A., Češka, M., Kwiatkowska, M.: Approximate policy iteration for Markov decision processes via quantitative adaptive aggregations. In: Artho, C., Legay, A., Peled, D. (eds.) ATVA 2016. LNCS, vol. 9938, pp. 13–31. Springer, Cham (2016). doi:10.1007/978-3-319-46520-3_2
Chapter Google Scholar
Beutler, F.J., Ross, K.W.: Uniformization for semi-Markov decision processes under stationary policies. J. Appl. Probability 24, 644–656 (1987)
Article MathSciNet MATH Google Scholar
Buchholz, P.: Bounding reward measures of Markov models using the Markov decision processes. Numerical Lin. Alg. with Applic. 18(6), 919–930 (2011)
Article MathSciNet MATH Google Scholar
Buchholz, P., Dohndorf, I., Scheftelowitsch, D.: Analysis of Markov decision processes under parameter uncertainty. In: Reinecke, P., Di Marco, A. (eds.) EPEW 2017. LNCS, vol. 10497, pp. 3–18. Springer, Cham (2017). doi:10.1007/978-3-319-66583-2_1
Chapter Google Scholar
Buchholz, P., Hahn, E.M., Hermanns, H., Zhang, L.: Model checking algorithms for CTMDPs. In: Computer Aided Verification - 23rd International Conference, CAV 2011, Snowbird, UT, USA, 14–20 July 2011, Proceedings, pp. 225–242 (2011)
Google Scholar
Buchholz, P., Kriege, J., Felko, I.: Input Modeling with Phase-Type Distributions and Markov Models. SM. Springer, Cham (2014)
Book MATH Google Scholar
Courtois, P., Semal, P.: Bounds for the positive eigenvectors of nonnegative matrices and for their approximations by decomposition. J. ACM 31(4), 804–825 (1984)
Article MathSciNet MATH Google Scholar
Dean, T.L., Givan, R., Leach, S.M.: Model reduction techniques for computing approximately optimal solutions for Markov decision processes. In: Geiger, D., Shenoy, P.P. (eds.) UAI 1997: Proceedings of the Thirteenth Conference on Uncertainty in Artificial Intelligence, Brown University, Providence, Rhode Island, USA, 1–3 August 1997, pp. 124–131. Morgan Kaufmann (1997)
Google Scholar
Franceschinis, G., Muntz, R.R.: Bounds for quasi-lumpable Markov chains. Perform. Eval. 20(1–3), 223–243 (1994)
Article MATH Google Scholar
Givan, R., Dean, T.L., Greig, M.: Equivalence notions and model minimization in Markov decision processes. Artif. Intell. 147(1–2), 163–223 (2003)
Article MathSciNet MATH Google Scholar
Givan, R., Leach, S.M., Dean, T.L.: Bounded-parameter Markov decision processes. Artif. Intell. 122(1–2), 71–109 (2000)
Article MathSciNet MATH Google Scholar
Li, L., Walsh, T.J., Littman, M.L.: Towards a unified theory of state abstraction for MDPs. In: International Symposium on Artificial Intelligence and Mathematics, ISAIM 2006, Fort Lauderdale, Florida, USA, 4–6 January 2006 (2006)
Google Scholar
Puterman, M.L.: Markov Decision Processes. Wiley, New York (2005)
MATH Google Scholar
Ren, Z., Krogh, B.: State aggregation in Markov decision processes. In: Proceedings of the 41st IEEE Conference on Decision and Control, vol. 4, pp. 3819–3824. IEEE (2002)
Google Scholar
Semal, P.: Refinable bounds for large Markov chains. IEEE Trans. Computers 44(10), 1216–1222 (1995)
Article MATH Google Scholar
Serfozo, R.F.: An equivalence between continuous and discrete time Markov decision processes. Oper. Res. 27(3), 616–620 (1979)
Article MathSciNet MATH Google Scholar
Tewari, A., Bartlett, P.L.: Bounded parameter Markov decision processes with average reward criterion. In: Bshouty, N.H., Gentile, C. (eds.) COLT 2007. LNCS, vol. 4539, pp. 263–277. Springer, Heidelberg (2007). doi:10.1007/978-3-540-72927-3_20
Chapter Google Scholar
Van Roy, B.: Performance loss bounds for approximate value iteration with state aggregation. Math. Oper. Res. 31(2), 234–244 (2006)
Article MathSciNet MATH Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Science, TU Dortmund, Dortmund, Germany
Peter Buchholz, Iryna Dohndorf, Alexander Frank & Dimitri Scheftelowitsch

Authors

Peter Buchholz
View author publications
You can also search for this author in PubMed Google Scholar
Iryna Dohndorf
View author publications
You can also search for this author in PubMed Google Scholar
Alexander Frank
View author publications
You can also search for this author in PubMed Google Scholar
Dimitri Scheftelowitsch
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Dimitri Scheftelowitsch .

Editor information

Editors and Affiliations

Bristol, Germany
Philipp Reinecke
University of L’Aquila, L’Aquila, Italy
Antinisca Di Marco

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Buchholz, P., Dohndorf, I., Frank, A., Scheftelowitsch, D. (2017). Bounded Aggregation for Continuous Time Markov Decision Processes. In: Reinecke, P., Di Marco, A. (eds) Computer Performance Engineering. EPEW 2017. Lecture Notes in Computer Science(), vol 10497. Springer, Cham. https://doi.org/10.1007/978-3-319-66583-2_2

Download citation

DOI: https://doi.org/10.1007/978-3-319-66583-2_2
Published: 13 August 2017
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-66582-5
Online ISBN: 978-3-319-66583-2
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics