Multi-objective Discounted Reward Verification in Graphs and MDPs

Chatterjee, Krishnendu; Forejt, Vojtěch; Wojtczak, Dominik

doi:10.1007/978-3-642-45221-5_17

Krishnendu Chatterjee¹⁹,
Vojtěch Forejt²⁰ &
Dominik Wojtczak²¹

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 8312))

Included in the following conference series:

International Conference on Logic for Programming Artificial Intelligence and Reasoning

1194 Accesses
14 Citations

Abstract

We study the problem of achieving a given value in Markov decision processes (MDPs) with several independent discounted reward objectives. We consider a generalised version of discounted reward objectives, in which the amount of discounting depends on the states visited and on the objective. This definition extends the usual definition of discounted reward, and allows to capture the systems in which the value of different commodities diminish at different and variable rates.

We establish results for two prominent subclasses of the problem, namely state-discount models where the discount factors are only dependent on the state of the MDP (and independent of the objective), and reward-discount models where they are only dependent on the objective (but not on the state of the MDP). For the state-discount models we use a straightforward reduction to expected total reward and show that the problem whether a value is achievable can be solved in polynomial time. For the reward-discount model we show that memory and randomisation of the strategies are required, but nevertheless that the problem is decidable and it is sufficient to consider strategies which after a certain number of steps behave in a memoryless way.

For the general case, we show that when restricted to graphs (i.e. MDPs with no randomisation), pure strategies and discount factors of the form 1/n where n is an integer, the problem is in PSPACE and finite memory suffices for achieving a given value. We also show that when the discount factors are not of the form 1/n, the memory required by a strategy can be infinite.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Altman, E.: Constrained Markov Decision Processes (Stochastic Modeling). Chapman & Hall/CRC (1999)
Google Scholar
Boker, U., Henzinger, T.A.: Determinizing discounted-sum automata. In: CSL, pp. 82–96 (2011)
Google Scholar
Boyd, S., Vandenberghe, L.: Convex Optimization. Cambridge Univ. Press (2004)
Google Scholar
Brázdil, T., Brozek, V., Chatterjee, K., Forejt, V., Kucera, A.: Two views on multiple mean-payoff objectives in markov decision processes. In: LICS, pp. 33–42. IEEE Computer Society (2011)
Google Scholar
Chatterjee, K., Doyen, L., Gimbert, H., Henzinger, T.A.: Randomness for free. In: Hliněný, P., Kučera, A. (eds.) MFCS 2010. LNCS, vol. 6281, pp. 246–257. Springer, Heidelberg (2010)
Chapter Google Scholar
Chatterjee, K., Majumdar, R., Henzinger, T.A.: Markov decision processes with multiple objectives. In: Durand, B., Thomas, W. (eds.) STACS 2006. LNCS, vol. 3884, pp. 325–336. Springer, Heidelberg (2006)
Chapter Google Scholar
Chen, T., Forejt, V., Kwiatkowska, M., Simaitis, A., Wiltsche, C.: On stochastic games with multiple objectives. In: Chatterjee, K., Sgall, J. (eds.) MFCS 2013. LNCS, vol. 8087, pp. 266–277. Springer, Heidelberg (2013)
Google Scholar
Chen, T., Kwiatkowska, M., Simaitis, A., Wiltsche, C.: Synthesis for multi-objective stochastic games: An application to autonomous urban driving. In: Joshi, K., Siegle, M., Stoelinga, M., D’Argenio, P.R. (eds.) QEST 2013. LNCS, vol. 8054, Springer, Heidelberg (2013)
Google Scholar
Clarke, E., Grumberg, O., Peled, D.: Model Checking. The MIT Press (1999)
Google Scholar
Etessami, K., Kwiatkowska, M., Vardi, M., Yannakakis, M.: Multi-objective model checking of Markov decision processes. Logical Methods in Computer Science 4(4), 1–21 (2008)
MathSciNet Google Scholar
Forejt, V., Kwiatkowska, M., Norman, G., Parker, D., Qu, H.: Quantitative multi-objective verification for probabilistic systems. In: Abdulla, P.A., Leino, K.R.M. (eds.) TACAS 2011. LNCS, vol. 6605, pp. 112–127. Springer, Heidelberg (2011)
Chapter Google Scholar
Forejt, V., Kwiatkowska, M., Parker, D.: Pareto curves for probabilistic model checking. In: Chakraborty, S., Mukund, M. (eds.) ATVA 2012. LNCS, vol. 7561, pp. 317–332. Springer, Heidelberg (2012)
Chapter Google Scholar
Papadimitriou, C.H., Yannakakis, M.: On the approximability of trade-offs and optimal access of web sources. In: FOCS, pp. 86–92. IEEE Computer Society (2000)
Google Scholar
Puterman, M.: Markov Decision Processes: Discrete Stochastic Dynamic Programming. John Wiley and Sons (1994)
Google Scholar
Rockafellar, R.: Convex Analysis. Princeton University Press (1997)
Google Scholar
Tarski, A.: A decision method for elementary algebra and geometry. Rand report. Rand Corporation (1948)
Google Scholar

Download references

Author information

Authors and Affiliations

IST, Austria
Krishnendu Chatterjee
Department of Computer Science, University of Oxford, UK
Vojtěch Forejt
Department of Computer Science, University of Liverpool, UK
Dominik Wojtczak

Authors

Krishnendu Chatterjee
View author publications
You can also search for this author in PubMed Google Scholar
Vojtěch Forejt
View author publications
You can also search for this author in PubMed Google Scholar
Dominik Wojtczak
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Microsoft Research, Redmond, WA, USA
Ken McMillan
University of Innsbruck, Austria
Aart Middeldorp
University of Manchester, UK
Andrei Voronkov

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Chatterjee, K., Forejt, V., Wojtczak, D. (2013). Multi-objective Discounted Reward Verification in Graphs and MDPs. In: McMillan, K., Middeldorp, A., Voronkov, A. (eds) Logic for Programming, Artificial Intelligence, and Reasoning. LPAR 2013. Lecture Notes in Computer Science, vol 8312. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-45221-5_17

Download citation

DOI: https://doi.org/10.1007/978-3-642-45221-5_17
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-45220-8
Online ISBN: 978-3-642-45221-5
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics