Risk-Sensitive Optimality Criteria in Markov Decision Processes

Sladký, Karel

doi:10.1007/978-3-540-69995-8_88

Karel Sladký²

Part of the book series: Operations Research Proceedings ((ORP,volume 2006))

2389 Accesses

Abstract

The usual optimization criteria for Markov decision processes (e.g. total discounted reward or mean reward) can be quite insufficient to fully capture the various aspects for a decision maker. It may be preferable to select more sophisticated criteria that also reflect variability-risk features of the problem. To this end we focus attention on risk-sensitive optimality criteria (i.e. the case when expectation of the stream of rewards generated by the Markov processes evaluated by an exponential utility function is considered) and their connections with mean-variance optimality (i.e. the case when a suitable combination of the expected total reward and its variance, usually considered per transition, is selected as a reasonable optimality criterion). The research of risk-sensitive optimality criteria in Markov decision processes was initiated in the seminal paper by Howard and Matheson [6] and followed by many other researchers (see e.g. [1, 2, 3, 5, 4, 8, 9, 14]). In this note we consider a Markov decision chain X = X _n, n = 0,1, ... with finite state space \( \mathcal{I} \) = 1,2, ..., N and a finite set \( \mathcal{A}_i \) = 1,2, ..., K _i of possible decisions (actions) in state i ∈ \( \mathcal{I} \). Supposing that in state i ∈ \( \mathcal{I} \) action k ∈ \( \mathcal{A}_i \) is selected, then state j is reached in the next transition with a given probability p ^k_ij and one-stage transition reward r _ij will be accrued to such transition.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 169.00; Price excludes VAT (USA)

Softcover Book: USD 219.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Bielecki TD, Hernández-Hernández TD, Pliska SR (1999) Risk-sensitive control of finite state Markov chains in discrete time, with application to portfolio management. Math Methods Oper Res 50:167–188
Article Google Scholar
Cavazos-Cadena R, Montes-de-Oca R (2003) The value iteration algorithm in risk-sensitive average Markov decision chains with finite state space. Math Oper Res 28:752–756
Article Google Scholar
Cavazos-Cadena R (2003) Solution to the risk-sensitive average cost optimality equation in a class of Markov decision processes with finite state space. Math Methods Oper Res 57:253–285
Article Google Scholar
Jaquette SA (1976) A utility criterion for Markov decision processes. Manag Sci 23:43–49
Google Scholar
Hinderer K, Waldmann KH (2003) The critical discount factor for finite Markovian decision processes with an absorbing set. Math Methods Oper Res 57:1–19
Article Google Scholar
Howard RA, Matheson J (1972) Risk-sensitive Markov decision processes. Manag Sci 23:356–369
Article Google Scholar
Puterman ML (1994) Markov decision processes — discrete stochastic dynamic programming. Wiley, New York
Google Scholar
Rothblum UG, Whittle P (1982) Growth optimality for branching Markov decision chains. Math Oper Res 7:582–601
Google Scholar
Sladký K (1976) On dynamic programming recursions for multiplicative Markov decision chains. Math Programming Study 6:216–226
Google Scholar
Sladký K (1980) Bounds on discrete dynamic programming recursions I. Kybernetika 16: 526–547
Google Scholar
Sladký K (1981) On the existence of stationary optimal policies in discrete dynamic programing. Kybernetika 17:489–513
Google Scholar
Sladký K (2005) On mean reward variance in semi-Markov processes. Math Methods Oper Res 62:387–397
Article Google Scholar
Whittle P (1983) Optimization over time — dynamic programming and stochastic control. Volume II, Chapter 35, Wiley, Chichester
Google Scholar
Zijm WHM (1983) Nonnegative matrices in dynamic programming. Mathematical Centre Tract, Amsterdam
Google Scholar

Download references

Author information

Authors and Affiliations

Institute of Information Theory and Automation, Academy of Sciences of the Czech Republic, Praha, Czech Republic
Karel Sladký

Authors

Karel Sladký
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Institut für Wirtschaftstheorie und Operations Research, Universität Karlsruhe (TH), Kaiserstraße 12, 76131, Karlsruhe, Germany
Karl-Heinz Waldmann & Ulrike M. Stocker &

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Sladký, K. (2007). Risk-Sensitive Optimality Criteria in Markov Decision Processes. In: Waldmann, KH., Stocker, U.M. (eds) Operations Research Proceedings 2006. Operations Research Proceedings, vol 2006. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-69995-8_88

Download citation

DOI: https://doi.org/10.1007/978-3-540-69995-8_88
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-69994-1
Online ISBN: 978-3-540-69995-8
eBook Packages: Business and EconomicsBusiness and Management (R0)

Publish with us

Policies and ethics