Skip to main content

Risk-Sensitive Optimality Criteria in Markov Decision Processes

  • Conference paper
Operations Research Proceedings 2006

Part of the book series: Operations Research Proceedings ((ORP,volume 2006))

  • 2389 Accesses

Abstract

The usual optimization criteria for Markov decision processes (e.g. total discounted reward or mean reward) can be quite insufficient to fully capture the various aspects for a decision maker. It may be preferable to select more sophisticated criteria that also reflect variability-risk features of the problem. To this end we focus attention on risk-sensitive optimality criteria (i.e. the case when expectation of the stream of rewards generated by the Markov processes evaluated by an exponential utility function is considered) and their connections with mean-variance optimality (i.e. the case when a suitable combination of the expected total reward and its variance, usually considered per transition, is selected as a reasonable optimality criterion). The research of risk-sensitive optimality criteria in Markov decision processes was initiated in the seminal paper by Howard and Matheson [6] and followed by many other researchers (see e.g. [1, 2, 3, 5, 4, 8, 9, 14]). In this note we consider a Markov decision chain X = X n, n = 0,1, ... with finite state space \( \mathcal{I} \) = 1,2, ..., N and a finite set \( \mathcal{A}_i \) = 1,2, ..., K i of possible decisions (actions) in state i\( \mathcal{I} \). Supposing that in state i\( \mathcal{I} \) action k\( \mathcal{A}_i \) is selected, then state j is reached in the next transition with a given probability p k ij and one-stage transition reward r ij will be accrued to such transition.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 169.00
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 219.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Bielecki TD, Hernández-Hernández TD, Pliska SR (1999) Risk-sensitive control of finite state Markov chains in discrete time, with application to portfolio management. Math Methods Oper Res 50:167–188

    Article  Google Scholar 

  2. Cavazos-Cadena R, Montes-de-Oca R (2003) The value iteration algorithm in risk-sensitive average Markov decision chains with finite state space. Math Oper Res 28:752–756

    Article  Google Scholar 

  3. Cavazos-Cadena R (2003) Solution to the risk-sensitive average cost optimality equation in a class of Markov decision processes with finite state space. Math Methods Oper Res 57:253–285

    Article  Google Scholar 

  4. Jaquette SA (1976) A utility criterion for Markov decision processes. Manag Sci 23:43–49

    Google Scholar 

  5. Hinderer K, Waldmann KH (2003) The critical discount factor for finite Markovian decision processes with an absorbing set. Math Methods Oper Res 57:1–19

    Article  Google Scholar 

  6. Howard RA, Matheson J (1972) Risk-sensitive Markov decision processes. Manag Sci 23:356–369

    Article  Google Scholar 

  7. Puterman ML (1994) Markov decision processes — discrete stochastic dynamic programming. Wiley, New York

    Google Scholar 

  8. Rothblum UG, Whittle P (1982) Growth optimality for branching Markov decision chains. Math Oper Res 7:582–601

    Google Scholar 

  9. Sladký K (1976) On dynamic programming recursions for multiplicative Markov decision chains. Math Programming Study 6:216–226

    Google Scholar 

  10. Sladký K (1980) Bounds on discrete dynamic programming recursions I. Kybernetika 16: 526–547

    Google Scholar 

  11. Sladký K (1981) On the existence of stationary optimal policies in discrete dynamic programing. Kybernetika 17:489–513

    Google Scholar 

  12. Sladký K (2005) On mean reward variance in semi-Markov processes. Math Methods Oper Res 62:387–397

    Article  Google Scholar 

  13. Whittle P (1983) Optimization over time — dynamic programming and stochastic control. Volume II, Chapter 35, Wiley, Chichester

    Google Scholar 

  14. Zijm WHM (1983) Nonnegative matrices in dynamic programming. Mathematical Centre Tract, Amsterdam

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2007 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Sladký, K. (2007). Risk-Sensitive Optimality Criteria in Markov Decision Processes. In: Waldmann, KH., Stocker, U.M. (eds) Operations Research Proceedings 2006. Operations Research Proceedings, vol 2006. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-69995-8_88

Download citation

Publish with us

Policies and ethics