Skip to main content

Optimal halting policies in Markov population decision chains with constant risk posture

  • Published:
Annals of Operations Research Aims and scope Submit manuscript

    We’re sorry, something doesn't seem to be working properly.

    Please try refreshing the page. If that doesn't work, please contact support so we can address the problem.

Abstract

This paper concerns infinite-horizon Markov population decision chains with finite state-action space, where one exerts control on a population of individuals in different states by assigning an action to each individual in the system in each period. In every transition, each individual earns a random reward and generates a random progeny vector. The objective of the decision maker is to maximize expected (infinite-horizon) system utility under the following assumptions: (i) The utility function exhibits constant risk posture, (ii) the (random) progeny vectors of distinct individuals are independent, and (iii) the progeny vectors of individuals in a state who take the same action are identically distributed. The paper deals with the problem of finding an optimal stationary halting policy and shows that this problem can be solved efficiently using successive approximations with the original state-action space without enlarging it to include information about the population in each state or any other aspect of the system history in a state. The proposed algorithm terminates after a finite number of iterations with an optimal stationary halting policy or with proof of nonexistence.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

Notes

  1. The theory in this paper holds true when rewards are allowed to be −∞, with the convention that ∞0=1. This would be the case in practice when a state is to be avoided at all cost, or if an action is in fact infeasible. However, there is no loss in generality in assuming no such states exist, since they can be eliminated from the system by preprocessing the data. To do this, all actions with positive probability of generating −∞ reward are stripped out of the system. After this, if there exists no admissible action in a state, this state is deleted as well as all actions in remaining states that lead to this one. Then the problem becomes that of finding a stationary optimal halting policy in the reduced system.

  2. Whenever 1 follows a row vector, it corresponds to the column vector of ones.

  3. The reader may wonder why we merely seek a policy with maximum lim sup when such a policy is assured to be nearly optimal for the N-period problem only for some, but not all large N. The answer is that in all cases, we exhibit a stationary optimal policy δ for which the lim sup in (3) equals the lim inf. Then δ will be nearly optimal for the N-period problem for all large N.

  4. Rothblum refers to halting policies as nilpotent policies.

  5. Throughout, 00≡1.

  6. The term increasing (decreasing) is used in the weak sense of nondecreasing (nonincreasing).

References

  • Arrow, K. J. (1965). Aspects of the theory of risk-bearing. Yrjö Jahnssonis Lectures, Helsinki.

  • Canakoglu, E., & Ozekici, S. (2009). Portfolio selection in stochastic markets with exponential utility functions. Annals of Operations Research, 166, 281–297.

    Article  Google Scholar 

  • Canbolat, P. G. (2009). Markov population decision chains with constant risk posture. Doctoral dissertation, Department of Management Science and Engineering, Stanford University, Stanford, CA.

  • Cavazos-Cadena, R., & Montes-de-Oca, R. (2003). The value iteration algorithm in risk-sensitive average Markov decision chains. Mathematics of Operations Research, 28, 752–776.

    Article  Google Scholar 

  • Cavazos-Cadena, R., & Hernández-Hernández, D. (2005). A characterization of the optimal risk-sensitive average cost in finite controlled Markov chains. The Annals of Applied Probability, 15, 175–212.

    Article  Google Scholar 

  • Choi, S., & Ruszczynski, A. (2011). A multi-product risk-averse newsvendor with exponential utility function. European Journal of Operational Research, 214, 78–84.

    Article  Google Scholar 

  • Denardo, E. V., & Rothblum, U. G. (1979). Optimal stopping, exponential utility, and linear programming. Mathematical Programming, 16, 228–244.

    Article  Google Scholar 

  • Denardo, E. V., & Rothblum, U. G. (2006). A turnpike theorem for a risk-sensitive Markov decision process with stopping. SIAM Journal on Control and Optimization, 45, 414–431.

    Article  Google Scholar 

  • Denardo, E. V., Park, H., & Rothblum, U. G. (2007). Risk-neutral and risk-sensitive multiarmed bandits. Mathematics of Operations Research, 32, 374–394.

    Article  Google Scholar 

  • Denardo, E. V., Rothblum, U. G., & van der Heyden, L. (2004). Index policies for stochastic search in a forest with an application to R&D project management. Mathematics of Operations Research, 29, 162–181.

    Article  Google Scholar 

  • Di Masi, G. B., & Stettner, L. (2000). Risk sensitive control of discrete-time Markov processes with infinite horizon. SIAM Journal on Control and Optimization, 38, 61–78.

    Article  Google Scholar 

  • Erickson, R. E. (1978). Minimum-concave-cost single-source network flows. Doctoral dissertation, Department of Operations Research, Stanford University, Stanford, CA.

  • Erickson, R. E. (1988). Optimality of stationary halting policies and finite termination of successive approximations. Mathematics of Operations Research, 13, 90–98.

    Article  Google Scholar 

  • Giri, B. C. (2011). Managing inventory with two suppliers under yield uncertainty and risk aversion. International Journal of Production Economics, 133, 80–85.

    Article  Google Scholar 

  • Howard, R. A. (1971). Proximal decision analysis. Management Science, 17, 507–541.

    Article  Google Scholar 

  • Howard, R. A., & Matheson, J. E. (1972). Risk-sensitive Markov decision processes. Management Science, 18, 356–369.

    Article  Google Scholar 

  • Li, J., & Deshpande, A. (2011). Maximizing expected utility for stochastic combinatorial optimization problem. arXiv:1012.3189v4.

  • Markowitz, H. (1952). Portfolio selection. The Journal of Finance, 7, 77–91.

    Google Scholar 

  • Pratt, J. W. (1964). Risk aversion in the small and in the large. Econometrica, 32, 122–136.

    Article  Google Scholar 

  • Rothblum, U. G. (1974). Multiplicative Markov decision chains. Doctoral dissertation, Department of Operations Research, Stanford University, Stanford, CA.

  • Rothblum, U. G. (1975). Multivariate constant risk posture. Journal of Economic Theory, 10, 309–332.

    Article  Google Scholar 

  • Rothblum, U. G. (1984). Multiplicative Markov decision chains. Mathematics of Operations Research, 9, 6–24.

    Article  Google Scholar 

  • Rothblum, U. G., & Veinott, A. F. (1992). Markov population decision chains. Unpublished manuscript.

  • White, A. M. (2002). Risk-sensitive Markov population decision processes. Tutorial paper, Department of Management Science and Engineering, Stanford University, Stanford, CA.

  • White, A. M., & Canbolat, P. G. (2012, under review). Finite-horizon Markov population decision chains with constant risk posture.

  • Von Neumann, J., & Morgenstern, O. (1947). Theory of games and economic behavior (2nd ed.). Princeton: Princeton University Press.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Pelin G. Canbolat.

Additional information

The author’s main work on this paper was done in the Operations Research Area, Department of Management Science and Engineering, Stanford University, at the suggestion of and under the supervision of her advisor Arthur F. Veinott, Jr., as Chap. 2 of her dissertation (Canbolat 2009, Chap. 2). The author is grateful to Arthur F. Veinott, Jr. for the original suggestion for the topic of this paper and for his continued guidance, and Uriel G. Rothblum for his valuable suggestions.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Canbolat, P.G. Optimal halting policies in Markov population decision chains with constant risk posture. Ann Oper Res 222, 227–237 (2014). https://doi.org/10.1007/s10479-012-1302-3

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10479-012-1302-3

Keywords