Abstract
This paper concerns infinite-horizon Markov population decision chains with finite state-action space, where one exerts control on a population of individuals in different states by assigning an action to each individual in the system in each period. In every transition, each individual earns a random reward and generates a random progeny vector. The objective of the decision maker is to maximize expected (infinite-horizon) system utility under the following assumptions: (i) The utility function exhibits constant risk posture, (ii) the (random) progeny vectors of distinct individuals are independent, and (iii) the progeny vectors of individuals in a state who take the same action are identically distributed. The paper deals with the problem of finding an optimal stationary halting policy and shows that this problem can be solved efficiently using successive approximations with the original state-action space without enlarging it to include information about the population in each state or any other aspect of the system history in a state. The proposed algorithm terminates after a finite number of iterations with an optimal stationary halting policy or with proof of nonexistence.
Similar content being viewed by others
Notes
The theory in this paper holds true when rewards are allowed to be −∞, with the convention that ∞0=1. This would be the case in practice when a state is to be avoided at all cost, or if an action is in fact infeasible. However, there is no loss in generality in assuming no such states exist, since they can be eliminated from the system by preprocessing the data. To do this, all actions with positive probability of generating −∞ reward are stripped out of the system. After this, if there exists no admissible action in a state, this state is deleted as well as all actions in remaining states that lead to this one. Then the problem becomes that of finding a stationary optimal halting policy in the reduced system.
Whenever 1 follows a row vector, it corresponds to the column vector of ones.
The reader may wonder why we merely seek a policy with maximum lim sup when such a policy is assured to be nearly optimal for the N-period problem only for some, but not all large N. The answer is that in all cases, we exhibit a stationary optimal policy δ ∞ for which the lim sup in (3) equals the lim inf. Then δ ∞ will be nearly optimal for the N-period problem for all large N.
Rothblum refers to halting policies as nilpotent policies.
Throughout, 00≡1.
The term increasing (decreasing) is used in the weak sense of nondecreasing (nonincreasing).
References
Arrow, K. J. (1965). Aspects of the theory of risk-bearing. Yrjö Jahnssonis Lectures, Helsinki.
Canakoglu, E., & Ozekici, S. (2009). Portfolio selection in stochastic markets with exponential utility functions. Annals of Operations Research, 166, 281–297.
Canbolat, P. G. (2009). Markov population decision chains with constant risk posture. Doctoral dissertation, Department of Management Science and Engineering, Stanford University, Stanford, CA.
Cavazos-Cadena, R., & Montes-de-Oca, R. (2003). The value iteration algorithm in risk-sensitive average Markov decision chains. Mathematics of Operations Research, 28, 752–776.
Cavazos-Cadena, R., & Hernández-Hernández, D. (2005). A characterization of the optimal risk-sensitive average cost in finite controlled Markov chains. The Annals of Applied Probability, 15, 175–212.
Choi, S., & Ruszczynski, A. (2011). A multi-product risk-averse newsvendor with exponential utility function. European Journal of Operational Research, 214, 78–84.
Denardo, E. V., & Rothblum, U. G. (1979). Optimal stopping, exponential utility, and linear programming. Mathematical Programming, 16, 228–244.
Denardo, E. V., & Rothblum, U. G. (2006). A turnpike theorem for a risk-sensitive Markov decision process with stopping. SIAM Journal on Control and Optimization, 45, 414–431.
Denardo, E. V., Park, H., & Rothblum, U. G. (2007). Risk-neutral and risk-sensitive multiarmed bandits. Mathematics of Operations Research, 32, 374–394.
Denardo, E. V., Rothblum, U. G., & van der Heyden, L. (2004). Index policies for stochastic search in a forest with an application to R&D project management. Mathematics of Operations Research, 29, 162–181.
Di Masi, G. B., & Stettner, L. (2000). Risk sensitive control of discrete-time Markov processes with infinite horizon. SIAM Journal on Control and Optimization, 38, 61–78.
Erickson, R. E. (1978). Minimum-concave-cost single-source network flows. Doctoral dissertation, Department of Operations Research, Stanford University, Stanford, CA.
Erickson, R. E. (1988). Optimality of stationary halting policies and finite termination of successive approximations. Mathematics of Operations Research, 13, 90–98.
Giri, B. C. (2011). Managing inventory with two suppliers under yield uncertainty and risk aversion. International Journal of Production Economics, 133, 80–85.
Howard, R. A. (1971). Proximal decision analysis. Management Science, 17, 507–541.
Howard, R. A., & Matheson, J. E. (1972). Risk-sensitive Markov decision processes. Management Science, 18, 356–369.
Li, J., & Deshpande, A. (2011). Maximizing expected utility for stochastic combinatorial optimization problem. arXiv:1012.3189v4.
Markowitz, H. (1952). Portfolio selection. The Journal of Finance, 7, 77–91.
Pratt, J. W. (1964). Risk aversion in the small and in the large. Econometrica, 32, 122–136.
Rothblum, U. G. (1974). Multiplicative Markov decision chains. Doctoral dissertation, Department of Operations Research, Stanford University, Stanford, CA.
Rothblum, U. G. (1975). Multivariate constant risk posture. Journal of Economic Theory, 10, 309–332.
Rothblum, U. G. (1984). Multiplicative Markov decision chains. Mathematics of Operations Research, 9, 6–24.
Rothblum, U. G., & Veinott, A. F. (1992). Markov population decision chains. Unpublished manuscript.
White, A. M. (2002). Risk-sensitive Markov population decision processes. Tutorial paper, Department of Management Science and Engineering, Stanford University, Stanford, CA.
White, A. M., & Canbolat, P. G. (2012, under review). Finite-horizon Markov population decision chains with constant risk posture.
Von Neumann, J., & Morgenstern, O. (1947). Theory of games and economic behavior (2nd ed.). Princeton: Princeton University Press.
Author information
Authors and Affiliations
Corresponding author
Additional information
The author’s main work on this paper was done in the Operations Research Area, Department of Management Science and Engineering, Stanford University, at the suggestion of and under the supervision of her advisor Arthur F. Veinott, Jr., as Chap. 2 of her dissertation (Canbolat 2009, Chap. 2). The author is grateful to Arthur F. Veinott, Jr. for the original suggestion for the topic of this paper and for his continued guidance, and Uriel G. Rothblum for his valuable suggestions.
Rights and permissions
About this article
Cite this article
Canbolat, P.G. Optimal halting policies in Markov population decision chains with constant risk posture. Ann Oper Res 222, 227–237 (2014). https://doi.org/10.1007/s10479-012-1302-3
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10479-012-1302-3