Optimal halting policies in Markov population decision chains with constant risk posture

Canbolat, Pelin G.

doi:10.1007/s10479-012-1302-3

Optimal halting policies in Markov population decision chains with constant risk posture

Published: 12 January 2013

Volume 222, pages 227–237, (2014)
Cite this article

Annals of Operations Research Aims and scope Submit manuscript

Pelin G. Canbolat¹

283 Accesses
Explore all metrics

Abstract

This paper concerns infinite-horizon Markov population decision chains with finite state-action space, where one exerts control on a population of individuals in different states by assigning an action to each individual in the system in each period. In every transition, each individual earns a random reward and generates a random progeny vector. The objective of the decision maker is to maximize expected (infinite-horizon) system utility under the following assumptions: (i) The utility function exhibits constant risk posture, (ii) the (random) progeny vectors of distinct individuals are independent, and (iii) the progeny vectors of individuals in a state who take the same action are identically distributed. The paper deals with the problem of finding an optimal stationary halting policy and shows that this problem can be solved efficiently using successive approximations with the original state-action space without enlarging it to include information about the population in each state or any other aspect of the system history in a state. The proposed algorithm terminates after a finite number of iterations with an optimal stationary halting policy or with proof of nonexistence.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Average criteria in denumerable semi-Markov decision chains under risk-aversion

Article 21 August 2023

An incremental off-policy search in a model-free Markov decision process using a single sample path

Article 13 February 2018

Continuous-Time Mean Field Markov Decision Models

Article Open access 22 June 2024

Notes

The theory in this paper holds true when rewards are allowed to be −∞, with the convention that ∞⁰=1. This would be the case in practice when a state is to be avoided at all cost, or if an action is in fact infeasible. However, there is no loss in generality in assuming no such states exist, since they can be eliminated from the system by preprocessing the data. To do this, all actions with positive probability of generating −∞ reward are stripped out of the system. After this, if there exists no admissible action in a state, this state is deleted as well as all actions in remaining states that lead to this one. Then the problem becomes that of finding a stationary optimal halting policy in the reduced system.
Whenever 1 follows a row vector, it corresponds to the column vector of ones.
The reader may wonder why we merely seek a policy with maximum lim sup when such a policy is assured to be nearly optimal for the N-period problem only for some, but not all large N. The answer is that in all cases, we exhibit a stationary optimal policy δ ^∞ for which the lim sup in (3) equals the lim inf. Then δ ^∞ will be nearly optimal for the N-period problem for all large N.
Rothblum refers to halting policies as nilpotent policies.
Throughout, 0⁰≡1.
The term increasing (decreasing) is used in the weak sense of nondecreasing (nonincreasing).

References

Arrow, K. J. (1965). Aspects of the theory of risk-bearing. Yrjö Jahnssonis Lectures, Helsinki.
Canakoglu, E., & Ozekici, S. (2009). Portfolio selection in stochastic markets with exponential utility functions. Annals of Operations Research, 166, 281–297.
Article Google Scholar
Canbolat, P. G. (2009). Markov population decision chains with constant risk posture. Doctoral dissertation, Department of Management Science and Engineering, Stanford University, Stanford, CA.
Cavazos-Cadena, R., & Montes-de-Oca, R. (2003). The value iteration algorithm in risk-sensitive average Markov decision chains. Mathematics of Operations Research, 28, 752–776.
Article Google Scholar
Cavazos-Cadena, R., & Hernández-Hernández, D. (2005). A characterization of the optimal risk-sensitive average cost in finite controlled Markov chains. The Annals of Applied Probability, 15, 175–212.
Article Google Scholar
Choi, S., & Ruszczynski, A. (2011). A multi-product risk-averse newsvendor with exponential utility function. European Journal of Operational Research, 214, 78–84.
Article Google Scholar
Denardo, E. V., & Rothblum, U. G. (1979). Optimal stopping, exponential utility, and linear programming. Mathematical Programming, 16, 228–244.
Article Google Scholar
Denardo, E. V., & Rothblum, U. G. (2006). A turnpike theorem for a risk-sensitive Markov decision process with stopping. SIAM Journal on Control and Optimization, 45, 414–431.
Article Google Scholar
Denardo, E. V., Park, H., & Rothblum, U. G. (2007). Risk-neutral and risk-sensitive multiarmed bandits. Mathematics of Operations Research, 32, 374–394.
Article Google Scholar
Denardo, E. V., Rothblum, U. G., & van der Heyden, L. (2004). Index policies for stochastic search in a forest with an application to R&D project management. Mathematics of Operations Research, 29, 162–181.
Article Google Scholar
Di Masi, G. B., & Stettner, L. (2000). Risk sensitive control of discrete-time Markov processes with infinite horizon. SIAM Journal on Control and Optimization, 38, 61–78.
Article Google Scholar
Erickson, R. E. (1978). Minimum-concave-cost single-source network flows. Doctoral dissertation, Department of Operations Research, Stanford University, Stanford, CA.
Erickson, R. E. (1988). Optimality of stationary halting policies and finite termination of successive approximations. Mathematics of Operations Research, 13, 90–98.
Article Google Scholar
Giri, B. C. (2011). Managing inventory with two suppliers under yield uncertainty and risk aversion. International Journal of Production Economics, 133, 80–85.
Article Google Scholar
Howard, R. A. (1971). Proximal decision analysis. Management Science, 17, 507–541.
Article Google Scholar
Howard, R. A., & Matheson, J. E. (1972). Risk-sensitive Markov decision processes. Management Science, 18, 356–369.
Article Google Scholar
Li, J., & Deshpande, A. (2011). Maximizing expected utility for stochastic combinatorial optimization problem. arXiv:1012.3189v4.
Markowitz, H. (1952). Portfolio selection. The Journal of Finance, 7, 77–91.
Google Scholar
Pratt, J. W. (1964). Risk aversion in the small and in the large. Econometrica, 32, 122–136.
Article Google Scholar
Rothblum, U. G. (1974). Multiplicative Markov decision chains. Doctoral dissertation, Department of Operations Research, Stanford University, Stanford, CA.
Rothblum, U. G. (1975). Multivariate constant risk posture. Journal of Economic Theory, 10, 309–332.
Article Google Scholar
Rothblum, U. G. (1984). Multiplicative Markov decision chains. Mathematics of Operations Research, 9, 6–24.
Article Google Scholar
Rothblum, U. G., & Veinott, A. F. (1992). Markov population decision chains. Unpublished manuscript.
White, A. M. (2002). Risk-sensitive Markov population decision processes. Tutorial paper, Department of Management Science and Engineering, Stanford University, Stanford, CA.
White, A. M., & Canbolat, P. G. (2012, under review). Finite-horizon Markov population decision chains with constant risk posture.
Von Neumann, J., & Morgenstern, O. (1947). Theory of games and economic behavior (2nd ed.). Princeton: Princeton University Press.
Google Scholar

Download references

Author information

Authors and Affiliations

Faculty of Industrial Engineering and Management, Technion—Israel Institute of Technology, Haifa, Israel
Pelin G. Canbolat

Authors

Pelin G. Canbolat
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Pelin G. Canbolat.

Additional information

The author’s main work on this paper was done in the Operations Research Area, Department of Management Science and Engineering, Stanford University, at the suggestion of and under the supervision of her advisor Arthur F. Veinott, Jr., as Chap. 2 of her dissertation (Canbolat 2009, Chap. 2). The author is grateful to Arthur F. Veinott, Jr. for the original suggestion for the topic of this paper and for his continued guidance, and Uriel G. Rothblum for his valuable suggestions.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Canbolat, P.G. Optimal halting policies in Markov population decision chains with constant risk posture. Ann Oper Res 222, 227–237 (2014). https://doi.org/10.1007/s10479-012-1302-3

Download citation

Published: 12 January 2013
Issue Date: November 2014
DOI: https://doi.org/10.1007/s10479-012-1302-3

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Optimal halting policies in Markov population decision chains with constant risk posture

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Average criteria in denumerable semi-Markov decision chains under risk-aversion

An incremental off-policy search in a model-free Markov decision process using a single sample path

Continuous-Time Mean Field Markov Decision Models

Notes

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Keywords

Subscribe and save

Buy Now

Navigation

Optimal halting policies in Markov population decision chains with constant risk posture

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Average criteria in denumerable semi-Markov decision chains under risk-aversion

An incremental off-policy search in a model-free Markov decision process using a single sample path

Continuous-Time Mean Field Markov Decision Models

Notes

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now

Search

Navigation