Abstract
A general control model under uncertainty is considered. Using a Bayesian approach and dynamic programming, we investigate structural properties of optimal decision rules. In particular, we show the monotonicity of the total expected reward and of the so-called Gittins-Index. We extend the stopping rule and the stay-on-a-winner rule, which are well-known in bandit problems. Our approach is based on the multivariate likelihood ratio order andTP 2 functions.
Similar content being viewed by others
References
H. Benzing, K. Hinderer and M. Kolonko, On thek-armed Bernoulli bandit: Monotonicity of the total reward under an arbitrary prior distribution, Math. Operationsforschung Statistik, Ser. Optimization 15(1984)583–595.
H. Benzing and M. Kolonko, Structured policies for a sequential design problem with general distributions, Math. Oper. Res. 12(1987)60–71.
D.A. Berry and B. Fristedt,Bandit Problems (Chapman and Hall, London, 1985).
D.P. Bertsekas,Dynamic Programming: Deterministic and Stochastic Models (Prentice Hall, Englewood Cliffs, NJ, 1987).
K. Hinderer,Foundations of Non-Stationary Dynamic Programming with Discrete Time, Parameter (Springer, Berlin, 1970).
S. Karlin and Y. Rinott, Classes of orderings of measures and related correlation inequalities. I. Multivariate totally positive distributions, J. Multivariate Anal. 10(1980)467–498.
M. Kolonko, A note on a general stoppting rule in dynamic programming with finite horizon, Statist. Decisions 4(1986)379–387.
U. Rieder, Bayesian dynamic programming, Adv. Appl. Prob. 7(1975)330–348.
U. Rieder, Bayessche Kontrollmodelle, Skript Universität Ulm (1988).
H. Wagner, Strukturuntersuchungen in Bayesschen Semi-Markoffschen Kontrollmodellen, Dissertation, Universität Ulm (1988).
W. Whitt, Multivariate monotone likelihood ratio and uniform conditional stochastic order, J. Appl. Prob. 19(1982)695–701.
P. Whittle,Optimization over Time, Vol. 1 (Wiley, New York, 1982).
Author information
Authors and Affiliations
Rights and permissions
About this article
Cite this article
Rieder, U., Wagner, H. Structured policies in the sequential design of experiments. Ann Oper Res 32, 165–188 (1991). https://doi.org/10.1007/BF02204833
Issue Date:
DOI: https://doi.org/10.1007/BF02204833