Skip to main content

Advertisement

Log in

Structured policies in the sequential design of experiments

  • Published:
Annals of Operations Research Aims and scope Submit manuscript

Abstract

A general control model under uncertainty is considered. Using a Bayesian approach and dynamic programming, we investigate structural properties of optimal decision rules. In particular, we show the monotonicity of the total expected reward and of the so-called Gittins-Index. We extend the stopping rule and the stay-on-a-winner rule, which are well-known in bandit problems. Our approach is based on the multivariate likelihood ratio order andTP 2 functions.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. H. Benzing, K. Hinderer and M. Kolonko, On thek-armed Bernoulli bandit: Monotonicity of the total reward under an arbitrary prior distribution, Math. Operationsforschung Statistik, Ser. Optimization 15(1984)583–595.

    Google Scholar 

  2. H. Benzing and M. Kolonko, Structured policies for a sequential design problem with general distributions, Math. Oper. Res. 12(1987)60–71.

    Google Scholar 

  3. D.A. Berry and B. Fristedt,Bandit Problems (Chapman and Hall, London, 1985).

    Google Scholar 

  4. D.P. Bertsekas,Dynamic Programming: Deterministic and Stochastic Models (Prentice Hall, Englewood Cliffs, NJ, 1987).

    Google Scholar 

  5. K. Hinderer,Foundations of Non-Stationary Dynamic Programming with Discrete Time, Parameter (Springer, Berlin, 1970).

    Google Scholar 

  6. S. Karlin and Y. Rinott, Classes of orderings of measures and related correlation inequalities. I. Multivariate totally positive distributions, J. Multivariate Anal. 10(1980)467–498.

    Google Scholar 

  7. M. Kolonko, A note on a general stoppting rule in dynamic programming with finite horizon, Statist. Decisions 4(1986)379–387.

    Google Scholar 

  8. U. Rieder, Bayesian dynamic programming, Adv. Appl. Prob. 7(1975)330–348.

    Google Scholar 

  9. U. Rieder, Bayessche Kontrollmodelle, Skript Universität Ulm (1988).

  10. H. Wagner, Strukturuntersuchungen in Bayesschen Semi-Markoffschen Kontrollmodellen, Dissertation, Universität Ulm (1988).

  11. W. Whitt, Multivariate monotone likelihood ratio and uniform conditional stochastic order, J. Appl. Prob. 19(1982)695–701.

    Google Scholar 

  12. P. Whittle,Optimization over Time, Vol. 1 (Wiley, New York, 1982).

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Rights and permissions

Reprints and permissions

About this article

Cite this article

Rieder, U., Wagner, H. Structured policies in the sequential design of experiments. Ann Oper Res 32, 165–188 (1991). https://doi.org/10.1007/BF02204833

Download citation

  • Issue Date:

  • DOI: https://doi.org/10.1007/BF02204833

Keywords