Abstract
We study the situation where there are a number of on-going production processes each yielding a state-dependent standard reward in discrete time. At each time step one may select at most one of these processes for improvement; the selected process will yield a state-dependent non-standard reward (or cost) at that time step and change its state according to a Markov chain. We show that this model can be cast into a bandit formulation with constructed rewards and we characterize the optimal policy. Finally, we present a numerical example.
Similar content being viewed by others
Author information
Authors and Affiliations
Additional information
Manuscript received: December 2000/Final version received: September 2001
Rights and permissions
About this article
Cite this article
Brock, M., Tind, J. Dynamic productivity improvement in a model with multiple processes. Mathematical Methods of OR 54, 387–393 (2001). https://doi.org/10.1007/s001860100166
Published:
Issue Date:
DOI: https://doi.org/10.1007/s001860100166