Abstract:
One often encounters the curse of dimensionality in the application of dynamic programming to determine optimal policies for controlled Markov chains. In this paper, we p...Show MoreMetadata
Abstract:
One often encounters the curse of dimensionality in the application of dynamic programming to determine optimal policies for controlled Markov chains. In this paper, we provide a method to construct sub-optimal policies along with a bound for the deviation of such a policy from the optimum through the use of restricted linear programming. The novelty of this approach lies in circumventing the need for a value iteration or a linear program defined on the entire state-space. Instead, the state-space is partitioned based on the reward structure and the optimal cost-to-go or value function is approximated by a constant over each partition. We associate a meta-state with each partition, where the transition probabilities between these meta-states can be derived from the original Markov chain specification. The state aggregation approach results in a significant reduction in the computational burden and lends itself to a restricted linear program defined on the aggregated state-space. Finally, the proposed method is bench marked on a perimeter surveillance stochastic control problem.
Published in: 49th IEEE Conference on Decision and Control (CDC)
Date of Conference: 15-17 December 2010
Date Added to IEEE Xplore: 22 February 2011
ISBN Information: