Centralized Optimization for Dec-POMDPs Under the Expected Average Reward Criterion | IEEE Journals & Magazine | IEEE Xplore

Centralized Optimization for Dec-POMDPs Under the Expected Average Reward Criterion


Abstract:

In this paper, the decentralized partially observable Markov decision process (Dec-POMDP) systems with discrete state and action spaces are studied from a gradient point ...Show More

Abstract:

In this paper, the decentralized partially observable Markov decision process (Dec-POMDP) systems with discrete state and action spaces are studied from a gradient point of view. Dec-POMDPs have recently emerged as a promising approach to optimizing multiagent decision making in the partially observable stochastic environment. However, the decentralized nature of the Dec-POMDP framework results in a lack of shared belief state, which makes the decision maker impossible to estimate the system state based on local information. In contrast to the belief-based policy, this paper focuses on optimizing the decentralized observation-based policy, which is easily to be applied and does not have the sharing problem. By analyzing the gradient of the objective function, we have developed a centralized stochastic gradient policy iteration algorithm to find the optimal policy on the basis of gradient estimates from a single sample path. This algorithm does not need any specific assumption and can be applied to most practical Dec-POMDP problems. One numerical example is provided to demonstrate the effectiveness of the algorithm.
Published in: IEEE Transactions on Automatic Control ( Volume: 62, Issue: 11, November 2017)
Page(s): 6032 - 6038
Date of Publication: 08 May 2017

ISSN Information:

Funding Agency:


References

References is not available for this document.