Loading [a11y]/accessibility-menu.js
Online learning for Markov decision processes applied to multi-agent systems | IEEE Conference Publication | IEEE Xplore

Online learning for Markov decision processes applied to multi-agent systems


Abstract:

Online learning is the process of providing online control decisions in sequential decision-making problems given (possibly partial) knowledge about the optimal controls ...Show More

Abstract:

Online learning is the process of providing online control decisions in sequential decision-making problems given (possibly partial) knowledge about the optimal controls for the past decision epochs. The purpose of this paper is to apply the online learning techniques on finite-state finite-action Markov Decision Processes (finite MDPs). We consider a multi-agent system composed of a learning agent and observed agents. The learning agent observes from the other agents the state probability distribution (pd) resulting from a stationary policy but not the policy itself. The state pd is observed either directly from an observed agent or through the density distribution of the multi-agent system. We show that using online learning, the learned policy performs at least as well as the one of the observed agents. Specifically, this paper shows that if the observed agents are running an optimal policy, the learning agent can learn the optimal average expected cost MDP policies via online learning techniques by using a descent gradient algorithm on the observed agents' pd data.
Date of Conference: 12-15 December 2017
Date Added to IEEE Xplore: 22 January 2018
ISBN Information:
Conference Location: Melbourne, VIC, Australia

Contact IEEE to Subscribe

References

References is not available for this document.