Loading [a11y]/accessibility-menu.js
An analysis of gradient-based policy iteration | IEEE Conference Publication | IEEE Xplore

An analysis of gradient-based policy iteration


Abstract:

Recently, a system theoretic framework for learning and optimization has been developed that shows how many approximate dynamic programming paradigms such as perturbation...Show More

Abstract:

Recently, a system theoretic framework for learning and optimization has been developed that shows how many approximate dynamic programming paradigms such as perturbation analysis, Markov decision processes, and reinforcement learning are very closely related. Using this system theoretic framework, a new optimization technique called gradient-based policy iteration (GBPI) has been developed. In this paper, we show how GBPI iteration can be extended to partially observable Markov decision processes (POMDPs). We also develop the value iteration analogue of GBPI and show that this new version of value iteration, extended to POMDPs, not only theoretically acts like value iteration but also does so numerically.
Date of Conference: 31 July 2005 - 04 August 2005
Date Added to IEEE Xplore: 27 December 2005
Print ISBN:0-7803-9048-2

ISSN Information:

Conference Location: Montreal, Que.

Contact IEEE to Subscribe

References

References is not available for this document.