skip to main content
10.1145/1553374.1553383acmotherconferencesArticle/Chapter ViewAbstractPublication PagesicmlConference Proceedingsconference-collections
research-article

Predictive representations for policy gradient in POMDPs

Published: 14 June 2009 Publication History

Abstract

We consider the problem of estimating the policy gradient in Partially Observable Markov Decision Processes (POMDPs) with a special class of policies that are based on Predictive State Representations (PSRs). We compare PSR policies to Finite-State Controllers (FSCs), which are considered as a standard model for policy gradient methods in POMDPs. We present a general Actor-Critic algorithm for learning both FSCs and PSR policies. The critic part computes a value function that has as variables the parameters of the policy. These latter parameters are gradually updated to maximize the value function. We show that the value function is polynomial for both FSCs and PSR policies, with a potentially smaller degree in the case of PSR policies. Therefore, the value function of a PSR policy can have less local optima than the equivalent FSC, and consequently, the gradient algorithm is more likely to converge to a global optimal solution.

References

[1]
Aberdeen, D., & Baxter, J. (2002). Scaling Internal-State Policy-Gradient Methods for POMDPs. Proc. 19th Int. Conf. Machine Learning (pp. 3--10).
[2]
Aberdeen, D., Buffet, O., & Thomas, O. (2007). Policy-Gradients for PSRs and POMDPs. Proc. 11th Int. Conf. Artificial Intelligence and Statistics.
[3]
Baxter, J., & Bartlett, P. (2000). Reinforcement Learning in POMDP's via Direct Gradient Ascent. Proc. 17th Int. Conf. Machine Learning (pp. 41--48).
[4]
Casella, G., & Robert, C. P. (1996). Raoblackwellisation of Sampling Schemes. Biometrika, 15, 229--235.
[5]
Littman, M., Sutton, R., & Singh, S. (2002). Predictive Representations of State. Advances in Neural Information Processing Systems 14 (pp. 1555--1561).
[6]
Makino, T., & Takagi, T. (2008). On-line Discovery of Temporal-Difference Networks. Proc. 25th Int. Conf. Machine Learning (pp. 632--639).
[7]
Meuleau, N., Peshkin, L., Kim, K., & Kaelbling, L. (1999). Learning Finite-State Controllers for Partially Observable Environments. Uncertainty in Artificial Intelligence: Proc. 15th Conf. (pp. 427--436).
[8]
Peshkin, L. (2001). Reinforcement Learning by Policy Search. Doctoral dissertation, Massachusetts Institute of Technology.
[9]
Peters, J., & Schaal, S. (2006). Policy Gradient Methods for Robotics. Proc. IEEE Int. Conf. Intelligent Robotics Systems (pp. 2219--2225).
[10]
Shelton, C. R. (2001). Importance Sampling for Reinforcement Learning with Multiple Objectives. Doctoral dissertation, Massachusetts Institute of Technology.
[11]
Sutton, R. S., Mcallester, D., Singh, S., & Mansour, Y. (2000). Policy Gradient Methods for Reinforcement Learning with Function Approximation. Advances in Neural Information Processing Systems 12 (pp. 1057--1063).
[12]
Wiewiora, E. (2005). Learning Predictive Representations from a History. Proc. 22nd Int. Conf. Machine Learning (pp. 964--971).

Cited By

View all
  • (2021)Constrained representation learning for recurrent policy optimisation under uncertaintyAdaptive Behavior - Animals, Animats, Software Agents, Robots, Adaptive Systems10.1177/105971231989164129:3(253-265)Online publication date: 1-Jun-2021
  • (2020)Learning Transition Models with Time-delayed Causal Relations2020 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS)10.1109/IROS45743.2020.9340809(8087-8093)Online publication date: 24-Oct-2020
  • (2012)Predictively Defined Representations of StateReinforcement Learning10.1007/978-3-642-27645-3_13(415-439)Online publication date: 2012

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Other conferences
ICML '09: Proceedings of the 26th Annual International Conference on Machine Learning
June 2009
1331 pages
ISBN:9781605585161
DOI:10.1145/1553374

Sponsors

  • NSF
  • Microsoft Research: Microsoft Research
  • MITACS

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 14 June 2009

Permissions

Request permissions for this article.

Check for updates

Qualifiers

  • Research-article

Conference

ICML '09
Sponsor:
  • Microsoft Research

Acceptance Rates

Overall Acceptance Rate 140 of 548 submissions, 26%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)5
  • Downloads (Last 6 weeks)0
Reflects downloads up to 05 Mar 2025

Other Metrics

Citations

Cited By

View all
  • (2021)Constrained representation learning for recurrent policy optimisation under uncertaintyAdaptive Behavior - Animals, Animats, Software Agents, Robots, Adaptive Systems10.1177/105971231989164129:3(253-265)Online publication date: 1-Jun-2021
  • (2020)Learning Transition Models with Time-delayed Causal Relations2020 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS)10.1109/IROS45743.2020.9340809(8087-8093)Online publication date: 24-Oct-2020
  • (2012)Predictively Defined Representations of StateReinforcement Learning10.1007/978-3-642-27645-3_13(415-439)Online publication date: 2012

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media