skip to main content
10.1145/1553374.1553502acmotherconferencesArticle/Chapter ViewAbstractPublication PagesicmlConference Proceedingsconference-collections
research-article

Optimistic initialization and greediness lead to polynomial time learning in factored MDPs

Published: 14 June 2009 Publication History

Abstract

In this paper we propose an algorithm for polynomial-time reinforcement learning in factored Markov decision processes (FMDPs). The factored optimistic initial model (FOIM) algorithm, maintains an empirical model of the FMDP in a conventional way, and always follows a greedy policy with respect to its model. The only trick of the algorithm is that the model is initialized optimistically. We prove that with suitable initialization (i) FOIM converges to the fixed point of approximate value iteration (AVI); (ii) the number of steps when the agent makes non-near-optimal decisions (with respect to the solution of AVI) is polynomial in all relevant quantities; (iii) the per-step costs of the algorithm are also polynomial. To our best knowledge, FOIM is the first algorithm with these properties.

References

[1]
Boutilier, C., Dearden, R., & Goldszmidt, M. (1995). Exploiting structure in policy construction. International Joint Conference on Artificial Intelligence (pp. 1104--1111).
[2]
Boutilier, C., Dearden, R., & Goldszmidt, M. (2000). Stochastic dynamic programming with factored representations. Artificial Intelligence, 121, 49--107.
[3]
Brafman, R. I., & Tennenholtz, M. (2001). R-MAX - a general polynomial time algorithm for near-optimal reinforcement learning. International Joint Conference on Artificial Intelligence (pp. 953--958).
[4]
Guestrin, C., Koller, D., Parr, R., & Venkataraman, S. (2002). Efficient solution algorithms for factored MDPs. Journal of Artificial Intelligence Research, 19, 399--468.
[5]
Kearns, M. J., & Koller, D. (1999). Efficient reinforcement learning in factored MDPs. International Joint Conference on Artificial Intelligence (pp. 740--747).
[6]
Kearns, M. J., & Singh, S. (1998). Near-optimal reinforcement learning in polynomial time. International Conference on Machine Learning (pp. 260--268).
[7]
Koller, D., & Parr, R. (2000). Policy iteration for factored MDPs. Uncertainty in Artificial Intelligence (pp. 326--334).
[8]
Liberatore, P. (2002). The size of MDP factored policies. AAAI Conference on Artificial Intelligence (pp. 267--272).
[9]
Strehl, A. L. (2007). Model-based reinforcement learning in factored-state MDPs. IEEE International Symposium on Approximate Dynamic Programming and Reinforcement Learning (pp. 103--110).
[10]
Strehl, A. L., Diuk, C., & Littman, M. L. (2007). Efficient structure learning in factored-state MDPs. AAAI Conference on Artificial Intelligence (pp. 645--650).
[11]
Strehl, A. L., & Littman, M. L. (2005). A theoretical analysis of model-based interval estimation. International Conference on Machine Learning (pp. 856--863).
[12]
Szita, I., & Lőőrincz, A. (2008a). Factored value iteration. Acta Cybernetica, 18, 615--635.
[13]
Szita, I., & Lőőrincz, A. (2008b). The many faces of optimism: a unifying approach. International Conference on Machine Learning (pp. 1048--1055).
[14]
Szita, I., & Lőőrincz, A. (2009). Optimistic initialization and greediness lead to polynomial time learning in factored MDPs -- extended version. http://arxiv.org/abs/0904.3352.

Cited By

View all
  • (2024)Online reinforcement learning for condition-based group maintenance using factored Markov decision processesEuropean Journal of Operational Research10.1016/j.ejor.2023.11.039315:1(176-190)Online publication date: May-2024
  • (2022)Learning in congestion games with bandit feedbackProceedings of the 36th International Conference on Neural Information Processing Systems10.5555/3600270.3601070(11009-11022)Online publication date: 28-Nov-2022
  • (2021)Oracle-efficient regret minimization in factored MDPs with unknown structureProceedings of the 35th International Conference on Neural Information Processing Systems10.5555/3540261.3541113(11148-11159)Online publication date: 6-Dec-2021
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Other conferences
ICML '09: Proceedings of the 26th Annual International Conference on Machine Learning
June 2009
1331 pages
ISBN:9781605585161
DOI:10.1145/1553374

Sponsors

  • NSF
  • Microsoft Research: Microsoft Research
  • MITACS

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 14 June 2009

Permissions

Request permissions for this article.

Check for updates

Qualifiers

  • Research-article

Funding Sources

Conference

ICML '09
Sponsor:
  • Microsoft Research

Acceptance Rates

Overall Acceptance Rate 140 of 548 submissions, 26%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)8
  • Downloads (Last 6 weeks)1
Reflects downloads up to 27 Feb 2025

Other Metrics

Citations

Cited By

View all
  • (2024)Online reinforcement learning for condition-based group maintenance using factored Markov decision processesEuropean Journal of Operational Research10.1016/j.ejor.2023.11.039315:1(176-190)Online publication date: May-2024
  • (2022)Learning in congestion games with bandit feedbackProceedings of the 36th International Conference on Neural Information Processing Systems10.5555/3600270.3601070(11009-11022)Online publication date: 28-Nov-2022
  • (2021)Oracle-efficient regret minimization in factored MDPs with unknown structureProceedings of the 35th International Conference on Neural Information Processing Systems10.5555/3540261.3541113(11148-11159)Online publication date: 6-Dec-2021
  • (2020)Towards minimax optimal reinforcement learning in factored Markov decision processesProceedings of the 34th International Conference on Neural Information Processing Systems10.5555/3495724.3497394(19896-19907)Online publication date: 6-Dec-2020
  • (2019)Information-theoretic confidence bounds for reinforcement learningProceedings of the 33rd International Conference on Neural Information Processing Systems10.5555/3454287.3454508(2461-2470)Online publication date: 8-Dec-2019
  • (2019)Sparsified and Twisted Residual AutoencodersBiologically Inspired Cognitive Architectures 201910.1007/978-3-030-25719-4_41(321-332)Online publication date: 17-Jul-2019
  • (2016)Cartesian Abstraction Can Yield ‘Cognitive Maps’Procedia Computer Science10.1016/j.procs.2016.07.43388(259-271)Online publication date: 2016
  • (2016)Multiple Model Q-Learning for Stochastic Asynchronous RewardsJournal of Intelligent and Robotic Systems10.1007/s10846-015-0222-281:3-4(407-422)Online publication date: 1-Mar-2016
  • (2016)Estimating Cartesian Compression via Deep LearningArtificial General Intelligence10.1007/978-3-319-41649-6_30(294-304)Online publication date: 25-Jun-2016
  • (2015)Cost and risk sensitive decision making and control for highway overtaking maneuver2015 International Conference on Models and Technologies for Intelligent Transportation Systems (MT-ITS)10.1109/MTITS.2015.7223241(87-92)Online publication date: Jun-2015
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media