research-article

Optimistic initialization and greediness lead to polynomial time learning in factored MDPs

Authors:

István Szita,

András LőrinczAuthors Info & Claims

ICML '09: Proceedings of the 26th Annual International Conference on Machine Learning

Pages 1001 - 1008

https://doi.org/10.1145/1553374.1553502

Published: 14 June 2009 Publication History

Get Access

Abstract

In this paper we propose an algorithm for polynomial-time reinforcement learning in factored Markov decision processes (FMDPs). The factored optimistic initial model (FOIM) algorithm, maintains an empirical model of the FMDP in a conventional way, and always follows a greedy policy with respect to its model. The only trick of the algorithm is that the model is initialized optimistically. We prove that with suitable initialization (i) FOIM converges to the fixed point of approximate value iteration (AVI); (ii) the number of steps when the agent makes non-near-optimal decisions (with respect to the solution of AVI) is polynomial in all relevant quantities; (iii) the per-step costs of the algorithm are also polynomial. To our best knowledge, FOIM is the first algorithm with these properties.

References

[1]

Boutilier, C., Dearden, R., & Goldszmidt, M. (1995). Exploiting structure in policy construction. International Joint Conference on Artificial Intelligence (pp. 1104--1111).

Digital Library

Google Scholar

[2]

Boutilier, C., Dearden, R., & Goldszmidt, M. (2000). Stochastic dynamic programming with factored representations. Artificial Intelligence, 121, 49--107.

Digital Library

Google Scholar

[3]

Brafman, R. I., & Tennenholtz, M. (2001). R-MAX - a general polynomial time algorithm for near-optimal reinforcement learning. International Joint Conference on Artificial Intelligence (pp. 953--958).

Digital Library

Google Scholar

[4]

Guestrin, C., Koller, D., Parr, R., & Venkataraman, S. (2002). Efficient solution algorithms for factored MDPs. Journal of Artificial Intelligence Research, 19, 399--468.

Digital Library

Google Scholar

[5]

Kearns, M. J., & Koller, D. (1999). Efficient reinforcement learning in factored MDPs. International Joint Conference on Artificial Intelligence (pp. 740--747).

Digital Library

Google Scholar

[6]

Kearns, M. J., & Singh, S. (1998). Near-optimal reinforcement learning in polynomial time. International Conference on Machine Learning (pp. 260--268).

Digital Library

Google Scholar

[7]

Koller, D., & Parr, R. (2000). Policy iteration for factored MDPs. Uncertainty in Artificial Intelligence (pp. 326--334).

Digital Library

Google Scholar

[8]

Liberatore, P. (2002). The size of MDP factored policies. AAAI Conference on Artificial Intelligence (pp. 267--272).

Digital Library

Google Scholar

[9]

Strehl, A. L. (2007). Model-based reinforcement learning in factored-state MDPs. IEEE International Symposium on Approximate Dynamic Programming and Reinforcement Learning (pp. 103--110).

Crossref

Google Scholar

[10]

Strehl, A. L., Diuk, C., & Littman, M. L. (2007). Efficient structure learning in factored-state MDPs. AAAI Conference on Artificial Intelligence (pp. 645--650).

Digital Library

Google Scholar

[11]

Strehl, A. L., & Littman, M. L. (2005). A theoretical analysis of model-based interval estimation. International Conference on Machine Learning (pp. 856--863).

Digital Library

Google Scholar

[12]

Szita, I., & L&odblac;őrincz, A. (2008a). Factored value iteration. Acta Cybernetica, 18, 615--635.

Digital Library

Google Scholar

[13]

Szita, I., & L&odblac;őrincz, A. (2008b). The many faces of optimism: a unifying approach. International Conference on Machine Learning (pp. 1048--1055).

Digital Library

Google Scholar

[14]

Szita, I., & L&odblac;őrincz, A. (2009). Optimistic initialization and greediness lead to polynomial time learning in factored MDPs -- extended version. http://arxiv.org/abs/0904.3352.

Digital Library

Google Scholar

Cited By

View all

Xu JLiu BZhao XWang X(2024)Online reinforcement learning for condition-based group maintenance using factored Markov decision processesEuropean Journal of Operational Research10.1016/j.ejor.2023.11.039315:1(176-190)Online publication date: May-2024
https://doi.org/10.1016/j.ejor.2023.11.039
Cui QXiong ZFazel MDu SKoyejo SMohamed SAgarwal ABelgrave DCho KOh A(2022)Learning in congestion games with bandit feedbackProceedings of the 36th International Conference on Neural Information Processing Systems10.5555/3600270.3601070(11009-11022)Online publication date: 28-Nov-2022
https://dl.acm.org/doi/10.5555/3600270.3601070
Rosenberg AMansour YRanzato MBeygelzimer ADauphin YLiang PVaughan J(2021)Oracle-efficient regret minimization in factored MDPs with unknown structureProceedings of the 35th International Conference on Neural Information Processing Systems10.5555/3540261.3541113(11148-11159)Online publication date: 6-Dec-2021
https://dl.acm.org/doi/10.5555/3540261.3541113
Show More Cited By

Recommendations

Fast gradient-descent methods for temporal-difference learning with linear function approximation

ICML '09: Proceedings of the 26th Annual International Conference on Machine Learning

Sutton, Szepesvári and Maei (2009) recently introduced the first temporal-difference learning algorithm compatible with both linear function approximation and off-policy training, and whose complexity scales only linearly in the size of the function ...

Comments

Information & Contributors

Information

Published In

ICML '09: Proceedings of the 26th Annual International Conference on Machine Learning

June 2009

1331 pages

ISBN:9781605585161

DOI:10.1145/1553374

General Chair:
Andrea Danyluk
Williams College
,
Program Chairs:
Léon Bottou
NEC Laboratories America
,
Michael Littman
Rutgers University

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 14 June 2009

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Qualifiers

Research-article

Funding Sources

Sixth Framework Programme

Conference

ICML '09

Sponsor:

Microsoft Research

ICML '09: The 26th Annual International Conference on Machine Learning held in conjunction with the 2007 International Conference on Inductive Logic Programming

June 14 - 18, 2009

Quebec, Montreal, Canada

Acceptance Rates

Overall Acceptance Rate 140 of 548 submissions, 26%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

15
Total Citations
View Citations
175
Total Downloads

Downloads (Last 12 months)8
Downloads (Last 6 weeks)1

Reflects downloads up to 27 Feb 2025

Other Metrics

View Author Metrics

Citations

Cited By

View all

Xu JLiu BZhao XWang X(2024)Online reinforcement learning for condition-based group maintenance using factored Markov decision processesEuropean Journal of Operational Research10.1016/j.ejor.2023.11.039315:1(176-190)Online publication date: May-2024
https://doi.org/10.1016/j.ejor.2023.11.039
Cui QXiong ZFazel MDu SKoyejo SMohamed SAgarwal ABelgrave DCho KOh A(2022)Learning in congestion games with bandit feedbackProceedings of the 36th International Conference on Neural Information Processing Systems10.5555/3600270.3601070(11009-11022)Online publication date: 28-Nov-2022
https://dl.acm.org/doi/10.5555/3600270.3601070
Rosenberg AMansour YRanzato MBeygelzimer ADauphin YLiang PVaughan J(2021)Oracle-efficient regret minimization in factored MDPs with unknown structureProceedings of the 35th International Conference on Neural Information Processing Systems10.5555/3540261.3541113(11148-11159)Online publication date: 6-Dec-2021
https://dl.acm.org/doi/10.5555/3540261.3541113
Tian YQian JSra SLarochelle HRanzato MHadsell RBalcan MLin H(2020)Towards minimax optimal reinforcement learning in factored Markov decision processesProceedings of the 34th International Conference on Neural Information Processing Systems10.5555/3495724.3497394(19896-19907)Online publication date: 6-Dec-2020
https://dl.acm.org/doi/10.5555/3495724.3497394
Lu XRoy BWallach HLarochelle HBeygelzimer Ad'Alché-Buc FFox E(2019)Information-theoretic confidence bounds for reinforcement learningProceedings of the 33rd International Conference on Neural Information Processing Systems10.5555/3454287.3454508(2461-2470)Online publication date: 8-Dec-2019
https://dl.acm.org/doi/10.5555/3454287.3454508
Lőrincz A(2019)Sparsified and Twisted Residual AutoencodersBiologically Inspired Cognitive Architectures 201910.1007/978-3-030-25719-4_41(321-332)Online publication date: 17-Jul-2019
https://doi.org/10.1007/978-3-030-25719-4_41
Lőrincz A(2016)Cartesian Abstraction Can Yield ‘Cognitive Maps’Procedia Computer Science10.1016/j.procs.2016.07.43388(259-271)Online publication date: 2016
https://doi.org/10.1016/j.procs.2016.07.433
Campbell JGivigi SSchwartz H(2016)Multiple Model Q-Learning for Stochastic Asynchronous RewardsJournal of Intelligent and Robotic Systems10.1007/s10846-015-0222-281:3-4(407-422)Online publication date: 1-Mar-2016
https://dl.acm.org/doi/10.1007/s10846-015-0222-2
Lőrincz ASárkány AMilacski ZTősér Z(2016)Estimating Cartesian Compression via Deep LearningArtificial General Intelligence10.1007/978-3-319-41649-6_30(294-304)Online publication date: 25-Jun-2016
https://doi.org/10.1007/978-3-319-41649-6_30
Mihaly AMilacski ZToser ZLorincz ATettamanti TVarga I(2015)Cost and risk sensitive decision making and control for highway overtaking maneuver2015 International Conference on Models and Technologies for Intelligent Transportation Systems (MT-ITS)10.1109/MTITS.2015.7223241(87-92)Online publication date: Jun-2015
https://doi.org/10.1109/MTITS.2015.7223241
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Abstract

References

Cited By

Recommendations

Fast gradient-descent methods for temporal-difference learning with linear function approximation

Comments

Information

Published In

Sponsors

Publisher

Publication History

Permissions

Check for updates

Qualifiers

Funding Sources

Conference

Acceptance Rates

Contributors

Other Metrics

Bibliometrics

Article Metrics

Other Metrics

Citations

Cited By

Login options

Full Access

View options

PDF

eReader

Share

Share this Publication link

Share on social media

Affiliations