Article

Bounded real-time dynamic programming: RTDP with monotone upper bounds and performance guarantees

Authors:

H. Brendan McMahan,

Maxim Likhachev,

Geoffrey J. GordonAuthors Info & Claims

ICML '05: Proceedings of the 22nd international conference on Machine learning

Pages 569 - 576

https://doi.org/10.1145/1102351.1102423

Published: 07 August 2005 Publication History

Get Access

Abstract

MDPs are an attractive formalization for planning, but realistic problems often have intractably large state spaces. When we only need a partial policy to get from a fixed start state to a goal, restricting computation to states relevant to this task can make much larger problems tractable. We introduce a new algorithm, Bounded RTDP, which can produce partial policies with strong performance guarantees while only touching a fraction of the state space, even on problems where other algorithms would have to visit the full state space. To do so, Bounded RTDP maintains both upper and lower bounds on the optimal value function. The performance of Bounded RTDP is greatly aided by the introduction of a new technique to efficiently find suitable upper bounds; this technique can also be used to provide informed initialization to a wide range of other planning algorithms.

References

[1]

Barto, A. G., Bradtke, S. J., & Singh, S. P. (1995). Learning to act using real-time dynamic programming. Artif. Intell., 72, 81--138.

Digital Library

Google Scholar

[2]

Bertsekas, D. P., & Tsitsiklis, J. N. (1996). Neurodynamic programming. Belmont, MA: Athena Scientific.

Digital Library

Google Scholar

[3]

Bonet, B., & Geffner, H. (2003a). Faster heuristic search algorithms for planning with uncertainty and full feedback. Proc. 18th International Joint Conf. on Artificial Intelligence (pp. 1233--1238). Acapulco, Mexico: Morgan Kaufmann.

Digital Library

Google Scholar

[4]

Bonet, B., & Geffner, H. (2003b). Labeled RTDP: Improving the convergence of real-time dynamic programming. Proc. of ICAPS-03 (pp. 12--21).

Google Scholar

[5]

Dean, T., Kaelbling, L. P., Kirman, J., & Nicholson, A. (1995). Planning under time constraints in stochastic domains. Artif. Intell., 76, 35--74.

Digital Library

Google Scholar

[6]

Ferguson, D., & Stentz, A. T. (2004). Focussed dynamic programming: Extensive comparative results (Technical Report CMU-RI-TR-04-13). Robotics Institute, Carnegie Mellon University, Pittsburgh, PA.

Google Scholar

[7]

Hansen, E. A., & Zilberstein, S. (2001). LAO*: a heuristic search algorithm that finds solutions with loops. Artif. Intell., 129, 35--62.

Digital Library

Google Scholar

[8]

McMahan, H. B., & Gordon, G. J. (2005). Fast exact planning in markov decision processes. To appear in ICAPS.

Google Scholar

[9]

Ng, A. Y., Coates, A., Diel, M., Ganapathi, V., Schulte, J., Tse, B., Berger, E., & Liang, E. (2004). Inverted autonomous helicopter flight via reinforcement learning. ISER. Springer.

Google Scholar

[10]

Roy, N., Gordon, G., & Thrun, S. (2004). Finding approximate POMDP solutions through belief compression. Journal of Artificial Intelligence Research. To appear.

Digital Library

Google Scholar

[11]

Smith, T., & Simmons, R. (2004). Heuristic search value iteration for pomdps. Proc. of UAI 2004. Banff, Alberta.

Digital Library

Google Scholar

Cited By

View all

Budd MLacerda BHawes NWooldridge MDy JNatarajan S(2024)Stop! planner timeProceedings of the Thirty-Eighth AAAI Conference on Artificial Intelligence and Thirty-Sixth Conference on Innovative Applications of Artificial Intelligence and Fourteenth Symposium on Educational Advances in Artificial Intelligence10.1609/aaai.v38i18.29983(20053-20060)Online publication date: 20-Feb-2024
https://dl.acm.org/doi/10.1609/aaai.v38i18.29983
Mohagheghi M(2024)State ordering and classification for analyzing non-sparse large Markov modelsThe Journal of Supercomputing10.1007/s11227-024-06446-680:18(26140-26170)Online publication date: 21-Aug-2024
https://doi.org/10.1007/s11227-024-06446-6
Agarwal CGuha SKřetínský JPazhamalai M(2024)PAC statistical model checking of mean payoff in discrete- and continuous-time MDPFormal Methods in System Design10.1007/s10703-024-00463-0Online publication date: 17-Aug-2024
https://doi.org/10.1007/s10703-024-00463-0
Show More Cited By

Bounded real-time dynamic programming: RTDP with monotone upper bounds and performance guarantees
1. Theory of computation
  1. Design and analysis of algorithms
    1. Algorithm design techniques

Recommendations

Symbolic bounded real-time dynamic programming
SBIA'10: Proceedings of the 20th Brazilian conference on Advances in artificial intelligence

Real-time dynamic programming (RTDP) solves Markov decision processes (MDPs) when the initial state is restricted. By visiting (and updating) only a fraction of the state space, this approach can be used to solve problems with intractably large state ...
Bayesian real-time dynamic programming
IJCAI'09: Proceedings of the 21st International Joint Conference on Artificial Intelligence

Real-time dynamic programming (RTDP) solves Markov decision processes (MDPs) when the initial state is restricted, by focusing dynamic programming on the envelope of states reachable from an initial state set. RTDP often provides performance guarantees ...
Focused real-time dynamic programming for MDPs: squeezing more out of a heuristic
AAAI'06: proceedings of the 21st national conference on Artificial intelligence - Volume 2

Real-time dynamic programming (RTDP) is a heuristic search algorithm for solving MDPs. We present a modified algorithm called Focused RTDP with several improvements. While RTDP maintains only an upper bound on the long-term reward function, FRTDP ...

Comments

Information & Contributors

Information

Published In

ICML '05: Proceedings of the 22nd international conference on Machine learning

August 2005

1113 pages

ISBN:1595931805

DOI:10.1145/1102351

General Chair:
Saso Dzeroski
Jozef Stefan Institute, Slovenia
,
Program Chairs:
Luc De Raedt,
Stefan Wrobel

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 07 August 2005

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Qualifiers

Article

Acceptance Rates

Overall Acceptance Rate 140 of 548 submissions, 26%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

71
Total Citations
View Citations
403
Total Downloads

Downloads (Last 12 months)32
Downloads (Last 6 weeks)9

Reflects downloads up to 14 Feb 2025

Other Metrics

View Author Metrics

Citations

Cited By

View all

Budd MLacerda BHawes NWooldridge MDy JNatarajan S(2024)Stop! planner timeProceedings of the Thirty-Eighth AAAI Conference on Artificial Intelligence and Thirty-Sixth Conference on Innovative Applications of Artificial Intelligence and Fourteenth Symposium on Educational Advances in Artificial Intelligence10.1609/aaai.v38i18.29983(20053-20060)Online publication date: 20-Feb-2024
https://dl.acm.org/doi/10.1609/aaai.v38i18.29983
Mohagheghi M(2024)State ordering and classification for analyzing non-sparse large Markov modelsThe Journal of Supercomputing10.1007/s11227-024-06446-680:18(26140-26170)Online publication date: 21-Aug-2024
https://doi.org/10.1007/s11227-024-06446-6
Agarwal CGuha SKřetínský JPazhamalai M(2024)PAC statistical model checking of mean payoff in discrete- and continuous-time MDPFormal Methods in System Design10.1007/s10703-024-00463-0Online publication date: 17-Aug-2024
https://doi.org/10.1007/s10703-024-00463-0
Szekeres DMarussy KMajzik I(2024)A Lazy Abstraction Algorithm for Markov Decision ProcessesAnalytical and Stochastic Modelling Techniques and Applications10.1007/978-3-031-70753-7_6(81-96)Online publication date: 14-Jun-2024
https://dl.acm.org/doi/10.1007/978-3-031-70753-7_6
Mertens HKatoen JQuatmann TWinkler T(2024)Accurately Computing Expected Visiting Times and Stationary Distributions in Markov ChainsTools and Algorithms for the Construction and Analysis of Systems10.1007/978-3-031-57249-4_12(237-257)Online publication date: 5-Apr-2024
https://doi.org/10.1007/978-3-031-57249-4_12
Szekeres DMajzik I(2023)Towards Abstraction-based Probabilistic Program AnalysisActa Cybernetica10.14232/actacyb.29828726:3(671-711)Online publication date: 2-Jun-2023
https://doi.org/10.14232/actacyb.298287
Křetínský JMeggendorfer TWeininger M(2023)Stopping Criteria for Value Iteration on Stochastic Games with Quantitative Objectives2023 38th Annual ACM/IEEE Symposium on Logic in Computer Science (LICS)10.1109/LICS56636.2023.10175771(1-14)Online publication date: 26-Jun-2023
https://doi.org/10.1109/LICS56636.2023.10175771
van der Vegt MJansen NJunges S(2023)Robust Almost-Sure Reachability in Multi-Environment MDPsTools and Algorithms for the Construction and Analysis of Systems10.1007/978-3-031-30823-9_26(508-526)Online publication date: 22-Apr-2023
https://dl.acm.org/doi/10.1007/978-3-031-30823-9_26
Hartmanns AJunges SQuatmann TWeininger M(2023)A Practitioner’s Guide to MDP Model Checking AlgorithmsTools and Algorithms for the Construction and Analysis of Systems10.1007/978-3-031-30823-9_24(469-488)Online publication date: 22-Apr-2023
https://dl.acm.org/doi/10.1007/978-3-031-30823-9_24
Eisentraut JKelmendi EKřetínský JWeininger M(2022)Value iteration for simple stochastic gamesInformation and Computation10.1016/j.ic.2022.104886285:PBOnline publication date: 15-Jun-2022
https://dl.acm.org/doi/10.1016/j.ic.2022.104886
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Abstract

References

Cited By

Recommendations

Symbolic bounded real-time dynamic programming

Bayesian real-time dynamic programming

Focused real-time dynamic programming for MDPs: squeezing more out of a heuristic

Comments

Information

Published In

Publisher

Publication History

Permissions

Check for updates

Qualifiers

Acceptance Rates

Contributors

Other Metrics

Bibliometrics

Article Metrics

Other Metrics

Citations

Cited By

Login options

Full Access

View options

PDF

eReader

Share

Share this Publication link

Share on social media

Affiliations