skip to main content
10.1145/1102351.1102423acmotherconferencesArticle/Chapter ViewAbstractPublication PagesicmlConference Proceedingsconference-collections
Article

Bounded real-time dynamic programming: RTDP with monotone upper bounds and performance guarantees

Published: 07 August 2005 Publication History

Abstract

MDPs are an attractive formalization for planning, but realistic problems often have intractably large state spaces. When we only need a partial policy to get from a fixed start state to a goal, restricting computation to states relevant to this task can make much larger problems tractable. We introduce a new algorithm, Bounded RTDP, which can produce partial policies with strong performance guarantees while only touching a fraction of the state space, even on problems where other algorithms would have to visit the full state space. To do so, Bounded RTDP maintains both upper and lower bounds on the optimal value function. The performance of Bounded RTDP is greatly aided by the introduction of a new technique to efficiently find suitable upper bounds; this technique can also be used to provide informed initialization to a wide range of other planning algorithms.

References

[1]
Barto, A. G., Bradtke, S. J., & Singh, S. P. (1995). Learning to act using real-time dynamic programming. Artif. Intell., 72, 81--138.
[2]
Bertsekas, D. P., & Tsitsiklis, J. N. (1996). Neurodynamic programming. Belmont, MA: Athena Scientific.
[3]
Bonet, B., & Geffner, H. (2003a). Faster heuristic search algorithms for planning with uncertainty and full feedback. Proc. 18th International Joint Conf. on Artificial Intelligence (pp. 1233--1238). Acapulco, Mexico: Morgan Kaufmann.
[4]
Bonet, B., & Geffner, H. (2003b). Labeled RTDP: Improving the convergence of real-time dynamic programming. Proc. of ICAPS-03 (pp. 12--21).
[5]
Dean, T., Kaelbling, L. P., Kirman, J., & Nicholson, A. (1995). Planning under time constraints in stochastic domains. Artif. Intell., 76, 35--74.
[6]
Ferguson, D., & Stentz, A. T. (2004). Focussed dynamic programming: Extensive comparative results (Technical Report CMU-RI-TR-04-13). Robotics Institute, Carnegie Mellon University, Pittsburgh, PA.
[7]
Hansen, E. A., & Zilberstein, S. (2001). LAO*: a heuristic search algorithm that finds solutions with loops. Artif. Intell., 129, 35--62.
[8]
McMahan, H. B., & Gordon, G. J. (2005). Fast exact planning in markov decision processes. To appear in ICAPS.
[9]
Ng, A. Y., Coates, A., Diel, M., Ganapathi, V., Schulte, J., Tse, B., Berger, E., & Liang, E. (2004). Inverted autonomous helicopter flight via reinforcement learning. ISER. Springer.
[10]
Roy, N., Gordon, G., & Thrun, S. (2004). Finding approximate POMDP solutions through belief compression. Journal of Artificial Intelligence Research. To appear.
[11]
Smith, T., & Simmons, R. (2004). Heuristic search value iteration for pomdps. Proc. of UAI 2004. Banff, Alberta.

Cited By

View all
  • (2024)Stop! planner timeProceedings of the Thirty-Eighth AAAI Conference on Artificial Intelligence and Thirty-Sixth Conference on Innovative Applications of Artificial Intelligence and Fourteenth Symposium on Educational Advances in Artificial Intelligence10.1609/aaai.v38i18.29983(20053-20060)Online publication date: 20-Feb-2024
  • (2024)State ordering and classification for analyzing non-sparse large Markov modelsThe Journal of Supercomputing10.1007/s11227-024-06446-680:18(26140-26170)Online publication date: 21-Aug-2024
  • (2024)PAC statistical model checking of mean payoff in discrete- and continuous-time MDPFormal Methods in System Design10.1007/s10703-024-00463-0Online publication date: 17-Aug-2024
  • Show More Cited By
  1. Bounded real-time dynamic programming: RTDP with monotone upper bounds and performance guarantees

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Other conferences
    ICML '05: Proceedings of the 22nd international conference on Machine learning
    August 2005
    1113 pages
    ISBN:1595931805
    DOI:10.1145/1102351
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 07 August 2005

    Permissions

    Request permissions for this article.

    Check for updates

    Qualifiers

    • Article

    Acceptance Rates

    Overall Acceptance Rate 140 of 548 submissions, 26%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)32
    • Downloads (Last 6 weeks)9
    Reflects downloads up to 14 Feb 2025

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)Stop! planner timeProceedings of the Thirty-Eighth AAAI Conference on Artificial Intelligence and Thirty-Sixth Conference on Innovative Applications of Artificial Intelligence and Fourteenth Symposium on Educational Advances in Artificial Intelligence10.1609/aaai.v38i18.29983(20053-20060)Online publication date: 20-Feb-2024
    • (2024)State ordering and classification for analyzing non-sparse large Markov modelsThe Journal of Supercomputing10.1007/s11227-024-06446-680:18(26140-26170)Online publication date: 21-Aug-2024
    • (2024)PAC statistical model checking of mean payoff in discrete- and continuous-time MDPFormal Methods in System Design10.1007/s10703-024-00463-0Online publication date: 17-Aug-2024
    • (2024)A Lazy Abstraction Algorithm for Markov Decision ProcessesAnalytical and Stochastic Modelling Techniques and Applications10.1007/978-3-031-70753-7_6(81-96)Online publication date: 14-Jun-2024
    • (2024)Accurately Computing Expected Visiting Times and Stationary Distributions in Markov ChainsTools and Algorithms for the Construction and Analysis of Systems10.1007/978-3-031-57249-4_12(237-257)Online publication date: 5-Apr-2024
    • (2023)Towards Abstraction-based Probabilistic Program AnalysisActa Cybernetica10.14232/actacyb.29828726:3(671-711)Online publication date: 2-Jun-2023
    • (2023)Stopping Criteria for Value Iteration on Stochastic Games with Quantitative Objectives2023 38th Annual ACM/IEEE Symposium on Logic in Computer Science (LICS)10.1109/LICS56636.2023.10175771(1-14)Online publication date: 26-Jun-2023
    • (2023)Robust Almost-Sure Reachability in Multi-Environment MDPsTools and Algorithms for the Construction and Analysis of Systems10.1007/978-3-031-30823-9_26(508-526)Online publication date: 22-Apr-2023
    • (2023)A Practitioner’s Guide to MDP Model Checking AlgorithmsTools and Algorithms for the Construction and Analysis of Systems10.1007/978-3-031-30823-9_24(469-488)Online publication date: 22-Apr-2023
    • (2022)Value iteration for simple stochastic gamesInformation and Computation10.1016/j.ic.2022.104886285:PBOnline publication date: 15-Jun-2022
    • Show More Cited By

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Figures

    Tables

    Media

    Share

    Share

    Share this Publication link

    Share on social media