skip to main content
10.1145/1102351.1102472acmotherconferencesArticle/Chapter ViewAbstractPublication PagesicmlConference Proceedingsconference-collections
Article

Bayesian sparse sampling for on-line reward optimization

Published: 07 August 2005 Publication History

Abstract

We present an efficient "sparse sampling" technique for approximating Bayes optimal decision making in reinforcement learning, addressing the well known exploration versus exploitation tradeoff. Our approach combines sparse sampling with Bayesian exploration to achieve improved decision making while controlling computational cost. The idea is to grow a sparse lookahead tree, intelligently, by exploiting information in a Bayesian posterior---rather than enumerate action branches (standard sparse sampling) or compensate myopically (value of perfect information). The outcome is a flexible, practical technique for improving action selection in simple reinforcement learning scenarios.

References

[1]
Bellman, R. (1961). Adaptive control processes. Princeton.
[2]
Berry, D., & Fristedt, B. (1985). Bandit problems. Chapman Hall.
[3]
Bertsekas, D. (1995). Dynamic programming and optimal control, vol. 2. Athena Scientific.
[4]
Bertsekas, D., & Tsitsiklis, J. (1996). Neuro-dynamic programming. Athena Scientific.
[5]
Boyan, J., & Moore, A. (1996). Learning evaluation functions for large acyclic domains. Proceedings ICML.
[6]
Brafman, R., & Tennenholtz, M. (2001). R-MAX - a general polynomial time algorithm for near-optimal reinforcement learning. Proceedings IJCAI.
[7]
Dearden, R., Friedman, N., & Andre, D. (1999). Model based Bayesian exploration. Proceedings UAI.
[8]
Duff, M. (2002). Optimal learning: Computational procedures for Bayes-adaptive Markov decision processes. Doctoral dissertation, U. Mass.
[9]
Engel, Y., Mannor, S., & Meir, R. (2003). Bayes meets Bellman: The Gaussian process approach to temporal difference learning. Proceedings ICML.
[10]
Gittins, J. (1989). Multi-armed bandit allocation indices. Wiley.
[11]
Jordan, M. (Ed.). (1999). Learning in graphical models. MIT Press.
[12]
Kaelbling, L. P. (1994). Associative reinforcement learning: Functions in k-DNF. Machine Learning, 15, 279--298.
[13]
Kearns, M., Mansour, Y., & Ng, A. (2001). A sparse sampling algorithm for near-optimal planning in large markov decision processes. JMLR, 1324--1331.
[14]
Kearns, M., & Singh, S. (1998). Near-optimal reinforcement learning in polynomial time. Proceedings ICML.
[15]
Lusena, C., Goldsmith, J., & Mundhenk, M. (2001). Nonapproximability results for partially observable Markov decision processes. JAIR, 14, 83--103.
[16]
Martin, J. (1967). Bayesian decision problems and Markov chains. Wiley.
[17]
Mundhenk, M., Goldsmith, J., Lusena, C., & Allender, E. (2000). Complexity of finite-horizon Markov decision processes. JACM, 47, 681--720.
[18]
Neal, R. (Ed.). (1996). Bayesian learning for neural networks. Springer.
[19]
Ng, A., & Jordan, M. (2000). Pegasus: A policy search method for large MDPs and POMDPs. Proceedings UAI.
[20]
Péret, L., & Garcia, F. (2004). On-line search for solving Markov decision processes via heuristic sampling. Proceedings ECAI
[21]
Salganicoff, M., & Ungar, L. (1995). Active exploration and learning in real-valued spaces using multi-armed bandit allocation indices. Proceedings ICML.
[22]
Strens, M. (2000). A Bayesian framework for reinforcement learning. Proceedings ICML.
[23]
Strens, M., & Moore, A. (2002). Policy search using paired comparisons. JMLR, 3, 921--950.
[24]
Sutton, R., & Barto, A. (1998). Reinforcement learning: An introduction. MIT Press.
[25]
Thompson, W. R. (1933). On the likelihood that one unknown probability exceeds another in view of the evidence of two samples. Biometrika, 25, 285--294.
[26]
Watkins, C. (1989). Learning from delayed rewards. Doctoral dissertation, King's College Cambridge.
[27]
Wiering, M. (1999). Explorations in efficient reinforcement learning. Doctoral dissertation, Univ. Amsterdam.
[28]
Williams, C. (1999). Prediction with Gaussian processes. In Learning in graphical models. MIT Press.
[29]
Wyatt, J. (2001). Exploration control in reinforcement learning using optimistic model selection. Proc. ICML.

Cited By

View all
  • (2024)ReLU to the rescueProceedings of the 41st International Conference on Machine Learning10.5555/3692070.3692936(21577-21605)Online publication date: 21-Jul-2024
  • (2023)Bayesian risk-averse Q-learning with streaming observationsProceedings of the 37th International Conference on Neural Information Processing Systems10.5555/3666122.3669441(75967-75992)Online publication date: 10-Dec-2023
  • (2022)Planning to the information horizon of BAMDPs via epistemic state abstractionProceedings of the 36th International Conference on Neural Information Processing Systems10.5555/3600270.3601759(20482-20497)Online publication date: 28-Nov-2022
  • Show More Cited By
  1. Bayesian sparse sampling for on-line reward optimization

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Other conferences
    ICML '05: Proceedings of the 22nd international conference on Machine learning
    August 2005
    1113 pages
    ISBN:1595931805
    DOI:10.1145/1102351
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 07 August 2005

    Permissions

    Request permissions for this article.

    Check for updates

    Qualifiers

    • Article

    Acceptance Rates

    Overall Acceptance Rate 140 of 548 submissions, 26%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)30
    • Downloads (Last 6 weeks)2
    Reflects downloads up to 15 Feb 2025

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)ReLU to the rescueProceedings of the 41st International Conference on Machine Learning10.5555/3692070.3692936(21577-21605)Online publication date: 21-Jul-2024
    • (2023)Bayesian risk-averse Q-learning with streaming observationsProceedings of the 37th International Conference on Neural Information Processing Systems10.5555/3666122.3669441(75967-75992)Online publication date: 10-Dec-2023
    • (2022)Planning to the information horizon of BAMDPs via epistemic state abstractionProceedings of the 36th International Conference on Neural Information Processing Systems10.5555/3600270.3601759(20482-20497)Online publication date: 28-Nov-2022
    • (2022)Deciding what to modelProceedings of the 36th International Conference on Neural Information Processing Systems10.5555/3600270.3600926(9024-9044)Online publication date: 28-Nov-2022
    • (2022)Bayesian Reinforcement LearningDecision Making Under Uncertainty and Reinforcement Learning10.1007/978-3-031-07614-5_9(197-220)Online publication date: 3-Dec-2022
    • (2018)Bayesian control of large MDPs with unknown dynamics in data-poor environmentsProceedings of the 32nd International Conference on Neural Information Processing Systems10.5555/3327757.3327909(8157-8167)Online publication date: 3-Dec-2018
    • (2018)Lifelong Machine Learning, Second EditionSynthesis Lectures on Artificial Intelligence and Machine Learning10.2200/S00832ED1V01Y201802AIM03712:3(1-207)Online publication date: 14-Aug-2018
    • (2018)Posterior sampling for Monte Carlo planning under uncertaintyApplied Intelligence10.1007/s10489-018-1248-548:12(4998-5018)Online publication date: 1-Dec-2018
    • (2017)A Bayesian Posterior Updating Algorithm in Reinforcement LearningNeural Information Processing10.1007/978-3-319-70139-4_42(418-426)Online publication date: 29-Oct-2017
    • (2017)Gaussian Process Reinforcement LearningEncyclopedia of Machine Learning and Data Mining10.1007/978-1-4899-7687-1_109(548-556)Online publication date: 14-Apr-2017
    • Show More Cited By

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Figures

    Tables

    Media

    Share

    Share

    Share this Publication link

    Share on social media