Article

Bayesian sparse sampling for on-line reward optimization

Authors:

Daniel Lizotte,

Michael Bowling,

Dale SchuurmansAuthors Info & Claims

ICML '05: Proceedings of the 22nd international conference on Machine learning

Pages 956 - 963

https://doi.org/10.1145/1102351.1102472

Published: 07 August 2005 Publication History

Abstract

We present an efficient "sparse sampling" technique for approximating Bayes optimal decision making in reinforcement learning, addressing the well known exploration versus exploitation tradeoff. Our approach combines sparse sampling with Bayesian exploration to achieve improved decision making while controlling computational cost. The idea is to grow a sparse lookahead tree, intelligently, by exploiting information in a Bayesian posterior---rather than enumerate action branches (standard sparse sampling) or compensate myopically (value of perfect information). The outcome is a flexible, practical technique for improving action selection in simple reinforcement learning scenarios.

References

[1]

Bellman, R. (1961). Adaptive control processes. Princeton.

[2]

Berry, D., & Fristedt, B. (1985). Bandit problems. Chapman Hall.

[3]

Bertsekas, D. (1995). Dynamic programming and optimal control, vol. 2. Athena Scientific.

Digital Library

[4]

Bertsekas, D., & Tsitsiklis, J. (1996). Neuro-dynamic programming. Athena Scientific.

Digital Library

[5]

Boyan, J., & Moore, A. (1996). Learning evaluation functions for large acyclic domains. Proceedings ICML.

[6]

Brafman, R., & Tennenholtz, M. (2001). R-MAX - a general polynomial time algorithm for near-optimal reinforcement learning. Proceedings IJCAI.

Digital Library

[7]

Dearden, R., Friedman, N., & Andre, D. (1999). Model based Bayesian exploration. Proceedings UAI.

Digital Library

[8]

Duff, M. (2002). Optimal learning: Computational procedures for Bayes-adaptive Markov decision processes. Doctoral dissertation, U. Mass.

Digital Library

[9]

Engel, Y., Mannor, S., & Meir, R. (2003). Bayes meets Bellman: The Gaussian process approach to temporal difference learning. Proceedings ICML.

[10]

Gittins, J. (1989). Multi-armed bandit allocation indices. Wiley.

[11]

Jordan, M. (Ed.). (1999). Learning in graphical models. MIT Press.

Digital Library

[12]

Kaelbling, L. P. (1994). Associative reinforcement learning: Functions in k-DNF. Machine Learning, 15, 279--298.

Digital Library

[13]

Kearns, M., Mansour, Y., & Ng, A. (2001). A sparse sampling algorithm for near-optimal planning in large markov decision processes. JMLR, 1324--1331.

Digital Library

[14]

Kearns, M., & Singh, S. (1998). Near-optimal reinforcement learning in polynomial time. Proceedings ICML.

Digital Library

[15]

Lusena, C., Goldsmith, J., & Mundhenk, M. (2001). Nonapproximability results for partially observable Markov decision processes. JAIR, 14, 83--103.

Digital Library

[16]

Martin, J. (1967). Bayesian decision problems and Markov chains. Wiley.

[17]

Mundhenk, M., Goldsmith, J., Lusena, C., & Allender, E. (2000). Complexity of finite-horizon Markov decision processes. JACM, 47, 681--720.

Digital Library

[18]

Neal, R. (Ed.). (1996). Bayesian learning for neural networks. Springer.

Digital Library

[19]

Ng, A., & Jordan, M. (2000). Pegasus: A policy search method for large MDPs and POMDPs. Proceedings UAI.

Digital Library

[20]

Péret, L., & Garcia, F. (2004). On-line search for solving Markov decision processes via heuristic sampling. Proceedings ECAI

[21]

Salganicoff, M., & Ungar, L. (1995). Active exploration and learning in real-valued spaces using multi-armed bandit allocation indices. Proceedings ICML.

[22]

Strens, M. (2000). A Bayesian framework for reinforcement learning. Proceedings ICML.

Digital Library

[23]

Strens, M., & Moore, A. (2002). Policy search using paired comparisons. JMLR, 3, 921--950.

Digital Library

[24]

Sutton, R., & Barto, A. (1998). Reinforcement learning: An introduction. MIT Press.

Digital Library

[25]

Thompson, W. R. (1933). On the likelihood that one unknown probability exceeds another in view of the evidence of two samples. Biometrika, 25, 285--294.

[26]

Watkins, C. (1989). Learning from delayed rewards. Doctoral dissertation, King's College Cambridge.

[27]

Wiering, M. (1999). Explorations in efficient reinforcement learning. Doctoral dissertation, Univ. Amsterdam.

[28]

Williams, C. (1999). Prediction with Gaussian processes. In Learning in graphical models. MIT Press.

Digital Library

[29]

Wyatt, J. (2001). Exploration control in reinforcement learning using optimistic model selection. Proc. ICML.

Digital Library

Cited By

Jesson ALu CGupta GBeltran-Velez NFilos AFoerster JGal YSalakhutdinov RKolter ZHeller KWeller AOliver NScarlett JBerkenkamp F(2024)ReLU to the rescueProceedings of the 41st International Conference on Machine Learning10.5555/3692070.3692936(21577-21605)Online publication date: 21-Jul-2024
https://dl.acm.org/doi/10.5555/3692070.3692936
Wang YZhou EOh ANaumann TGloberson ASaenko KHardt MLevine S(2023)Bayesian risk-averse Q-learning with streaming observationsProceedings of the 37th International Conference on Neural Information Processing Systems10.5555/3666122.3669441(75967-75992)Online publication date: 10-Dec-2023
https://dl.acm.org/doi/10.5555/3666122.3669441
Arumugam DSingh SKoyejo SMohamed SAgarwal ABelgrave DCho KOh A(2022)Planning to the information horizon of BAMDPs via epistemic state abstractionProceedings of the 36th International Conference on Neural Information Processing Systems10.5555/3600270.3601759(20482-20497)Online publication date: 28-Nov-2022
https://dl.acm.org/doi/10.5555/3600270.3601759
Show More Cited By

Bayesian sparse sampling for on-line reward optimization
1. Computing methodologies

Recommendations

Sparse Bayesian learning with automatic-weighting Laplace priors for sparse signal recovery
Abstract
The least absolute shrinkage and selection operator (LASSO) and its variants are widely used for sparse signal recovery. However, the determination of the regularization factor requires cross-validation strategy, which may obtain a sub-optimal ...
L(1)-norm sparse bayesian learning: theory and applications
Bayesian methods for finding sparse representations

Comments

Information & Contributors

Information

Published In

cover image ACM Other conferences

ICML '05: Proceedings of the 22nd international conference on Machine learning

August 2005

1113 pages

ISBN:1595931805

DOI:10.1145/1102351

General Chair:
Saso Dzeroski
Jozef Stefan Institute, Slovenia
,
Program Chairs:
Luc De Raedt,
Stefan Wrobel

Copyright © 2005 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 07 August 2005

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Qualifiers

Article

Acceptance Rates

Overall Acceptance Rate 140 of 548 submissions, 26%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

61
Total Citations
View Citations
590
Total Downloads

Downloads (Last 12 months)30
Downloads (Last 6 weeks)2

Reflects downloads up to 15 Feb 2025

Other Metrics

View Author Metrics

Citations

Cited By

Jesson ALu CGupta GBeltran-Velez NFilos AFoerster JGal YSalakhutdinov RKolter ZHeller KWeller AOliver NScarlett JBerkenkamp F(2024)ReLU to the rescueProceedings of the 41st International Conference on Machine Learning10.5555/3692070.3692936(21577-21605)Online publication date: 21-Jul-2024
https://dl.acm.org/doi/10.5555/3692070.3692936
Wang YZhou EOh ANaumann TGloberson ASaenko KHardt MLevine S(2023)Bayesian risk-averse Q-learning with streaming observationsProceedings of the 37th International Conference on Neural Information Processing Systems10.5555/3666122.3669441(75967-75992)Online publication date: 10-Dec-2023
https://dl.acm.org/doi/10.5555/3666122.3669441
Arumugam DSingh SKoyejo SMohamed SAgarwal ABelgrave DCho KOh A(2022)Planning to the information horizon of BAMDPs via epistemic state abstractionProceedings of the 36th International Conference on Neural Information Processing Systems10.5555/3600270.3601759(20482-20497)Online publication date: 28-Nov-2022
https://dl.acm.org/doi/10.5555/3600270.3601759
Arumugam DVan Roy BKoyejo SMohamed SAgarwal ABelgrave DCho KOh A(2022)Deciding what to modelProceedings of the 36th International Conference on Neural Information Processing Systems10.5555/3600270.3600926(9024-9044)Online publication date: 28-Nov-2022
https://dl.acm.org/doi/10.5555/3600270.3600926
Dimitrakakis COrtner RDimitrakakis COrtner R(2022)Bayesian Reinforcement LearningDecision Making Under Uncertainty and Reinforcement Learning10.1007/978-3-031-07614-5_9(197-220)Online publication date: 3-Dec-2022
https://doi.org/10.1007/978-3-031-07614-5_9
Imani MGhoreishi SBraga-Neto U(2018)Bayesian control of large MDPs with unknown dynamics in data-poor environmentsProceedings of the 32nd International Conference on Neural Information Processing Systems10.5555/3327757.3327909(8157-8167)Online publication date: 3-Dec-2018
https://dl.acm.org/doi/10.5555/3327757.3327909
Chen ZLiu B(2018)Lifelong Machine Learning, Second EditionSynthesis Lectures on Artificial Intelligence and Machine Learning10.2200/S00832ED1V01Y201802AIM03712:3(1-207)Online publication date: 14-Aug-2018
https://doi.org/10.2200/S00832ED1V01Y201802AIM037
Bai AWu FChen X(2018)Posterior sampling for Monte Carlo planning under uncertaintyApplied Intelligence10.1007/s10489-018-1248-548:12(4998-5018)Online publication date: 1-Dec-2018
https://dl.acm.org/doi/10.1007/s10489-018-1248-5
Xiong FLiu ZYang XSun BChiu CQiao H(2017)A Bayesian Posterior Updating Algorithm in Reinforcement LearningNeural Information Processing10.1007/978-3-319-70139-4_42(418-426)Online publication date: 29-Oct-2017
https://doi.org/10.1007/978-3-319-70139-4_42
Engel Y(2017)Gaussian Process Reinforcement LearningEncyclopedia of Machine Learning and Data Mining10.1007/978-1-4899-7687-1_109(548-556)Online publication date: 14-Apr-2017
https://doi.org/10.1007/978-1-4899-7687-1_109
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Table of Conten