Definition
Gaussian process reinforcement learning generically refers to a class of reinforcement learning (RL) algorithms that use Gaussian processes (GPs) to model and learn some aspect of the problem.
Such methods may be divided roughly into two groups:
- 1.
Model-based methods: Here, GPs are used to learn the transition and reward model of the Markov decision process (MDP) underlying the RL problem. The estimated MDP model is then used to compute an approximate solution to the true MDP.
- 2.
Model-free methods: Here, no explicit representation of the MDP is maintained. Rather, GPs are used to learn either the MDP’s value function, state–action value function, or some other quantity that may be used to solve the MDP.
This entry is concerned with the latter class of methods, as these constitute the majority of published research in this area.
Motivation and Background
Reinforcement learningis a class of learning problems concerned with achieving long-term goals in unfamiliar,...
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Recommended Reading
Bellman RE (1956) A problem in the sequential design of experiments. Sankhya 16:221–229
Bellman RE (1957) Dynamic programming. Princeton University Press, Princeton
Bertsekas DP (1995) Dynamic programming and optimal control. Athena Scientific, Belmont
Bertsekas DP, Tsitsiklis JN (1996) Neuro-dynamic programming. Athena Scientific, Belmont
Boyan JA (1999) Least-squares temporal difference learning. In: Proceedings of the 16th international conference on machine learning, Bled. Morgan Kaufmann, San Francisco, pp 49–56
Bradtke SJ, Barto AG (1996) Linear least-squares algorithms for temporal difference learning. Mach Learn 22:33–57
Dearden R, Friedman N, Andre D (1999) Model based Bayesian exploration. In: Proceedings of the fifteenth conference on uncertainty in artificial intelligence, Stockholm. Morgan Kaufmann, San Francisco, pp 150–159
Dearden R, Friedman N, Russell S (1998) Bayesian Q-learning. In: Proceedings of the fifteenth national conference on artificial intelligence, Madison. AAAI, Menlo Park, pp 761–768
Duff M (2002) Optimal learning: computational procedures for Bayes-adaptive Markov decision processes. PhD thesis, University of Massachusetts, Amherst
Engel Y (2005) Algorithms and representations for reinforcement learning. PhD thesis, The Hebrew University of Jerusalem
Engel Y, Mannor S, Meir R (2003) Bayes meets Bellman: the Gaussian process approach to temporal difference learning. In: Proceedings of the 20th international conference on machine learning, Washington, DC. Morgan Kaufmann, San Francisco
Engel Y, Mannor S, Meir R (2005) Reinforcement learning with Gaussian processes. In: Proceedings of the 22nd international conference on machine learning, Bonn
Engel Y, Szabo P, Volkinshtein D (2005) Learning to control an Octopus arm with Gaussian process temporal difference methods. Technical report, Technion Institute of Technology. www.cs.ualberta.ca/~yaki/reports/octopus.pdf
Ghavamzadeh M, Engel Y (2007) Bayesian actor-critic algorithms. In: Ghahramani Z (ed) 24th international conference on machine learning, Corvalis. Omnipress, Corvallis
Howard R (1960) Dynamic programming and Markov processes. MIT, Cambridge
Kaelbling LP, Littman ML, Cassandra AR (1998) Planning and acting in partially observable stochastic domains. Artif Intell 101:99–134
Kushner HJ, Yin CJ (1997) Stochastic approximation algorithms and applications. Springer, Berlin
Littman ML (1994) Markov games as a framework for multi-agent reinforcement learning. In: Proceedings of the 11th international conference on machine learning (ICML-94), New Brunswick. Morgan Kaufmann, New Brunswick, pp 157–163
Mannor S, Simester D, Sun P, Tsitsiklis JN (2004) Bias and variance in value function estimation. In: Proceedings of the 21st international conference on machine learning, Banff
Poupart P, Vlassis NA, Hoey J, Regan K (2006) An analytic solution to discrete Bayesian reinforcement learning. In: Proceedings of the twenty-third international conference on machine learning, Pittsburgh, pp 697–704
Puterman ML (1994) Markov decision processes: discrete stochastic dynamic programming. Wiley, New York
Rummery G, Niranjan M (1994) On-line Q-learning using connectionist systems. Technical report CUED/F-INFENG/TR 166, Cambridge University Engineering Department
Strens M (2000) A Bayesian framework for reinforcement learning. In: Proceedings of the 17th international conference on machine learning, Stanford. Morgan Kaufmann, San Francisco, pp 943–950
Sutton RS (1984) Temporal credit assignment in reinforcement learning. PhD thesis, University of Massachusetts, Amherst
Sutton RS, Barto AG (1998) Reinforcement learning: an introduction. MIT, Cambridge
Tsitsiklis JN, Van Roy B (1996) An analysis of temporal-difference learning with function approximation. Technical report LIDS-P-2322. MIT, Cambridge
Wang T, Lizotte D, Bowling M, Schuurmans D (2005) Bayesian sparse sampling for on-line reward optimization. In: Proceedings of the 22nd international conference on machine learning, Bonn. ACM, New York, pp 956–963
Watkins CJCH (1989) Learning from delayed rewards. PhD thesis, King’s College, Cambridge
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2017 Springer Science+Business Media New York
About this entry
Cite this entry
Engel, Y. (2017). Gaussian Process Reinforcement Learning. In: Sammut, C., Webb, G.I. (eds) Encyclopedia of Machine Learning and Data Mining. Springer, Boston, MA. https://doi.org/10.1007/978-1-4899-7687-1_109
Download citation
DOI: https://doi.org/10.1007/978-1-4899-7687-1_109
Published:
Publisher Name: Springer, Boston, MA
Print ISBN: 978-1-4899-7685-7
Online ISBN: 978-1-4899-7687-1
eBook Packages: Computer ScienceReference Module Computer Science and Engineering