Skip to main content

Gaussian Process Reinforcement Learning

  • Reference work entry
  • First Online:
Encyclopedia of Machine Learning and Data Mining
  • 75 Accesses

Definition

Gaussian process reinforcement learning generically refers to a class of reinforcement learning (RL) algorithms that use Gaussian processes (GPs) to model and learn some aspect of the problem.

Such methods may be divided roughly into two groups:

  1. 1.

    Model-based methods: Here, GPs are used to learn the transition and reward model of the Markov decision process (MDP) underlying the RL problem. The estimated MDP model is then used to compute an approximate solution to the true MDP.

  2. 2.

    Model-free methods: Here, no explicit representation of the MDP is maintained. Rather, GPs are used to learn either the MDP’s value function, state–action value function, or some other quantity that may be used to solve the MDP.

This entry is concerned with the latter class of methods, as these constitute the majority of published research in this area.

Motivation and Background

Reinforcement learningis a class of learning problems concerned with achieving long-term goals in unfamiliar,...

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 699.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Hardcover Book
USD 949.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Recommended Reading

  • Bellman RE (1956) A problem in the sequential design of experiments. Sankhya 16:221–229

    MathSciNet  MATH  Google Scholar 

  • Bellman RE (1957) Dynamic programming. Princeton University Press, Princeton

    MATH  Google Scholar 

  • Bertsekas DP (1995) Dynamic programming and optimal control. Athena Scientific, Belmont

    MATH  Google Scholar 

  • Bertsekas DP, Tsitsiklis JN (1996) Neuro-dynamic programming. Athena Scientific, Belmont

    MATH  Google Scholar 

  • Boyan JA (1999) Least-squares temporal difference learning. In: Proceedings of the 16th international conference on machine learning, Bled. Morgan Kaufmann, San Francisco, pp 49–56

    Google Scholar 

  • Bradtke SJ, Barto AG (1996) Linear least-squares algorithms for temporal difference learning. Mach Learn 22:33–57

    MATH  Google Scholar 

  • Dearden R, Friedman N, Andre D (1999) Model based Bayesian exploration. In: Proceedings of the fifteenth conference on uncertainty in artificial intelligence, Stockholm. Morgan Kaufmann, San Francisco, pp 150–159

    Google Scholar 

  • Dearden R, Friedman N, Russell S (1998) Bayesian Q-learning. In: Proceedings of the fifteenth national conference on artificial intelligence, Madison. AAAI, Menlo Park, pp 761–768

    Google Scholar 

  • Duff M (2002) Optimal learning: computational procedures for Bayes-adaptive Markov decision processes. PhD thesis, University of Massachusetts, Amherst

    Google Scholar 

  • Engel Y (2005) Algorithms and representations for reinforcement learning. PhD thesis, The Hebrew University of Jerusalem

    Google Scholar 

  • Engel Y, Mannor S, Meir R (2003) Bayes meets Bellman: the Gaussian process approach to temporal difference learning. In: Proceedings of the 20th international conference on machine learning, Washington, DC. Morgan Kaufmann, San Francisco

    Google Scholar 

  • Engel Y, Mannor S, Meir R (2005) Reinforcement learning with Gaussian processes. In: Proceedings of the 22nd international conference on machine learning, Bonn

    Google Scholar 

  • Engel Y, Szabo P, Volkinshtein D (2005) Learning to control an Octopus arm with Gaussian process temporal difference methods. Technical report, Technion Institute of Technology. www.cs.ualberta.ca/~yaki/reports/octopus.pdf

    Google Scholar 

  • Ghavamzadeh M, Engel Y (2007) Bayesian actor-critic algorithms. In: Ghahramani Z (ed) 24th international conference on machine learning, Corvalis. Omnipress, Corvallis

    Google Scholar 

  • Howard R (1960) Dynamic programming and Markov processes. MIT, Cambridge

    MATH  Google Scholar 

  • Kaelbling LP, Littman ML, Cassandra AR (1998) Planning and acting in partially observable stochastic domains. Artif Intell 101:99–134

    Article  MathSciNet  MATH  Google Scholar 

  • Kushner HJ, Yin CJ (1997) Stochastic approximation algorithms and applications. Springer, Berlin

    Book  MATH  Google Scholar 

  • Littman ML (1994) Markov games as a framework for multi-agent reinforcement learning. In: Proceedings of the 11th international conference on machine learning (ICML-94), New Brunswick. Morgan Kaufmann, New Brunswick, pp 157–163

    Google Scholar 

  • Mannor S, Simester D, Sun P, Tsitsiklis JN (2004) Bias and variance in value function estimation. In: Proceedings of the 21st international conference on machine learning, Banff

    Google Scholar 

  • Poupart P, Vlassis NA, Hoey J, Regan K (2006) An analytic solution to discrete Bayesian reinforcement learning. In: Proceedings of the twenty-third international conference on machine learning, Pittsburgh, pp 697–704

    Google Scholar 

  • Puterman ML (1994) Markov decision processes: discrete stochastic dynamic programming. Wiley, New York

    Book  MATH  Google Scholar 

  • Rummery G, Niranjan M (1994) On-line Q-learning using connectionist systems. Technical report CUED/F-INFENG/TR 166, Cambridge University Engineering Department

    Google Scholar 

  • Strens M (2000) A Bayesian framework for reinforcement learning. In: Proceedings of the 17th international conference on machine learning, Stanford. Morgan Kaufmann, San Francisco, pp 943–950

    Google Scholar 

  • Sutton RS (1984) Temporal credit assignment in reinforcement learning. PhD thesis, University of Massachusetts, Amherst

    Google Scholar 

  • Sutton RS, Barto AG (1998) Reinforcement learning: an introduction. MIT, Cambridge

    Google Scholar 

  • Tsitsiklis JN, Van Roy B (1996) An analysis of temporal-difference learning with function approximation. Technical report LIDS-P-2322. MIT, Cambridge

    Google Scholar 

  • Wang T, Lizotte D, Bowling M, Schuurmans D (2005) Bayesian sparse sampling for on-line reward optimization. In: Proceedings of the 22nd international conference on machine learning, Bonn. ACM, New York, pp 956–963

    Google Scholar 

  • Watkins CJCH (1989) Learning from delayed rewards. PhD thesis, King’s College, Cambridge

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Yaakov Engel .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2017 Springer Science+Business Media New York

About this entry

Cite this entry

Engel, Y. (2017). Gaussian Process Reinforcement Learning. In: Sammut, C., Webb, G.I. (eds) Encyclopedia of Machine Learning and Data Mining. Springer, Boston, MA. https://doi.org/10.1007/978-1-4899-7687-1_109

Download citation

Publish with us

Policies and ethics