Skip to main content

Improving Gaussian Process Value Function Approximation in Policy Gradient Algorithms

  • Conference paper
Book cover Artificial Neural Networks and Machine Learning – ICANN 2011 (ICANN 2011)

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 6792))

Included in the following conference series:

Abstract

The use of value-function approximation in reinforcement learning (RL) problems is widely studied, the most common application of it being the extension of value-based RL methods to continuous domains. Gradient-based policy search algorithms can also benefit from the availability of an estimated value-function, as this estimation can be used for gradient variance reduction. In this article we present a new value function approximation method that uses a modified version of the Kullback–Leibler (KL) distance based sparse on-line Gaussian process regression. We combine it with Williams’ episodic REINFORCE algorithm to reduce the variance of the gradient estimates. A significant computational overload of the algorithm is caused by the need to completely re-estimate the value-function after each gradient update step. To overcome this problem we propose a measure composed of a KL distance–based score and a time dependent factor to exchange obsolete basis vectors with newly acquired measurements. This method leads to a more stable estimation of the action value-function and also reduces gradient variance. Performance and convergence comparisons are provided for the described algorithm, testing it on a dynamic system control problem with continuous state-action space.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Baird, L., Moore, A.: Gradient descent for general reinforcement learning. In: Kearns, M.S., Solla, S.A., Cohn, D.A. (eds.) NIPS 1998. Advances in Neural Information Processing Systems, vol. 11, pp. 968–974. MIT Press, Cambridge (1998)

    Google Scholar 

  2. Csató, L.: Gaussian Processes – Iterative Sparse Approximation. PhD thesis, Neural Computing Research Group  (2002)

    Google Scholar 

  3. Csató, L., Opper, M.: Sparse representation for Gaussian process models. In: Leen, T.K., Dietterich, T.G., Tresp, V. (eds.) NIPS, vol. 13, pp. 444–450. MIT Press, Cambridge (2001)

    Google Scholar 

  4. Deisenroth, M.P., Rasmussen, C.E., Peters, J.: Gaussian process dynamic programming. Neurocomputing 72(7-9), 1508–1524 (2009)

    Article  Google Scholar 

  5. Engel, Y., Mannor, S., Meir, R.: Reinforcement learning with Gaussian processes. In: Proceedings of the 22nd International Conference on Machine learning, pp. 201–208, New York (2005)

    Google Scholar 

  6. Fan, Y., Xu, J., Shelton, C.R.: Importance sampling for continuous time Bayesian networks. Journal of Machine Learning Research 11, 2115–2140 (2010)

    MathSciNet  MATH  Google Scholar 

  7. Ghavamzadeh, M., Engel, Y.: Bayesian policy gradient algorithms. In: Schölkopf, B., Platt, J., Hoffman, T. (eds.) NIPS 2007, Advances in Neural Information Processing Systems, vol. 19, pp. 457–464. MIT Press, Cambridge (2007)

    Google Scholar 

  8. Jakab, H.S., Csató, L.: Using Gaussian processes for variance reduction in policy gradient algorithms. In: 8th International Conference on Applied Informatics, Eger, pp. 55–63 (2010)

    Google Scholar 

  9. Peters, J., Schaal, S.: Reinforcement learning of motor skills with policy gradients. Neural Networks 21(4), 682–697 (2008)

    Article  Google Scholar 

  10. Puterman, M.L.: Markov Decision Processes: Discrete Stochastic Dynamic Programming. John Wiley & Sons, New York (1994)

    Google Scholar 

  11. Rasmussen, C.E., Kuss, M.: Gaussian processes in reinforcement learning. In: Saul, L.K., Thrun, S., Schlkopf, B. (eds.) NIPS 2003, Advances in Neural Information Processing Systems, pp. 751–759. MIT Press, Cambridge (2004)

    Google Scholar 

  12. Rasmussen, C.E., Williams, C.: Gaussian Processes for Machine Learning. MIT Press, Cambridge (2006)

    MATH  Google Scholar 

  13. Sugiyama, M., Hachiya, H., Towell, C., Vijayakumar, S.: Geodesic gaussian kernels for value function approximation. Auton. Robots 25, 287–304 (2008)

    Article  Google Scholar 

  14. Sutton, R.S., McAllester, D.A., Singh, S.P., Mansour, Y.: Policy gradient methods for reinforcement learning with function approximation. In: Solla, S.A., Leen, T.K., Müller, K.R. (eds.) NIPS 1999, Advances in Neural Information Processing Systems, pp. 1057–1063. MIT Press, Cambridge (1999)

    Google Scholar 

  15. Williams, R.J.: Simple statistical gradient-following algorithms for connectionist reinforcement learning. Machine Learning 8, 229–256 (1992)

    MATH  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2011 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Jakab, H., Csató, L. (2011). Improving Gaussian Process Value Function Approximation in Policy Gradient Algorithms. In: Honkela, T., Duch, W., Girolami, M., Kaski, S. (eds) Artificial Neural Networks and Machine Learning – ICANN 2011. ICANN 2011. Lecture Notes in Computer Science, vol 6792. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-21738-8_29

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-21738-8_29

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-21737-1

  • Online ISBN: 978-3-642-21738-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics