Improving Gaussian Process Value Function Approximation in Policy Gradient Algorithms

Jakab, Hunor; Csató, Lehel

doi:10.1007/978-3-642-21738-8_29

Hunor Jakab^19,20 &
Lehel Csató^19,20

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 6792))

Included in the following conference series:

International Conference on Artificial Neural Networks

2382 Accesses
2 Citations

Abstract

The use of value-function approximation in reinforcement learning (RL) problems is widely studied, the most common application of it being the extension of value-based RL methods to continuous domains. Gradient-based policy search algorithms can also benefit from the availability of an estimated value-function, as this estimation can be used for gradient variance reduction. In this article we present a new value function approximation method that uses a modified version of the Kullback–Leibler (KL) distance based sparse on-line Gaussian process regression. We combine it with Williams’ episodic REINFORCE algorithm to reduce the variance of the gradient estimates. A significant computational overload of the algorithm is caused by the need to completely re-estimate the value-function after each gradient update step. To overcome this problem we propose a measure composed of a KL distance–based score and a time dependent factor to exchange obsolete basis vectors with newly acquired measurements. This method leads to a more stable estimation of the action value-function and also reduces gradient variance. Performance and convergence comparisons are provided for the described algorithm, testing it on a dynamic system control problem with continuous state-action space.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Baird, L., Moore, A.: Gradient descent for general reinforcement learning. In: Kearns, M.S., Solla, S.A., Cohn, D.A. (eds.) NIPS 1998. Advances in Neural Information Processing Systems, vol. 11, pp. 968–974. MIT Press, Cambridge (1998)
Google Scholar
Csató, L.: Gaussian Processes – Iterative Sparse Approximation. PhD thesis, Neural Computing Research Group (2002)
Google Scholar
Csató, L., Opper, M.: Sparse representation for Gaussian process models. In: Leen, T.K., Dietterich, T.G., Tresp, V. (eds.) NIPS, vol. 13, pp. 444–450. MIT Press, Cambridge (2001)
Google Scholar
Deisenroth, M.P., Rasmussen, C.E., Peters, J.: Gaussian process dynamic programming. Neurocomputing 72(7-9), 1508–1524 (2009)
Article Google Scholar
Engel, Y., Mannor, S., Meir, R.: Reinforcement learning with Gaussian processes. In: Proceedings of the 22nd International Conference on Machine learning, pp. 201–208, New York (2005)
Google Scholar
Fan, Y., Xu, J., Shelton, C.R.: Importance sampling for continuous time Bayesian networks. Journal of Machine Learning Research 11, 2115–2140 (2010)
MathSciNet MATH Google Scholar
Ghavamzadeh, M., Engel, Y.: Bayesian policy gradient algorithms. In: Schölkopf, B., Platt, J., Hoffman, T. (eds.) NIPS 2007, Advances in Neural Information Processing Systems, vol. 19, pp. 457–464. MIT Press, Cambridge (2007)
Google Scholar
Jakab, H.S., Csató, L.: Using Gaussian processes for variance reduction in policy gradient algorithms. In: 8th International Conference on Applied Informatics, Eger, pp. 55–63 (2010)
Google Scholar
Peters, J., Schaal, S.: Reinforcement learning of motor skills with policy gradients. Neural Networks 21(4), 682–697 (2008)
Article Google Scholar
Puterman, M.L.: Markov Decision Processes: Discrete Stochastic Dynamic Programming. John Wiley & Sons, New York (1994)
Google Scholar
Rasmussen, C.E., Kuss, M.: Gaussian processes in reinforcement learning. In: Saul, L.K., Thrun, S., Schlkopf, B. (eds.) NIPS 2003, Advances in Neural Information Processing Systems, pp. 751–759. MIT Press, Cambridge (2004)
Google Scholar
Rasmussen, C.E., Williams, C.: Gaussian Processes for Machine Learning. MIT Press, Cambridge (2006)
MATH Google Scholar
Sugiyama, M., Hachiya, H., Towell, C., Vijayakumar, S.: Geodesic gaussian kernels for value function approximation. Auton. Robots 25, 287–304 (2008)
Article Google Scholar
Sutton, R.S., McAllester, D.A., Singh, S.P., Mansour, Y.: Policy gradient methods for reinforcement learning with function approximation. In: Solla, S.A., Leen, T.K., Müller, K.R. (eds.) NIPS 1999, Advances in Neural Information Processing Systems, pp. 1057–1063. MIT Press, Cambridge (1999)
Google Scholar
Williams, R.J.: Simple statistical gradient-following algorithms for connectionist reinforcement learning. Machine Learning 8, 229–256 (1992)
MATH Google Scholar

Download references

Author information

Authors and Affiliations

Babeş-Bolyai University, Kogălniceanu str 1, Cluj-Napoca, Romania, RO, 400084
Hunor Jakab & Lehel Csató
Eötvös Loránd University, Pázmány P. Sétány 1/C, Budapest, Hungary, H-1117
Hunor Jakab & Lehel Csató

Authors

Hunor Jakab
View author publications
You can also search for this author in PubMed Google Scholar
Lehel Csató
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Department of Information and Computer Science, Aalto University School of Science, P.O. Box 15400, 00076, Aalto, Finland
Timo Honkela & Samuel Kaski &
School of Physics, Astronomy and Informatics, Department of Informatics, Nicolaus Copernicus University, ul. Grudziadzka 5, 87-100, Torun, Poland
Włodzisław Duch
Department of Statistical Science, University College London, 1-19 Torrington Place, WC1E 7HB, London, UK
Mark Girolami

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Jakab, H., Csató, L. (2011). Improving Gaussian Process Value Function Approximation in Policy Gradient Algorithms. In: Honkela, T., Duch, W., Girolami, M., Kaski, S. (eds) Artificial Neural Networks and Machine Learning – ICANN 2011. ICANN 2011. Lecture Notes in Computer Science, vol 6792. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-21738-8_29

Download citation

DOI: https://doi.org/10.1007/978-3-642-21738-8_29
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-21737-1
Online ISBN: 978-3-642-21738-8
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics