Abstract
We consider the framework of a set of recently proposed two-timescale actor-critic algorithms for reinforcement-learning (RL) using the long-run average-reward criterion and linear feature-based value-function approximation. The actor and critic updates are based on stochastic policy-gradient ascent and temporal-difference algorithms, respectively. Unlike conventional RL algorithms, policy-gradient-based algorithms guarantee convergence even with value-function approximation but suffer due to high variance of the policy-gradient estimator. To minimize this variance for an existing algorithm, we derive a stochasticgradient-based novel critic update. We propose a novel baseline structure for variance minimization of an estimator and derive an optimal baseline which makes the covariance matrix a zero matrix – the best achievable. We derive a novel actor update based on the optimal baseline deduced for an existing algorithm. We derive another novel actor update using the optimal baseline for an unbiased policy-gradient estimator which we deduce from the Policy-Gradient Theorem with Function Approximation. We obtain a novel variance-minimization-based interpretation for an existing algorithm. The computational results demonstrate that the proposed algorithms outperform the state-of-the-art on Garnet problems.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
D.P. Bertsekas and J.N. Tsitsiklis, Neuro-Dynamic Programming, Athena Scientific, MA, 1996.
L. Baird, “Residual algorithms: reinforcement learning with function approximation”, Proc. 12 th International Conf. on Machine Learning, 1995, pp30-37.
R. Sutton, D. McAllester, S. Singh and Y. Mansour, “Policy gradient methods for reinforcement learning with function approximation”, Adv. in Neural Info. Proc. Systems, 2000, 12:1057-1063.
P. Marbach and J.N. Tsitsiklis, “Simulation-based optimization of Markov reward processes”, IEEE Trans. on Automatic Control, 2001, 46:191-209.
J. Baxter and P.L. Bartlett, “Infinite-horizon policy-gradient estimation”, Journal of Artificial Intelligence Research, 2001, 15:319-350.
E. Greensmith, P.L. Bartlett and J. Baxter, “Variance reduction techniques for gradient estimates in reinforcement learning” Journal of Machine Learning Research, 2004, Vol. 5, pp. 1471–1530.
S. Bhatnagar, R.S. Sutton, M. Ghavamzadeh and M. Lee, “Naturalgradient actor-critic algorithms” Automatica, 2007 (to appear, http://drona.csa.iisc.ernet.in/∼shalabh/pubs/ac_bhatnagar.pdf).
S. Bhatnagar, R.S. Sutton, M. Ghavamzadeh and M. Lee, “Incremental natural actor-critic algorithms”, Proc. 21 st Annual Conference on Neural Information Processing Systems, 2007.
S. Kakade, “A natural policy gradient”, Adv. in Neural Info. Proc. Systems, 2002, 14.
J. Peters, S. Vijayakumar and S. Schaal, “Natural actor-critic”, Proc. 16 th European Conference on Machine Learning, 2005, pp. 280-291.
S. Amari, K. Kurata and H. Nagaoka, “Information geometry of Boltzmann machines” IEEE Trans. on Neural Networks, 1992, Vol. 3, No. 2, pp 260-271.
S. Amari, “Natural gradient works efficiently in learning”, Neural Computation, 1998, 10(2):251-276.
V.S. Borkar, “Stochastic approximation with two timescales”, Systems and Control Letters, 1997, 29:291-294.
V.R. Konda and J.N. Tsitsiklis, “On actor-critic algorithms”, SIAM Journal on Control and Optimization, 2003, 42(4):1143-1166.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2010 Springer Science+Business Media B.V.
About this paper
Cite this paper
Awate, Y.P. (2010). Actor-Critic Algorithms for Variance Minimization. In: Iskander, M., Kapila, V., Karim, M. (eds) Technological Developments in Education and Automation. Springer, Dordrecht. https://doi.org/10.1007/978-90-481-3656-8_82
Download citation
DOI: https://doi.org/10.1007/978-90-481-3656-8_82
Published:
Publisher Name: Springer, Dordrecht
Print ISBN: 978-90-481-3655-1
Online ISBN: 978-90-481-3656-8
eBook Packages: Humanities, Social Sciences and LawEducation (R0)