Abstract
Reinforcement learning is challenging if state and action spaces are continuous. The discretization of state and action spaces and real-time adaptation of the discretization are critical issues in reinforcement learning problems.
In our contribution we consider the adaptive discretization, and introduce a sparse gradient-based direct policy search method. We address the issue of efficient states/actions selection in the gradient-based direct policy search based on imposing sparsity through the L 1 penalty term. We propose to start learning with a fine discretization of state space and to induce sparsity via the L 1 norm.
We compare the proposed approach to state-of-the art methods, such as progressive widening Q-learning which updates the discretization of the states adaptively, and to classic as well as sparse Q-learning with linear function approximation. We demonstrate by our experiments on standard reinforcement learning challenges that the proposed approach is efficient.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
References
Baxter, J., Bartlett, P.L., Weaver, L.: Experiments with infinite-horizon, policy-gradient estimation. Journal of Artificial Intelligence Research 5, 351–381 (2001)
Baxter, J., Bartlett, P.L.: Infinite-horizon policy-gradient estimation. Journal of Artificial Intelligence Research 15, 319–350 (2001)
Couëtoux, A., Hoock, J.-B., Sokolovska, N., Teytaud, O., Bonnard, N.: Continuous Upper Confidence Trees. In: Coello, C.A.C. (ed.) LION 2011. LNCS, vol. 6683, pp. 433–445. Springer, Heidelberg (2011)
Coulom, R.: Monte-carlo tree search in crazy stone. In: Game Programming Workshop (2007)
Davies, S.: Multidimensional triangulation and interpolation for reinforcement learning. In: Mozer, M.C., Jordan, M.I., Petsche, T. (eds.) NIPS (1997)
Dutech, A., et al.: Reinforcement learning benchmarks and bake-offs. In: Workshop at the 2005 NIPS Conference (2005)
Lampton, A., Valasek, J.: Multiresolution state-space discretization method for Q-learning. In: American Control Conference (2009)
Melo, F.S., Meyn, S.P., Ribeiro, M.I.: An analysis of reinforcement learning with function approximation. In: ICML (2008)
Munos, R., Moore, A.: Variable resolution discretization in optimal control. Technical report, Robotics Institute, CMU (1999)
Munos, R., Moore, A.W.: Variable resolution discretization for high-accuracy solutions of optimal control problems. In: IJCAI (1999)
Rolet, P., Sebag, M., Teytaud, O.: Boosting Active Learning to Optimality: A Tractable Monte-Carlo, Billiard-Based Algorithm. In: Buntine, W., Grobelnik, M., Mladenić, D., Shawe-Taylor, J. (eds.) ECML PKDD 2009, Part II. LNCS, vol. 5782, pp. 302–317. Springer, Heidelberg (2009)
Sokolovska, N., Teytaud, O., Milone, M.: Q-Learning with Double Progressive Widening: Application to Robotics. In: Lu, B.-L., Zhang, L., Kwok, J. (eds.) ICONIP 2011, Part III. LNCS, vol. 7064, pp. 103–112. Springer, Heidelberg (2011)
Sutton, R.: Generalization in reinforcement learning: successful examples using sparse coarse coding. In: NIPS (1996)
Sutton, R.S., Barto, A.G.: Reinforcement learning: an introduction. MIT Press (1998)
Szepesvári, C.: Algorithms for reinforcement learning. Morgan and Claypool (2010)
Tibshirani, R.: Regression shrinkage and selection via Lasso. Journal of the Royal Statistical Society. Series B 58(1), 267–288 (1996)
Watkings, C.J.C.H.: Learning from Delayed Rewards. PhD thesis. Cambridge University (1989)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2012 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Sokolovska, N. (2012). Sparse Gradient-Based Direct Policy Search. In: Huang, T., Zeng, Z., Li, C., Leung, C.S. (eds) Neural Information Processing. ICONIP 2012. Lecture Notes in Computer Science, vol 7666. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-34478-7_27
Download citation
DOI: https://doi.org/10.1007/978-3-642-34478-7_27
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-34477-0
Online ISBN: 978-3-642-34478-7
eBook Packages: Computer ScienceComputer Science (R0)