Sparse Gradient-Based Direct Policy Search

Sokolovska, Nataliya

doi:10.1007/978-3-642-34478-7_27

Sparse Gradient-Based Direct Policy Search

Nataliya Sokolovska²⁰

Conference paper

4185 Accesses

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 7666))

Abstract

Reinforcement learning is challenging if state and action spaces are continuous. The discretization of state and action spaces and real-time adaptation of the discretization are critical issues in reinforcement learning problems.

In our contribution we consider the adaptive discretization, and introduce a sparse gradient-based direct policy search method. We address the issue of efficient states/actions selection in the gradient-based direct policy search based on imposing sparsity through the L ₁ penalty term. We propose to start learning with a fine discretization of state space and to induce sparsity via the L ₁ norm.

We compare the proposed approach to state-of-the art methods, such as progressive widening Q-learning which updates the discretization of the states adaptively, and to classic as well as sparse Q-learning with linear function approximation. We demonstrate by our experiments on standard reinforcement learning challenges that the proposed approach is efficient.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Baxter, J., Bartlett, P.L., Weaver, L.: Experiments with infinite-horizon, policy-gradient estimation. Journal of Artificial Intelligence Research 5, 351–381 (2001)
MathSciNet Google Scholar
Baxter, J., Bartlett, P.L.: Infinite-horizon policy-gradient estimation. Journal of Artificial Intelligence Research 15, 319–350 (2001)
Article MathSciNet MATH Google Scholar
Couëtoux, A., Hoock, J.-B., Sokolovska, N., Teytaud, O., Bonnard, N.: Continuous Upper Confidence Trees. In: Coello, C.A.C. (ed.) LION 2011. LNCS, vol. 6683, pp. 433–445. Springer, Heidelberg (2011)
Chapter Google Scholar
Coulom, R.: Monte-carlo tree search in crazy stone. In: Game Programming Workshop (2007)
Google Scholar
Davies, S.: Multidimensional triangulation and interpolation for reinforcement learning. In: Mozer, M.C., Jordan, M.I., Petsche, T. (eds.) NIPS (1997)
Google Scholar
Dutech, A., et al.: Reinforcement learning benchmarks and bake-offs. In: Workshop at the 2005 NIPS Conference (2005)
Google Scholar
Lampton, A., Valasek, J.: Multiresolution state-space discretization method for Q-learning. In: American Control Conference (2009)
Google Scholar
Melo, F.S., Meyn, S.P., Ribeiro, M.I.: An analysis of reinforcement learning with function approximation. In: ICML (2008)
Google Scholar
Munos, R., Moore, A.: Variable resolution discretization in optimal control. Technical report, Robotics Institute, CMU (1999)
Google Scholar
Munos, R., Moore, A.W.: Variable resolution discretization for high-accuracy solutions of optimal control problems. In: IJCAI (1999)
Google Scholar
Rolet, P., Sebag, M., Teytaud, O.: Boosting Active Learning to Optimality: A Tractable Monte-Carlo, Billiard-Based Algorithm. In: Buntine, W., Grobelnik, M., Mladenić, D., Shawe-Taylor, J. (eds.) ECML PKDD 2009, Part II. LNCS, vol. 5782, pp. 302–317. Springer, Heidelberg (2009)
Chapter Google Scholar
Sokolovska, N., Teytaud, O., Milone, M.: Q-Learning with Double Progressive Widening: Application to Robotics. In: Lu, B.-L., Zhang, L., Kwok, J. (eds.) ICONIP 2011, Part III. LNCS, vol. 7064, pp. 103–112. Springer, Heidelberg (2011)
Chapter Google Scholar
Sutton, R.: Generalization in reinforcement learning: successful examples using sparse coarse coding. In: NIPS (1996)
Google Scholar
Sutton, R.S., Barto, A.G.: Reinforcement learning: an introduction. MIT Press (1998)
Google Scholar
Szepesvári, C.: Algorithms for reinforcement learning. Morgan and Claypool (2010)
Google Scholar
Tibshirani, R.: Regression shrinkage and selection via Lasso. Journal of the Royal Statistical Society. Series B 58(1), 267–288 (1996)
MathSciNet MATH Google Scholar
Watkings, C.J.C.H.: Learning from Delayed Rewards. PhD thesis. Cambridge University (1989)
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computing, Macquarie University, Sydney, Australia
Nataliya Sokolovska

Authors

Nataliya Sokolovska
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Texas A&M University at Qatar, Education City, P.O. Box 23874, Doha, Qatar
Tingwen Huang
Department of Control Science and Engineering, Huazhong University of Science and Technology, 1037 Luoyu Road, 430074, Wuhan, Hubei, China
Zhigang Zeng
College of Computer Science, Chongqing University, 174 Shazhengjie Street, 400044, Chongqing, China
Chuandong Li
Department of Electronic Engineering, City University of Hong Kong, 83 Tat Chee Avenue, Kowloon, Hong Kong, China
Chi Sing Leung

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Sokolovska, N. (2012). Sparse Gradient-Based Direct Policy Search. In: Huang, T., Zeng, Z., Li, C., Leung, C.S. (eds) Neural Information Processing. ICONIP 2012. Lecture Notes in Computer Science, vol 7666. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-34478-7_27

Download citation

DOI: https://doi.org/10.1007/978-3-642-34478-7_27
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-34477-0
Online ISBN: 978-3-642-34478-7
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics