Skip to main content

Sparse Gradient-Based Direct Policy Search

  • Conference paper
  • 4185 Accesses

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 7666))

Abstract

Reinforcement learning is challenging if state and action spaces are continuous. The discretization of state and action spaces and real-time adaptation of the discretization are critical issues in reinforcement learning problems.

In our contribution we consider the adaptive discretization, and introduce a sparse gradient-based direct policy search method. We address the issue of efficient states/actions selection in the gradient-based direct policy search based on imposing sparsity through the L 1 penalty term. We propose to start learning with a fine discretization of state space and to induce sparsity via the L 1 norm.

We compare the proposed approach to state-of-the art methods, such as progressive widening Q-learning which updates the discretization of the states adaptively, and to classic as well as sparse Q-learning with linear function approximation. We demonstrate by our experiments on standard reinforcement learning challenges that the proposed approach is efficient.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Baxter, J., Bartlett, P.L., Weaver, L.: Experiments with infinite-horizon, policy-gradient estimation. Journal of Artificial Intelligence Research 5, 351–381 (2001)

    MathSciNet  Google Scholar 

  2. Baxter, J., Bartlett, P.L.: Infinite-horizon policy-gradient estimation. Journal of Artificial Intelligence Research 15, 319–350 (2001)

    Article  MathSciNet  MATH  Google Scholar 

  3. Couëtoux, A., Hoock, J.-B., Sokolovska, N., Teytaud, O., Bonnard, N.: Continuous Upper Confidence Trees. In: Coello, C.A.C. (ed.) LION 2011. LNCS, vol. 6683, pp. 433–445. Springer, Heidelberg (2011)

    Chapter  Google Scholar 

  4. Coulom, R.: Monte-carlo tree search in crazy stone. In: Game Programming Workshop (2007)

    Google Scholar 

  5. Davies, S.: Multidimensional triangulation and interpolation for reinforcement learning. In: Mozer, M.C., Jordan, M.I., Petsche, T. (eds.) NIPS (1997)

    Google Scholar 

  6. Dutech, A., et al.: Reinforcement learning benchmarks and bake-offs. In: Workshop at the 2005 NIPS Conference (2005)

    Google Scholar 

  7. Lampton, A., Valasek, J.: Multiresolution state-space discretization method for Q-learning. In: American Control Conference (2009)

    Google Scholar 

  8. Melo, F.S., Meyn, S.P., Ribeiro, M.I.: An analysis of reinforcement learning with function approximation. In: ICML (2008)

    Google Scholar 

  9. Munos, R., Moore, A.: Variable resolution discretization in optimal control. Technical report, Robotics Institute, CMU (1999)

    Google Scholar 

  10. Munos, R., Moore, A.W.: Variable resolution discretization for high-accuracy solutions of optimal control problems. In: IJCAI (1999)

    Google Scholar 

  11. Rolet, P., Sebag, M., Teytaud, O.: Boosting Active Learning to Optimality: A Tractable Monte-Carlo, Billiard-Based Algorithm. In: Buntine, W., Grobelnik, M., Mladenić, D., Shawe-Taylor, J. (eds.) ECML PKDD 2009, Part II. LNCS, vol. 5782, pp. 302–317. Springer, Heidelberg (2009)

    Chapter  Google Scholar 

  12. Sokolovska, N., Teytaud, O., Milone, M.: Q-Learning with Double Progressive Widening: Application to Robotics. In: Lu, B.-L., Zhang, L., Kwok, J. (eds.) ICONIP 2011, Part III. LNCS, vol. 7064, pp. 103–112. Springer, Heidelberg (2011)

    Chapter  Google Scholar 

  13. Sutton, R.: Generalization in reinforcement learning: successful examples using sparse coarse coding. In: NIPS (1996)

    Google Scholar 

  14. Sutton, R.S., Barto, A.G.: Reinforcement learning: an introduction. MIT Press (1998)

    Google Scholar 

  15. Szepesvári, C.: Algorithms for reinforcement learning. Morgan and Claypool (2010)

    Google Scholar 

  16. Tibshirani, R.: Regression shrinkage and selection via Lasso. Journal of the Royal Statistical Society. Series B 58(1), 267–288 (1996)

    MathSciNet  MATH  Google Scholar 

  17. Watkings, C.J.C.H.: Learning from Delayed Rewards. PhD thesis. Cambridge University (1989)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2012 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Sokolovska, N. (2012). Sparse Gradient-Based Direct Policy Search. In: Huang, T., Zeng, Z., Li, C., Leung, C.S. (eds) Neural Information Processing. ICONIP 2012. Lecture Notes in Computer Science, vol 7666. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-34478-7_27

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-34478-7_27

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-34477-0

  • Online ISBN: 978-3-642-34478-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics