Skip to main content

Sparse Kernel-Based Least Squares Temporal Difference with Prioritized Sweeping

  • Conference paper
  • First Online:
  • 3067 Accesses

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 9949))

Abstract

How to improve the efficiency of the algorithms to solve the large scale or continuous space reinforcement learning (RL) problems has been a hot research. Kernel-based least squares temporal difference(KLSTD) algorithm can solve continuous space RL problems. But it has the problem of high computational complexity because of kernel-based and complex matrix computation. For the problem, this paper proposes an algorithm named sparse kernel-based least squares temporal difference with prioritized sweeping (PS-SKLSTD). PS-SKLSTD consists of two parts: learning and planning. In the learning process, we exploit the ALD-based sparse kernel function to represent value function and update the parameter vectors based on the Sherman-Morrison equation. In the planning process, we use prioritized sweeping method to select the current updated state-action pair. The experimental results demonstrate that PS-SKLSTD has better performance on convergence and calculation efficiency than KLSTD.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

  1. Sutton, R.S., Barto, A.G.: Reinforcement Learning: An Introduction. MIT Press, Cambridge (1998)

    Google Scholar 

  2. Busoniu, L., Babuska, R., Schutter, B.D., Ernst, D.: Reinforcement Learning and Dynamic Programming Using Function Approximators. CRC Press, Boca Raton (2010)

    Book  MATH  Google Scholar 

  3. Wiering, M., van Otterlo, M.: Reinforcement learning: state-of-the-art. Phillip Journal Fr Restaurative Zahnmedizin (2012)

    Google Scholar 

  4. van Hasselt, H.: Reinforcement learning in continuous state and action spaces. In: Wiering, M., van Otterlo, M. (eds.) Reinforcement Learning. ALO, vol. 12, pp. 205–248. Springer, Heidelberg (2012)

    Google Scholar 

  5. Xu, X., Xie, X., Hu, D.: Kernel least-squares temporal difference learning. Int. J. Inf. Technol. 11(9), 54–63 (2005)

    Google Scholar 

  6. Xu, X., Hu, D.: Kernel-based least squares policy iteration for reinforcement learning. IEEE Trans. Neural Netw. 18(4), 973–992 (2007)

    Article  Google Scholar 

  7. Moore, A.W., Atkeson, C.G.: Prioritized sweeping: reinforcement learning with less data and less time. Mach. Learn. 13(1), 103–130 (1993)

    Google Scholar 

  8. Sutton, R.S., Szepesvari, C., Geramifard, A., Bowling, M.P.: Dyna-style planning with linear function approximation and prioritized sweeping. In: Conference on Uncertainty in Artificial Intelligence (2008)

    Google Scholar 

  9. Lagoudakis, M.G., Parr, R.: Least-squares policy iteration. J. Mach. Learn. Res. 4(6), 1107–1149 (2010)

    MathSciNet  MATH  Google Scholar 

  10. Liu, Q., Zhou, X., Zhu, F., Fu, Q., Fu, Y.: Experience replay for least-squares policy iteration. IEEE/CAA J. Autom. Sin. 1(3), 274–281 (2014). IEEE

    Article  Google Scholar 

  11. Xu, X.: A sparse kernel-based least-squares temporal difference algorithm for reinforcement learning. In: Jiao, L., Wang, L., Gao, X., Liu, J., Wu, F. (eds.) ICNC 2006. LNCS, vol. 4221, pp. 47–56. Springer, Heidelberg (2006). doi:10.1007/11881070_8

    Chapter  Google Scholar 

  12. Jong, N., Stone, P.: Kernel-based models for reinforcement learning. In: ICML Workshop on Kernel Machines and Reinforcement Learning (2006)

    Google Scholar 

  13. Engel, Y., Mannor, S., Meir, R.: Bayes meets Bellman: the Gaussian process approach to temporal difference learning. In: 20th International Conference on Machine Learning, pp. 154–161. American Association for Artificial Intelligence (2003)

    Google Scholar 

  14. Lazaric, A., Ghavamzadeh, M., Munos, R.: Finite-sample analysis of least-squares policy iteration. J. Mach. Learn. Res. 13(1), 3041–3074 (2012). Microtome Publishing

    MathSciNet  MATH  Google Scholar 

Download references

Acknowledgments

This paper was funded by National Natural Science Foundation (61103045, 61272005, 61272244, 61303108, 61373094, 61472262). Natural Science Foundation of Jiangsu (BK2012616), High School Natural Foundation of Jiangsu (13KJB520020), Key Laboratory of Symbolic Computation and Knowledge Engineering of Ministry of Education, Jilin University (93K172014K04), Suzhou Industrial application of basic research program part (SYG201422).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Xinghong Ling .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2016 Springer International Publishing AG

About this paper

Cite this paper

Sun, C. et al. (2016). Sparse Kernel-Based Least Squares Temporal Difference with Prioritized Sweeping. In: Hirose, A., Ozawa, S., Doya, K., Ikeda, K., Lee, M., Liu, D. (eds) Neural Information Processing. ICONIP 2016. Lecture Notes in Computer Science(), vol 9949. Springer, Cham. https://doi.org/10.1007/978-3-319-46675-0_25

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-46675-0_25

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-46674-3

  • Online ISBN: 978-3-319-46675-0

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics