Sparse Kernel-Based Least Squares Temporal Difference with Prioritized Sweeping

Sun, Cijia; Ling, Xinghong; Fu, Yuchen; Liu, Quan; Zhu, Haijun; Zhai, Jianwei; Zhang, Peng

doi:10.1007/978-3-319-46675-0_25

Cijia Sun¹⁹,
Xinghong Ling¹⁹,
Yuchen Fu¹⁹,
Quan Liu¹⁹,
Haijun Zhu¹⁹,
Jianwei Zhai¹⁹ &
…
Peng Zhang¹⁹

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 9949))

Included in the following conference series:

International Conference on Neural Information Processing

3314 Accesses

Abstract

How to improve the efficiency of the algorithms to solve the large scale or continuous space reinforcement learning (RL) problems has been a hot research. Kernel-based least squares temporal difference(KLSTD) algorithm can solve continuous space RL problems. But it has the problem of high computational complexity because of kernel-based and complex matrix computation. For the problem, this paper proposes an algorithm named sparse kernel-based least squares temporal difference with prioritized sweeping (PS-SKLSTD). PS-SKLSTD consists of two parts: learning and planning. In the learning process, we exploit the ALD-based sparse kernel function to represent value function and update the parameter vectors based on the Sherman-Morrison equation. In the planning process, we use prioritized sweeping method to select the current updated state-action pair. The experimental results demonstrate that PS-SKLSTD has better performance on convergence and calculation efficiency than KLSTD.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Multikernel Recursive Least-Squares Temporal Difference Learning

Efficient reinforcement learning in continuous state and action spaces with Dyna and policy approximation

Article 13 February 2018

Knowledge Gradient for Online Reinforcement Learning

References

Sutton, R.S., Barto, A.G.: Reinforcement Learning: An Introduction. MIT Press, Cambridge (1998)
Google Scholar
Busoniu, L., Babuska, R., Schutter, B.D., Ernst, D.: Reinforcement Learning and Dynamic Programming Using Function Approximators. CRC Press, Boca Raton (2010)
Book MATH Google Scholar
Wiering, M., van Otterlo, M.: Reinforcement learning: state-of-the-art. Phillip Journal Fr Restaurative Zahnmedizin (2012)
Google Scholar
van Hasselt, H.: Reinforcement learning in continuous state and action spaces. In: Wiering, M., van Otterlo, M. (eds.) Reinforcement Learning. ALO, vol. 12, pp. 205–248. Springer, Heidelberg (2012)
Google Scholar
Xu, X., Xie, X., Hu, D.: Kernel least-squares temporal difference learning. Int. J. Inf. Technol. 11(9), 54–63 (2005)
Google Scholar
Xu, X., Hu, D.: Kernel-based least squares policy iteration for reinforcement learning. IEEE Trans. Neural Netw. 18(4), 973–992 (2007)
Article Google Scholar
Moore, A.W., Atkeson, C.G.: Prioritized sweeping: reinforcement learning with less data and less time. Mach. Learn. 13(1), 103–130 (1993)
Google Scholar
Sutton, R.S., Szepesvari, C., Geramifard, A., Bowling, M.P.: Dyna-style planning with linear function approximation and prioritized sweeping. In: Conference on Uncertainty in Artificial Intelligence (2008)
Google Scholar
Lagoudakis, M.G., Parr, R.: Least-squares policy iteration. J. Mach. Learn. Res. 4(6), 1107–1149 (2010)
MathSciNet MATH Google Scholar
Liu, Q., Zhou, X., Zhu, F., Fu, Q., Fu, Y.: Experience replay for least-squares policy iteration. IEEE/CAA J. Autom. Sin. 1(3), 274–281 (2014). IEEE
Article Google Scholar
Xu, X.: A sparse kernel-based least-squares temporal difference algorithm for reinforcement learning. In: Jiao, L., Wang, L., Gao, X., Liu, J., Wu, F. (eds.) ICNC 2006. LNCS, vol. 4221, pp. 47–56. Springer, Heidelberg (2006). doi:10.1007/11881070_8
Chapter Google Scholar
Jong, N., Stone, P.: Kernel-based models for reinforcement learning. In: ICML Workshop on Kernel Machines and Reinforcement Learning (2006)
Google Scholar
Engel, Y., Mannor, S., Meir, R.: Bayes meets Bellman: the Gaussian process approach to temporal difference learning. In: 20th International Conference on Machine Learning, pp. 154–161. American Association for Artificial Intelligence (2003)
Google Scholar
Lazaric, A., Ghavamzadeh, M., Munos, R.: Finite-sample analysis of least-squares policy iteration. J. Mach. Learn. Res. 13(1), 3041–3074 (2012). Microtome Publishing
MathSciNet MATH Google Scholar

Download references

Acknowledgments

This paper was funded by National Natural Science Foundation (61103045, 61272005, 61272244, 61303108, 61373094, 61472262). Natural Science Foundation of Jiangsu (BK2012616), High School Natural Foundation of Jiangsu (13KJB520020), Key Laboratory of Symbolic Computation and Knowledge Engineering of Ministry of Education, Jilin University (93K172014K04), Suzhou Industrial application of basic research program part (SYG201422).

Author information

Authors and Affiliations

School of Computer Science and Technology, Soochow University, Suzhou, 215000, China
Cijia Sun, Xinghong Ling, Yuchen Fu, Quan Liu, Haijun Zhu, Jianwei Zhai & Peng Zhang

Authors

Cijia Sun
View author publications
You can also search for this author in PubMed Google Scholar
Xinghong Ling
View author publications
You can also search for this author in PubMed Google Scholar
Yuchen Fu
View author publications
You can also search for this author in PubMed Google Scholar
Quan Liu
View author publications
You can also search for this author in PubMed Google Scholar
Haijun Zhu
View author publications
You can also search for this author in PubMed Google Scholar
Jianwei Zhai
View author publications
You can also search for this author in PubMed Google Scholar
Peng Zhang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Xinghong Ling .

Editor information

Editors and Affiliations

The University of Tokyo , Tokyo, Japan
Akira Hirose
Kobe University , Kobe, Japan
Seiichi Ozawa
Okinawa Institute of Science and Technology Graduate University, Onna, Japan
Kenji Doya
Nara Institute of Science and Technology , Ikoma, Japan
Kazushi Ikeda
Kyungpook National University , Daegu, Korea (Republic of)
Minho Lee
Chinese Academy of Sciences , Beijing, China
Derong Liu

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Sun, C. et al. (2016). Sparse Kernel-Based Least Squares Temporal Difference with Prioritized Sweeping. In: Hirose, A., Ozawa, S., Doya, K., Ikeda, K., Lee, M., Liu, D. (eds) Neural Information Processing. ICONIP 2016. Lecture Notes in Computer Science(), vol 9949. Springer, Cham. https://doi.org/10.1007/978-3-319-46675-0_25

Download citation

DOI: https://doi.org/10.1007/978-3-319-46675-0_25
Published: 29 September 2016
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-46674-3
Online ISBN: 978-3-319-46675-0
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics