A Rapid Sparsification Method for Kernel Machines in Approximate Policy Iteration

Liu, Chunming; Huang, Zhenhua; Xu, Xin; Zuo, Lei; Wu, Jun

doi:10.1007/978-3-642-31346-2_60

Chunming Liu¹⁹,
Zhenhua Huang¹⁹,
Xin Xu¹⁹,
Lei Zuo¹⁹ &
…
Jun Wu¹⁹

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 7367))

Included in the following conference series:

International Symposium on Neural Networks

2618 Accesses

Abstract

Recently approximate policy iteration (API) has received increasing attention due to its good convergence and generalization abilities in solving difficult reinforcement learning (RL) problems, e.g. least-squares policy iteration (LSPI) and its kernelized version (KLSPI). However, the sparsification of feature vectors, especially the kernel-based features, costs much computation and greatly influences the performance of API methods. In this paper, a novel rapid sparsification method is proposed for sparsifying kernel machines in API. In this method, the approximation error of a new feature vector is computed prior in the original space to decide if it is added to the current kernel dictionary, so the computational cost becomes a little higher when the collected samples are sparse, but remarkably lower when the collected samples are dense. Experimental results on the swing-up control of an double-link pendulum verify that the computational cost of the proposed algorithm is lower than that of the previous kernel-based API algorithm, and this performance becomes more and more obvious when the number of the collected samples increases and when the level of sparsification increases.

Supported by National Natural Science Foundation of China under Grant 61075072, & 90820302, the Program for New Century Excellent Talents in University under Grant NCET-10-0901.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Sutton, R., Barto, A.: Reinforcement Learning. Introduction. MIT Press, Cambridge (1998)
Google Scholar
Kaelbling, L.P., Littman, M.L., Moore, A.W.: Reinforcement learning: a survey. Journal of Artificial Intelligence Research 4, 237–285 (1996)
Google Scholar
Xu, X., Hu, D.W., Lu, X.C.: Kernel based least-squares policy iteration. IEEE Transactions on Neural Networks 18(4), 973–992 (2007)
Article Google Scholar
Bertsekas, D.P., Tsitsiklis, J.N.: Neurodynamic Programming. Athena Scientific, Belmont (1996)
Google Scholar
Moody, J., Saffell, M.: Learning to trade via direct reinforcement. IEEE Transactions on Neural Networks 12(4), 875–889 (2001)
Article Google Scholar
Baxter, J., Bartlett, P.L.: Infinite-horizon policy-gradient estimation. Journal of Artificial Intelligence Research 15, 319–350 (2001)
MathSciNet MATH Google Scholar
Barto, A.G., Sutton, R.S., Anderson, C.W.: Neuronlike adaptive elements that can solve difficult learning control problems. IEEE Transactions on System, Man, and Cybernetics 13, 835–846 (1983)
Google Scholar
Konda, V.R., Tsitsiklis, J.N.: Actor-Critic Algorithm. In: Advances in Neural Information Processing Systems. MIT Press (2000)
Google Scholar
Xu, X., He, H.G., Hu, D.W.: Efficient reinforcement learning using recursive least-squares methods. Journal of Artificial Intelligence Research 16, 259–292 (2002)
MathSciNet MATH Google Scholar
Boyan, J.: Technical update: least-squares temporal difference learning. Machine Learning 49(2-3), 233–246 (2002)
Article MATH Google Scholar
Crites, R.H., Barto, A.G.: Elevator group control using multiple reinforcement learning agents. Machine Learning 33(2-3), 235–262 (1998)
Article MATH Google Scholar
Tesauro, G.: TD-Gammon, a self-teaching backgammon program, achieves master-level play. Neural Computation 6, 215–219 (1994)
Article Google Scholar
Zhang, W., Dietterich, T.: A reinforcement learning approach to job-shop scheduling. In: Proceedings of the Fourteenth International Joint Conference on Artificial Intelligence, pp. 1114–1120. Morgan Kaufmann (1995)
Google Scholar
Lagoudakis, M.G., Parr, P.: Least-squares policy iteration. Journal of Machine Learning Research 4, 1107–1149 (2003)
MathSciNet Google Scholar
Puterman, M.L.: Markov Decision Processes: discrete stochastic dynamic programming. John Wiley &. Sons, Inc., New York (1994)
Book MATH Google Scholar
Vapnik, V.: Statistical Learning Theory. Wiley Interscience, NewYork (1998)
MATH Google Scholar
Engel, Y., Mannor, S., Meir, R.: The kernel recursive least-squares algorithm. IEEE Transactions on Signal Processing 52(8), 2275–2285 (2004)
Article MathSciNet Google Scholar
Hauser, J., Murray, R.M.: Nonlinear controllers for non-integratable systems: the acrobot example. In: Proceedings of American Control Conference, San Diego, USA, pp. 669–671 (1990)
Google Scholar
Bortoff, S., Spong, M.W.: Psedolinearization of the acrobot using spline functions. In: Proceedings of the IEEE Conference on Decision and Control, Teuson, Arizona, pp. 593–598 (1992)
Google Scholar
Spong, M.W.: The swing up control problem for the acrobot. IEEE Control System Magazine 15(1), 49–55 (1995)
Article Google Scholar
Xu, X., He, H.G.: Residual-gradient-based neural reinforcement learning for the optimal control of an acrobot. In: Proceedings of the IEEE International Symposium on Intelligent Control, Vancouver, Canada, pp. 758–763 (October 2002)
Google Scholar
Sutton, R.: Generalization in reinforcement learning: successful examples using sparse coarse coding. In: Advances in Neural Information Processing Systems 8, pp. 1038–1044. MIT Press (1996)
Google Scholar

Download references

Author information

Authors and Affiliations

Institute of Automation, National University of Defense Technology, Changsha, 410073, P.R. China
Chunming Liu, Zhenhua Huang, Xin Xu, Lei Zuo & Jun Wu

Authors

Chunming Liu
View author publications
You can also search for this author in PubMed Google Scholar
Zhenhua Huang
View author publications
You can also search for this author in PubMed Google Scholar
Xin Xu
View author publications
You can also search for this author in PubMed Google Scholar
Lei Zuo
View author publications
You can also search for this author in PubMed Google Scholar
Jun Wu
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Department of Mechanical & Automation Engineering, The Chinese University of Hong Kong, Shatin, New Territories, Hong Kong
Jun Wang
School of Electrical and Computer Engineering, Oklahoma State University, 74078, Stillwater, OK, USA
Gary G. Yen
Department of Electrical and Computer Engineering, University of Cyprus, 75 Kallipoleos Avenue, 1678, Nicosia, Cyprus
Marios M. Polycarpou

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Liu, C., Huang, Z., Xu, X., Zuo, L., Wu, J. (2012). A Rapid Sparsification Method for Kernel Machines in Approximate Policy Iteration. In: Wang, J., Yen, G.G., Polycarpou, M.M. (eds) Advances in Neural Networks – ISNN 2012. ISNN 2012. Lecture Notes in Computer Science, vol 7367. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-31346-2_60

Download citation

DOI: https://doi.org/10.1007/978-3-642-31346-2_60
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-31345-5
Online ISBN: 978-3-642-31346-2
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics