Abstract
Recently approximate policy iteration (API) has received increasing attention due to its good convergence and generalization abilities in solving difficult reinforcement learning (RL) problems, e.g. least-squares policy iteration (LSPI) and its kernelized version (KLSPI). However, the sparsification of feature vectors, especially the kernel-based features, costs much computation and greatly influences the performance of API methods. In this paper, a novel rapid sparsification method is proposed for sparsifying kernel machines in API. In this method, the approximation error of a new feature vector is computed prior in the original space to decide if it is added to the current kernel dictionary, so the computational cost becomes a little higher when the collected samples are sparse, but remarkably lower when the collected samples are dense. Experimental results on the swing-up control of an double-link pendulum verify that the computational cost of the proposed algorithm is lower than that of the previous kernel-based API algorithm, and this performance becomes more and more obvious when the number of the collected samples increases and when the level of sparsification increases.
Supported by National Natural Science Foundation of China under Grant 61075072, & 90820302, the Program for New Century Excellent Talents in University under Grant NCET-10-0901.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Sutton, R., Barto, A.: Reinforcement Learning. Introduction. MIT Press, Cambridge (1998)
Kaelbling, L.P., Littman, M.L., Moore, A.W.: Reinforcement learning: a survey. Journal of Artificial Intelligence Research 4, 237–285 (1996)
Xu, X., Hu, D.W., Lu, X.C.: Kernel based least-squares policy iteration. IEEE Transactions on Neural Networks 18(4), 973–992 (2007)
Bertsekas, D.P., Tsitsiklis, J.N.: Neurodynamic Programming. Athena Scientific, Belmont (1996)
Moody, J., Saffell, M.: Learning to trade via direct reinforcement. IEEE Transactions on Neural Networks 12(4), 875–889 (2001)
Baxter, J., Bartlett, P.L.: Infinite-horizon policy-gradient estimation. Journal of Artificial Intelligence Research 15, 319–350 (2001)
Barto, A.G., Sutton, R.S., Anderson, C.W.: Neuronlike adaptive elements that can solve difficult learning control problems. IEEE Transactions on System, Man, and Cybernetics 13, 835–846 (1983)
Konda, V.R., Tsitsiklis, J.N.: Actor-Critic Algorithm. In: Advances in Neural Information Processing Systems. MIT Press (2000)
Xu, X., He, H.G., Hu, D.W.: Efficient reinforcement learning using recursive least-squares methods. Journal of Artificial Intelligence Research 16, 259–292 (2002)
Boyan, J.: Technical update: least-squares temporal difference learning. Machine Learning 49(2-3), 233–246 (2002)
Crites, R.H., Barto, A.G.: Elevator group control using multiple reinforcement learning agents. Machine Learning 33(2-3), 235–262 (1998)
Tesauro, G.: TD-Gammon, a self-teaching backgammon program, achieves master-level play. Neural Computation 6, 215–219 (1994)
Zhang, W., Dietterich, T.: A reinforcement learning approach to job-shop scheduling. In: Proceedings of the Fourteenth International Joint Conference on Artificial Intelligence, pp. 1114–1120. Morgan Kaufmann (1995)
Lagoudakis, M.G., Parr, P.: Least-squares policy iteration. Journal of Machine Learning Research 4, 1107–1149 (2003)
Puterman, M.L.: Markov Decision Processes: discrete stochastic dynamic programming. John Wiley &. Sons, Inc., New York (1994)
Vapnik, V.: Statistical Learning Theory. Wiley Interscience, NewYork (1998)
Engel, Y., Mannor, S., Meir, R.: The kernel recursive least-squares algorithm. IEEE Transactions on Signal Processing 52(8), 2275–2285 (2004)
Hauser, J., Murray, R.M.: Nonlinear controllers for non-integratable systems: the acrobot example. In: Proceedings of American Control Conference, San Diego, USA, pp. 669–671 (1990)
Bortoff, S., Spong, M.W.: Psedolinearization of the acrobot using spline functions. In: Proceedings of the IEEE Conference on Decision and Control, Teuson, Arizona, pp. 593–598 (1992)
Spong, M.W.: The swing up control problem for the acrobot. IEEE Control System Magazine 15(1), 49–55 (1995)
Xu, X., He, H.G.: Residual-gradient-based neural reinforcement learning for the optimal control of an acrobot. In: Proceedings of the IEEE International Symposium on Intelligent Control, Vancouver, Canada, pp. 758–763 (October 2002)
Sutton, R.: Generalization in reinforcement learning: successful examples using sparse coarse coding. In: Advances in Neural Information Processing Systems 8, pp. 1038–1044. MIT Press (1996)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2012 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Liu, C., Huang, Z., Xu, X., Zuo, L., Wu, J. (2012). A Rapid Sparsification Method for Kernel Machines in Approximate Policy Iteration. In: Wang, J., Yen, G.G., Polycarpou, M.M. (eds) Advances in Neural Networks – ISNN 2012. ISNN 2012. Lecture Notes in Computer Science, vol 7367. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-31346-2_60
Download citation
DOI: https://doi.org/10.1007/978-3-642-31346-2_60
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-31345-5
Online ISBN: 978-3-642-31346-2
eBook Packages: Computer ScienceComputer Science (R0)