Skip to main content

A Rapid Sparsification Method for Kernel Machines in Approximate Policy Iteration

  • Conference paper
Advances in Neural Networks – ISNN 2012 (ISNN 2012)

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 7367))

Included in the following conference series:

  • 2618 Accesses

Abstract

Recently approximate policy iteration (API) has received increasing attention due to its good convergence and generalization abilities in solving difficult reinforcement learning (RL) problems, e.g. least-squares policy iteration (LSPI) and its kernelized version (KLSPI). However, the sparsification of feature vectors, especially the kernel-based features, costs much computation and greatly influences the performance of API methods. In this paper, a novel rapid sparsification method is proposed for sparsifying kernel machines in API. In this method, the approximation error of a new feature vector is computed prior in the original space to decide if it is added to the current kernel dictionary, so the computational cost becomes a little higher when the collected samples are sparse, but remarkably lower when the collected samples are dense. Experimental results on the swing-up control of an double-link pendulum verify that the computational cost of the proposed algorithm is lower than that of the previous kernel-based API algorithm, and this performance becomes more and more obvious when the number of the collected samples increases and when the level of sparsification increases.

Supported by National Natural Science Foundation of China under Grant 61075072, & 90820302, the Program for New Century Excellent Talents in University under Grant NCET-10-0901.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Sutton, R., Barto, A.: Reinforcement Learning. Introduction. MIT Press, Cambridge (1998)

    Google Scholar 

  2. Kaelbling, L.P., Littman, M.L., Moore, A.W.: Reinforcement learning: a survey. Journal of Artificial Intelligence Research 4, 237–285 (1996)

    Google Scholar 

  3. Xu, X., Hu, D.W., Lu, X.C.: Kernel based least-squares policy iteration. IEEE Transactions on Neural Networks 18(4), 973–992 (2007)

    Article  Google Scholar 

  4. Bertsekas, D.P., Tsitsiklis, J.N.: Neurodynamic Programming. Athena Scientific, Belmont (1996)

    Google Scholar 

  5. Moody, J., Saffell, M.: Learning to trade via direct reinforcement. IEEE Transactions on Neural Networks 12(4), 875–889 (2001)

    Article  Google Scholar 

  6. Baxter, J., Bartlett, P.L.: Infinite-horizon policy-gradient estimation. Journal of Artificial Intelligence Research 15, 319–350 (2001)

    MathSciNet  MATH  Google Scholar 

  7. Barto, A.G., Sutton, R.S., Anderson, C.W.: Neuronlike adaptive elements that can solve difficult learning control problems. IEEE Transactions on System, Man, and Cybernetics 13, 835–846 (1983)

    Google Scholar 

  8. Konda, V.R., Tsitsiklis, J.N.: Actor-Critic Algorithm. In: Advances in Neural Information Processing Systems. MIT Press (2000)

    Google Scholar 

  9. Xu, X., He, H.G., Hu, D.W.: Efficient reinforcement learning using recursive least-squares methods. Journal of Artificial Intelligence Research 16, 259–292 (2002)

    MathSciNet  MATH  Google Scholar 

  10. Boyan, J.: Technical update: least-squares temporal difference learning. Machine Learning 49(2-3), 233–246 (2002)

    Article  MATH  Google Scholar 

  11. Crites, R.H., Barto, A.G.: Elevator group control using multiple reinforcement learning agents. Machine Learning 33(2-3), 235–262 (1998)

    Article  MATH  Google Scholar 

  12. Tesauro, G.: TD-Gammon, a self-teaching backgammon program, achieves master-level play. Neural Computation 6, 215–219 (1994)

    Article  Google Scholar 

  13. Zhang, W., Dietterich, T.: A reinforcement learning approach to job-shop scheduling. In: Proceedings of the Fourteenth International Joint Conference on Artificial Intelligence, pp. 1114–1120. Morgan Kaufmann (1995)

    Google Scholar 

  14. Lagoudakis, M.G., Parr, P.: Least-squares policy iteration. Journal of Machine Learning Research 4, 1107–1149 (2003)

    MathSciNet  Google Scholar 

  15. Puterman, M.L.: Markov Decision Processes: discrete stochastic dynamic programming. John Wiley &. Sons, Inc., New York (1994)

    Book  MATH  Google Scholar 

  16. Vapnik, V.: Statistical Learning Theory. Wiley Interscience, NewYork (1998)

    MATH  Google Scholar 

  17. Engel, Y., Mannor, S., Meir, R.: The kernel recursive least-squares algorithm. IEEE Transactions on Signal Processing 52(8), 2275–2285 (2004)

    Article  MathSciNet  Google Scholar 

  18. Hauser, J., Murray, R.M.: Nonlinear controllers for non-integratable systems: the acrobot example. In: Proceedings of American Control Conference, San Diego, USA, pp. 669–671 (1990)

    Google Scholar 

  19. Bortoff, S., Spong, M.W.: Psedolinearization of the acrobot using spline functions. In: Proceedings of the IEEE Conference on Decision and Control, Teuson, Arizona, pp. 593–598 (1992)

    Google Scholar 

  20. Spong, M.W.: The swing up control problem for the acrobot. IEEE Control System Magazine 15(1), 49–55 (1995)

    Article  Google Scholar 

  21. Xu, X., He, H.G.: Residual-gradient-based neural reinforcement learning for the optimal control of an acrobot. In: Proceedings of the IEEE International Symposium on Intelligent Control, Vancouver, Canada, pp. 758–763 (October 2002)

    Google Scholar 

  22. Sutton, R.: Generalization in reinforcement learning: successful examples using sparse coarse coding. In: Advances in Neural Information Processing Systems 8, pp. 1038–1044. MIT Press (1996)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2012 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Liu, C., Huang, Z., Xu, X., Zuo, L., Wu, J. (2012). A Rapid Sparsification Method for Kernel Machines in Approximate Policy Iteration. In: Wang, J., Yen, G.G., Polycarpou, M.M. (eds) Advances in Neural Networks – ISNN 2012. ISNN 2012. Lecture Notes in Computer Science, vol 7367. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-31346-2_60

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-31346-2_60

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-31345-5

  • Online ISBN: 978-3-642-31346-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics