Abstract
Model-free Least-Squares Policy Iteration (LSPI) method has been successfully used for control problems in the context of reinforcement learning. LSPI is a promising algorithm that uses linear approximator architecture to achieve policy optimization in the spirit of Q-learning. However it faces challenging issues in terms of the selection of basis functions and training sample. Inspired by orthogonal Least-Squares regression method for selecting the centers of RBF neural network, a new hybrid learning method for LSPI is proposed in this paper. The suggested method uses simulation as a tool to guide the “feature configuration” process. The results on the learning control of Cart-Pole system illustrate the effectiveness of the presented method.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
R.S. Sutton.: Learning to Predict by the Methods of Temporal Difference. Machine Learning, Vol.3, No.1 (1988) 9–44
C.J.C.H. Watkins.: Learning From Delayed Rewards. PhD thesis, Cambridge University, Cambridge, UK (1989)
R.S. Sutton, A. Barto.: Reinforcement Learning: An Introduction. MIT Press, Cambridge, MA (1998)
Steven J. Bradtke, A. Barto.: Linear Least-Squares Algorithms for Temporal Difference Learning. Machine Learning. 22(1/2/3) (1996) 33–57
Daphne Koller, Ronald Parr.: Policy Iteration for factored MDPs. Proceedings of the 16th Conference on Uncertainty in Artificial Intelligence (UAI-00), Morgan Kaufmann. (2000) 326–334
Michail Lagoudakis, Ronald Parr.: Model Free Least Squares Policy Iteration. Proceedings of the 14th Neural Information Processing Systems (NIPS-14), Vancouver, Canada. December (2001)
S. Chen, C.F. Cowan, P.M. Grant.: Orthogonal Least Squares Algorithm for Radial Basis Function Networks, IEEE Transactions on Neural Networks, vol.21. (1990) 2513–39
Michail Lagoudakis, Michael L. Littman.: Algorithm Selection Using Reinforcement Learning. Proceedings of the 7th International Conference on Machine Learning. San Francisco, CA (2000) 511–518
R.S. Sutton.: Temporal Aspects of Credit Assignment in Reinforcement Learning. PhD thesis, University of Massachusetts (1984)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2003 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Li, H., Dagli, C.H. (2003). Hybrid Least-Squares Methods for Reinforcement Learning. In: Chung, P.W.H., Hinde, C., Ali, M. (eds) Developments in Applied Artificial Intelligence. IEA/AIE 2003. Lecture Notes in Computer Science(), vol 2718. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-45034-3_47
Download citation
DOI: https://doi.org/10.1007/3-540-45034-3_47
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-40455-2
Online ISBN: 978-3-540-45034-4
eBook Packages: Springer Book Archive