Abstract
Model-free Least-Squares Policy Iteration (LSPI) method has been successfully used for control problems in the context of reinforcement learning. LSPI is a promising algorithm that uses linear approximator architecture to achieve policy optimization in the spirit of Q-learning. However it faces challenging issues in terms of the selection of basis functions and training sample. Inspired by orthogonal Least-Squares regression method for selecting the centers of RBF neural network, a new hybrid learning method for LSPI is proposed in this paper. The suggested method uses simulation as a tool to guide the “feature configuration” process. The results on the learning control of Cart-Pole system illustrate the effectiveness of the presented method.
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
R.S. Sutton.: Learning to Predict by the Methods of Temporal Difference. Machine Learning, Vol.3, No.1 (1988) 9–44
C.J.C.H. Watkins.: Learning From Delayed Rewards. PhD thesis, Cambridge University, Cambridge, UK (1989)
R.S. Sutton, A. Barto.: Reinforcement Learning: An Introduction. MIT Press, Cambridge, MA (1998)
Steven J. Bradtke, A. Barto.: Linear Least-Squares Algorithms for Temporal Difference Learning. Machine Learning. 22(1/2/3) (1996) 33–57
Daphne Koller, Ronald Parr.: Policy Iteration for factored MDPs. Proceedings of the 16th Conference on Uncertainty in Artificial Intelligence (UAI-00), Morgan Kaufmann. (2000) 326–334
Michail Lagoudakis, Ronald Parr.: Model Free Least Squares Policy Iteration. Proceedings of the 14th Neural Information Processing Systems (NIPS-14), Vancouver, Canada. December (2001)
S. Chen, C.F. Cowan, P.M. Grant.: Orthogonal Least Squares Algorithm for Radial Basis Function Networks, IEEE Transactions on Neural Networks, vol.21. (1990) 2513–39
Michail Lagoudakis, Michael L. Littman.: Algorithm Selection Using Reinforcement Learning. Proceedings of the 7th International Conference on Machine Learning. San Francisco, CA (2000) 511–518
R.S. Sutton.: Temporal Aspects of Credit Assignment in Reinforcement Learning. PhD thesis, University of Massachusetts (1984)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2003 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Li, H., Dagli, C.H. (2003). Hybrid Least-Squares Methods for Reinforcement Learning. In: Chung, P.W.H., Hinde, C., Ali, M. (eds) Developments in Applied Artificial Intelligence. IEA/AIE 2003. Lecture Notes in Computer Science(), vol 2718. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-45034-3_47
Download citation
DOI: https://doi.org/10.1007/3-540-45034-3_47
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-40455-2
Online ISBN: 978-3-540-45034-4
eBook Packages: Springer Book Archive