Abstract
A popular form of policy evaluation for large Markov Decision Processes (MDPs) is the least-squares temporal differencing (TD) method. Least-squares TD methods handle large MDPs by requiring prior knowledge feature vectors which form a set of basis vectors that compress the system down to tractable levels. Model-based methods have largely been ignored in favour of model-free TD algorithms due to two perceived drawbacks: slower computation time and larger storage requirements. This paper challenges the perceived advantage of the temporal difference method over a model-based method in three distinct ways. First, it provides a new model-based approximate policy estimation method which produces solutions in a faster computation time than Boyan’s least-squares TD method. Second, it introduces a new algorithm to derive basis vectors without any prior knowledge of the system. Third, we introduce an iteratively improvingmo del-based value estimator that can run faster than standard TD methods. All algorithms require model storage but remain computationally competitive in terms of accuracy with model-free temporal differencing methods.
Keywords
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Sutton, R. S., Barto, A.G.: Reinforcement Learning: An Introduction. MIT Press, Cambridge, Massachusetts (1998)
Singh, S.P., Sutton, R. S.: Reinforcement Learningwit h Replacing Eligibility Traces. Machine Learning 22 (1996) 123–158
Lu, F., Patrascu, R., Schuurmans, D.: Investigating the Maximum Likelihood alternative to TD(λ). Proceedings of the 19th ICML (2002) 403–410x
Boyan, J.A.: Least-squares Temporal Difference learning. Proc. of 16th ICML (1999) 123–158
Bradtke, S. J., Barto, A. G.: Linear Least-Squares Algorithms for Temporal Difference Learning. Machine Learning 22 (1996) 33–57
Bellman, R.E.: A Markov decision process. J. of Mathematical Mechanics 6 (1957) 679–684
Lagoudakis, G., Parr, R.: Model-Free Least Squares Policy Iteration. NIPS 14 (2001)
Lawson, C. L., Hanson, R. J.: SolvingL east Squares Problems. Prentice-Hall, Englewood Cliffs, New Jersey (1974)
Koller, D., Parr, R.: Computing fact ored value functions for policies in structured MDPs. 16th Intl. Joint Conference on Artficial Intelligence (1999) 1332–1399
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2003 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Lu, F., Schuurmans, D. (2003). Model-Based Least-Squares Policy Evaluation. In: Xiang, Y., Chaib-draa, B. (eds) Advances in Artificial Intelligence. Canadian AI 2003. Lecture Notes in Computer Science, vol 2671. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-44886-1_26
Download citation
DOI: https://doi.org/10.1007/3-540-44886-1_26
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-40300-5
Online ISBN: 978-3-540-44886-0
eBook Packages: Springer Book Archive