Regularized Least Squares Temporal Difference Learning with Nested ℓ2 and ℓ1 Penalization

Hoffman, Matthew W.; Lazaric, Alessandro; Ghavamzadeh, Mohammad; Munos, Rémi

doi:10.1007/978-3-642-29946-9_13

Matthew W. Hoffman²¹,
Alessandro Lazaric²²,
Mohammad Ghavamzadeh²² &
…
Rémi Munos²²

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 7188))

Included in the following conference series:

European Workshop on Reinforcement Learning

2310 Accesses
11 Citations

Abstract

The construction of a suitable set of features to approximate value functions is a central problem in reinforcement learning (RL). A popular approach to this problem is to use high-dimensional feature spaces together with least-squares temporal difference learning (LSTD). Although this combination allows for very accurate approximations, it often exhibits poor prediction performance because of overfitting when the number of samples is small compared to the number of features in the approximation space. In the linear regression setting, regularization is commonly used to overcome this problem. In this paper, we review some regularized approaches to policy evaluation and we introduce a novel scheme (L ₂₁) which uses ℓ₂ regularization in the projection operator and an ℓ₁ penalty in the fixed-point step. We show that such formulation reduces to a standard Lasso problem. As a result, any off-the-shelf solver can be used to compute its solution and standardization techniques can be applied to the data. We report experimental results showing that L ₂₁ is effective in avoiding overfitting and that it compares favorably to existing ℓ₁ regularized methods.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Antos, A., Szepesvári, C., Munos, R.: Learning near-optimal policies with Bellman-residual minimization based fitted policy iteration and a single sample path. Machine Learning 71(1) (2008)
Google Scholar
Bradtke, S., Barto, A.: Linear least-squares algorithms for temporal difference learning. Machine Learning 22, 33–57 (1996)
MATH Google Scholar
Bunea, F., Tsybakov, A., Wegkamp, M.: Sparsity oracle inequalities for the lasso. Electronic Journal of Statistics 1, 169–194 (2007)
Article MathSciNet MATH Google Scholar
Efron, B., Hastie, T., Johnstone, I., Tibshirani, R.: Least angle regression. Annals of Statistics 32(2) (2004)
Google Scholar
Farahmand, A., Ghavamzadeh, M., Szepesvari, C., Mannor, S.: Regularized policy iteration. In: Advances in Neural Information Processing Systems 21 (2009)
Google Scholar
Friedman, J., Hastie, T., Höfling, H., Tibshirani, R.: Pathwise coordinate optimization. The Annals of Applied Statistics 1(2), 302–332 (2007)
Article MathSciNet MATH Google Scholar
Friedman, J., Hastie, T., Tibshirani, R.: The elements of statistical learning. Springer, Heidelberg (2001)
MATH Google Scholar
Geist, M., Scherrer, B.: ℓ₁-penalized projected bellman residual. In: European Workshop on Reinforcement Learning (2011)
Google Scholar
Ghavamzadeh, M., Lazaric, A., Munos, R., Hoffman, M.: Finite-sample analysis of Lasso-TD. In: Proceedings of the International Conference on Machine Learning (2011)
Google Scholar
Johns, J., Painter-Wakefield, C., Parr, R.: Linear complementarity for regularized policy evaluation and improvement. In: Advances in Neural Information Processing Systems 23 (2010)
Google Scholar
Kolter, J.Z., Ng, A.Y.: Regularization and feature selection in least-squares temporal difference learning. In: Proceedings of the International Conference on Machine Learning (2009)
Google Scholar
Lagoudakis, M.G., Parr, R.: Least-squares policy iteration. Journal of Machine Learning Research 4 (2003)
Google Scholar
Schmidt, M.: Graphical Model Structure Learning with l1-Regularization. Ph.D. thesis, University of British Columbia (2010)
Google Scholar
Sutton, R., Barto, A.: Reinforcement Learning: An Introduction. MIT Press (1998)
Google Scholar
Tibshirani, R.: Regression shrinkage and selection via the lasso. Journal of the Royal Statistical Society. Series B (Methodological) 58(1), 267–288 (1996)
MathSciNet MATH Google Scholar

Download references

Author information

Authors and Affiliations

Computer Science, University of British Columbia, Vancouver, Canada
Matthew W. Hoffman
INRIA Lille - Nord Europe, Team SequeL, France
Alessandro Lazaric, Mohammad Ghavamzadeh & Rémi Munos

Authors

Matthew W. Hoffman
View author publications
You can also search for this author in PubMed Google Scholar
Alessandro Lazaric
View author publications
You can also search for this author in PubMed Google Scholar
Mohammad Ghavamzadeh
View author publications
You can also search for this author in PubMed Google Scholar
Rémi Munos
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

NICTA and the Australian National University, 7 London Circuit, ACT 2601, Canberra, Australia
Scott Sanner
Research School of Computer Science, Australian National University, ACT 0200, Canberra, Australia
Marcus Hutter

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Hoffman, M.W., Lazaric, A., Ghavamzadeh, M., Munos, R. (2012). Regularized Least Squares Temporal Difference Learning with Nested ℓ₂ and ℓ₁ Penalization. In: Sanner, S., Hutter, M. (eds) Recent Advances in Reinforcement Learning. EWRL 2011. Lecture Notes in Computer Science(), vol 7188. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-29946-9_13

Download citation

DOI: https://doi.org/10.1007/978-3-642-29946-9_13
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-29945-2
Online ISBN: 978-3-642-29946-9
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics