research-article

Regularization and feature selection in least-squares temporal difference learning

Authors:

J. Zico Kolter,

Andrew Y. NgAuthors Info & Claims

ICML '09: Proceedings of the 26th Annual International Conference on Machine Learning

Pages 521 - 528

https://doi.org/10.1145/1553374.1553442

Published: 14 June 2009 Publication History

Abstract

We consider the task of reinforcement learning with linear value function approximation. Temporal difference algorithms, and in particular the Least-Squares Temporal Difference (LSTD) algorithm, provide a method for learning the parameters of the value function, but when the number of features is large this algorithm can over-fit to the data and is computationally expensive. In this paper, we propose a regularization framework for the LSTD algorithm that overcomes these difficulties. In particular, we focus on the case of l₁ regularization, which is robust to irrelevant features and also serves as a method for feature selection. Although the l₁ regularized LSTD solution cannot be expressed as a convex optimization problem, we present an algorithm similar to the Least Angle Regression (LARS) algorithm that can efficiently compute the optimal solution. Finally, we demonstrate the performance of the algorithm experimentally.

References

[1]

Boyan, J. (2002). Technical update: Least-squares temporal difference learning. Machine Learning, 49, 233--246.

Digital Library

[2]

Bradtke, S., & Barto, A. (1996). Linear least-squares algorithms for temporal difference learning. Machine Learning, 22, 33--57.

Digital Library

[3]

Corduneanu, A., & Jaakkola, T. (2003). On information regularization. Proceedings of the Conference on Uncertainty in Artificial Intelligence (pp. 151--158).

Digital Library

[4]

Efron, B., Hastie, T., Johnstone, I., & Tibshirani, R. (2004). Least angle regression. Annals of Statistics, 32, 407--499.

[5]

Farahmand, A. M., Ghavamzadeh, M., Szepesvari, C., & Mannor, S. (2009). Regularized policy iteration. Neural Information Processing Systems (pp. 441--448).

[6]

Geramifard, A., Bowling, M., & Sutton, R. (2006). Incremental least-squares temporal difference learning. Proceedings of the American Association for Artitical Intelligence (pp. 356--361).

Digital Library

[7]

Horn, R. A., & Johnson, C. R. (1991). Topics in matrix analysis. Cambridge University Press.

Digital Library

[8]

Jung, T., & Polani, D. (2006). Least squares svm for least squares td learning. Proceedings of the European Conference on Artificial Intelligence (pp. 499--503).

Digital Library

[9]

Keller, P. W., Mannor, S., & Precup, D. (2006). Automatic basis function construction for approximate dynamic programming and reinforcement learning. Proceedings of the International Conference on Machine Learning (pp. 449--456).

Digital Library

[10]

Kim, S., Koh, K., Lustig, M., Boyd, S., & Gorinevsky, D. (2007). An interior-point method for large-scale l1-regularized least squares. IEEE Journal on Selected Topics in Signal Processing, 1, 606--617.

[11]

Kolter, J. Z., & Ng, A. Y. (2009). Regularization and feature selection in least-squares temporal difference learning (full version). Available at http://ai.stanford.edu/~kolter.

Digital Library

[12]

Lagoudakis, M., & Parr, R. (2003). Least-squares policy iteration. Journal of Machine Learning Research, 4, 1107--1149.

Digital Library

[13]

Loth, M., Davy, M., & Preux, P. (2007). Sparse temporal difference learning using lasso. Proceedings of the IEEE Symposium on Approximate Dynamic Programming and Reinforcement Learning (pp. 352--359).

[14]

Menache, I., Mannor, S., & Shimkin, N. (2005). Basis function adaptation in temporal difference reinforcement learning. Annals of Operations Research, 134, 215--238.

[15]

Ng, A. Y. (2004). Feature selection, l1 vs. l2 regularization, and rotational invariance. Proceedings of the International Conference on Machine Learning.

Digital Library

[16]

Parr, R., Painter-Wakefield, C., Li, L., & Littman, M. (2007). Analyzing feature generation for value-function approximation. Proceedings of the International Conference on Machine Learing (pp. 737--744).

Digital Library

[17]

Sutton, R. (1988). Learning to predict by the methods of temporal differences. Machine Learning, 3, 9--44.

[18]

Sutton, R., & Barto, A. (1998). Reinforcement learning: An introduction. MIT Press.

Digital Library

[19]

Tibshirani, R. (1996). Regression shrinkage and selection via the lasso. Journal of the Royal Statistical Society B, 58, 267--288.

[20]

Tsitsiklis, J., & Roy, B. V. (1997). An analysis of temporal-difference learning with function approximation. IEEE Transactions on Automatic Control, 42, 674--690.

[21]

Xu, X., Hu, D., & Lu, X. (2007). Kernel-based least squares policy iteration for reinforcement learning. IEEE Transactions on Neural Networks, 18, 973--992.

Digital Library

[22]

Zou, H., & Hastie, T. (2005). Regularization and variable selection via the elastic net. Journal of the Royal Statistical Society B, 67, 301--320.

Cited By

Kang LLiu YLuo YYang JYuan HZhu C(2025)Approximate Policy Iteration With Deep Minimax Average Bellman Error MinimizationIEEE Transactions on Neural Networks and Learning Systems10.1109/TNNLS.2023.334699236:2(2288-2299)Online publication date: Feb-2025
https://doi.org/10.1109/TNNLS.2023.3346992
Che FXiao CMei JDai BGummadi RRamirez OHarris CMahmood ASchuurmans DSalakhutdinov RKolter ZHeller KWeller AOliver NScarlett JBerkenkamp F(2024)Target networks and over-parameterization stabilize off-policy bootstrapping with function approximationProceedings of the 41st International Conference on Machine Learning10.5555/3692070.3692316(6372-6396)Online publication date: 21-Jul-2024
https://dl.acm.org/doi/10.5555/3692070.3692316
Dong JYang LKrause ABrunskill ECho KEngelhardt BSabato SScarlett J(2023)Does sparsity help in learning misspecified linear bandits?Proceedings of the 40th International Conference on Machine Learning10.5555/3618408.3618741(8317-8333)Online publication date: 23-Jul-2023
https://dl.acm.org/doi/10.5555/3618408.3618741
Show More Cited By

Recommendations

Feature selection, L₁ vs. L₂ regularization, and rotational invariance
ICML '04: Proceedings of the twenty-first international conference on Machine learning

We consider supervised learning in the presence of very many irrelevant features, and study two different regularization methods for preventing overfitting. Focusing on logistic regression, we show that using L₁ regularization of the parameters, the ...
Feature Selection: A Data Perspective

Feature selection, as a data preprocessing strategy, has been proven to be effective and efficient in preparing data (especially high-dimensional data) for various data-mining and machine-learning problems. The objectives of feature selection include ...
Fast gradient-descent methods for temporal-difference learning with linear function approximation
ICML '09: Proceedings of the 26th Annual International Conference on Machine Learning

Sutton, Szepesvári and Maei (2009) recently introduced the first temporal-difference learning algorithm compatible with both linear function approximation and off-policy training, and whose complexity scales only linearly in the size of the function ...

Comments

Information & Contributors

Information

Published In

cover image ACM Other conferences

ICML '09: Proceedings of the 26th Annual International Conference on Machine Learning

June 2009

1331 pages

ISBN:9781605585161

DOI:10.1145/1553374

General Chair:
Andrea Danyluk
Williams College
,
Program Chairs:
Léon Bottou
NEC Laboratories America
,
Michael Littman
Rutgers University

Copyright © 2009 Copyright 2009 by the author(s)/owner(s).

Sponsors

NSF
Microsoft Research: Microsoft Research
MITACS

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 14 June 2009

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Qualifiers

Research-article

Funding Sources

Defense Advanced Research Projects Agency

Conference

ICML '09

Sponsor:

Microsoft Research

ICML '09: The 26th Annual International Conference on Machine Learning held in conjunction with the 2007 International Conference on Inductive Logic Programming

June 14 - 18, 2009

Quebec, Montreal, Canada

Acceptance Rates

Overall Acceptance Rate 140 of 548 submissions, 26%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

92
Total Citations
View Citations
864
Total Downloads

Downloads (Last 12 months)27
Downloads (Last 6 weeks)4

Reflects downloads up to 09 Feb 2025

Other Metrics

View Author Metrics

Citations

Cited By

Kang LLiu YLuo YYang JYuan HZhu C(2025)Approximate Policy Iteration With Deep Minimax Average Bellman Error MinimizationIEEE Transactions on Neural Networks and Learning Systems10.1109/TNNLS.2023.334699236:2(2288-2299)Online publication date: Feb-2025
https://doi.org/10.1109/TNNLS.2023.3346992
Che FXiao CMei JDai BGummadi RRamirez OHarris CMahmood ASchuurmans DSalakhutdinov RKolter ZHeller KWeller AOliver NScarlett JBerkenkamp F(2024)Target networks and over-parameterization stabilize off-policy bootstrapping with function approximationProceedings of the 41st International Conference on Machine Learning10.5555/3692070.3692316(6372-6396)Online publication date: 21-Jul-2024
https://dl.acm.org/doi/10.5555/3692070.3692316
Dong JYang LKrause ABrunskill ECho KEngelhardt BSabato SScarlett J(2023)Does sparsity help in learning misspecified linear bandits?Proceedings of the 40th International Conference on Machine Learning10.5555/3618408.3618741(8317-8333)Online publication date: 23-Jul-2023
https://dl.acm.org/doi/10.5555/3618408.3618741
Kayaalp MGhadieh FSayed A(2023)Policy Evaluation in Decentralized POMDPs With Belief SharingIEEE Open Journal of Control Systems10.1109/OJCSYS.2023.32777602(125-145)Online publication date: 2023
https://doi.org/10.1109/OJCSYS.2023.3277760
Ma HWang DGao NLiu ARen J(2023) Discounted Generalized Value Iteration for Adaptive Critic Control Based on ℓ 1 -Regularization 2023 9th International Conference on Control Science and Systems Engineering (ICCSSE)10.1109/ICCSSE59359.2023.10245635(105-110)Online publication date: 16-Jun-2023
https://doi.org/10.1109/ICCSSE59359.2023.10245635
de Oliveira Cdo Nascimento MRoberto GTosta TMartins ANeves L(2023)Hybrid models for classifying histological images: An association of deep features by transfer learning with ensemble classifierMultimedia Tools and Applications10.1007/s11042-023-16351-483:8(21929-21952)Online publication date: 9-Aug-2023
https://doi.org/10.1007/s11042-023-16351-4
Moon SLee JSong HKoyejo SMohamed SAgarwal ABelgrave DCho KOh A(2022)Rethinking value function learning for generalization in reinforcement learningProceedings of the 36th International Conference on Neural Information Processing Systems10.5555/3600270.3602795(34846-34858)Online publication date: 28-Nov-2022
https://dl.acm.org/doi/10.5555/3600270.3602795
Song TLi DXu X(2022)Online Sparse Temporal Difference Learning Based on Nested Optimization and Regularized Dual AveragingIEEE Transactions on Systems, Man, and Cybernetics: Systems10.1109/TSMC.2020.304358452:4(2042-2052)Online publication date: Apr-2022
https://doi.org/10.1109/TSMC.2020.3043584
Chen CZhang WYu Y(2022)Efficient policy evaluation by matrix sketchingFrontiers of Computer Science: Selected Publications from Chinese Universities10.1007/s11704-021-0354-416:5Online publication date: 1-Oct-2022
https://dl.acm.org/doi/10.1007/s11704-021-0354-4
Zhao YYang XBolnykh CHarenberg SKorchiev NYerramsetty SVellanki BKodumagulla RSamatova N(2022)Predictive models with end user preferenceStatistical Analysis and Data Mining10.1002/sam.1154515:1(69-82)Online publication date: 10-Jan-2022
https://dl.acm.org/doi/10.1002/sam.11545
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Table of Conten