skip to main content
10.1145/1553374.1553442acmotherconferencesArticle/Chapter ViewAbstractPublication PagesicmlConference Proceedingsconference-collections
research-article

Regularization and feature selection in least-squares temporal difference learning

Published: 14 June 2009 Publication History

Abstract

We consider the task of reinforcement learning with linear value function approximation. Temporal difference algorithms, and in particular the Least-Squares Temporal Difference (LSTD) algorithm, provide a method for learning the parameters of the value function, but when the number of features is large this algorithm can over-fit to the data and is computationally expensive. In this paper, we propose a regularization framework for the LSTD algorithm that overcomes these difficulties. In particular, we focus on the case of l1 regularization, which is robust to irrelevant features and also serves as a method for feature selection. Although the l1 regularized LSTD solution cannot be expressed as a convex optimization problem, we present an algorithm similar to the Least Angle Regression (LARS) algorithm that can efficiently compute the optimal solution. Finally, we demonstrate the performance of the algorithm experimentally.

References

[1]
Boyan, J. (2002). Technical update: Least-squares temporal difference learning. Machine Learning, 49, 233--246.
[2]
Bradtke, S., & Barto, A. (1996). Linear least-squares algorithms for temporal difference learning. Machine Learning, 22, 33--57.
[3]
Corduneanu, A., & Jaakkola, T. (2003). On information regularization. Proceedings of the Conference on Uncertainty in Artificial Intelligence (pp. 151--158).
[4]
Efron, B., Hastie, T., Johnstone, I., & Tibshirani, R. (2004). Least angle regression. Annals of Statistics, 32, 407--499.
[5]
Farahmand, A. M., Ghavamzadeh, M., Szepesvari, C., & Mannor, S. (2009). Regularized policy iteration. Neural Information Processing Systems (pp. 441--448).
[6]
Geramifard, A., Bowling, M., & Sutton, R. (2006). Incremental least-squares temporal difference learning. Proceedings of the American Association for Artitical Intelligence (pp. 356--361).
[7]
Horn, R. A., & Johnson, C. R. (1991). Topics in matrix analysis. Cambridge University Press.
[8]
Jung, T., & Polani, D. (2006). Least squares svm for least squares td learning. Proceedings of the European Conference on Artificial Intelligence (pp. 499--503).
[9]
Keller, P. W., Mannor, S., & Precup, D. (2006). Automatic basis function construction for approximate dynamic programming and reinforcement learning. Proceedings of the International Conference on Machine Learning (pp. 449--456).
[10]
Kim, S., Koh, K., Lustig, M., Boyd, S., & Gorinevsky, D. (2007). An interior-point method for large-scale l1-regularized least squares. IEEE Journal on Selected Topics in Signal Processing, 1, 606--617.
[11]
Kolter, J. Z., & Ng, A. Y. (2009). Regularization and feature selection in least-squares temporal difference learning (full version). Available at http://ai.stanford.edu/~kolter.
[12]
Lagoudakis, M., & Parr, R. (2003). Least-squares policy iteration. Journal of Machine Learning Research, 4, 1107--1149.
[13]
Loth, M., Davy, M., & Preux, P. (2007). Sparse temporal difference learning using lasso. Proceedings of the IEEE Symposium on Approximate Dynamic Programming and Reinforcement Learning (pp. 352--359).
[14]
Menache, I., Mannor, S., & Shimkin, N. (2005). Basis function adaptation in temporal difference reinforcement learning. Annals of Operations Research, 134, 215--238.
[15]
Ng, A. Y. (2004). Feature selection, l1 vs. l2 regularization, and rotational invariance. Proceedings of the International Conference on Machine Learning.
[16]
Parr, R., Painter-Wakefield, C., Li, L., & Littman, M. (2007). Analyzing feature generation for value-function approximation. Proceedings of the International Conference on Machine Learing (pp. 737--744).
[17]
Sutton, R. (1988). Learning to predict by the methods of temporal differences. Machine Learning, 3, 9--44.
[18]
Sutton, R., & Barto, A. (1998). Reinforcement learning: An introduction. MIT Press.
[19]
Tibshirani, R. (1996). Regression shrinkage and selection via the lasso. Journal of the Royal Statistical Society B, 58, 267--288.
[20]
Tsitsiklis, J., & Roy, B. V. (1997). An analysis of temporal-difference learning with function approximation. IEEE Transactions on Automatic Control, 42, 674--690.
[21]
Xu, X., Hu, D., & Lu, X. (2007). Kernel-based least squares policy iteration for reinforcement learning. IEEE Transactions on Neural Networks, 18, 973--992.
[22]
Zou, H., & Hastie, T. (2005). Regularization and variable selection via the elastic net. Journal of the Royal Statistical Society B, 67, 301--320.

Cited By

View all
  • (2025)Approximate Policy Iteration With Deep Minimax Average Bellman Error MinimizationIEEE Transactions on Neural Networks and Learning Systems10.1109/TNNLS.2023.334699236:2(2288-2299)Online publication date: Feb-2025
  • (2024)Target networks and over-parameterization stabilize off-policy bootstrapping with function approximationProceedings of the 41st International Conference on Machine Learning10.5555/3692070.3692316(6372-6396)Online publication date: 21-Jul-2024
  • (2023)Does sparsity help in learning misspecified linear bandits?Proceedings of the 40th International Conference on Machine Learning10.5555/3618408.3618741(8317-8333)Online publication date: 23-Jul-2023
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Other conferences
ICML '09: Proceedings of the 26th Annual International Conference on Machine Learning
June 2009
1331 pages
ISBN:9781605585161
DOI:10.1145/1553374

Sponsors

  • NSF
  • Microsoft Research: Microsoft Research
  • MITACS

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 14 June 2009

Permissions

Request permissions for this article.

Check for updates

Qualifiers

  • Research-article

Funding Sources

Conference

ICML '09
Sponsor:
  • Microsoft Research

Acceptance Rates

Overall Acceptance Rate 140 of 548 submissions, 26%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)27
  • Downloads (Last 6 weeks)4
Reflects downloads up to 08 Feb 2025

Other Metrics

Citations

Cited By

View all
  • (2025)Approximate Policy Iteration With Deep Minimax Average Bellman Error MinimizationIEEE Transactions on Neural Networks and Learning Systems10.1109/TNNLS.2023.334699236:2(2288-2299)Online publication date: Feb-2025
  • (2024)Target networks and over-parameterization stabilize off-policy bootstrapping with function approximationProceedings of the 41st International Conference on Machine Learning10.5555/3692070.3692316(6372-6396)Online publication date: 21-Jul-2024
  • (2023)Does sparsity help in learning misspecified linear bandits?Proceedings of the 40th International Conference on Machine Learning10.5555/3618408.3618741(8317-8333)Online publication date: 23-Jul-2023
  • (2023)Policy Evaluation in Decentralized POMDPs With Belief SharingIEEE Open Journal of Control Systems10.1109/OJCSYS.2023.32777602(125-145)Online publication date: 2023
  • (2023) Discounted Generalized Value Iteration for Adaptive Critic Control Based on ℓ 1 -Regularization 2023 9th International Conference on Control Science and Systems Engineering (ICCSSE)10.1109/ICCSSE59359.2023.10245635(105-110)Online publication date: 16-Jun-2023
  • (2023)Hybrid models for classifying histological images: An association of deep features by transfer learning with ensemble classifierMultimedia Tools and Applications10.1007/s11042-023-16351-483:8(21929-21952)Online publication date: 9-Aug-2023
  • (2022)Rethinking value function learning for generalization in reinforcement learningProceedings of the 36th International Conference on Neural Information Processing Systems10.5555/3600270.3602795(34846-34858)Online publication date: 28-Nov-2022
  • (2022)Online Sparse Temporal Difference Learning Based on Nested Optimization and Regularized Dual AveragingIEEE Transactions on Systems, Man, and Cybernetics: Systems10.1109/TSMC.2020.304358452:4(2042-2052)Online publication date: Apr-2022
  • (2022)Efficient policy evaluation by matrix sketchingFrontiers of Computer Science: Selected Publications from Chinese Universities10.1007/s11704-021-0354-416:5Online publication date: 1-Oct-2022
  • (2022)Predictive models with end user preferenceStatistical Analysis and Data Mining10.1002/sam.1154515:1(69-82)Online publication date: 10-Jan-2022
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media