Efficient policy evaluation by matrix sketching

Chen, Cheng; Zhang, Weinan; Yu, Yong

doi:10.1007/s11704-021-0354-4

Efficient policy evaluation by matrix sketching

Research Article
Published: 08 January 2022

Volume 16, article number 165330, (2022)
Cite this article

Frontiers of Computer Science Aims and scope Submit manuscript

Cheng Chen¹,
Weinan Zhang¹ &
Yong Yu¹

103 Accesses
1 Altmetric
Explore all metrics

Abstract

In the reinforcement learning, policy evaluation aims to predict long-term values of a state under a certain policy. Since high-dimensional representations become more and more common in the reinforcement learning, how to reduce the computational cost becomes a significant problem to the policy evaluation. Many recent works focus on adopting matrix sketching methods to accelerate least-square temporal difference (TD) algorithms and quasi-Newton temporal difference algorithms. Among these sketching methods, the truncated incremental SVD shows better performance because it is stable and efficient. However, the convergence properties of the incremental SVD is still open. In this paper, we first show that the conventional incremental SVD algorithms could have enormous approximation errors in the worst case. Then we propose a variant of incremental SVD with better theoretical guarantees by shrinking the singular values periodically. Moreover, we employ our improved incremental SVD to accelerate least-square TD and quasi-Newton TD algorithms. The experimental results verify the correctness and effectiveness of our methods.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Multikernel Recursive Least-Squares Temporal Difference Learning

Efficient reinforcement learning in continuous state and action spaces with Dyna and policy approximation

Article 13 February 2018

Least-Squares Reinforcement Learning Methods

References

Sutton R S, Barto A G. Reinforcement Learning: an Introduction. 2nd ed. London: MIT Press, 2018
MATH Google Scholar
Bertsekas D P, Tsitsiklis J N. Neuro-dynamic programming: an overview. In: Proceedings of the 34th IEEE Conference on Decision and Control. 1995, 560–564
Lagoudakis M G, Parr R. Least-squares policy iteration. Journal of Machine Learning Research, 2003, 4: 1107–1149
MathSciNet MATH Google Scholar
Dann C, Neumann G, Peters J. Policy evaluation with temporal differences: a survey and comparison. The Journal of Machine Learning Research, 2014, 15(1): 809–883
MathSciNet MATH Google Scholar
Geist M, Scherrer B. Off-policy learning with eligibility traces: a survey. The Journal of Machine Learning Research, 2014, 15(1): 289–333
MathSciNet MATH Google Scholar
Liang Y T, Machado M C, Talvitie E, Bowling M. State of the art control of Atari games using shallow reinforcement learning. In: Proceedings of the 2016 International Conference on Autonomous Agents & Multiagent Systems. 2016, 485–493
Tagorti M, Scherrer B. On the rate of convergence and error bounds for LSTD (λ). In: Proceedings of the 32nd International Conference on International Conference on Machine Learning. 2015, 1521–1529
Sutton R S. Learning to predict by the methods of temporal differences. Machine Learning, 1988, 3(1): 9–44
Article Google Scholar
Sutton R S, Szepesvári C, Maei H R. A convergent O(n) algorithm for off-policy temporal-difference learning with linear function approximation. In: Proceedings of the 21st International Conference on Neural Information Processing Systems. 2008, 1609–1616
Boyan J A. Technical update: least-squares temporal difference learning. Machine Learning, 2002, 49(2–3): 233–246
Article MATH Google Scholar
Geramifard A, Bowling M H, Sutton R S. Incremental least-squares temporal difference learning. In: Proceedings of the 21st National Conference on Artificial Intelligence and the Eighteenth Innovative Applications of Artificial Intelligence Conference. 2006, 356–361
Geramifard A, Bowling M H, Zinkevich M, Sutton R S. iLSTD: eligibility traces and convergence analysis. In: Proceedings of the 20th Annual Conference on Neural Information Processing Systems. 2006, 441–448
Pan Y C, White A M, White M. Accelerated gradient temporal difference learning. In: Proceedings of the 31st AAAI Conference on Artificial Intelligence. 2017, 2464–2470
Ghavamzadeh M, Lazaric A, Maillard O A, Munos R. LSTD with random projections. In: Proceedings of Advances in Neural Information Processing Systems: 24th Annual Conference on Neural Information Processing Systems 2010. 2010, 721–729
Pan Y C, Azer E S, White M. Effective sketching methods for value function approximation. In: Proceedings of the 33rd Conference on Uncertainty in Artificial Intelligence. 2017
Gehring C, Pan Y C, White M. Incremental truncated LSTD. In: Proceedings of the 25th International Joint Conference on Artificial Intelligence. 2016, 1505–1511
Li H F, Xia Y C, Zhang W S. Finite sample analysis of LSTD with random projections and eligibility traces. In: Proceedings of the 27th International Joint Conference on Artificial Intelligence. 2018, 2390–2396
Woodruff D P. Sketching as a tool for numerical linear algebra. Foundations and Trends® in Theoretical Computer Science, 2014, 10(1–2): 1–157
Article MathSciNet MATH Google Scholar
Bertsekas D P. Dynamic Programming and Optimal Control. Volume 1. Belmont: Athena Scientific, 1995
MATH Google Scholar
Kolter J Z, Ng A Y. Regularization and feature selection in least-squares temporal difference learning. In: Proceedings of the 26th Annual International Conference on Machine Learning. 2009, 521–528
Bradtke S J, Barto A G. Linear least-squares algorithms for temporal difference learning. Machine Learning, 1996, 22(1–3): 33–57
Article MATH Google Scholar
Liberty E. Simple and deterministic matrix sketching. In: Proceedings of the 19th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 2013, 581–588
Ghashami M, Liberty E, Phillips J M, Woodruff D P. Frequent directions: simple and deterministic matrix sketching. SIAM Journal on Computing, 2016, 45(5): 1762–1792
Article MathSciNet MATH Google Scholar
Kuzborskij I, Cella L, Cesa-Bianchi N. Efficient linear bandits through matrix sketching. In: Proceedings of the 22nd International Conference on Artificial Intelligence and Statistics. 2019, 177–185
Luo H P, Agarwal A, Cesa-Bianchi N, Langford J. Efficient second order online learning by sketching. In: Proceedings of Advances in Neural Information Processing Systems. 2016, 902–910
Luo L, Chen C, Zhang Z H, Li W J, Zhang T. Robust frequent directions with application in online learning. Journal of Machine Learning Research, 2019, 20(45): 1–41
MathSciNet MATH Google Scholar
Mroueh Y, Marcheret E, Goel V. Co-occurring directions sketching for approximate matrix multiply. In: Proceedings of the 20th International Conference on Artificial Intelligence and Statistics. 2017, 567–575
Brand M. Fast online SVD revisions for lightweight recommender systems. In: Proceedings of the 2003 SIAM International Conference on Data Mining. 2003, 37–46
Sarwar B, Karypis G, Konstan J, Riedl J. Incremental singular value decomposition algorithms for highly scalable recommender systems. In: Proceedings of the 5th International Conference on Computer and Information Science. 2002, 27–28
Ross D A, Lim J, Lin R S, Yang M H. Incremental learning for robust visual tracking. International Journal of Computer Vision, 2008, 77(1–3): 125–141
Article Google Scholar
Hall P M, Marshall D, Martin R R. Incremental eigenanalysis for classification. In: Proceedings of the British Machine Vision Conference. 1998, 1–10
Brand M. Incremental singular value decomposition of uncertain data with missing values. In: Proceedings of the 7th European Conference on Computer Vision. 2002, 707–720
Brand M. Fast low-rank modifications of the thin singular value decomposition. Linear Algebra and its Applications, 2006, 415(1): 20–30
Article MathSciNet MATH Google Scholar
Salas D F, Powell W B. Benchmarking a scalable approximate dynamic programming algorithm for stochastic control of grid-level energy storage. INFORMS Journal on Computing, 2018, 30(1): 106–123
Article MathSciNet MATH Google Scholar

Download references

Acknowledgements

The corresponding author Weinan Zhang was supported by the “New Generation of AI 2030” Major Project (2018AAA0100900) and the National Natural Science Foundation of China (Grant Nos. 62076161, 61772333, 61632017).

Author information

Authors and Affiliations

Department of Computer Science, Shanghai Jiao Tong University, Shanghai, 200240, China
Cheng Chen, Weinan Zhang & Yong Yu

Authors

Cheng Chen
View author publications
You can also search for this author in PubMed Google Scholar
Weinan Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Yong Yu
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Weinan Zhang.

Additional information

Cheng Chen is currently a PhD candidate in APEX Lab at Shanghai Jiao Tong University, China. He received his bachelor’s degree at the Department of Computer Science in Shanghai Jiao Tong University, China in 2013. His research interest lies in matrix approximation, online learning and optimization.

Weinan Zhang received his PhD degree from University College London in 2016 and his BS degree from the ACM Class of Shanghai Jiao Tong University, China in 2011. He is currently an assistant professor with the Department of Computer Science, Shanghai Jiao Tong University. He has published over 50 research papers on conferences and journals, including KDD, SIGIR, AAAI, WWW, WSDM, ICDM, JMLR, IPM, and so on. His research interests include machine learning and big data mining, particularly, deep learning and reinforcement learning techniques for real-world data mining scenarios, such as computational advertising, recommendation systems, text mining, Web search, and knowledge graphs.

Yong Yu received his MS degree from the CS Department, East China Normal University, China. He is currently a professor with the Department of Computer Science, Shanghai Jiao Tong University, China and the Director of the Apex Data & Knowledge Management Lab. As the principal investigator, he took charge of several National Natural Science Foundation of China and China National High Tech (863) Program projects. His research interests include Web search, semantic search, data mining, and machine learning. He has published over 200 papers and served as a PC Member of several conferences, including WWW, RecSys, and a dozen of other related conferences, such as NIPS, ICML, SIGIR, ISWC, and so on.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Chen, C., Zhang, W. & Yu, Y. Efficient policy evaluation by matrix sketching. Front. Comput. Sci. 16, 165330 (2022). https://doi.org/10.1007/s11704-021-0354-4

Download citation

Received: 17 July 2020
Accepted: 09 March 2021
Published: 08 January 2022
DOI: https://doi.org/10.1007/s11704-021-0354-4

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Efficient policy evaluation by matrix sketching

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Multikernel Recursive Least-Squares Temporal Difference Learning

Efficient reinforcement learning in continuous state and action spaces with Dyna and policy approximation

Least-Squares Reinforcement Learning Methods

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Keywords

Subscribe and save

Buy Now

Navigation

Efficient policy evaluation by matrix sketching

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Multikernel Recursive Least-Squares Temporal Difference Learning

Efficient reinforcement learning in continuous state and action spaces with Dyna and policy approximation

Least-Squares Reinforcement Learning Methods

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now

Search

Navigation