Residual Sarsa algorithm with function approximation

Qiming, Fu; Wen, Hu; Quan, Liu; Heng, Luo; Lingyao, Hu; Jianping, Chen

doi:10.1007/s10586-017-1303-8

Residual Sarsa algorithm with function approximation

Published: 10 November 2017

Volume 22, pages 795–807, (2019)
Cite this article

Cluster Computing Aims and scope Submit manuscript

Fu Qiming^1,2,3,5,
Hu Wen^1,2,3,
Liu Quan^4,5,6,
Luo Heng^1,2,3,
Hu Lingyao^1,2,3 &
…
Chen Jianping^1,2,3

417 Accesses
3 Citations
Explore all metrics

Abstract

In this work, we proposed an efficient algorithm named the residual Sarsa algorithm with function approximation (FARS) to improve the performance of the traditional Sarsa algorithm, and we use the gradient-descent method to update the function parameter vector. In the learning process, the Bellman residual method is adopted to guarantee the convergence of the algorithm, and a new rule for updating vectors of action-value functions is adopted to solve unstable and slow convergence problems. To accelerate the convergence rate of the algorithm, we introduce a new factor, named the forgotten factor, which can help improve the robustness of the algorithm’s performance. Based on two classical reinforcement learning benchmark problems, the experimental results show that the FARS algorithm has better performance than other related reinforcement learning algorithms.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fundamental Design Principles for Reinforcement Learning Algorithms

Efficient reinforcement learning in continuous state and action spaces with Dyna and policy approximation

Article 13 February 2018

Least-Squares Reinforcement Learning Methods

References

Sutton, R.S., Barto, A.G.: Reinforcement learning: an introduction. MIT press, Cambridge (1998)
MATH Google Scholar
Liu, Q., Fu, Q.M., Gong, S.R., Fu, Y.C., Cui, Z.M.: Reinforcement learning algorithm based on minimum state method and average reward. J. Commun. 32(1), 66–71 (2011)
Google Scholar
Sutton, R.S.: Learning to predict by the method of temporal differences. Mach. Learn. 3, 9–44 (1988)
Google Scholar
Go, C.K., Lao, B., Yoshimoto J., et al.: A reinforcement learning approach to the shepherding task using Sarsa. In: Proceedings of International Joint Conference on Neural Networks (IJCNN), Kuala Lumpur, Malaysia (2016)
Chettibi, S., Chikhi, S.: Dynamic fuzzy logic and reinforcement learning for adaptive energy efficient routing in mobile ad-hoc networks. Appl. Soft Comput. 38, 321–328 (2016)
Article Google Scholar
Ortiz, A., Al-Shatri, H., Li, X., et al.: Reinforcement learning for energy harvesting point-to-point communications. In: Proceedings of IEEE International Conference on Communications (ICC), Kuala Lumpur, Malaysia (2016)
Saadatjou, F., Derhami, V., Majd, V.: Balance of exploration and exploitation in deterministic and stochastic environment in reinforcement learning. In: Proceedings of the 11th Annual Computer Society of Iran Computer Conference, Tehran, Iran (2006)
Yen, G., Yang, F., Hickey, T.: Coordination of exploration and exploitation in a dynamic environment. Int. J. Smart Eng. Syst. Des. 4(3), 177–182 (2002)
Article Google Scholar
Derhami, V., Majd, V.J., Ahmadabadi, M.N.: Exploration and exploitation balance management in fuzzy reinforcement learning. Fuzzy Sets Syst. 161(4), 578–595 (2010)
Article MathSciNet Google Scholar
You, S.H., Liu, Q., Fu, Q.M., et al.: A Bayesian Sarsa learning algorithm with bandit-based method. In: Proceedings of International Conference on Neural Information Processing (2015)
Liu, Q., Li, J., Fu, Q.M.: A multiple-goal Sarsa($\lambda )$ algorithm based on lost reward of greatest mass. J. Electron. 41(8), 1469–1473 (2013)
Google Scholar
Xiao, F., Liu, Q., Fu, Q.M.: Gradient descent Sarsa($\lambda )$ algorithm based on the adaptive potential function shaping reward mechanism. J. Commun. 1, 77–88 (2013)
Google Scholar
Fu, Q.M., Liu, Q., You, S.H.: A novel fast Sarsa algorithm based on value function transfer. J. Electron. 42(11), 2157–2161 (2014)
Google Scholar
Zhu, H., Zhu, F., Fu, Y., et al.: A kernel-based Sarsa($\lambda )$ algorithm with clustering-based sample sparsification. In: Proceedings of International Conference on Neural Information Processing, Kyoto, Japan (2016)
Antos, A., Szepesvari, C., Mounos, R.: Learning near-optimal polices with bellman-residual minimization based fitted policy iteration and a single sample path. Mach. Learn. 71(1), 89–129 (2008)
Article Google Scholar
Busoniu, L., Babuska, R., De Schutter, B., et al.: Reinforcement learning and dynamic programming using function approximators. CRC Press, New York (2010)
Google Scholar
Indyk, P., Motwani, R.: Approximate nearest neighbors: towards removing the curse of dimensionality. In: Proceedings of the 30th annual ACM symposium on Theory of computing, New York, USA (1998)
Geist, M., Pietquin, O.: Parametric value function approximation. In: Proceedings of the IEEE Symposium on Adaptive Dynamic Programming and Reinforcement Learning, Paris, France (2011)
Akimoto, Y., Auger, A., Hansen N.: Comparison-based natural gradient optimization in high dimension. In: Proceedings of the Annual Conference on Genetic and Evolutionary Computation. Vancouver, Canada (2014)
Sutton, R.S., Maei, H.R., Szepesvári, C. et al.: A convergent O(n) Temporal-difference algorithm for Off-policy learning with linear function approximation. In: Proceedings of the Advances Neural Information Processing Systems, Vancouver, Canada (2009)
Sutton, R.S., Hamid, R.M., Precup, D.: Fast gradient-descent methods for temporal-difference learning with linear function approximation. In: Proceedings of the 26th International Conference on Machine Learning, New York, USA (2009)
Maei, H.R, Szepesvari, C., Bhatnagar, S. et al.: Toward off-policy learning control with function approixamtion. In: Proceedings of the 27th International Conference on Machine Learning, Haifa, Israel (2010)
Kalyanakrish, S., Stone, P.: Characterizing reinforcement learning methods through parameterized learning problems. Mach. Learn. 84(1–2), 205–247 (2011)
Article MathSciNet Google Scholar
Jaśkowski, W., Szubert, M., Liskowski, P. et al.: High-dimensional function approximation for knowledge-free reinforcement learning: a case study in SZ-Tetris. In: Proceedings of the Annual Conference on Genetic and Evolutionary Computation, New York, USA (2015)
Van Seijen, H.: Effective multi-step temporal-difference learning for non-linear function approximation. arXiv preprint arXiv:1608.05151 (2016)
Veeriah, V., Van Seijen, H., Sutton, R.S.: Forward actor-Critic for nonlinear function approximation in reinforcement learning. In: Proceedings of the 16th Conference on Autonomous Agents and MultiAgent Systems, São Paulo, Brazil (2017)
Singh, S., Jaakkola, T., Littman, M.L., et al.: Convergence results for single-step on-policy reinforcement-learning algorithms. Mach. Learn. 38(3), 287–308 (2000)
Article MATH Google Scholar
Barnard, E.: Temporal-difference methods and Markov models. IEEE Trans. Syst. Man Cybern. 23(2), 357–365 (1993)
Article MATH Google Scholar

Download references

Acknowledgements

This research was partially supported National Natural Science Foundation of China (61672371, 61602334, 61502329, 61502323, 61272005, 61303108, 61373094, 61472262), Natural Science Foundation of Jiangsu (BK20140283, BK2012616), High School Natural Foundation of Jiangsu (13KJB520020), Fundation of Ministry of Housing and Urban-Rural Development of the People’s Republic of China (2015-K1-047), Key Laboratory of Symbolic Computation and Knowledge Engineering of Ministry of Education, Jilin University (93K172014K04), Suzhou Industrial application of basic research program part (SYG201422). We declare that there is no conflict of interest regarding the publication of this article.

Author information

Authors and Affiliations

Institute of Electronics and Information Engineering, Suzhou University of Science and Technology, Suzhou, 215009, Jiangsu, China
Fu Qiming, Hu Wen, Luo Heng, Hu Lingyao & Chen Jianping
Jiangsu Key Laboratory of Intelligent Building Energy Efficiency, Suzhou University of Science and Technology, Suzhou, 215009, Jiangsu, China
Fu Qiming, Hu Wen, Luo Heng, Hu Lingyao & Chen Jianping
Suzhou Key Laboratory of Mobile Networking and Applied Technologies, Suzhou University of Science and Technology, Suzhou, 215009, Jiangsu, China
Fu Qiming, Hu Wen, Luo Heng, Hu Lingyao & Chen Jianping
School of Computer Science and Technology, Soochow University, Suzhou, 215000, Jiangsu, China
Liu Quan
Key Laboratory of Symbolic Computation and Knowledge Engineering of Ministry of Education, Jilin University, Changchun, 130012, Jilin, China
Fu Qiming & Liu Quan
Collaborative Innovation Center of Novel Software Technology and Industrialization, Nanjing, 210000, Jiangsu, China
Liu Quan

Authors

Fu Qiming
View author publications
You can also search for this author inPubMed Google Scholar
Hu Wen
View author publications
You can also search for this author inPubMed Google Scholar
Liu Quan
View author publications
You can also search for this author inPubMed Google Scholar
Luo Heng
View author publications
You can also search for this author inPubMed Google Scholar
Hu Lingyao
View author publications
You can also search for this author inPubMed Google Scholar
Chen Jianping
View author publications
You can also search for this author inPubMed Google Scholar

Corresponding author

Correspondence to Chen Jianping.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Qiming, F., Wen, H., Quan, L. et al. Residual Sarsa algorithm with function approximation. Cluster Comput 22 (Suppl 1), 795–807 (2019). https://doi.org/10.1007/s10586-017-1303-8

Download citation

Received: 18 July 2017
Revised: 17 October 2017
Accepted: 28 October 2017
Published: 10 November 2017
Issue Date: 16 January 2019
DOI: https://doi.org/10.1007/s10586-017-1303-8

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Residual Sarsa algorithm with function approximation

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Fundamental Design Principles for Reinforcement Learning Algorithms

Efficient reinforcement learning in continuous state and action spaces with Dyna and policy approximation

Least-Squares Reinforcement Learning Methods

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now