Hybrid Temporal-Difference Algorithm Using Sliding Mode Control and Sigmoid Function

Xu, Ke; Wu, Fengge

doi:10.1007/978-3-319-42911-3_51

Ke Xu¹⁵ &
Fengge Wu¹⁵

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 9810))

Included in the following conference series:

Pacific Rim International Conference on Artificial Intelligence

2556 Accesses

Abstract

Gradient temporal-different algorithms such as GTD2 and TDC have improved the accuracy of the algorithm to a new level. Unfortunately, these algorithms converge much slower than conventional temporal-different algorithms. In this paper, we present a approach based on sliding mode control to speed up the GTD2 algorithm, and then use sigmoid function to reduce algorithm’s jitter. Our experiments on random walk show that our algorithm converges as fast as conventional temporal-different algorithms and as accurate as GTD2 algorithm at the same time. This is an important property for online-learning tasks.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Sutton, R.S., Maei, H.R., Precup, D., Bhatnagar, S., Silver, D., Szepesvri, C., Eric Wiewiora: Fast gradient-descent methods for temporal-difference learning with linear function approximation. In: Danyluk Et, pp. 993–1000 (2009)
Google Scholar
Li, Y., Schuurmans, D.: MapReduce for parallel reinforcement learning. In: Sanner, S., Hutter, M. (eds.) EWRL 2011. LNCS, vol. 7188, pp. 309–320. Springer, Heidelberg (2012)
Chapter Google Scholar
Van Hasselt, H., Guez, A., Silver, D.: Deep reinforcement learning with double q-learning. Comput. Sci. (2015)
Google Scholar
Dabney, W., Thomas, P.: Natural temporal difference learning. In: Twenty-Eighth AAAI Conference on Artificial Intelligence (2014)
Google Scholar
Harm Van Seijen, A., Mahmood, R., Pilarski, P.M., Machado, M.C., Sutton, R.S.: True online temporal-difference learning (2015)
Google Scholar
Geramifard, A., Bowling, M., Zinkevich, M., Sutton, R.S.: ilstd: Eligibility traces and convergence analysis. In: Advances in Neural Information Processing Systems 19: Proceedings of the 2006 Conference, vol. 19, p. 441. MIT Press (2007)
Google Scholar
Prashanth, L.A., Korda, N., Munos, R.: Fast LSTD using stochastic approximation: finite time analysis and application to traffic control. In: Calders, T., Esposito, F., Hüllermeier, E., Meo, R. (eds.) ECML PKDD 2014, Part II. LNCS, vol. 8275, pp. 66–81. Springer, Heidelberg (2014)
Google Scholar
Sabanovic, A.: Variable structure systems with sliding modes in motion controla survey. IEEE Trans. Ind. Inform. 2(7), 212–223 (2011)
Article Google Scholar
Konidaris, G., Niekum, S., Thomas, P.S.: Td: Re-evaluating complex backups in temporal difference learning. In: Advances in Neural Information Processing Systems, pp. 2402–2410 (2011)
Google Scholar
Sutton, S.R., Barto, G.A.: Reinforcement learning : an introduction. IEEE Trans. Neural Netw. 9(5), 1054 (1998)
Article Google Scholar
Boyan, J.A.: Least-squares temporal difference learning. In: Proceedings of the Sixteenth International Conference on Machine Learning, pp. 49–56 (1999)
Google Scholar

Download references

Acknowledgment

This research was supported and partially sponsored by the National Natural Science Foundation of China Grant No. 61202218.

Author information

Authors and Affiliations

Science and Technology on Integrated Information System Laboratory, Institute of Software Chinese Academy of Sciences, Beijing, China
Ke Xu & Fengge Wu

Authors

Ke Xu
View author publications
You can also search for this author in PubMed Google Scholar
Fengge Wu
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Ke Xu .

Editor information

Editors and Affiliations

Cardiff University, Cardiff, United Kingdom
Richard Booth
Southeast University , Nanjing, China
Min-Ling Zhang

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Xu, K., Wu, F. (2016). Hybrid Temporal-Difference Algorithm Using Sliding Mode Control and Sigmoid Function. In: Booth, R., Zhang, ML. (eds) PRICAI 2016: Trends in Artificial Intelligence. PRICAI 2016. Lecture Notes in Computer Science(), vol 9810. Springer, Cham. https://doi.org/10.1007/978-3-319-42911-3_51

Download citation

DOI: https://doi.org/10.1007/978-3-319-42911-3_51
Published: 10 August 2016
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-42910-6
Online ISBN: 978-3-319-42911-3
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics