Abstract
Gradient temporal-different algorithms such as GTD2 and TDC have improved the accuracy of the algorithm to a new level. Unfortunately, these algorithms converge much slower than conventional temporal-different algorithms. In this paper, we present a approach based on sliding mode control to speed up the GTD2 algorithm, and then use sigmoid function to reduce algorithm’s jitter. Our experiments on random walk show that our algorithm converges as fast as conventional temporal-different algorithms and as accurate as GTD2 algorithm at the same time. This is an important property for online-learning tasks.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Sutton, R.S., Maei, H.R., Precup, D., Bhatnagar, S., Silver, D., Szepesvri, C., Eric Wiewiora: Fast gradient-descent methods for temporal-difference learning with linear function approximation. In: Danyluk Et, pp. 993–1000 (2009)
Li, Y., Schuurmans, D.: MapReduce for parallel reinforcement learning. In: Sanner, S., Hutter, M. (eds.) EWRL 2011. LNCS, vol. 7188, pp. 309–320. Springer, Heidelberg (2012)
Van Hasselt, H., Guez, A., Silver, D.: Deep reinforcement learning with double q-learning. Comput. Sci. (2015)
Dabney, W., Thomas, P.: Natural temporal difference learning. In: Twenty-Eighth AAAI Conference on Artificial Intelligence (2014)
Harm Van Seijen, A., Mahmood, R., Pilarski, P.M., Machado, M.C., Sutton, R.S.: True online temporal-difference learning (2015)
Geramifard, A., Bowling, M., Zinkevich, M., Sutton, R.S.: ilstd: Eligibility traces and convergence analysis. In: Advances in Neural Information Processing Systems 19: Proceedings of the 2006 Conference, vol. 19, p. 441. MIT Press (2007)
Prashanth, L.A., Korda, N., Munos, R.: Fast LSTD using stochastic approximation: finite time analysis and application to traffic control. In: Calders, T., Esposito, F., Hüllermeier, E., Meo, R. (eds.) ECML PKDD 2014, Part II. LNCS, vol. 8275, pp. 66–81. Springer, Heidelberg (2014)
Sabanovic, A.: Variable structure systems with sliding modes in motion controla survey. IEEE Trans. Ind. Inform. 2(7), 212–223 (2011)
Konidaris, G., Niekum, S., Thomas, P.S.: Td: Re-evaluating complex backups in temporal difference learning. In: Advances in Neural Information Processing Systems, pp. 2402–2410 (2011)
Sutton, S.R., Barto, G.A.: Reinforcement learning : an introduction. IEEE Trans. Neural Netw. 9(5), 1054 (1998)
Boyan, J.A.: Least-squares temporal difference learning. In: Proceedings of the Sixteenth International Conference on Machine Learning, pp. 49–56 (1999)
Acknowledgment
This research was supported and partially sponsored by the National Natural Science Foundation of China Grant No. 61202218.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2016 Springer International Publishing Switzerland
About this paper
Cite this paper
Xu, K., Wu, F. (2016). Hybrid Temporal-Difference Algorithm Using Sliding Mode Control and Sigmoid Function. In: Booth, R., Zhang, ML. (eds) PRICAI 2016: Trends in Artificial Intelligence. PRICAI 2016. Lecture Notes in Computer Science(), vol 9810. Springer, Cham. https://doi.org/10.1007/978-3-319-42911-3_51
Download citation
DOI: https://doi.org/10.1007/978-3-319-42911-3_51
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-42910-6
Online ISBN: 978-3-319-42911-3
eBook Packages: Computer ScienceComputer Science (R0)