Skip to main content

Hybrid Temporal-Difference Algorithm Using Sliding Mode Control and Sigmoid Function

  • Conference paper
  • First Online:
PRICAI 2016: Trends in Artificial Intelligence (PRICAI 2016)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 9810))

Included in the following conference series:

  • 2556 Accesses

Abstract

Gradient temporal-different algorithms such as GTD2 and TDC have improved the accuracy of the algorithm to a new level. Unfortunately, these algorithms converge much slower than conventional temporal-different algorithms. In this paper, we present a approach based on sliding mode control to speed up the GTD2 algorithm, and then use sigmoid function to reduce algorithm’s jitter. Our experiments on random walk show that our algorithm converges as fast as conventional temporal-different algorithms and as accurate as GTD2 algorithm at the same time. This is an important property for online-learning tasks.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Sutton, R.S., Maei, H.R., Precup, D., Bhatnagar, S., Silver, D., Szepesvri, C., Eric Wiewiora: Fast gradient-descent methods for temporal-difference learning with linear function approximation. In: Danyluk Et, pp. 993–1000 (2009)

    Google Scholar 

  2. Li, Y., Schuurmans, D.: MapReduce for parallel reinforcement learning. In: Sanner, S., Hutter, M. (eds.) EWRL 2011. LNCS, vol. 7188, pp. 309–320. Springer, Heidelberg (2012)

    Chapter  Google Scholar 

  3. Van Hasselt, H., Guez, A., Silver, D.: Deep reinforcement learning with double q-learning. Comput. Sci. (2015)

    Google Scholar 

  4. Dabney, W., Thomas, P.: Natural temporal difference learning. In: Twenty-Eighth AAAI Conference on Artificial Intelligence (2014)

    Google Scholar 

  5. Harm Van Seijen, A., Mahmood, R., Pilarski, P.M., Machado, M.C., Sutton, R.S.: True online temporal-difference learning (2015)

    Google Scholar 

  6. Geramifard, A., Bowling, M., Zinkevich, M., Sutton, R.S.: ilstd: Eligibility traces and convergence analysis. In: Advances in Neural Information Processing Systems 19: Proceedings of the 2006 Conference, vol. 19, p. 441. MIT Press (2007)

    Google Scholar 

  7. Prashanth, L.A., Korda, N., Munos, R.: Fast LSTD using stochastic approximation: finite time analysis and application to traffic control. In: Calders, T., Esposito, F., Hüllermeier, E., Meo, R. (eds.) ECML PKDD 2014, Part II. LNCS, vol. 8275, pp. 66–81. Springer, Heidelberg (2014)

    Google Scholar 

  8. Sabanovic, A.: Variable structure systems with sliding modes in motion controla survey. IEEE Trans. Ind. Inform. 2(7), 212–223 (2011)

    Article  Google Scholar 

  9. Konidaris, G., Niekum, S., Thomas, P.S.: Td: Re-evaluating complex backups in temporal difference learning. In: Advances in Neural Information Processing Systems, pp. 2402–2410 (2011)

    Google Scholar 

  10. Sutton, S.R., Barto, G.A.: Reinforcement learning : an introduction. IEEE Trans. Neural Netw. 9(5), 1054 (1998)

    Article  Google Scholar 

  11. Boyan, J.A.: Least-squares temporal difference learning. In: Proceedings of the Sixteenth International Conference on Machine Learning, pp. 49–56 (1999)

    Google Scholar 

Download references

Acknowledgment

This research was supported and partially sponsored by the National Natural Science Foundation of China Grant No. 61202218.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Ke Xu .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2016 Springer International Publishing Switzerland

About this paper

Cite this paper

Xu, K., Wu, F. (2016). Hybrid Temporal-Difference Algorithm Using Sliding Mode Control and Sigmoid Function. In: Booth, R., Zhang, ML. (eds) PRICAI 2016: Trends in Artificial Intelligence. PRICAI 2016. Lecture Notes in Computer Science(), vol 9810. Springer, Cham. https://doi.org/10.1007/978-3-319-42911-3_51

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-42911-3_51

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-42910-6

  • Online ISBN: 978-3-319-42911-3

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics