Speeding up Q(λ)-learning

Wiering, Marco; Schmidhuber, Jürgen

doi:10.1007/BFb0026706

Marco Wiering¹ &
Jürgen Schmidhuber¹

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 1398))

Included in the following conference series:

European Conference on Machine Learning

745 Accesses
1 Altmetric

Abstract

Q(λ)-learning uses TD(λ)-methods to accelerate Q-Learning. The worst case complexity for a single update step of previous online Q(λ) implementations based on lookup-tables is bounded by the size of the state/action space. Our faster algorithm's worst case complexity is bounded by the number of actions. The algorithm is based on the observation that Q-value updates may be postponed until they are needed.

Download to read the full chapter text

Chapter PDF

Keywords

References

Albus, J. S. (1975). A new approach to manipulator control: The cerebellar model articulation controller (CMAC). Dynamic Systems, Measurement and Control, pages 220–227.
Google Scholar
Atkeson, C. G., Schaal, S., and Moore, A. W. (1997). Locally weighted learning. Artificial Intelligence Review, 11:11–73.
Article Google Scholar
Barto, A. G., Sutton, R. S., and Anderson, C. W. (1983). Neuronlike adaptive elements that can solve difficult learning control problems. IEEE Transactions on Systems, Man, and Cybernetics, SMC-13:834–846.
Google Scholar
Bertsekas, D. P. and Tsitsiklis, J. N. (1996). Neurodynamic Programming. Athena Scientific, Belmont, MA.
Google Scholar
Caironi, P. V. C. and Dorigo, M. (1994). Training Q-agents. Technical Report IRIDIA-94-14, Université Libre de Bruxelles.
Google Scholar
Cichosz, P. (1995). Truncating temporal differences: On the efficient implementation of TD(λ) for reinforcement learning. Journal on Artificial Intelligence, 2:287–318.
Google Scholar
Koenig, S. and Simmons, R. G. (1996). The effect of representation and knowledge on goal-directed exploration with reinforcement learning algorithms. Machine Learning, 22:228–250.
Google Scholar
Kohonen, T. (1988). Self-Organization and Associative Memory. Springer, second edition.
Google Scholar
Lin, L. (1993). Reinforcement Learning for Robots Using Neural Networks. PhD thesis, Carnegie Mellon University, Pittsburgh.
Google Scholar
Peng, J. and Williams, R. (1996). Incremental multi-step Q-learning. Machine Learning, 22:283–290.
Google Scholar
Rummery, G. and Niranjan, M. (1994). On-line Q-learning using connectionist sytems. Technical Report CUED/F-INFENG-TR 166, Cambridge University, UK.
Google Scholar
Singh, S. and Sutton, R. (1996). Reinforcement learning with replacing eligibility traces. Machine Learning, 22:123–158.
Google Scholar
Sutton, R. S. (1988). Learning to predict by the methods of temporal differences. Machine Learning, 3:9–44.
Google Scholar
Sutton, R. S. (1996). Generalization in reinforcement learning: Successful examples using sparse coarse coding. In D. S. Touretzky, M. C. M. and Hasselmo, M. E., editors, Advances in Neural Information Processing Systems 8, pages 1038–1045. MIT Press, Cambridge MA.
Google Scholar
Tesauro, G. (1992). Practical issues in temporal difference learning. In Lippman, D. S., Moody, J. E., and Touretzky, D. S., editors, Advances in Neural Information Processing Systems 4, pages 259–266. San Mateo, CA: Morgan Kaufmann.
Google Scholar
Thrun, S. (1992). Efficient exploration in reinforcement learning. Technical Report CMU-CS-92-102, Carnegie-Mellon University.
Google Scholar
Watkins, C. J. C. H. (1989). Learning from Delayed Rewards. PhD thesis, University of Cambridge, England.
Google Scholar
Watkins, C. J. C. H. and Dayan, P. (1992). Q-learning. Machine Learning, 8:279–292.
Google Scholar
Whitehead, S. (1992). Reinforcement Learning for the adaptive control of perception and action. PhD thesis, University of Rochester.
Google Scholar

Download references

Author information

Authors and Affiliations

IDSIA, Corso Elvezia 36, CH-6900, Lugano, Switzerland
Marco Wiering & Jürgen Schmidhuber

Authors

Marco Wiering
View author publications
You can also search for this author in PubMed Google Scholar
Jürgen Schmidhuber
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Claire Nédellec Céline Rouveirol

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Wiering, M., Schmidhuber, J. (1998). Speeding up Q(λ)-learning. In: Nédellec, C., Rouveirol, C. (eds) Machine Learning: ECML-98. ECML 1998. Lecture Notes in Computer Science, vol 1398. Springer, Berlin, Heidelberg. https://doi.org/10.1007/BFb0026706

Download citation

DOI: https://doi.org/10.1007/BFb0026706
Published: 16 June 2005
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-64417-0
Online ISBN: 978-3-540-69781-7
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics