Q-Learning in Continuous State and Action Spaces

Gaskett, Chris; Wettergreen, David; Zelinsky, Alexander

doi:10.1007/3-540-46695-9_35

Q-Learning in Continuous State and Action Spaces

Chris Gaskett²,
David Wettergreen² &
Alexander Zelinsky²

Conference paper

1681 Accesses
45 Citations
4 Altmetric

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 1747))

Abstract

Q-learning can be used to learn a control policy that maximises a scalar reward through interaction with the environment. Q-learning is commonly applied to problems with discrete states and actions. We describe a method suitable for control tasks which require continuous actions, in response to continuous states. The system consists of a neural network coupled with a novel interpolator. Simulation results are presented for a non-holonomic control task. Advantage Learning, a variation of Q-learning, is shown enhance learning speed and reliability for this task.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

J. S. Albus. A new approach to manipulator control: the cerebrellar model articulated controller (CMAC). J. Dynamic Systems, Measurement and Control, 97:220–227, 1975.
MATH Google Scholar
Leemon C. Baird and A. Harry Klopf. Reinforcement learning with high-dimensional, continuous actions. Technical Report WL-TR-93-1147, Wright Laboratory, 1993.
Google Scholar
W. Baker and J. Farrel. An introduction to connectionist learning control systems. In D. A. White and D. A. Sofge, editors, Handbook of Intelligent Control: Neural, Fuzzy, and Adaptive Approaches. Van Nostrand Reinhold, 1992.
Google Scholar
A. G. Barto, R. S. Sutton, and C. W. Anderson. Neuronlike adaptive elements that can solve difficult learning control problems. IEEE Trans on systems, man and cybernetics, SMC-13:834–846, 1983.
Google Scholar
Chris Gaskett, David Wettergreen, and Alexander Zelinsky. Reinforcement learning applied to the control of an autonomous underwater vehicle. In Proceedings of the Australian Conference on Robotics and Automation (AuCRA99), 1999.
Google Scholar
H.-M. Gross, V. Stephan, and M. Krabbes. A neural field approach to topological reinforcement learning in continuous action spaces. In Proc. 1998 IEEE World Congress on Computational Intelligence, WCCI’98 and International Joint Conference on Neural Networks, IJCNN’98, Anchorage, Alaska, 1998.
Google Scholar
Mance E. Harmon and Leemon C. Baird. Residual advantage learning applied to a differential game. In Proceedings of the International Conference on Neural Networks, Washington D.C, 1995.
Google Scholar
T. Kohonen. Self-Organization and Associative Memory. Springer, Berlin, third edition, 1989.
Google Scholar
Peter Lancaster and Kęstutis Šalkauskas. Curve and Surface Fitting, an Introduction. Academic Press, 1986.
Google Scholar
Long-Ji Lin. Self-improving reactive agents based on reinforcement learning, planning and teaching. Machine Learning Journal, 8(3/4), 1992.
Google Scholar
Gavin Adrian Rummery. Problem solving with reinforcement learning. PhD thesis, Cambridge University, 1995.
Google Scholar
Juan C. Santamaria, Richard S. Sutton, and Ashwin Ram. Experiments with reinforcement learning in problems with continuous state and action spaces. Adaptive Behaviour, 6(2):163–218, 1998.
Article Google Scholar
Juan Miguel Santos. Contribution to the study and design of reinforcement functions. PhD thesis, Universidad de Buenos Aires, Universite d’Aix-Marseille III, 1999.
Google Scholar
Richard S. Sutton and Andrew G. Barto. Reinforcement Learning: An Introduction. Bradford Books, MIT, 1998.
Google Scholar
Claude F. Touzet. Neural reinforcement learning for behaviour synthesis. Robotics and Autonomous Systems, 22(3–4):251–81, 1997.
Article Google Scholar
Christopher J. C. H. Watkins. Learning from Delayed Rewards. PhD thesis, University of Cambridge, 1989.
Google Scholar
Paul J. Werbos. Approximate dynamic programming for real-time control and neural modeling. In D. A. White and D. A. Sofge, editors, Handbook of Intelligent Control: Neural, Fuzzy, and Adaptive Approaches. Van Nostrand Reinhold, 1992.
Google Scholar

Download references

Author information

Authors and Affiliations

Robotic Systems Laboratory, Department of Systems Engineering Research School of Information Sciences and Engineering, The Australian National University, Canberra, ACT, 0200, Australia
Chris Gaskett, David Wettergreen & Alexander Zelinsky

Authors

Chris Gaskett
View author publications
You can also search for this author in PubMed Google Scholar
David Wettergreen
View author publications
You can also search for this author in PubMed Google Scholar
Alexander Zelinsky
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

School of Computer Science and Engineering, University of New South Wales, Sydney, NSW, 2052, Australia
Norman Foo

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Gaskett, C., Wettergreen, D., Zelinsky, A. (1999). Q-Learning in Continuous State and Action Spaces. In: Foo, N. (eds) Advanced Topics in Artificial Intelligence. AI 1999. Lecture Notes in Computer Science(), vol 1747. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-46695-9_35

Download citation

DOI: https://doi.org/10.1007/3-540-46695-9_35
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-66822-0
Online ISBN: 978-3-540-46695-6
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics