Skip to main content

Q-Learning in Continuous State and Action Spaces

  • Conference paper

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 1747))

Abstract

Q-learning can be used to learn a control policy that maximises a scalar reward through interaction with the environment. Q-learning is commonly applied to problems with discrete states and actions. We describe a method suitable for control tasks which require continuous actions, in response to continuous states. The system consists of a neural network coupled with a novel interpolator. Simulation results are presented for a non-holonomic control task. Advantage Learning, a variation of Q-learning, is shown enhance learning speed and reliability for this task.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. J. S. Albus. A new approach to manipulator control: the cerebrellar model articulated controller (CMAC). J. Dynamic Systems, Measurement and Control, 97:220–227, 1975.

    MATH  Google Scholar 

  2. Leemon C. Baird and A. Harry Klopf. Reinforcement learning with high-dimensional, continuous actions. Technical Report WL-TR-93-1147, Wright Laboratory, 1993.

    Google Scholar 

  3. W. Baker and J. Farrel. An introduction to connectionist learning control systems. In D. A. White and D. A. Sofge, editors, Handbook of Intelligent Control: Neural, Fuzzy, and Adaptive Approaches. Van Nostrand Reinhold, 1992.

    Google Scholar 

  4. A. G. Barto, R. S. Sutton, and C. W. Anderson. Neuronlike adaptive elements that can solve difficult learning control problems. IEEE Trans on systems, man and cybernetics, SMC-13:834–846, 1983.

    Google Scholar 

  5. Chris Gaskett, David Wettergreen, and Alexander Zelinsky. Reinforcement learning applied to the control of an autonomous underwater vehicle. In Proceedings of the Australian Conference on Robotics and Automation (AuCRA99), 1999.

    Google Scholar 

  6. H.-M. Gross, V. Stephan, and M. Krabbes. A neural field approach to topological reinforcement learning in continuous action spaces. In Proc. 1998 IEEE World Congress on Computational Intelligence, WCCI’98 and International Joint Conference on Neural Networks, IJCNN’98, Anchorage, Alaska, 1998.

    Google Scholar 

  7. Mance E. Harmon and Leemon C. Baird. Residual advantage learning applied to a differential game. In Proceedings of the International Conference on Neural Networks, Washington D.C, 1995.

    Google Scholar 

  8. T. Kohonen. Self-Organization and Associative Memory. Springer, Berlin, third edition, 1989.

    Google Scholar 

  9. Peter Lancaster and Kęstutis Šalkauskas. Curve and Surface Fitting, an Introduction. Academic Press, 1986.

    Google Scholar 

  10. Long-Ji Lin. Self-improving reactive agents based on reinforcement learning, planning and teaching. Machine Learning Journal, 8(3/4), 1992.

    Google Scholar 

  11. Gavin Adrian Rummery. Problem solving with reinforcement learning. PhD thesis, Cambridge University, 1995.

    Google Scholar 

  12. Juan C. Santamaria, Richard S. Sutton, and Ashwin Ram. Experiments with reinforcement learning in problems with continuous state and action spaces. Adaptive Behaviour, 6(2):163–218, 1998.

    Article  Google Scholar 

  13. Juan Miguel Santos. Contribution to the study and design of reinforcement functions. PhD thesis, Universidad de Buenos Aires, Universite d’Aix-Marseille III, 1999.

    Google Scholar 

  14. Richard S. Sutton and Andrew G. Barto. Reinforcement Learning: An Introduction. Bradford Books, MIT, 1998.

    Google Scholar 

  15. Claude F. Touzet. Neural reinforcement learning for behaviour synthesis. Robotics and Autonomous Systems, 22(3–4):251–81, 1997.

    Article  Google Scholar 

  16. Christopher J. C. H. Watkins. Learning from Delayed Rewards. PhD thesis, University of Cambridge, 1989.

    Google Scholar 

  17. Paul J. Werbos. Approximate dynamic programming for real-time control and neural modeling. In D. A. White and D. A. Sofge, editors, Handbook of Intelligent Control: Neural, Fuzzy, and Adaptive Approaches. Van Nostrand Reinhold, 1992.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 1999 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Gaskett, C., Wettergreen, D., Zelinsky, A. (1999). Q-Learning in Continuous State and Action Spaces. In: Foo, N. (eds) Advanced Topics in Artificial Intelligence. AI 1999. Lecture Notes in Computer Science(), vol 1747. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-46695-9_35

Download citation

  • DOI: https://doi.org/10.1007/3-540-46695-9_35

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-66822-0

  • Online ISBN: 978-3-540-46695-6

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics