Skip to main content
Log in

Reinforcement learning of dynamic behavior by using recurrent neural networks

  • Original Paper
  • Published:
Artificial Life and Robotics Aims and scope Submit manuscript

Abstract

Reinforcement learning is a learning scheme for finding the optimal policy to control a system, based on a scalar signal representing a reward or a punishment. If the observation of the system by the controller is sufficiently rich to represent the internal state of the system, the controller can achieve the optimal policy simply by learning reactive behavior. However, if the state of the controlled system cannot be assessed completely using current sensory observations, the controller must learn a dynamic behavior to achieve the optimal policy.

In this paper, we propose a dynamic controller scheme which utilizes memory to uncover hidden states by using information about past system outputs, and makes control decisions using memory. This scheme integrates Q-learning, as proposed by Watkins, and recurrent neural networks of several types. It performs favorably in simulations which involve a task with hidden states.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. Sutton RS (1988) Learning to predict by the methods of temporal differences. Mach Learn 3:9–44

    Google Scholar 

  2. Watkins C, Dayan P (1992)Q-learning. Mach Learn 8:279–292

    MATH  Google Scholar 

  3. Barto AG, Bradtke SJ, Singh SP (1993) Learning to act using real-time dynamic programming. Technical Report, Department of Computer Science, University of Massachusetts

  4. Werbos P (1990) A menu of designs for reinforcement learning over time. Neural networks for control. MIT Press, Cambridge, pp. 67–95

    Google Scholar 

  5. Whitehead SD, Ballard DH (1990). Active perception and reinforcement learning. Neural Comput 2:409–419

    Google Scholar 

  6. Whitehead SD, Ballard DH (1991) Learning to perceive and act by trial and error. Mach Learn 7:45–83

    Google Scholar 

  7. Tan M (1991) Cost-sensitive reinforcement learning for adaptive classification and control. Proceedings 9th National Conference on Artificial Intelligence 2:774–780

    Google Scholar 

  8. Chrisman L (1992) Reinforcement learning with perceptual aliasing: The perceptual distinctions approach. Proceedings AAAI'92, pp 183–188

  9. McCallum A (1995) Hidden state and reinforcement learning with instance-based state identification. University of Rochester, Technical Report 502

  10. Littman ML (1994) Memoryless policies: Theoretical limitations and practical results. Proceedings Third International Conference on Simulation of Adaptive Behavior (Animats 3), pp 238–245

  11. Onat A, Kita H, Nishikawa Y (1995) Reinforcement learning of dynamic behavior by using recurrent neural networks. Proceedings World Conference on Neural Networks 2:342–345

    Google Scholar 

  12. Elman JL (1990) Finding structure in time. Cognit Sci 14:179–211

    Article  Google Scholar 

  13. Noda I (1994) Neural networks that learn symbolic and structured representation of information. Researches of the Electrotechnical Laboratory No. 968. Agency of Industrial Science and Technology, Japan

    Google Scholar 

  14. Lin L (1992) Self-improving reactive agents based on reinforcement learning, planning and teaching. Mach Learn 8:293–321

    Google Scholar 

  15. Werbos P (1974) Beyond regression: New tools for prediction and analysis in the behavioral sciences. PhD Thesis. Harvard University

  16. Nishikawa Y, Kitamura S (eds) (1995) Neural networks as applied to measurement and control (in Japanese). Asakura, Japan

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Ahmet Onat.

About this article

Cite this article

Onat, A., Kita, H. & Nishikawa, Y. Reinforcement learning of dynamic behavior by using recurrent neural networks. Artificial Life and Robotics 1, 117–121 (1997). https://doi.org/10.1007/BF02471125

Download citation

  • Received:

  • Accepted:

  • Issue Date:

  • DOI: https://doi.org/10.1007/BF02471125

Key words

Navigation