Reinforcement learning of dynamic behavior by using recurrent neural networks

Onat, Ahmet; Kita, Hajime; Nishikawa, Yoshikazu

doi:10.1007/BF02471125

Reinforcement learning of dynamic behavior by using recurrent neural networks

Original Paper
Published: September 1997

Volume 1, pages 117–121, (1997)
Cite this article

Artificial Life and Robotics Aims and scope Submit manuscript

Ahmet Onat¹,
Hajime Kita² &
Yoshikazu Nishikawa³

111 Accesses
6 Citations
Explore all metrics

Abstract

Reinforcement learning is a learning scheme for finding the optimal policy to control a system, based on a scalar signal representing a reward or a punishment. If the observation of the system by the controller is sufficiently rich to represent the internal state of the system, the controller can achieve the optimal policy simply by learning reactive behavior. However, if the state of the controlled system cannot be assessed completely using current sensory observations, the controller must learn a dynamic behavior to achieve the optimal policy.

In this paper, we propose a dynamic controller scheme which utilizes memory to uncover hidden states by using information about past system outputs, and makes control decisions using memory. This scheme integrates Q-learning, as proposed by Watkins, and recurrent neural networks of several types. It performs favorably in simulations which involve a task with hidden states.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Dynamic Systems Simulation and Control Using Consecutive Recurrent Neural Networks

Reinforcement Learning with Neural Networks: A Survey

Inferring Adaptive Goal-Directed Behavior Within Recurrent Neural Networks

References

Sutton RS (1988) Learning to predict by the methods of temporal differences. Mach Learn 3:9–44
Google Scholar
Watkins C, Dayan P (1992)Q-learning. Mach Learn 8:279–292
MATH Google Scholar
Barto AG, Bradtke SJ, Singh SP (1993) Learning to act using real-time dynamic programming. Technical Report, Department of Computer Science, University of Massachusetts
Werbos P (1990) A menu of designs for reinforcement learning over time. Neural networks for control. MIT Press, Cambridge, pp. 67–95
Google Scholar
Whitehead SD, Ballard DH (1990). Active perception and reinforcement learning. Neural Comput 2:409–419
Google Scholar
Whitehead SD, Ballard DH (1991) Learning to perceive and act by trial and error. Mach Learn 7:45–83
Google Scholar
Tan M (1991) Cost-sensitive reinforcement learning for adaptive classification and control. Proceedings 9th National Conference on Artificial Intelligence 2:774–780
Google Scholar
Chrisman L (1992) Reinforcement learning with perceptual aliasing: The perceptual distinctions approach. Proceedings AAAI'92, pp 183–188
McCallum A (1995) Hidden state and reinforcement learning with instance-based state identification. University of Rochester, Technical Report 502
Littman ML (1994) Memoryless policies: Theoretical limitations and practical results. Proceedings Third International Conference on Simulation of Adaptive Behavior (Animats 3), pp 238–245
Onat A, Kita H, Nishikawa Y (1995) Reinforcement learning of dynamic behavior by using recurrent neural networks. Proceedings World Conference on Neural Networks 2:342–345
Google Scholar
Elman JL (1990) Finding structure in time. Cognit Sci 14:179–211
Article Google Scholar
Noda I (1994) Neural networks that learn symbolic and structured representation of information. Researches of the Electrotechnical Laboratory No. 968. Agency of Industrial Science and Technology, Japan
Google Scholar
Lin L (1992) Self-improving reactive agents based on reinforcement learning, planning and teaching. Mach Learn 8:293–321
Google Scholar
Werbos P (1974) Beyond regression: New tools for prediction and analysis in the behavioral sciences. PhD Thesis. Harvard University
Nishikawa Y, Kitamura S (eds) (1995) Neural networks as applied to measurement and control (in Japanese). Asakura, Japan
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Electrical Engineering, Graduate School of Kyoto University, Yoshida Honmachi, Sakyo, 606-01, Kyoto, Japan
Ahmet Onat
Interdisciplinary Graduate School of Science and Engineering, Tokyo Institute of Technology, Tokyo, Japan
Hajime Kita
Faculty of Information Science, Osaka Institute of Technology, Osaka, Japan
Yoshikazu Nishikawa

Authors

Ahmet Onat
View author publications
You can also search for this author in PubMed Google Scholar
Hajime Kita
View author publications
You can also search for this author in PubMed Google Scholar
Yoshikazu Nishikawa
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Ahmet Onat.

About this article

Cite this article

Onat, A., Kita, H. & Nishikawa, Y. Reinforcement learning of dynamic behavior by using recurrent neural networks. Artificial Life and Robotics 1, 117–121 (1997). https://doi.org/10.1007/BF02471125

Download citation

Received: 20 May 1997
Accepted: 20 June 1997
Issue Date: September 1997
DOI: https://doi.org/10.1007/BF02471125

Key words

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Reinforcement learning of dynamic behavior by using recurrent neural networks

Abstract

Access this article

Similar content being viewed by others

Dynamic Systems Simulation and Control Using Consecutive Recurrent Neural Networks

Reinforcement Learning with Neural Networks: A Survey

Inferring Adaptive Goal-Directed Behavior Within Recurrent Neural Networks

References

Author information

Authors and Affiliations

Corresponding author

About this article

Cite this article

Key words

Navigation

Reinforcement learning of dynamic behavior by using recurrent neural networks

Abstract

Access this article

Similar content being viewed by others

Dynamic Systems Simulation and Control Using Consecutive Recurrent Neural Networks

Reinforcement Learning with Neural Networks: A Survey

Inferring Adaptive Goal-Directed Behavior Within Recurrent Neural Networks

References

Author information

Authors and Affiliations

Corresponding author

About this article

Cite this article

Share this article

Key words

Search

Navigation