Abstract
Reinforcement learning is a learning scheme for finding the optimal policy to control a system, based on a scalar signal representing a reward or a punishment. If the observation of the system by the controller is sufficiently rich to represent the internal state of the system, the controller can achieve the optimal policy simply by learning reactive behavior. However, if the state of the controlled system cannot be assessed completely using current sensory observations, the controller must learn a dynamic behavior to achieve the optimal policy.
In this paper, we propose a dynamic controller scheme which utilizes memory to uncover hidden states by using information about past system outputs, and makes control decisions using memory. This scheme integrates Q-learning, as proposed by Watkins, and recurrent neural networks of several types. It performs favorably in simulations which involve a task with hidden states.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.References
Sutton RS (1988) Learning to predict by the methods of temporal differences. Mach Learn 3:9–44
Watkins C, Dayan P (1992)Q-learning. Mach Learn 8:279–292
Barto AG, Bradtke SJ, Singh SP (1993) Learning to act using real-time dynamic programming. Technical Report, Department of Computer Science, University of Massachusetts
Werbos P (1990) A menu of designs for reinforcement learning over time. Neural networks for control. MIT Press, Cambridge, pp. 67–95
Whitehead SD, Ballard DH (1990). Active perception and reinforcement learning. Neural Comput 2:409–419
Whitehead SD, Ballard DH (1991) Learning to perceive and act by trial and error. Mach Learn 7:45–83
Tan M (1991) Cost-sensitive reinforcement learning for adaptive classification and control. Proceedings 9th National Conference on Artificial Intelligence 2:774–780
Chrisman L (1992) Reinforcement learning with perceptual aliasing: The perceptual distinctions approach. Proceedings AAAI'92, pp 183–188
McCallum A (1995) Hidden state and reinforcement learning with instance-based state identification. University of Rochester, Technical Report 502
Littman ML (1994) Memoryless policies: Theoretical limitations and practical results. Proceedings Third International Conference on Simulation of Adaptive Behavior (Animats 3), pp 238–245
Onat A, Kita H, Nishikawa Y (1995) Reinforcement learning of dynamic behavior by using recurrent neural networks. Proceedings World Conference on Neural Networks 2:342–345
Elman JL (1990) Finding structure in time. Cognit Sci 14:179–211
Noda I (1994) Neural networks that learn symbolic and structured representation of information. Researches of the Electrotechnical Laboratory No. 968. Agency of Industrial Science and Technology, Japan
Lin L (1992) Self-improving reactive agents based on reinforcement learning, planning and teaching. Mach Learn 8:293–321
Werbos P (1974) Beyond regression: New tools for prediction and analysis in the behavioral sciences. PhD Thesis. Harvard University
Nishikawa Y, Kitamura S (eds) (1995) Neural networks as applied to measurement and control (in Japanese). Asakura, Japan
Author information
Authors and Affiliations
Corresponding author
About this article
Cite this article
Onat, A., Kita, H. & Nishikawa, Y. Reinforcement learning of dynamic behavior by using recurrent neural networks. Artificial Life and Robotics 1, 117–121 (1997). https://doi.org/10.1007/BF02471125
Received:
Accepted:
Issue Date:
DOI: https://doi.org/10.1007/BF02471125