Abstract
For the progress in developing human-like intelligence in robots, autonomous and purposive learning of adaptive memory function is significant. The combination of reinforcement learning (RL) and recurrent neural network (RNN) seems promising for it. However, it has not been applied to a continuous state-action space task, nor has its internal representations been analyzed in depth. In this paper, in a continuous state-action space task, it is shown that a robot learned to memorize necessary information and to behave appropriately according to it even though no special technique other than RL and RNN was utilized. Three types of hidden neurons that seemed to contribute to remembering the necessary information were observed. Furthermore, by manipulate them, the robot changed its behavior as if the memorized information was forgotten or swapped. That makes us feel a potential towards the emergence of higher functions in this very simple learning system.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
References
Lin, L., Mitchell, T.M.: Memory Approaches to Reinforcement Learning In Non-Markovian Domains. Technical Report CMU-CS-TR-92-138, CMU, Computer Science (1992)
Onat, A., Kita, H., Nishikawa, Y.: Recurrent Neural Networks for Reinforcement Learning: Architecture, Learning Algorithms and Internal Representation. In: Proc. of IJCNN 1998, pp. 2010–2015 (1998)
Mizutani, E., Dreyfus, S.E.: Totally Model-Free Reinforcement Learning by Actor-Critic Elman Network in Non-Markovian Domains. In: Proc. of IJCNN 1998, pp. 2016–2021 (1998)
Onat, A., Kita, H., Nishikawa, Y.: Q-Learning with Recurrent Neural Networks as a Controller for the Inverted Pendulum Problem. In: Proc. of ICONIP 1998, pp. 837–840 (1998)
Bakker, B., Linaker, F., Schmidhuber, J.: Reinforcement Learning in Partially Observable Mobile Robot Domains Using Unsupervised Event Extraction. In: Proc. of IROS 2002, vol. 1, pp. 938–943 (2002)
Bakker, B., Zhumatiy, V., Gruener, G., Schmidhuber, J.: A Robot that Reinforcement-Learns to Identify and Memorize Important Previous Observations. In: Proc. of IROS 2003, pp. 430–435 (2003)
Elman, J.L.: Finding structure in time. Cognitive Science 14, 179–211 (1990)
Barto, A.G., Sutton, R.S., Anderson, W.: Neuronlike adaptive elements can solve difficult learning control problems. IEEE Trans. on Systems, Man, and Cybernetics 13(5), 834–846 (1983)
Rumelhart, D.E., Hinton, G.E., Williams, R.J.: Learning internal representations by error propagation. In: Rumelhart, D.E., McClelland, J.L., Group, P.R. (eds.) Parallel distributed processing. MIT Press, Cambridge (1986)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2009 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Utsunomiya, H., Shibata, K. (2009). Contextual Behaviors and Internal Representations Acquired by Reinforcement Learning with a Recurrent Neural Network in a Continuous State and Action Space Task. In: Köppen, M., Kasabov, N., Coghill, G. (eds) Advances in Neuro-Information Processing. ICONIP 2008. Lecture Notes in Computer Science, vol 5507. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-03040-6_118
Download citation
DOI: https://doi.org/10.1007/978-3-642-03040-6_118
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-03039-0
Online ISBN: 978-3-642-03040-6
eBook Packages: Computer ScienceComputer Science (R0)