Abstract
This paper presents work on using hierarchical long term memory to reduce the memory requirements of nearest sequence memory (NSM) learning, a previously published, instance-based reinforcement learning algorithm. A hierarchical memory representation reduces the memory requirements by allowing traces to share common sub-sequences. We present moderated mechanisms for estimating discounted future rewards and for dealing with hidden state using hierarchical memory. We also present an experimental analysis of how the sub-sequence length affects the memory compression achieved and show that the reduced memory requirements do not effect the speed of learning. Finally, we analyse and discuss the persistence of the sub-sequences independent of specific trace instances.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Barto, A.G., Mahadevan, S.: Recent advances in hierarchical reinforcement learning. Dicrete Event Dynamic Systems: Theory and Applications 13, 41–77 (2003)
Chang, C.H., Shibu, M., Xiao, R.: Salf organizing feature map for color quantization on FPGA. In: A.R. Omondi, J.C. Rajapakse (eds.) FPGA Implementations of Neural Networks. Springer (2006)
Dietterich, T.G.: Hierarchical reinforcement learning with the MAXQ value function decomposition. Journal of Artificial Intelligence Research 13, 227–303 (2000)
Digney, B.L.: Emergent hierarchical control structures: Learning reactive/hierarchical relationships in reinforcement learning. In: P. Maes, M. Matari´c, J.A. Meyer, J. Pollack, S.W. Wilson (eds.) From Animals to Animats 4 [Proceedings of the 4th International Conference on Simulation of Adaptive Behavior (SAB’06), Cape Cod, Massachusetts, September 9 - 13, 1996], pp. 363–372. MIT Press/Bradford Books (1996)
Fuster, J.M.: Cortex and Mind: Unifying Cognition. Oxford University Press, New York (2003)
Hengst, B.: Discovering hierarchy in reinforcement learning with HEXQ. In: C. Sammut, A.G. Hoffmann (eds.) Machine Learning [Proceedings of the 19th International Conference (ICML’02), Sydney, Australia, July 8 - 12, 2002], pp. 243–250. Morgan Kaufmann (2002)
Hernandez-Gardiol, N., Mahadevan, S.: Hierarchical memory-based reinforcement learning. In: T.K. Leen, T.G. Dietterich, V. Tresp (eds.) Advances in Neural Information Processing Sys-tems 13 [Proceedings of the NIPS Conference, Denver, Colorado, November 28 - 30, 2000], pp. 1047–1053. MIT Press (2001)
Ijspeert, A.J., Nakanishi, J., Schaal, S.:Movement imitation with nonlinear dynamical systems in humanoid robots. In: Proceedings of the 2002 IEEE International Conference on Robotics and Automation (ICRA’02), pp. 1398–1403. Washington, DC (2002)
Kaelbling, L.P., Littman, M.L., Cassandra, A.R.: Planning and acting in partially observable stochastic domains. Artificial Intelligence 101 (1998)
Kohonen, T.: Self-organizing maps, 3rd edn. Springer, New York (2001)
Konidaris, G.D., Hayes, G.M.: An architechture for behavior-based reinforcement learning. Adaptive Behavior 13(1), 5–32 (2005)
Maes, P.: How to do the right thing. Connection Science 1(3), 291–232 (1989)
McCallum, A.: Instance-based utile distinctions for reinforcement learning with hidden state. In: Proceedings of the 12th International Conference on Machine Learning (ICML’95), pp. 387–395. Tahoe City, California (1995)
McCallum, A.: Hidden state and reinforcement learning with instance-based state identification. IEEE Transactions on Systems,Man and Cybernetics, Part B: Cybernetics (Special Issue on Robot Learning) 26(3), 464–473 (1996)
McGovern, A.: Autonomous discovery of temporal abstractions from interactions with an environment. Ph.D. thesis, University of Massachusetts, Amherst, Amherst, Massachusetts (2002)
Moore, A.W., Atkeson, C.G.: Prioritized sweeping: Reinforcement learning with less data and less time. Machine Learning 13, 103–130 (1993)
Quartz, S.R.: Learning and brain development: A neural constructivist perspective. In: P.T. Quinlan (ed.) Connectionist Models of Development: Developmental Processes in Real and Artificial Neural Networks, pp. 279–310. Psychology Press (2003)
Schaal, S., Atkeson, C.G.: Robot juggling: Implementation of memory-based learning. Control Systems Magazine 14(1), 57–71 (1994)
Sun, R., Sessions, C.: Self-segmentation of sequences: Automatic formation of hierarchies of sequential behaviors. IEEE Transactions on Systems, Man and Cybernetics: Part B, Cybernetics 30(3), 403–418 (2000)
Sutton, R.S., Barto, A.G.: Reinforcement learning: an introduction. MIT Press, Cambridge, Massachusetts (1998)
Sutton, R.S., Precup, D., Singh, S.P.: Between MDPs ans semi-MDPs: A framework for temporal abstraction in reinforcement learning. Artificial Intelligence 112, 181–211 (1999)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2011 Springer-Verlag London Limited
About this paper
Cite this paper
Dahl, T.S. (2011). Hierarchical Traces for Reduced NSM Memory Requirements. In: Bramer, M., Petridis, M., Hopgood, A. (eds) Research and Development in Intelligent Systems XXVII. SGAI 2010. Springer, London. https://doi.org/10.1007/978-0-85729-130-1_12
Download citation
DOI: https://doi.org/10.1007/978-0-85729-130-1_12
Published:
Publisher Name: Springer, London
Print ISBN: 978-0-85729-129-5
Online ISBN: 978-0-85729-130-1
eBook Packages: Computer ScienceComputer Science (R0)