Abstract
This paper presents a system that transfers the results of prior learning to speed up reinforcement learning in a changing world. Often, even when the change to the world is relatively small an extensive relearning effort is required. The new system exploits strong features in the multi-dimensional function produced by reinforcement learning. The features generate a partitioning of the state space. The partition is represented as a graph. This is used to index and compose functions stored in a case base to form a close approximation to the solution of the new task. The experimental results investigate one important example of a changing world, a new goal position. In this situation, there is close to a two orders of magnitude increase in learning rate over using a basic reinforcement learning algorithm.
Chapter PDF
Keywords
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.
References
A. D. Christiansen. Learning to predict in uncertain continuous tasks. ICML pp 72–81, 1992.
L. D. Cohen and Isaac Cohen. Finite element methods for active contour models and balloons for 2-d and 3-d images. PAMI 15(11):1131–1147, Nov 1993.
E. W. Dijkstra. A note on two problems in connection with graphs. Numer. Math. 1:269–271, 1959.
C. Drummond. Using a case-base of surfaces to speed-up reinforcement learning. LNAI volume 1266, pp 435–444, 1997.
K. J. Hammond. Case-based planning: A framework for planning from experience. Journal of Cognitive Science 14(3):85–443, July 1990.
A. MacDonald. Graphs: Notes on symetries, imbeddings, decompositions. Elec. Eng. Dept. TR-92-10-AJM, Brunel University, Uxbridge, Middx, U. K., Oct 1992.
S. Mahadevan and J. Connell. Automatic programming of behavior-based robots using reinforcement learning. Artificial Intelligence 55:311–365, 1992.
J. Peng. Efficient memory-based dynamic programming. ICML pp 438–439 1995.
D. Precup and R. S. Sutton. Multi-time models for temporally abstract planning. NIPS 10 1997.
J. W. Sheppard and S. L. Salzberg. A teaching strategy for memory-based control. Artificial Intelligence Review 11:343–370, 1997.
S. P. Singh. Reinforcement learning with a hierarchy of abstract models. AAAI pp 202–207, 1992.
P. Suetens, P. Fua, and A. Hanson. Computational strategies for object recognition. Computing surveys 4(1):5–61, 1992.
R. S. Sutton. Generalization in reinforcement learning: Successful examples using sparse coarse coding. NIPS 8 pp 1038–1044, 1996.
R. S. Sutton. Integrated architectures for learning, planning, and reacting based on approximating dynamic programming. ICML pp 216–224, 1990.
P. Tadepalli and D. Ok. Scaling up average reward reinforcement learning by approximating learning by approximating. ICML pp 471–479, 1996.
S. Thrun and A. Schwartz. Finding structure in reinforcement learning. NIPS 7 pp 385–392 1994.
M. M. Veloso and J. G. Carbonell. Derivational analogy in prodigy: Automating case acquisition, storage and utilization. Machine Learning, 10(3):249–278, 1993.
C. J. C. H. Watkins and P. Dayan. Technical note: Q-learning. Machine Learning, 8(3–4):279–292, May 1992.
Author information
Authors and Affiliations
Editor information
Rights and permissions
Copyright information
© 1998 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Drummond, C. (1998). Composing functions to speed up reinforcement learning in a changing world. In: Nédellec, C., Rouveirol, C. (eds) Machine Learning: ECML-98. ECML 1998. Lecture Notes in Computer Science, vol 1398. Springer, Berlin, Heidelberg. https://doi.org/10.1007/BFb0026708
Download citation
DOI: https://doi.org/10.1007/BFb0026708
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-64417-0
Online ISBN: 978-3-540-69781-7
eBook Packages: Springer Book Archive