Abstract
The back-propagation (BP) training scheme is widely used for training network models in cognitive science besides its well known technical and biological short-comings. In this paper we contribute to making the BP training scheme more acceptable from a biological point of view in cognitively motivated prediction tasks overcoming one of its major drawbacks.
Traditionally, recurrent neural networks in symbolic time series prediction (e. g. language) are trained with gradient decent based learning algorithms, notably with back-propagation (BP) through time. A major drawback for the biological plausibility of BP is that it is a supervised scheme in which a teacher has to provide a fully speci.ed target answer. Yet, agents in natural environments often receive a summary feed-back about the degree of success or failure only, a view adopted in reinforcement learning schemes.
In this work we show that for simple recurrent networks in prediction tasks for which there is a probability interpretation of the network’s output vector, Elman BP can be reimplemented as a reinforcement learning scheme for which the expected weight updates agree with the ones from traditional Elman BP, using ideas from the AGREL learning scheme (van Ooyen and Roelfsema 2003) for feed-forward networks.
An erratum to this chapter can be found at http://dx.doi.org/10.1007/11550907_163 .
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Sutton, R.S., Barto, A.G.: Reinforcement learning: An Indroduction. Bradford Books, MIT Press (2002)
Wörgötter, F., Porr, B.: Temporal sequence learning, prediction, and control – a review of different models and their relation to biological mechanisms. Neural Computation 17, 245–319 (2005)
van Ooyen, A., Roelfsema, P.R.: A biologically plausible implementation of error-backpropagation for classification tasks. In: Kaynak, O., Alpaydin, E., Oja, E., Xu, L. (eds.) Artificial Neural Networks and Neural Information Processing – Supplementary Proceedings ICANN/ICONIP, Istanbul, pp. 442–444 (2003)
Elman, J.L.: Finding structure in time. Cognitive Science 14, 179–211 (1990)
Ellis, R., Humphreys, G.: Connectionist Psychology. Psychology Press, Hove (1999)
Williams, R.J., Peng, J.: An efficient gradient-based algorithm for on-line training of recurrent network trajectories. Neural Computation 2, 490–501 (1990)
Christiansen, M.H., Chater, N.: Toward a connectionist model of recursion in human linguistic performance. Cognitive Science 23, 157–205 (1999)
Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Computation 9, 1735–1780 (1997)
Schultz, W.: Predictive reward signal of dopaminic neurons. J. Neurophysiol. 80, 1–27 (1998)
Carandini, M., Heeger, D.J.: Summation and division by neurons in primate visual cortex. Sience 264, 1333–1336 (1994)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2005 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Grüning, A. (2005). Back-Propagation as Reinforcement in Prediction Tasks. In: Duch, W., Kacprzyk, J., Oja, E., Zadrożny, S. (eds) Artificial Neural Networks: Formal Models and Their Applications – ICANN 2005. ICANN 2005. Lecture Notes in Computer Science, vol 3697. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11550907_86
Download citation
DOI: https://doi.org/10.1007/11550907_86
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-28755-1
Online ISBN: 978-3-540-28756-8
eBook Packages: Computer ScienceComputer Science (R0)