Back-Propagation as Reinforcement in Prediction Tasks

Grüning, André

doi:10.1007/11550907_86

André Grüning²⁰

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 3697))

Included in the following conference series:

International Conference on Artificial Neural Networks

2887 Accesses

Abstract

The back-propagation (BP) training scheme is widely used for training network models in cognitive science besides its well known technical and biological short-comings. In this paper we contribute to making the BP training scheme more acceptable from a biological point of view in cognitively motivated prediction tasks overcoming one of its major drawbacks.

Traditionally, recurrent neural networks in symbolic time series prediction (e. g. language) are trained with gradient decent based learning algorithms, notably with back-propagation (BP) through time. A major drawback for the biological plausibility of BP is that it is a supervised scheme in which a teacher has to provide a fully speci.ed target answer. Yet, agents in natural environments often receive a summary feed-back about the degree of success or failure only, a view adopted in reinforcement learning schemes.

In this work we show that for simple recurrent networks in prediction tasks for which there is a probability interpretation of the network’s output vector, Elman BP can be reimplemented as a reinforcement learning scheme for which the expected weight updates agree with the ones from traditional Elman BP, using ideas from the AGREL learning scheme (van Ooyen and Roelfsema 2003) for feed-forward networks.

An erratum to this chapter can be found at http://dx.doi.org/10.1007/11550907_163 .

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

A Taxonomy of Recurrent Learning Rules

Learning Multiple Timescales in Recurrent Neural Networks

Skipped Nonsynaptic Backpropagation for Interval-valued Long-term Cognitive Networks

References

Sutton, R.S., Barto, A.G.: Reinforcement learning: An Indroduction. Bradford Books, MIT Press (2002)
Google Scholar
Wörgötter, F., Porr, B.: Temporal sequence learning, prediction, and control – a review of different models and their relation to biological mechanisms. Neural Computation 17, 245–319 (2005)
Article Google Scholar
van Ooyen, A., Roelfsema, P.R.: A biologically plausible implementation of error-backpropagation for classification tasks. In: Kaynak, O., Alpaydin, E., Oja, E., Xu, L. (eds.) Artificial Neural Networks and Neural Information Processing – Supplementary Proceedings ICANN/ICONIP, Istanbul, pp. 442–444 (2003)
Google Scholar
Elman, J.L.: Finding structure in time. Cognitive Science 14, 179–211 (1990)
Article Google Scholar
Ellis, R., Humphreys, G.: Connectionist Psychology. Psychology Press, Hove (1999)
Google Scholar
Williams, R.J., Peng, J.: An efficient gradient-based algorithm for on-line training of recurrent network trajectories. Neural Computation 2, 490–501 (1990)
Article Google Scholar
Christiansen, M.H., Chater, N.: Toward a connectionist model of recursion in human linguistic performance. Cognitive Science 23, 157–205 (1999)
Article Google Scholar
Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Computation 9, 1735–1780 (1997)
Article Google Scholar
Schultz, W.: Predictive reward signal of dopaminic neurons. J. Neurophysiol. 80, 1–27 (1998)
Google Scholar
Carandini, M., Heeger, D.J.: Summation and division by neurons in primate visual cortex. Sience 264, 1333–1336 (1994)
Google Scholar

Download references

Author information

Authors and Affiliations

Cognitive Neuroscience Sector, S.I.S.S.A., via Beirut 4, 34014, Trieste, Italy
André Grüning

Authors

André Grüning
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Department of Informatics, Nicolaus Copernicus University, Toruń, Poland
Włodzisław Duch
Systems Research Institute, Polish Academy of Sciences, ul. Newelska 6, 01–447, Warsaw, Poland
Janusz Kacprzyk
Adaptive Informatics Research Centre, Helsinki University of Technology, P.O. Box 5400, 02015, HUT, Finland
Erkki Oja
Systems Research Institute, Polish Academy of Sciences, ul. Newelska 6, 01-447, Warsaw, Poland
Sławomir Zadrożny

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Grüning, A. (2005). Back-Propagation as Reinforcement in Prediction Tasks. In: Duch, W., Kacprzyk, J., Oja, E., Zadrożny, S. (eds) Artificial Neural Networks: Formal Models and Their Applications – ICANN 2005. ICANN 2005. Lecture Notes in Computer Science, vol 3697. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11550907_86

Download citation

DOI: https://doi.org/10.1007/11550907_86
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-28755-1
Online ISBN: 978-3-540-28756-8
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Back-Propagation as Reinforcement in Prediction Tasks

Abstract

Access this chapter

Preview

Similar content being viewed by others

A Taxonomy of Recurrent Learning Rules

Learning Multiple Timescales in Recurrent Neural Networks

Skipped Nonsynaptic Backpropagation for Interval-valued Long-term Cognitive Networks

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Navigation

Back-Propagation as Reinforcement in Prediction Tasks

Abstract

Access this chapter

Preview

Similar content being viewed by others

A Taxonomy of Recurrent Learning Rules

Learning Multiple Timescales in Recurrent Neural Networks

Skipped Nonsynaptic Backpropagation for Interval-valued Long-term Cognitive Networks

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation