Abstract
The generalization ability of discrete time partially recurrent networks is examined. It is well known that the VC dimension of recurrent networks is infinite in most interesting cases and hence the standard VC analysis cannot be applied directly. We find guarantees for specific situations where the transition function forms a contraction or the probability of long inputs is restricted. For the general case, we derive posterior bounds which take the input data into account. They are obtained via a generalization of the luckiness framework to the agnostic setting. The general formalism allows to focus on reppresentative parts of the data as well as more general situations such as long term prediction.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
M. Anthony and P. Bartlett Neural Network Learning: Theoretical Foundations. Cambridge University Press, 1999.
E. B. Baum and D. Haussler. What size net gives valid generalization? Neural Computation, 1, 1989.
B. Dasgupta and E. D. Sontag. Sample complexity for learning recurrent perceptron mappings. IEEE Transactions on Information Theory, 42, 1996.
P. Frasconi, M. Gori, and A. Sperduti. A general framework for adaptive processing of data sequences. IEEE Transactions on Neural Networks, 9(5), 1997.
C. L. Giles, G. M. Kuhn, and R. J. Williams. Special issue on dynamic recurrent neural networks. IEEE Transactions on Neural Networks, 5(2), 1994.
M. Gori (chair). Special session: Adaptive computation of data structures. ESANN’99, 1999.
M. Gori, M. Mozer, A. C. Tsoi, and R. L. Watrous. Special issue on recurrent neural networks for sequence processing. Neurocomputing, 15(3-4), 1997.
B. Hammer. Learning with Recurrent Neural Networks. Springer Lecture Notes in Control and Information Sciences 254, 2000.
B. Hammer. Generalization ability of folding networks. To appear in IEEE Transactions on Knowledge and Data Engineering.
D. Haussler. Decision theoretic generalizations of the PAC model for neural net and other learning applications. Information and Computation, 100, 1992.
K. Hornik, M. Stinchcombe, and H. White. Multilayer feedforward networks are universal approximators. Neural Networks, 2, 1989.
M. J. Kearns, R. E. Schapire, and L. M. Sellie. Toward Efficient Agnostic Learning. Machine Learning, 17, 1994.
P. Koiran and E. D. Sontag. Vapnik-Chervonenkis dimension of recurrent neural networks. Discrete Applied Mathematics, 86(1), 1998.
W. Maass and P. Orponen. On the effect of analog noise in discrete-time analog computation. Neural Computation, 10(5), 1998.
S. Hochreiter and J. Schmidhuber. Long short-term memory. Neural Computation, 9(8), 1997.
J. Shawe-Taylor, P. L. Bartlett, R. Williamson, and M. Anthony. Structural risk minimization over data dependent hierarchies. IEEE Transactions on Information Theory, 44(5), 1998.
M. Vidyasagar. A Theory of Learning and Generalization. Springer, 1997.
P. Werbos. Beyond Regression: New Tools for Prediction and Analysis in the Behavioral Science. PhD thesis, Harvard University, 1974.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2001 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Hammer, B. (2001). On the Generalization Ability of Recurrent Networks. In: Dorffner, G., Bischof, H., Hornik, K. (eds) Artificial Neural Networks — ICANN 2001. ICANN 2001. Lecture Notes in Computer Science, vol 2130. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-44668-0_102
Download citation
DOI: https://doi.org/10.1007/3-540-44668-0_102
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-42486-4
Online ISBN: 978-3-540-44668-2
eBook Packages: Springer Book Archive