Abstract
Recent work by Siegelmann has shown that the computational power of recurrent neural networks matches that of Turing Machines. One important implication is that complex language classes (infinite languages with embedded clauses) can be represented in neural networks. Proofs are based on a fractal encoding of states to simulate the memory and operations of stacks.
In the present work, it is shown that similar stack-like dynamics can be learned in recurrent neural networks from simple sequence prediction tasks. Two main types of network solutions are found and described qualitatively as dynamical systems: damped oscillation and entangled spiraling around fixed points. The potential and limitations of each solution type are established in terms of generalization on two different context-free languages. Both solution types constitute novel stack implementations—generally in line with Siegelmann's theoretical work—which supply insights into how embedded structures of languages can be handled in analog hardware.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.References
N. Chomsky, Syntactic Structures, Mouton, The Hague, 1957.
H.T. Siegelmann and E.D. Sontag, “On the computational power of neural nets,” Journal of Computer System Sciences, vol. 50, no.1, pp. 132–150, 1995.
H.T. Siegelmann, Neural Networks and Analog Computation: Beyond the Turing Limit, Birkhäuser, 1999.
S. Hölldobler, Y. Kalinke, and H. Lehmann, “Designing a counter: Another case study of dynamics and activation landscapes in recurrent networks,” in Proceedings of KI-97: Advances in Artificial Intelligence, SpringerVerlag, 1997, pp. 313–324.
M. Steijvers and P. Grünwald, “A recurrent network that performs a context-sensitive prediction task,” Technical Report NC-TR-96-035, NeuroCOLT, Royal Holloway, University of London, 1996.
J. Wiles and J.L. Elman, “Learning to count without a counter: A case study of dynamics and activation landscapes in recurrent networks,” in Proceedings of the Seventeenth Annual Meeting of the Cognitive Science Society, Lawrence Erlbaum, 1995, pp. 482–487.
B. Tonkes, A. Blair, and J. Wiles, “Inductive bias in context-free language learning,” in Proceedings of the Ninth Australian Conference on Neural Networks, 1998, pp. 52–56.
P. Rodriguez and J. Wiles, “Recurrent neural networks can learn to implement symbol-sensitive counting,” in Advances in Neural Information Processing Systems, edited by Michael I. Jordan, Michael J. Kearns, and Sara A. Solla, The MIT Press, 1998, vol. 10.
P. Rodriguez, J. Wiles, and J.L. Elman, “A recurrent neural network that learns to count,” Connection Science, vol. 11, no.1, pp. 5–40, 1999.
M. Bodén, J. Wiles, B. Tonkes, and A. Blair, “Learning to predict a context-free language: Analysis of dynamics in recurrent hidden units,” in Proceedings of the International Conference on Artificial Neural Networks, Edinburgh, 1999, pp. 359–364. IEE.
M. Christiansen and N. Chater, “Toward a connectionist model of recursion in human linguistic performance,” Cognitive Science, vol. 23, pp. 157–205, 1999.
J.B. Pollack, “The induction of dynamical recognizers,” Machine Learning, vol. 7, p. 227, 1991.
C. Moore, “Dynamical recognizers: Real-time language recognition by analog computers,” Theoretical Computer Science, vol. 201, pp. 99–136, 1998.
M. Casey, “The dynamics of discrete-time computation, with application to recurrent neural networks and finite state machine extraction,” Neural Computation, vol. 8, no.6, pp. 1135–1178, 1996.
P. Tino, B.G. Horne, C.L. Giles, and P.C. Collingwood, “Finite state machines and recurrent neural networks—automata and dynamical systems approaches,” in Neural Networks and Pattern Recognition, edited by J.E. Dayhoff and O. Omidvar, Academic Press, 1998, pp. 171–220.
M. Barnsley, Fractals Everywhere, Academic Press: Boston, 2nd edition, 1993.
R.L. Devaney, An Introduction to Chaotic Dynamical Systems, Addison-Wesley, 1989.
R.J. Williams and D. Zipser, “A learning algorithm for continually running fully recurrent neural networks,” Neural Computation, vol. 1, no.2, pp. 270–280, 1989.
J.L. Elman, “Learning and development in neural networks: The importance of starting small,” Cognition, vol. 48, pp. 71–99, 1993.
M.W. Hirsch and S. Smale, Differential Equations, Dynamical Systems, and Linear Algebra, Academic Press: New York, 1974.
M. Bodén and J. Wiles, “Context-free and context-sensitive dynamics in recurrent neural networks,” Connection Science, vol. 12, no.3, 2000.
Author information
Authors and Affiliations
Rights and permissions
About this article
Cite this article
Bodén, M., Blair, A. Learning the Dynamics of Embedded Clauses. Applied Intelligence 19, 51–63 (2003). https://doi.org/10.1023/A:1023816706954
Issue Date:
DOI: https://doi.org/10.1023/A:1023816706954