Comparing Simple Recurrent Networks and n-Grams in a Large Corpus

Rodriguez, Paul

doi:10.1023/A:1023864622883

Comparing Simple Recurrent Networks and n-Grams in a Large Corpus

Published: July 2003

Volume 19, pages 39–50, (2003)
Cite this article

Applied Intelligence Aims and scope Submit manuscript

Paul Rodriguez¹

97 Accesses
9 Citations
Explore all metrics

Abstract

The increased availability of text corpora and the growth of connectionism has stimulated a renewed interest in probabilistic models of language processing in computational linguistics and psycholinguistics. The Simple Recurrent Network (SRN) is an important connectionist model because it has the potential to learn temporal dependencies of unspecified length. In addition, many computational questions about the SRN's ability to learn dependencies between individual items extend to other models. This paper will report on experiments with an SRN trained on a large corpus and examine the ability of the network to learn bigrams, trigrams, etc., as a function of the size of the corpus. The performance is evaluated by an information theoretic measure of prediction (or guess) ranking and output vector entropy. With enough training and hidden units the SRN shows the ability to learn 5 and 6-gram dependencies, although learning an n-gram is contingent on its frequency and the relative frequency of other n-grams. In some cases, the network will learn relatively low frequency deep dependencies before relatively high frequency short ones if the deep dependencies do not require representational shifts in hidden unit space.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Sparse Bayesian Recurrent Neural Networks

Understanding NLP Neural Networks by the Texts They Generate

Long Distance Relationships Without Time Travel: Boosting the Performance of a Sparse Predictive Autoencoder in Sequence Modeling

References

K.W. Church and R.L. Mercer, “Introduction to using large corpora,” Computational Linguistics, vol. 10, no.1, pp. 1–24, 1993.
Google Scholar
E. Charniak, Statistical Language Learning. The MIT Press: Cambridge, MA, 1993.
Google Scholar
D. Jurafsky, “A probabilistic model of lexical and syntactic access and disambiguation,” Cognitive Science, vol. 20, no.2, pp. 137–194, 1996.
Google Scholar
C. Burgess and K. Lund, “Modelling parsing constraints with high dimensional context space,” Language and Cognitive Processes, vol. 12, no.2, pp. 177–210, 1997.
Google Scholar
T.K. Landauer, D. Laham, and P. Foltz, “Learning human-like knowledge by singular value decomposition:A progress report,” in Advances in Information Processing 10, edited by M. Jordan, M. Kearns, and S. Solla, The MIT Press: Cambridge, MA, 1998.
Google Scholar
M. Redington, N. Chater, and S. Finch, “Distributional information: A powerful cue for acquiring syntactic categories,” Cognitive Science, vol. 22, no.4, pp. 425–469, 1999.
Google Scholar
J.L. Elman and D. Zipser, “Learning the hidden structure of speech,” Journal of the Acoustical Society of America, vol. 83, no.4, pp. 1615–1626, 1988.
Google Scholar
K.J. Lang, A.H. Waibel, and G.E. Hinton, “A time-delay neural network architecture for isolated word recognition,” Neural Networks, vol. 3, no.1, pp. 23–43, 1990.
Google Scholar
M.G. Gaskell, M. Hare, and W.D. Marslen-Wilson, “A connectionist model of phonological representation in speech perception,” Cognitive Science, vol. 19, no.4, pp. 407–439, 1995.
Google Scholar
D.E. Rumelhart and J. McClelland, “On learning past tenses of English verbs,” in Parallel Distributed Processing Vol. 2, edited by David E. Rumelhart and James L. McClelland, The MIT Press: Cambridge, MA, 1986.
Google Scholar
K. Plunkett and V. Marchman, “From rote learning to system building acquiring verb morphology in children and connectionist nets,” Cognition, vol. 48, no.1, pp. 21–69, 1993.
Google Scholar
J.L. McClelland and A.H. Kawamoto, “Mechanisms of sentence processing: Assigning roles to constituents, in Parallel Distributed Processing Vol. 2, edited by David E. Rumelhart and James L. McClelland, The MIT Press: Cambridge, MA, 1986.
Google Scholar
G.W. Cottrell and K. Plunkett, “Acquiring the mapping from meaning to sounds,” Connection Science, vol. 6, no.4, pp. 379–412, 1994.
Google Scholar
M.H. Christiansen and N. Chater, “Connectionist natural language processing: The state of the art,” Cognitive Science, vol. 23, no.4, pp. 417–437, 1999.
Google Scholar
D.E. Rumelhart, G.E. Hinton, and R.J. Williams, “Learning internal representations by error propagation,” in Parallel Distributed Processing Vol. 2, edited by David E. Rumelhart and James L. McClelland, The MIT Press: Cambridge, MA, 1986.
Google Scholar
A. Cleeremans, D. Servan-Schreiber, and J.L. McClelland, “Finite state automata and simple recurrent networks,” Neural Computation, vol. 1, no.3, pp. 372–381, 1989.
Google Scholar
C.L. Giles, G.Z. Sun, H.H. Chen, Y.C. Lee, and D. Chen, “Extracting and learning an unknown grammar with recurrent neural networks,” in Advances in Information Processing 4, edited by J.E. Moody, S.J. Hanson, and R.P. Lippmann, Morgan Kaufman: San Mateo, CA, 1992.
Google Scholar
A. Cleeremans, Mechanisms of Implicit Learning, The MIT Press: Cambridge, MA, 1993.
Google Scholar
M. St.John, “The story gestalt: A model of knowledge-intensive processes in text comprehension,” Cognitive Science, vol. 16, pp. 271–306, 1992.
Google Scholar
M.H. Christiansen and N. Chater, “Toward a connectionist model of recursion in human linguistic performance,” Cognitive Science, vol. 23, no.2, pp. 157–205, 1999.
Google Scholar
R. Shillcock and G. Westermann, “The role of phonotactic range in the order of acquisition of English consonants,” Clinical Linguistics and Phonetics, vol. 99, 65–72, 1997.
Google Scholar
S. Lawrence, C.G. Giles, and S. Fong, “On the applicability of neural network and machine learning methodologies to natural language processing,” Technical Report UMIACS-TR-95-64, Unversity of Maryland, Institute for Advanced Computer Studies, 1995.
I. Schellhammer, J. Diederich, M. Towsey, S. Chalup, and C. Brugman, “Natural language learning by recurrent neural networks: A comparison with probabilistic approaches,” Technical report, Queensland University of Technology, Machine Learning Research Centre, 1998.
M.S. Seidenberg, “Language acquisition and use: Learning and applying probabilistic constraints,” Science, vol. 275, no.5306, pp. 1599–1603, 1997.
Google Scholar
J. Saffran, R.N. Aslin, and E.L. Newport, “Statistical learning by an 8-month old infant,” Science, vol. 274, 1926–1928, 1996.
Google Scholar
J.L. Elman, “Distributed representations, simple recurrent networks, and grammatical structure,” Machine Learning, vol. 7, 195–225, 1991.
Google Scholar
D. Servan-Schreiber, “ A. Cleeremans, and J.L. McClelland, “Encoding sequential structure in simple recurrent networks,” Technical Report CMU TR CS-88-183, Carnegie Mellon University, 1988.
A. Maskara and A. Noetzel, “Sequence recognition with recurrent neural networks,” Connection Science, vol. 5, 139–152, 1992.
Google Scholar
Y. Bengio and P. Frasconi, “Learning long-term dependencies with gradient descent is difficult,” IEEE Transactions on Neural Networks, vol. 5, 157–66, 1994.
Google Scholar
P.E. Kenne and M.O. O'Kane, “Word phrase modelling of spontaneous speech,” Journal of Electrical and Electronics Engineering, Australia, vol. 17, no.1, pp. 7–11, 1997.
Google Scholar
C.E. Shannon, “Prediction and entropy of printed English,” Bell System Technical Journal, pp. 50–64, 1951.
T.M. Cover and R.C. King, “A convergent gambling estimate of the entropy of English,” IEEE Transactions on Information Theory, vol. 11, no.24, pp. 39–45, 1976.
Google Scholar
T.M. Cover and J.A. Thomas, Elements of Information Theory, Wiley: New York, NY, 1991.
Google Scholar
C. Kuan, K. Hornik, and H. White, “A convergence result for learning in recurrent neural networks,” Neural Computation, vol. 6, no.3, pp. 420–40, 1994.
Google Scholar
M.H. Christiansen, J. Allen, and Mark S. Seidenberg, “Learning to segment speech using multiple cues: A connectionist model,” Language and Cognitive Processes, vol. 13, no.2, pp. 221–268, 1998.
Google Scholar
V.J. Brown, P.F. Della Pietra, P.V. de Souza, R.L. Mercer, and J.C. Lai, “Class-based n-gram models of natural language,” Computational Linguistics, vol. 18, no.4, pp. 467–480, 1992.
Google Scholar
S.F. Chen and J. Goodman, “An empirical study of smoothing techniques for language modeling,” Computer Speech & Language, vol. 13, no.4, pp. 359–394, 1990.
Google Scholar
P. Rodriguez and J. Elman, “Watching the transients: Viewing a simple recurrent network as a limited counter,” Behaviormetrika, Special Issue on Representation in Neural Networks, vol. 26, no.1, pp. 51–74, 1999.
Google Scholar
M.R. Brent and T.A. Cartwright, “Distributional regularity and phonotactics are useful for segmentation,” Cognition, vol. 61, 93–125, 1996.
Google Scholar
M.R. Brent, “Speech segmentation and word discovery: A computational perspective,” Trends in Cognitive Science, vol. 3, no.8, pp. 294–301, 1999.
Google Scholar
E.J. Yannakoudakis, I. Tsomokos, and P.J. Hutton, “n-grams and their implication to natural language understanding,” Pattern Recognition, vol. 23, no.5, pp. 509–528, 1990.
Google Scholar
M. Hare, J. Elman, and K.G. Daugherty, “Default generalization in connectionist networks,” Language and Cognitive Processes, vol. 10, no.6, pp. 601–630, 1995.
Google Scholar
P.F. Brown, V.J. Della Pietra, R.L. Mercer, S.A. Della Pietra, and J.C. Lai, “An estimate of an upper bound for the entropy of English,” Computational Linguistics, vol. 18, no.1, pp. 31–40, 1992.
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Psychology, University of California, Los Angeles, Los Angeles, CA, 90095-1563
Paul Rodriguez

Authors

Paul Rodriguez
View author publications
You can also search for this author in PubMed Google Scholar

Rights and permissions

Reprints and permissions

About this article

Cite this article

Rodriguez, P. Comparing Simple Recurrent Networks and n-Grams in a Large Corpus. Applied Intelligence 19, 39–50 (2003). https://doi.org/10.1023/A:1023864622883

Download citation

Issue Date: July 2003
DOI: https://doi.org/10.1023/A:1023864622883

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Comparing Simple Recurrent Networks and n-Grams in a Large Corpus

Abstract

Access this article

Similar content being viewed by others

Sparse Bayesian Recurrent Neural Networks

Understanding NLP Neural Networks by the Texts They Generate

Long Distance Relationships Without Time Travel: Boosting the Performance of a Sparse Predictive Autoencoder in Sequence Modeling

References

Author information

Authors and Affiliations

Rights and permissions

About this article

Cite this article

Navigation

Comparing Simple Recurrent Networks and n-Grams in a Large Corpus

Abstract

Access this article

Similar content being viewed by others

Sparse Bayesian Recurrent Neural Networks

Understanding NLP Neural Networks by the Texts They Generate

Long Distance Relationships Without Time Travel: Boosting the Performance of a Sparse Predictive Autoencoder in Sequence Modeling

References

Author information

Authors and Affiliations

Rights and permissions

About this article

Cite this article

Share this article

Search

Navigation