Abstract
The increased availability of text corpora and the growth of connectionism has stimulated a renewed interest in probabilistic models of language processing in computational linguistics and psycholinguistics. The Simple Recurrent Network (SRN) is an important connectionist model because it has the potential to learn temporal dependencies of unspecified length. In addition, many computational questions about the SRN's ability to learn dependencies between individual items extend to other models. This paper will report on experiments with an SRN trained on a large corpus and examine the ability of the network to learn bigrams, trigrams, etc., as a function of the size of the corpus. The performance is evaluated by an information theoretic measure of prediction (or guess) ranking and output vector entropy. With enough training and hidden units the SRN shows the ability to learn 5 and 6-gram dependencies, although learning an n-gram is contingent on its frequency and the relative frequency of other n-grams. In some cases, the network will learn relatively low frequency deep dependencies before relatively high frequency short ones if the deep dependencies do not require representational shifts in hidden unit space.
Similar content being viewed by others
References
K.W. Church and R.L. Mercer, “Introduction to using large corpora,” Computational Linguistics, vol. 10, no.1, pp. 1–24, 1993.
E. Charniak, Statistical Language Learning. The MIT Press: Cambridge, MA, 1993.
D. Jurafsky, “A probabilistic model of lexical and syntactic access and disambiguation,” Cognitive Science, vol. 20, no.2, pp. 137–194, 1996.
C. Burgess and K. Lund, “Modelling parsing constraints with high dimensional context space,” Language and Cognitive Processes, vol. 12, no.2, pp. 177–210, 1997.
T.K. Landauer, D. Laham, and P. Foltz, “Learning human-like knowledge by singular value decomposition:A progress report,” in Advances in Information Processing 10, edited by M. Jordan, M. Kearns, and S. Solla, The MIT Press: Cambridge, MA, 1998.
M. Redington, N. Chater, and S. Finch, “Distributional information: A powerful cue for acquiring syntactic categories,” Cognitive Science, vol. 22, no.4, pp. 425–469, 1999.
J.L. Elman and D. Zipser, “Learning the hidden structure of speech,” Journal of the Acoustical Society of America, vol. 83, no.4, pp. 1615–1626, 1988.
K.J. Lang, A.H. Waibel, and G.E. Hinton, “A time-delay neural network architecture for isolated word recognition,” Neural Networks, vol. 3, no.1, pp. 23–43, 1990.
M.G. Gaskell, M. Hare, and W.D. Marslen-Wilson, “A connectionist model of phonological representation in speech perception,” Cognitive Science, vol. 19, no.4, pp. 407–439, 1995.
D.E. Rumelhart and J. McClelland, “On learning past tenses of English verbs,” in Parallel Distributed Processing Vol. 2, edited by David E. Rumelhart and James L. McClelland, The MIT Press: Cambridge, MA, 1986.
K. Plunkett and V. Marchman, “From rote learning to system building acquiring verb morphology in children and connectionist nets,” Cognition, vol. 48, no.1, pp. 21–69, 1993.
J.L. McClelland and A.H. Kawamoto, “Mechanisms of sentence processing: Assigning roles to constituents, in Parallel Distributed Processing Vol. 2, edited by David E. Rumelhart and James L. McClelland, The MIT Press: Cambridge, MA, 1986.
G.W. Cottrell and K. Plunkett, “Acquiring the mapping from meaning to sounds,” Connection Science, vol. 6, no.4, pp. 379–412, 1994.
M.H. Christiansen and N. Chater, “Connectionist natural language processing: The state of the art,” Cognitive Science, vol. 23, no.4, pp. 417–437, 1999.
D.E. Rumelhart, G.E. Hinton, and R.J. Williams, “Learning internal representations by error propagation,” in Parallel Distributed Processing Vol. 2, edited by David E. Rumelhart and James L. McClelland, The MIT Press: Cambridge, MA, 1986.
A. Cleeremans, D. Servan-Schreiber, and J.L. McClelland, “Finite state automata and simple recurrent networks,” Neural Computation, vol. 1, no.3, pp. 372–381, 1989.
C.L. Giles, G.Z. Sun, H.H. Chen, Y.C. Lee, and D. Chen, “Extracting and learning an unknown grammar with recurrent neural networks,” in Advances in Information Processing 4, edited by J.E. Moody, S.J. Hanson, and R.P. Lippmann, Morgan Kaufman: San Mateo, CA, 1992.
A. Cleeremans, Mechanisms of Implicit Learning, The MIT Press: Cambridge, MA, 1993.
M. St.John, “The story gestalt: A model of knowledge-intensive processes in text comprehension,” Cognitive Science, vol. 16, pp. 271–306, 1992.
M.H. Christiansen and N. Chater, “Toward a connectionist model of recursion in human linguistic performance,” Cognitive Science, vol. 23, no.2, pp. 157–205, 1999.
R. Shillcock and G. Westermann, “The role of phonotactic range in the order of acquisition of English consonants,” Clinical Linguistics and Phonetics, vol. 99, 65–72, 1997.
S. Lawrence, C.G. Giles, and S. Fong, “On the applicability of neural network and machine learning methodologies to natural language processing,” Technical Report UMIACS-TR-95-64, Unversity of Maryland, Institute for Advanced Computer Studies, 1995.
I. Schellhammer, J. Diederich, M. Towsey, S. Chalup, and C. Brugman, “Natural language learning by recurrent neural networks: A comparison with probabilistic approaches,” Technical report, Queensland University of Technology, Machine Learning Research Centre, 1998.
M.S. Seidenberg, “Language acquisition and use: Learning and applying probabilistic constraints,” Science, vol. 275, no.5306, pp. 1599–1603, 1997.
J. Saffran, R.N. Aslin, and E.L. Newport, “Statistical learning by an 8-month old infant,” Science, vol. 274, 1926–1928, 1996.
J.L. Elman, “Distributed representations, simple recurrent networks, and grammatical structure,” Machine Learning, vol. 7, 195–225, 1991.
D. Servan-Schreiber, “ A. Cleeremans, and J.L. McClelland, “Encoding sequential structure in simple recurrent networks,” Technical Report CMU TR CS-88-183, Carnegie Mellon University, 1988.
A. Maskara and A. Noetzel, “Sequence recognition with recurrent neural networks,” Connection Science, vol. 5, 139–152, 1992.
Y. Bengio and P. Frasconi, “Learning long-term dependencies with gradient descent is difficult,” IEEE Transactions on Neural Networks, vol. 5, 157–66, 1994.
P.E. Kenne and M.O. O'Kane, “Word phrase modelling of spontaneous speech,” Journal of Electrical and Electronics Engineering, Australia, vol. 17, no.1, pp. 7–11, 1997.
C.E. Shannon, “Prediction and entropy of printed English,” Bell System Technical Journal, pp. 50–64, 1951.
T.M. Cover and R.C. King, “A convergent gambling estimate of the entropy of English,” IEEE Transactions on Information Theory, vol. 11, no.24, pp. 39–45, 1976.
T.M. Cover and J.A. Thomas, Elements of Information Theory, Wiley: New York, NY, 1991.
C. Kuan, K. Hornik, and H. White, “A convergence result for learning in recurrent neural networks,” Neural Computation, vol. 6, no.3, pp. 420–40, 1994.
M.H. Christiansen, J. Allen, and Mark S. Seidenberg, “Learning to segment speech using multiple cues: A connectionist model,” Language and Cognitive Processes, vol. 13, no.2, pp. 221–268, 1998.
V.J. Brown, P.F. Della Pietra, P.V. de Souza, R.L. Mercer, and J.C. Lai, “Class-based n-gram models of natural language,” Computational Linguistics, vol. 18, no.4, pp. 467–480, 1992.
S.F. Chen and J. Goodman, “An empirical study of smoothing techniques for language modeling,” Computer Speech & Language, vol. 13, no.4, pp. 359–394, 1990.
P. Rodriguez and J. Elman, “Watching the transients: Viewing a simple recurrent network as a limited counter,” Behaviormetrika, Special Issue on Representation in Neural Networks, vol. 26, no.1, pp. 51–74, 1999.
M.R. Brent and T.A. Cartwright, “Distributional regularity and phonotactics are useful for segmentation,” Cognition, vol. 61, 93–125, 1996.
M.R. Brent, “Speech segmentation and word discovery: A computational perspective,” Trends in Cognitive Science, vol. 3, no.8, pp. 294–301, 1999.
E.J. Yannakoudakis, I. Tsomokos, and P.J. Hutton, “n-grams and their implication to natural language understanding,” Pattern Recognition, vol. 23, no.5, pp. 509–528, 1990.
M. Hare, J. Elman, and K.G. Daugherty, “Default generalization in connectionist networks,” Language and Cognitive Processes, vol. 10, no.6, pp. 601–630, 1995.
P.F. Brown, V.J. Della Pietra, R.L. Mercer, S.A. Della Pietra, and J.C. Lai, “An estimate of an upper bound for the entropy of English,” Computational Linguistics, vol. 18, no.1, pp. 31–40, 1992.
Author information
Authors and Affiliations
Rights and permissions
About this article
Cite this article
Rodriguez, P. Comparing Simple Recurrent Networks and n-Grams in a Large Corpus. Applied Intelligence 19, 39–50 (2003). https://doi.org/10.1023/A:1023864622883
Issue Date:
DOI: https://doi.org/10.1023/A:1023864622883