Skip to main content
Log in

Comparing Simple Recurrent Networks and n-Grams in a Large Corpus

  • Published:
Applied Intelligence Aims and scope Submit manuscript

Abstract

The increased availability of text corpora and the growth of connectionism has stimulated a renewed interest in probabilistic models of language processing in computational linguistics and psycholinguistics. The Simple Recurrent Network (SRN) is an important connectionist model because it has the potential to learn temporal dependencies of unspecified length. In addition, many computational questions about the SRN's ability to learn dependencies between individual items extend to other models. This paper will report on experiments with an SRN trained on a large corpus and examine the ability of the network to learn bigrams, trigrams, etc., as a function of the size of the corpus. The performance is evaluated by an information theoretic measure of prediction (or guess) ranking and output vector entropy. With enough training and hidden units the SRN shows the ability to learn 5 and 6-gram dependencies, although learning an n-gram is contingent on its frequency and the relative frequency of other n-grams. In some cases, the network will learn relatively low frequency deep dependencies before relatively high frequency short ones if the deep dependencies do not require representational shifts in hidden unit space.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Similar content being viewed by others

References

  1. K.W. Church and R.L. Mercer, “Introduction to using large corpora,” Computational Linguistics, vol. 10, no.1, pp. 1–24, 1993.

    Google Scholar 

  2. E. Charniak, Statistical Language Learning. The MIT Press: Cambridge, MA, 1993.

    Google Scholar 

  3. D. Jurafsky, “A probabilistic model of lexical and syntactic access and disambiguation,” Cognitive Science, vol. 20, no.2, pp. 137–194, 1996.

    Google Scholar 

  4. C. Burgess and K. Lund, “Modelling parsing constraints with high dimensional context space,” Language and Cognitive Processes, vol. 12, no.2, pp. 177–210, 1997.

    Google Scholar 

  5. T.K. Landauer, D. Laham, and P. Foltz, “Learning human-like knowledge by singular value decomposition:A progress report,” in Advances in Information Processing 10, edited by M. Jordan, M. Kearns, and S. Solla, The MIT Press: Cambridge, MA, 1998.

    Google Scholar 

  6. M. Redington, N. Chater, and S. Finch, “Distributional information: A powerful cue for acquiring syntactic categories,” Cognitive Science, vol. 22, no.4, pp. 425–469, 1999.

    Google Scholar 

  7. J.L. Elman and D. Zipser, “Learning the hidden structure of speech,” Journal of the Acoustical Society of America, vol. 83, no.4, pp. 1615–1626, 1988.

    Google Scholar 

  8. K.J. Lang, A.H. Waibel, and G.E. Hinton, “A time-delay neural network architecture for isolated word recognition,” Neural Networks, vol. 3, no.1, pp. 23–43, 1990.

    Google Scholar 

  9. M.G. Gaskell, M. Hare, and W.D. Marslen-Wilson, “A connectionist model of phonological representation in speech perception,” Cognitive Science, vol. 19, no.4, pp. 407–439, 1995.

    Google Scholar 

  10. D.E. Rumelhart and J. McClelland, “On learning past tenses of English verbs,” in Parallel Distributed Processing Vol. 2, edited by David E. Rumelhart and James L. McClelland, The MIT Press: Cambridge, MA, 1986.

    Google Scholar 

  11. K. Plunkett and V. Marchman, “From rote learning to system building acquiring verb morphology in children and connectionist nets,” Cognition, vol. 48, no.1, pp. 21–69, 1993.

    Google Scholar 

  12. J.L. McClelland and A.H. Kawamoto, “Mechanisms of sentence processing: Assigning roles to constituents, in Parallel Distributed Processing Vol. 2, edited by David E. Rumelhart and James L. McClelland, The MIT Press: Cambridge, MA, 1986.

    Google Scholar 

  13. G.W. Cottrell and K. Plunkett, “Acquiring the mapping from meaning to sounds,” Connection Science, vol. 6, no.4, pp. 379–412, 1994.

    Google Scholar 

  14. M.H. Christiansen and N. Chater, “Connectionist natural language processing: The state of the art,” Cognitive Science, vol. 23, no.4, pp. 417–437, 1999.

    Google Scholar 

  15. D.E. Rumelhart, G.E. Hinton, and R.J. Williams, “Learning internal representations by error propagation,” in Parallel Distributed Processing Vol. 2, edited by David E. Rumelhart and James L. McClelland, The MIT Press: Cambridge, MA, 1986.

    Google Scholar 

  16. A. Cleeremans, D. Servan-Schreiber, and J.L. McClelland, “Finite state automata and simple recurrent networks,” Neural Computation, vol. 1, no.3, pp. 372–381, 1989.

    Google Scholar 

  17. C.L. Giles, G.Z. Sun, H.H. Chen, Y.C. Lee, and D. Chen, “Extracting and learning an unknown grammar with recurrent neural networks,” in Advances in Information Processing 4, edited by J.E. Moody, S.J. Hanson, and R.P. Lippmann, Morgan Kaufman: San Mateo, CA, 1992.

    Google Scholar 

  18. A. Cleeremans, Mechanisms of Implicit Learning, The MIT Press: Cambridge, MA, 1993.

    Google Scholar 

  19. M. St.John, “The story gestalt: A model of knowledge-intensive processes in text comprehension,” Cognitive Science, vol. 16, pp. 271–306, 1992.

    Google Scholar 

  20. M.H. Christiansen and N. Chater, “Toward a connectionist model of recursion in human linguistic performance,” Cognitive Science, vol. 23, no.2, pp. 157–205, 1999.

    Google Scholar 

  21. R. Shillcock and G. Westermann, “The role of phonotactic range in the order of acquisition of English consonants,” Clinical Linguistics and Phonetics, vol. 99, 65–72, 1997.

    Google Scholar 

  22. S. Lawrence, C.G. Giles, and S. Fong, “On the applicability of neural network and machine learning methodologies to natural language processing,” Technical Report UMIACS-TR-95-64, Unversity of Maryland, Institute for Advanced Computer Studies, 1995.

  23. I. Schellhammer, J. Diederich, M. Towsey, S. Chalup, and C. Brugman, “Natural language learning by recurrent neural networks: A comparison with probabilistic approaches,” Technical report, Queensland University of Technology, Machine Learning Research Centre, 1998.

  24. M.S. Seidenberg, “Language acquisition and use: Learning and applying probabilistic constraints,” Science, vol. 275, no.5306, pp. 1599–1603, 1997.

    Google Scholar 

  25. J. Saffran, R.N. Aslin, and E.L. Newport, “Statistical learning by an 8-month old infant,” Science, vol. 274, 1926–1928, 1996.

    Google Scholar 

  26. J.L. Elman, “Distributed representations, simple recurrent networks, and grammatical structure,” Machine Learning, vol. 7, 195–225, 1991.

    Google Scholar 

  27. D. Servan-Schreiber, “ A. Cleeremans, and J.L. McClelland, “Encoding sequential structure in simple recurrent networks,” Technical Report CMU TR CS-88-183, Carnegie Mellon University, 1988.

  28. A. Maskara and A. Noetzel, “Sequence recognition with recurrent neural networks,” Connection Science, vol. 5, 139–152, 1992.

    Google Scholar 

  29. Y. Bengio and P. Frasconi, “Learning long-term dependencies with gradient descent is difficult,” IEEE Transactions on Neural Networks, vol. 5, 157–66, 1994.

    Google Scholar 

  30. P.E. Kenne and M.O. O'Kane, “Word phrase modelling of spontaneous speech,” Journal of Electrical and Electronics Engineering, Australia, vol. 17, no.1, pp. 7–11, 1997.

    Google Scholar 

  31. C.E. Shannon, “Prediction and entropy of printed English,” Bell System Technical Journal, pp. 50–64, 1951.

  32. T.M. Cover and R.C. King, “A convergent gambling estimate of the entropy of English,” IEEE Transactions on Information Theory, vol. 11, no.24, pp. 39–45, 1976.

    Google Scholar 

  33. T.M. Cover and J.A. Thomas, Elements of Information Theory, Wiley: New York, NY, 1991.

    Google Scholar 

  34. C. Kuan, K. Hornik, and H. White, “A convergence result for learning in recurrent neural networks,” Neural Computation, vol. 6, no.3, pp. 420–40, 1994.

    Google Scholar 

  35. M.H. Christiansen, J. Allen, and Mark S. Seidenberg, “Learning to segment speech using multiple cues: A connectionist model,” Language and Cognitive Processes, vol. 13, no.2, pp. 221–268, 1998.

    Google Scholar 

  36. V.J. Brown, P.F. Della Pietra, P.V. de Souza, R.L. Mercer, and J.C. Lai, “Class-based n-gram models of natural language,” Computational Linguistics, vol. 18, no.4, pp. 467–480, 1992.

    Google Scholar 

  37. S.F. Chen and J. Goodman, “An empirical study of smoothing techniques for language modeling,” Computer Speech & Language, vol. 13, no.4, pp. 359–394, 1990.

    Google Scholar 

  38. P. Rodriguez and J. Elman, “Watching the transients: Viewing a simple recurrent network as a limited counter,” Behaviormetrika, Special Issue on Representation in Neural Networks, vol. 26, no.1, pp. 51–74, 1999.

    Google Scholar 

  39. M.R. Brent and T.A. Cartwright, “Distributional regularity and phonotactics are useful for segmentation,” Cognition, vol. 61, 93–125, 1996.

    Google Scholar 

  40. M.R. Brent, “Speech segmentation and word discovery: A computational perspective,” Trends in Cognitive Science, vol. 3, no.8, pp. 294–301, 1999.

    Google Scholar 

  41. E.J. Yannakoudakis, I. Tsomokos, and P.J. Hutton, “n-grams and their implication to natural language understanding,” Pattern Recognition, vol. 23, no.5, pp. 509–528, 1990.

    Google Scholar 

  42. M. Hare, J. Elman, and K.G. Daugherty, “Default generalization in connectionist networks,” Language and Cognitive Processes, vol. 10, no.6, pp. 601–630, 1995.

    Google Scholar 

  43. P.F. Brown, V.J. Della Pietra, R.L. Mercer, S.A. Della Pietra, and J.C. Lai, “An estimate of an upper bound for the entropy of English,” Computational Linguistics, vol. 18, no.1, pp. 31–40, 1992.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Rights and permissions

Reprints and permissions

About this article

Cite this article

Rodriguez, P. Comparing Simple Recurrent Networks and n-Grams in a Large Corpus. Applied Intelligence 19, 39–50 (2003). https://doi.org/10.1023/A:1023864622883

Download citation

  • Issue Date:

  • DOI: https://doi.org/10.1023/A:1023864622883

Navigation