Abstract
Traditional theories of grammar, as well as computational modelling of language acquisition, have focused either on aspects of word learning, or grammar learning. Work on intermediate linguistic constructions (the area between words and combinatory grammar rules) has been very limited. Although recent usage-based theories of language learning emphasize the role of multiword constructions, much remains to be explored concerning the precise computational mechanisms that underlie how children learn to identify and interpret different types of multiword lexemes. The goal of the current study is to bring in ideas from computational linguistics on the topic of identifying multiword lexemes, and to explore whether these ideas can be extended in a natural way to the domain of child language acquisition. We take a first step toward computational modelling of the acquisition of a widely-documented class of multiword verbs, such as take the train and give a kiss, that children must master early in language learning. Specifically, we show that simple statistics based on the linguistic properties of these multiword verbs are informative for identifying them in a corpus of child-directed utterances. We present preliminary experiments demonstrating that such statistics can be used within a word learning model to learn associations between meanings and sequences of words.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
Notes
- 1.
A compositional approach to take the train would depend on knowledge of a very specialized meaning of take restricted to occur with a narrow range of objects, which is essentially an alternative lexicalization of the necessary knowledge. See Fazly et al. [31] for a computational approach to the restricted productivity of such expressions.
- 2.
- 3.
The choice of verb can vary among dialects of the language; for example, British speakers typically say take a decision instead of make a decision and have a nap instead of take a nap.
- 4.
Although it remains to be tested whether children actually do this, a construction grammar approach to language acquisition, as in Goldberg [41], supports this type of calculation, since the learner would keep track of which nouns can occur in which constructions.
- 5.
For example, the verb–noun pair give-hand may occur as an abs usage (give me a hand cleaning up) or as a lit usage (give me Mr. PotatoHead’s hand or give me your pretty hands). In most cases of such potential ambiguity, the annotator had a clear intuition of which would be the predominant usage, since the alternative would be odd to find in CDS. In some cases, such as give-hand, the actual corpus usages were examined to determine the most frequent class.
- 6.
The original model of Fazly et al. treats utterances as unordered bags of words, ignoring syntactic information. Syntax is arguably a valuable source of knowledge in word learning in children (e.g., [39, 56]). In a preliminary study, Alishahi and Fazly [2] also show that the word learning model can potentially benefit from knowledge of syntactic categories. Such information might be necessary for the acquisition of multiword lexemes, and should be further investigated in the future.
- 7.
Following Fazly et al. [33] we assume that words such as a and is also have corresponding meaning symbols in the scene. Such words are often considered by linguists to mainly have a grammatical function. However, it is reasonable to assume that language learners perceive some aspects of their meaning (e.g., definite/indefinite for a determiner such as a, and state/action for the verb be) from the scene.
- 8.
We did not incorporate the Fixed measure into this probability, because this measure needs to consider the usage pattern across several occurrences, and many of the experimental items in this corpus have frequency of only 1 or 2.
References
Alba-Salas, J. (2002). Light verb constructions in Romance: A syntactic analysis. Ph.D. thesis, Cornell University.
Alishahi, A., & Fazly, A. (2010). Integrating syntactic knowledge into a model of cross-situational word learning. In Proceedings of CogSci’2010, Portland.
Alishahi, A., & Stevenson, S. (2008). A computational model of early argument structure acquisition. Cognitive Science: A Multidisciplinary Journal, 32(5), 789–834.
Alishahi, A., & Stevenson, S. (2011). Gradual acquisition of verb selectional preferences in a Bayesian model. In Poibeau et al. (2011).
Arnon, I., & Snider, N. (2010). More than words: Frequency effects for multi-word phrases. Journal of Memory and Language, 62(1), 67–82. ISSN 0749–596X.
Bannard, C. (2007). A measure of syntactic flexibility for automatically identifying multiword expressions in corpora. In Multiword Expression’07: Proceedings of the Workshop on a Broader Perspective on Multiword Expressions (pp. 1–8). Prague: Association for Computational Linguistics.
Bannard, C., & Matthews, D. (2008). Stored word sequences in language learning: The effect of familiarity on children’s repetition of four-word combinations. Psychological Science, 19(3), 241–248.
Bannard, C., Baldwin, T., & Lascarides, A. (2003). A statistical approach to the semantics of verb-particles. In Proceedings of the ACL-SIGLEX Workshop on Multiword Expressions: Analysis, Acquisition and Treatment (pp. 65–72), Sapporo.
Borensztajn, G., Zuidema, W., & Bod, R. (2009). Children’s grammars grow more abstract with age – evidence from an automatic procedure for identifying the productive units of language. Topics in Cognitive Science, 1(1), 175–188.
Brown, R. (1957). Linguistic determinism and the part of speech. Journal of Abnormal Psychology, 55(1), 1–5.
Brown, R. (1973). A first language: The early stages. Cambridge: Harvard University Press.
Butt, M. (1997). Aspectual complex predicates, passives and dispositionability. In Talk Held at the 1997 Meeting of the Linguistics Association of Great Britain (LAGB’97), University of Essex. http://ling.uni-konstanz.de/pages/home/butt/.
Chang, N. (2004). Putting meaning into grammar learning. In Proceedings of the ACL’04 Workshop on Psycho-Computational Models of Human Language Acquisition (pp. 17–24), Geneva.
Church, K., Gale, W., Hanks, P., & Hindle, D. (1991). Using statistics in lexical analysis. In U. Zernik (Ed.), Lexical acquisition: Exploiting on-line resources to build a lexicon (pp. 115–164). Hillsdale: Erlbaum.
Claridge, C. (2000). Multiword verbs in early modern english. Language and Computers 32. New York: Rodopi.
Clark, E. V. (1996). Early verbs, event-types, and inflections. In C. E. Johnson & J. H. V. Gilbert (Eds.), Children’s language (Vol. 9, pp. 61–73). Mahwah: Erlbaum.
Clark, A. (2001). Unsupervised induction of stochastic context free grammars with distributional clustering. In Proceedings of Conference on Computational Natural Language Learning (pp. 105–112), Toulouse.
Connor, M., Fisher, C., & Roth, D. (2011). Starting from scratch in semantic role labeling: Early indirect supervision. In Poibeau et al. (2011).
Cook, P., & Stevenson, S. (2006). Classifying particle semantics in English verb-particle constructions. In Proceedings of the COLING-ACL Workshop on Multiword Expressions: Identifying and Exploiting Underlying Properties (pp. 45–53), Sydney.
Cowie, A. P. (1981). The treatment of collocations and idioms in learner’s dictionaries. Applied Linguistics, II(3), 223–235.
Deane, P. (2005). A nonparametric method for extraction of candidate phrasal terms. In Proceedings of the 43rd Annual Meeting of the Association for Computational Linguistics (ACL’05) (pp. 605–613), Ann Arbor.
Devereux, B. J. & Costello, F. J. (2011). Learning to interpret novel noun-noun compounds: Evidence from category learning experiments. In Poibeau et al. (2011).
Dominey, P. F., & Inui, T. (2004). A developmental model of syntax acquisition in the construction grammar framework with cross-linguistic validation in English and Japanese. In Proceedings of the ACL’04 Workshop on Psycho-Computational Models of Human Language Acquisition (pp. 33–40), Geneva.
Dras, M. (1995). Automatic identification of support verbs: A step towards a definition of semantic weight. In Proceedings of the Eighth Australian Joint Conference on Artificial Intelligence (pp. 451–458). Singapore: World Scientific.
Dras, M., & Johnson, M. (1996). Death and lightness: Using a demographic model to find support verbs. In Proceedings of the Fifth International Conference on the Cognitive Science of Natural Language Processing (pp. 165–172), Dublin.
Everaert, M., van der Linden, E. -J., Schenk, A., & Schreuder, R. (Eds.). (1995). Idioms: Structural and psychological perspectives. Hillsdale: Lawrence Erlbaum Associates.
Evert, S. (2008). Corpora and collocations. In A. Lüdeling & M. Kytö (Eds.), Corpus linguistics. An international handbook. Berlin: Mouton de Gruyter. Article 58.
Evert, S., Heid, U., & Spranger, K. (2004). Identifying morphosyntactic preferences in collocations. In Proceedings of the 4th Int’l Conference on Language Resources and Evaluation (pp. 907–910), Lisbon.
A. Fazly. (2007). Automatic acquisition of lexical knowledge about multiword predicates. Ph.D. in Computer Science, University of Toronto.
Fazly, A., & Stevenson, S. (2007). Distinguishing subtypes of multiword expressions using linguistically-motivated statistical measures. In Multiword Expression’07: Proceedings of the Workshop on a Broader Perspective on Multiword Expressions (pp. 9–16), Prague. Association for Computational Linguistics.
Fazly, A., Stevenson, S., & North, R. (2007). Automatically learning semantic knowledge about multiword predicates. Journal of Language Resources and Evaluation, 41(1), 61–89.
Fazly, A., Nematzadeh, A., & Stevenson, S. (2009). Acquiring multiword verbs: The role of statistical evidence. In Proceedings of the 31st Annual Conference of the Cognitive Science Society, Amsterdam.
Fazly, A., Alishahi, A., & Stevenson, S. (2010). A probabilistic computational model of cross-situational word learning. Cognitive Science, 34, 1017–1063.
Fellbaum, C. (1993). The determiner in English idioms (pp. 271–295). Hillsdale: Lawrence Erlbaum Associates.
Fellbaum, C. (Ed.). (1998). WordNet, an electronic lexical database. Cambridge/London: MIT Press.
Fisher, C. (2002). Structural limits on verb mapping: The role of abstract structure in 2.5-year-olds’ interpretations of novel verbs. Developmental Science, 5(1), 55–64.
Frank, M., Goodman, N., & Tenenbaum, J. B. (2007). A Bayesian framework for cross-situational word-learning. In Advances in Neural Information Processing Systems. Cambridge/London: MIT
Gentner, D., & France, I. M. (2004). The verb mutability effect: Studies of the combinatorial semantics of nouns and verbs. In S. L. Small, G. W. Cottrell, & M. K. Tanenhaus (Eds.), Lexical ambiguity resolution: Perspectives from psycholinguistics, neuropsychology, and artificial intelligence (pp. 343–382). San Mateo: Kaufmann.
Gertner, Y., Fisher, C., & Eisengart, J. (2006). Learning words and rules: Abstract knowledge of word order in early sentence comprehension. Psychological Science, 17(8), 684–691.
Goldberg, A. E. (1995). Constructions: A construction grammar approach to argument structure. Chicago: University of Chicago Press.
Goldberg, A. E. (2006). Constructions at work: The nature of generalization in language. Oxford: Oxford University Press.
Grant, L. E. (2005). Frequency of ‘core idioms’ in the British National Corpus (BNC). International Journal of Corpus Linguistics, 10(4), 429–451.
Grefenstette, G., & Teufel, S. (1995). Corpus-based method for automatic identification of support verbs for nominalization. In Proceedings of the 7th Meeting of the European Chapter of the Association for Computational Linguistics (EACL’95) (pp. 98–103), Dublin.
Israel, M. How children get constructions. In M. Fried & J. -O. Ostman (Eds.), Pragmatics in construction grammar and frame semantics. John Benjamins. (submitted)
Karimi, S. (1997). Persian complex verbs: Idiomatic or compositional? Lexicology, 3(1), 273–318.
Kearns, K. (2002). Light verbs in English. unpublished manuscript. http://www.ling.canterbury.ac.nz/people/kearns.html.
Krott, A., Gagne, C., & Nicoladis, E. (2009). How the parts relate to the whole: Frequency effects on childrens interpretations of novel compounds. Journal of Child Language, 36(01), 85–112.
Kytö, M. (1999). Collocational and idiomatic aspects of verbs in Early Modern English (pp. 167–206). Amsterdam/Philadelphia: John Benjamins Publishing Company.
Xiaowei, P. Li, & MacWhinney, B. (2007). Dynamic self-organization and early lexical development in children. Cognitive Science, 31, 581–612.
Lin, D. (1999). Automatic identification of non-compositional phrases. In Proceedings of the 37th Annual Meeting of the Association for Computational Linguistics on Computational Linguistics (pp. 317–324), College Park. Association for Computational Linguistics.
Lin, T. -H. (2001). Light verb syntax and the theory of phrase structure. Ph.D. thesis, University of California, Irvine.
MacWhinney, B. (2000). The CHILDES project: Tools for analyzing talk. The Database (3rd ed., Vol. 2). Mahwah: Lawrence Erlbaum Associates.
McCarthy, D., Keller, B., & Carroll, J. (2003). Detecting a continuum of compositionality in phrasal verbs. In Proceedings of the ACL-SIGLEX Workshop on Multiword Expressions: Analysis, Acquisition and Treatment (pp. 73–80), Sapporo.
Miyamoto, T. (2000). The light verb construction in Japanese: The role of the verbal noun. Amsterdam/Philadelphia: John Benjamins.
Moon, R. (1998). Fixed expressions and idioms in English: A corpus-based approach. New York: Oxford University Press.
Naigles, L., & Kako, E. T. (1993). First contact in verb acquisition: Defining a role for syntax. Child Development, 64, 1665–1687.
Nation, K., Marshall, C. M., & Altmann, G. T. M. (2003). Investigating individual differences in children’s real-time sentence comprehension using language-mediated eye movements. Journal of Experimental Child Psychology, 86, 314–329.
Newman, J. (1996). Give: A cognitive linguistic study. Berlin/New York: Mouton de Gruyter.
Newman, J., & Rice, S. (2004). Patterns of usage for English SIT, STAND, and LIE: A cognitively inspired exploration in corpus linguistics. Cognitive Linguistics, 15(3), 351–396.
Onnis, L., Roberts, M., & Chater, N. (2002). Simplicity: A cure for overgeneralizations in language acquisition. In Proceedings of the 24th Annual Conference of the Cognitive Science Society (pp. 720–725), Fairfax.
Parisien, C., & Stevenson, S. (2010). Learning verb alternations in a usage-based Bayesian model. In Proceeding of the 32nd Annual Meeting of the Cognitive Science Society, Austin.
Pauwels, P. (2000). Put, set, lay and place: A cognitive linguistic approach to verbal meaning. Munich: Lincom Europa.
Perfors, A., Tenenbaum, J. B., & Wonnacott, E. (2010). Variability, negative evidence, and the acquisition of verb argument constructions. Journal of Child Language, 37(3), 607–642.
Quochi, V. (2007). A usage-based approach to light verb constructions in Italian: Development and use. Ph.D. thesis, Universit‘a di Pisa.
Regier, T. (2005). The emergence of words: Attentional learning in form and meaning. Cognitive Science, 29, 819–865.
Riehemann, S. (2001). A constructional approach to idioms and word formation. Ph.D. thesis, Stanford University, Stanford.
Sag, I. A., Baldwin, T., Bond, F., Copestake, A., & Flickinger, D. (2002). Multiword expressions: A pain in the neck for NLP. In Proceedings of the 3rd International Conference on Intelligent Text Processing and Computational Linguistics (CICLing’02) (pp. 1–15), Mexico City, Mexico.
Sagae, K., Davis, E., Lavie, A., MacWhinney, B., & Wintner, S. (2007). High-accuracy annotation and parsing of CHILDES transcripts. In Proceedings of the ACL’07 Workshop on Cognitive Aspects of Computational Language Acquisition, Prague.
Sakas, W., & Fodor, J. D. (2001). The structural triggers learner. In S. Bertolo (Eds.), Language acquistion and learnability, (172–233). Cambridge: Cambridge University Press.
Scott, R. M., & Fisher, C. (2009). Two-year-olds use distributional cues to interpret transitivity-alternating verbs. Language and Cognitive Processes, 24, 777–803
Smadja, F. (1993). Retrieving collocations from text: Xtract. Computational Linguistics, 19(1), 143–177.
Sosa, A. V., & MacFarlane, J. (2002). Evidence for frequency based constituents in the mental lexicon: Collocations involving the word of. Brain and Language, 83, 227–236.
Theakston, A. L., Lieven, E. V. M., Pine, J. M., & Rowland, C. F. (2002). Going, going, gone: The acquisition of the verb ‘go’. Journal of Child Language, 29, 783–811.
Tomasello, M. (2003). Constructing a language: A usage-based theory of language acquisition. Cambridge: Harvard University Press.
Venkatapathy, S., & Joshi, A. (2005). Measuring the relative compositionality of verb-noun (V-N) collocations by integrating features. In Proceeding of HLT-EMNLP’05 (pp. 899–906), Vancouver.
Wierzbicka, A. (1982). Why can you Have a Drink when you can’t *Have an Eat? Language, 58(4), 753–799.
Yu, C., & Smith, L. B. (2006). Statistical cross-situational learning to build word-to-world mappings. In Proceedings of the 28th Annual Conference of the Cognitive Science Society, Vancouver.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2013 Springer-Verlag Berlin Heidelberg
About this chapter
Cite this chapter
Nematzadeh, A., Fazly, A., Stevenson, S. (2013). Child Acquisition of Multiword Verbs: A Computational Investigation. In: Villavicencio, A., Poibeau, T., Korhonen, A., Alishahi, A. (eds) Cognitive Aspects of Computational Language Acquisition. Theory and Applications of Natural Language Processing. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-31863-4_9
Download citation
DOI: https://doi.org/10.1007/978-3-642-31863-4_9
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-31862-7
Online ISBN: 978-3-642-31863-4
eBook Packages: Computer ScienceComputer Science (R0)