Abstract
Young children exhibit knowledge of abstract syntactic categories of words, such as noun and verb. A key research question is concerned with the type of information that children might use to form such categories. We use a computational model to provide insights into the (differential and cooperative) role of various information sources (namely, distributional, morphological, phonological, and semantic properties of words) in children’s early word categorization. Specifically, we use an unsupervised incremental clustering algorithm to learn categories of words using different combinations of these information sources, and determine the role of each type of cue by evaluating the quality of the resulting categories. We conduct two types of experiments: First, we compare the categories learned by our model to a set of gold-standard part of speech (PoS) tags, such as verb and noun. Second, we perform an experiment which simulates a particular language task similar to what performed by children, as reported in a psycholinguistic study by Brown (J Abnor Soc Psychol 55(1):1–5, 1957). Our results suggest that different categories of words may be recognized by relying on different types of cues. The results also indicate the importance of knowledge of word meanings for their syntactic categorization, and vice versa: Addition of semantic information leads to the construction of categories with a better match to the gold-standard parts of speech. On the other hand, our model (like children) can predict the semantic class of a word (e.g., action or object) by drawing on its learned knowledge of the word’s syntactic category.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsNotes
- 1.
In earlier experiments, we also included the first phoneme (beginning) of a word—a feature also considered by Onnis and Christiansen [22]. In our initial evaluations, we found that the inclusion of this feature did not affect the performance, and hence excluded it from further consideration.
- 2.
Authors are grateful to Christopher Parisien for providing them with a preprocessed version of this corpus.
- 3.
The “Null” value is treated as a missing value for a feature.
- 4.
- 5.
We have performed similar experiments with different ranges of cluster numbers, and found that the general patterns in results are similar. In Appendix A, we report the result of experiments in which we set the number of clusters within the range 346–500 ( < 500). In general we prefer fewer clusters (fewer than our vocabulary size) to allow for generalization. We expect the generalization ability of the model with 247–288 ( < 300) clusters to be reasonably good since more than 55 % of these clusters contain three or more word types in all conditions.
- 6.
In both the training and test data less than 6 % of the vocabulary are adjectives.
- 7.
Note that although the results show that by using semantic features the prediction accuracies for adjectives and determiners are substantially improved, this effect is due to the nature of the semantic features for these words (taken from Harm [14]) and should be interpreted with caution.
- 8.
Results of the novel word categorization experiment are included in the Appendix with more details.
References
Alishahi, A., & Chrupała, G. (2009) Lexical category acquisition as an incremental process. In CogSci-2009 Workshop on Psychocomputational Models of Human Language Acquisition, Amsterdam.
Asr, F. T., Fazly, A. & Azimifar, Z. (2010). The effect of word-internal properties on syntactic categorization: A computational modeling approach. In Proceedings of the 32nd Annual Conference of the Cognitive Science Society, Portland, USA.
Berko, G. J. (1958). The child’s learning of English morphology. Word, 14, 150–177.
Brown, R. (1957). Linguistic determinism and the part of speech. Journal of Abnormal and Social Psychology, 55(1), 1–5.
Cartwright, T., & Brent, M. (1997). Syntactic categorization in early language acquisition: Formalizing the role of distributional analysis. Cognition, 63(2), 121–170.
Chang, F., Lieven, E., & Tomasello, M. (2008). Automatic evaluation of syntactic learners in typologically-different languages. Cognitive Systems Research, 9(3), 198–213.
Chrupała, G., & Alishahi, A. (2010). Online entropy-based model of lexical category acquisition. In Proceedings of 14th Conference on Computational Natural Language Learning (CoNLL) (pp. 182–191), Uppsala, Sweden.
Clark, A. (2000). Inducing syntactic categories by context distribution clustering. In Proceedings of the 2nd Workshop on Learning Language in Logic and the 4th Conference on Computational Natural Language Learning (Vol. 7, pp. 91–94). Morristown: Association for Computational Linguistics.
Fazly, A., Alishahi, A., & Stevenson, S. (2008). A probabilistic incremental model of word learning in the presence of referential uncertainty. In Proceedings of the 30th Annual Conference of the Cognitive Science Society, Washington, DC.
Fellbaum, C. (1998). WordNet: An electronic lexical database. Cambridge: The MIT press. ISBN 026206197X.
Gelman, S., & Taylor, M. (1984). How two-year-old children interpret proper and common names for unfamiliar objects. Child Development, 55(4), 1535–1540.
Gerken, L., Wilson, R., & Lewis, W. (2005). Infants can use distributional cues to form syntactic categories. Journal of Child Language, 32(02), 249–268.
Goldwater, S., Griffiths, T. L., & Johnson, M. (2009). A bayesian framework for word segmentation: Exploring the effects of context. Cognition, 112(1), 21–54.
Harm, M. (2002). Building large scale distributed semantic feature sets with WordNet (Tech. Rep. No. PDP. CNS. 02.01). Carnegie Mellon University, Center for the Neural Basis of Cognition, Pittsburgh, PA.
Kaplan, F., Oudeyer, P., & Bergen, B. (2008). Computational models in the debate over language learnability. Infant and child development, 17(1), 55–80.
Kemp, N., Lieven, E., Tomasello, M. (2005). Young children’s knowledge of the “determiner” and “adjective” categories. Journal of Speech, Language, and Hearing Research, 48(3), 592–602.
Kipper-Schuler, K. (2005). VerbNet: A broad-coverage, comprehensive verb lexicon. Ph.D. thesis, University of Pennsylvania, Philadelphia.
MacWhinney, B. (2000). The CHILDES project: Tools for analyzing talk, volume 2: The database (3rd ed.). MahWah: Lawrence Erlbaum Associates.
Mintz, T. (2003). Frequent frames as a cue for grammatical categories in child directed speech. Cognition, 90(1), 91–117.
Monaghan, P., Christiansen, M., & Chater, N. (2007). The phonological-distributional coherence hypothesis: Cross-linguistic evidence in language acquisition. Cognitive Psychology, 55(4), 259–305.
Naigles, L. (1990). Children use syntax to learn verb meanings. Journal of Child Language, 17, 357–374.
Onnis, L. & Christiansen, M. (2008). Lexical categories at the edge of the word. Cognitive Science, 32(1), 184–221.
Parisien, C., Fazly, A., & Stevenson, S. (2008). An incremental Bayesian model for learning syntactic categories. In Proceedings of the Twelfth Conference on Computational Natural Language Learning (pp. 89–96). New York: Association for Computational Linguistics.
Pearl, L. (2009). Using computational modeling in language acquisition research. Experimental Methods in Language Acquisition Research, 163–184.
Redington, M., Chater, N., & Finch, S. (1998). Distributional information: A powerful cue for acquiring syntactic categories. Cognitive Science, 22(4), 425–469.
Samuelson, L. & Smith, L. (1999). Early noun vocabularies: do ontology, category structure and syntax correspond? Cognition, 73(1), 1–33.
Schütze, H. (1995). Distributional part-of-speech tagging. In Proceedings of the Seventh Conference on European Chapter of the Association for Computational Linguistics (pp. 141–148). San Francisco: Morgan Kaufmann Publishers Inc.
Theakston, A. L., Lieven, E. V., Pine, J. M., & Rowland, C. F. (2001). The role of performance limitations in the acquisition of verb–argument structure: An alternative account. Journal of Child Language, 28, 127–152.
Wilson, M. (1988). MRC psycholinguistic database: Machine-usable dictionary, version 2.00. Behavior Research Methods, 20(1), 6–10.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Appendix
Appendix
Rights and permissions
Copyright information
© 2012 Springer-Verlag Berlin Heidelberg
About this chapter
Cite this chapter
Asr, F.T., Fazly, A., Azimifar, Z. (2012). From Cues to Categories: A Computational Study of Children’s Early Word Categorization. In: Villavicencio, A., Poibeau, T., Korhonen, A., Alishahi, A. (eds) Cognitive Aspects of Computational Language Acquisition. Theory and Applications of Natural Language Processing. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-31863-4_4
Download citation
DOI: https://doi.org/10.1007/978-3-642-31863-4_4
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-31862-7
Online ISBN: 978-3-642-31863-4
eBook Packages: Computer ScienceComputer Science (R0)