Abstract
We present two axiomatic and three conjectural conditions which a model inducing natural language categories should dispose of, if ever it aims to be considered as “cognitively plausible”. 1st axiomatic condition is that the model should involve a bootstrapping component. 2nd axiomatic condition is that it should be data-driven. 1st conjectural condition demands that the model integrates the surface features – related to prosody, phonology and morphology – somewhat more intensively than is the case in existing Markov-inspired models. 2nd conjectural condition demands that asides integrating symbolic and connectionist aspects, the model under question should exploit the global geometric and topologic properties of vector-spaces upon which it operates. At last we shall argue that model should facilitate qualitative evaluation, for example in form of a POS-i oriented Turing Test. In order to support our claims, we shall present a POS-induction model based on trivial k-way clustering of vectors representing suffixal and co-occurrence information present in parts of Multext-East corpus. Even in very initial stages of its development, the model succeeds to outperform some more complex probabilistic POS-induction models for lesser computational cost.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Berg-Kirkpatrick, T., Bouchard-Côté, A., DeNero, J., Klein, D.: Painless unsupervised learning with features. In: Human LanguageTechnologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics, pp. 582–590 (2010)
Biemann, C.: Unsupervised part-of-speech tagging employing efficient graph clustering. In: Proceedings of the 21st International Conference on Computational Linguistics and 44th Annual Meeting of the Associationfor Computational Linguistics: Student Research Workshop, pp. 7–12 (2006)
Brown, P.F., Desouza, P.V., Mercer, R.L., Pietra, V.J.D., Lai, J.C.: Class-based ngram models of natural language. Computational Linguistics 18(4), 467–479 (1992)
Christodoulopoulos, C., Goldwater, S., Steedman, M.: Two Decades of Unsupervised POS induction: How far have we come? In: Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing, pp. 575–584 (2010)
Clark, A.: Combining distributional and morphological information for part of speech induction. In: Proceedings of the Tenth Conference on European Chapter of the Association for Computational Linguistics, vol. 1, pp. 59–66 (2003)
Clark, A., de Jong, J.: Towards general algorithms for grammatical inference. In: Hutter, M., Stephan, F., Vovk, V., Zeugmann, T. (eds.) ALT 2010. LNCS, vol. 6331, pp. 11–30. Springer, Heidelberg (2010)
Cohen, T., Schvaneveldt, R., Widdows, D.: Reflective Random Indexing and indirect inference: A scalable method for discovery of implicit connections. Journal of Biomedical Informatics 43(2), 240–256 (2010)
Elman, J.L.: Representation and structure in connectionist models. DTIC Document (1989)
Erjavec, T.: MULTEXT-East: morphosyntactic resources for Central and Eastern European languages. Language Resources and Evaluation 46(1), 131–142 (2012)
Ferguson, C.A.: Baby talk in six languages. American Anthropologist 66(6_PART2), 103–114 (1964)
Frank, S., Goldwater, S., Keller, F.: Evaluating models of syntactic category acquisition without using a gold standard. In: Proc. 31st Annual Conf. of the Cognitive Science Society, pp. 2576–2581 (2009)
Gao, J., Johnson, M.: A comparison of Bayesian estimators for unsupervised Hidden Markov Model POS taggers. In: Proceedings of the Conference on Empirical Methods in Natural Language Processing, pp. 344–352 (2008)
Gärdenfors, P.: Conceptual spaces: The geometry of thought. MIT Press (2004)
Goldwater, S., Griffiths, T.: A fully Bayesian approach to unsupervised part-of-speech tagging. In: Annual Meeting Association for Computational Linguistics, vol. 45, p. 744 (2007)
Haghighi, A., Klein, D.: Prototype-driven learning for sequence models. In: Proceedings of the Main Conference on Human LanguageTechnology Conference of the North American Chapter of the Association of Computational Linguistics, pp. 320–327 (2006)
Harris, Z.S.: Distributional structure. Word (1954)
Hebb, D.O.: The Organization of Behavior: A Neuropsychlogical Theory. John Wiley & Sons (1964)
Hromada, D.D.: Taxonomy of Turing Test Scenarios. In: Proceedings of AISB/IACAP 2012 Symposium, Birmingham, United Kingdom (2012)
Jackendoff, R.: Foundations of language: Brain, meaning, grammar, evolution. OxfordUniversity Press, USA (2003)
Johnson, M.: Why doesn’t EM find good HMM POS-taggers. In: Proceedings of the 2007 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning (EMNLP-CoNLL), pp. 296–305 (2007)
Johnson, W.B., Lindenstrauss, J.: Extensions of Lipschitz mappings into a Hilbert space. Contemporary Mathematics 26, 1 (1984)
Karypis, G.: CLUTO-a clustering toolkit. DTIC Document (2002)
Lakoff, G.: Women, fire, and dangerous things. Univ. of Chicago Press (1990)
Landauer, T.K., Dumais, S.T.: A solution to Plato’s problem: The latent semantic analysis theory of acquisition, induction, and representation of knowledge. Psychological Review 104(2), 211–240 (1997)
Levy, Y., Schlesinger, I.M., Braine, M.D.S.: Categories and Processes in Language Acquisition. Lawrence Erlbaum (1988)
MacWhinney, B.: The CHILDES Project: Tools for Analyzing Talk. Transcription, format and programs, vol. 1. Lawrence Erlbaum (2000)
Meilă, M.: Comparing clusterings by the variation of information. In: Schölkopf, B., Warmuth, M.K. (eds.) COLT/Kernel 2003. LNCS (LNAI), vol. 2777, pp. 173–187. Springer, Heidelberg (2003)
Nowak, M.A., Plotkin, J.B., Krakauer, D.C.: The evolutionary language game. Journal of Theoretical Biology 200(2), 147–162 (1999)
Riloff, E., Jones, R.: Learning dictionaries for information extraction by multi-level bootstrapping. In: Proceedings of the National Conference on Artificial Intelligence, pp. 474–479 (1999)
Rosenberg, A., Hirschberg, J.: V-measure: A conditional entropy-based external cluster evaluation measure. In: Proceedings of the 2007Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning (EMNLP-CoNLL), vol. 410, p. 420 (2007)
Sahlgren, M.: An introduction to random indexing. In: Methods and Applications of Semantic Indexing Workshop at the 7th International Conference on Terminologyand Knowledge Engineering, TKE, vol. 5 (2005)
Sahlgren, M., Karlgren, J.: Vector-based semantic analysis using random indexing for cross-lingual query expansion. In: Peters, C., Braschler, M., Gonzalo, J., Kluck, M. (eds.) CLEF 2001. LNCS, vol. 2406, p. 169. Springer, Heidelberg (2002)
De Saussure, F., Bally, C., Séchehaye, A., Riedlinger, A., Calvet, L.J., De Mauro, T.: Cours de linguistique générale. Payot, Paris (1922)
Schütze, H.: Part-of-speech induction from scratch. In: Proceedings of the 31st Annual Meeting on Association for Computational Linguistics, pp. 251–258 (1993)
Shannon, C.E., Weaver, W.: The mathematical theory of information, vol. 97. University of Illinois Press, Urbana (1949)
Solan, Z., Horn, D., Ruppin, E., Edelman, S.: Unsupervised learning of natural languages. Proceedings of the National Academy of Sciences 102(33), 11629 (2005)
Turing, A.M.: Systems of logic based on ordinals. Proceedings of the LondonMathematical Society 2(1), 161–228 (1939), Language and Speech 40(1), 47–62
Vlachos, A., Korhonen, A., Ghahramani, Z.: Unsupervised and constrained Dirichlet process mixture models for verb clustering. In: Proceedings of the Workshop on Geometrical Models of Natural Language Semantics, pp. 74–82 (2009)
Widdows, D., Kanerva, P.: Geometry and meaning. CSLI Publications Stanford (2004)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2014 Springer International Publishing Switzerland
About this paper
Cite this paper
Hromada, D.D. (2014). Conditions for Cognitive Plausibility of Computational Models of Category Induction. In: Laurent, A., Strauss, O., Bouchon-Meunier, B., Yager, R.R. (eds) Information Processing and Management of Uncertainty in Knowledge-Based Systems. IPMU 2014. Communications in Computer and Information Science, vol 443. Springer, Cham. https://doi.org/10.1007/978-3-319-08855-6_11
Download citation
DOI: https://doi.org/10.1007/978-3-319-08855-6_11
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-08854-9
Online ISBN: 978-3-319-08855-6
eBook Packages: Computer ScienceComputer Science (R0)