Abstract
The analysis of word patterns from a corpus has previously been examined using a number of different word embedding models. These models create a numeric representation of word co-occurrence and are able to capture some of the syntactic and semantic relationships of words in a document. Assessing language complexity has been considered for many years through the use of simple indexes and basic statistical properties (word frequency, etc.), however little work has been done on using word embeddings to develop language complexity measures. This paper describes preliminary work on measuring language complexity using clustered word embeddings to produce network transition models. The structural measures of these transition networks are shown to represent basic properties of language complexity and may be used to infer some aspects of the underlying generative grammar.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
References
Andreas, J., Klein, D.: How much do word embeddings encode about syntax? In: Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (vol. 2: Short Papers), pp. 822–827. Association for Computational Linguistics, Baltimore (2014)
Baumann, J.F.: Vocabulary and reading comprehension: the nexus of meaning. In: Israel, S., Duffy, G. (eds.) Handbook of Research on Reading Comprehension, chap. 15, p. 24 (2014). https://doi.org/10.4324/9781315759609-28
Bird, S., Klein, E., Loper, E.: Natural Language Processing with Python (2009)
Brown, P.F., deSouza, P.V., Mercer, R.L., Pietra, V.J.D., Lai, J.C.: Class-based N-gram models of natural language. Comput. Linguist. 18(4), 467–479 (1992)
Bullinaria, J.A., Levy, J.P.: Extracting semantic representations from word co-occurrence statistics: a computational study. Behav. Res. Methods 39(3), 510–526 (2007). https://doi.org/10.3758/BF03193020
Cha, M., Gwon, Y., Kung, H.T.: Language modeling by clustering with word embeddings for text readability assessment. In: Proceedings of the 2017 ACM on Conference on Information and Knowledge Management, CIKM 2017, pp. 2003–2006. ACM, New York (2017). https://doi.org/10.1145/3132847.3133104
Church, K.W., Hanks, P.: Word association norms mutual information, and lexicography. Comput. Linguist. 1(1), 22–29 (1990)
Csardi, G., Nepusz, T.: The igraph software package for complex network research. Int. J. Complex Syst. 1695(5), 1–9 (2006)
Gunning, R.: The fog index after twenty years. J. Bus. Commun. 6(2), 3–13 (1969). https://doi.org/10.1177/002194366900600202
Harris, Z.S.: Distributional structure. WORD 10(2–3), 146–162 (1954). https://doi.org/10.1080/00437956.1954.11659520
Huang, Y.T., Chang, H.P., Sun, Y., Chen, M.C.: A robust estimation scheme of reading difficulty for second language learners. In: 2011 IEEE 11th International Conference on Advanced Learning Technologies, pp. 58–62 (2011). https://doi.org/10.1109/ICALT.2011.25
Li, Y., Yang, T.: Word embedding for understanding natural language: a survey. In: Srinivasan, S. (ed.) Guide to Big Data Applications. SBD, vol. 26, pp. 83–104. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-53817-4_4
Mikolov, T., Chen, K., Corrado, G., Dean, J.: Efficient Estimation of Word Representations in Vector Space. arXiv:1301.3781 [cs] (2013)
Mikolov, T., Yih, S.W.T., Zweig, G.: Linguistic Regularities in Continuous Space Word Representations. Microsoft Research (2013)
Patel, K., Bhattacharyya, P.: Towards lower bounds on number of dimensions for word embeddings. In: Proceedings of the Eighth International Joint Conference on Natural Language Processing (vol. 2: Short Papers), pp. 31–36. Asian Federation of Natural Language Processing, Taipei (2017)
Pennington, J., Socher, R., Manning, C.: Glove: global vectors for word representation. In: Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), pp. 1532–1543. Association for Computational Linguistics, Doha (2014)
Team, R.C.: R: A Language and Environment for Statistical Computing (2017)
Yasseri, T., Kornai, A., Kertész, J.: A practical approach to language complexity: a wikipedia case study. PLoS ONE 7(11), e48386 (2012). https://doi.org/10.1371/journal.pone.0048386
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2018 Springer Nature Switzerland AG
About this paper
Cite this paper
Whigham, P.A., Chugh, M., Dick, G. (2018). Measuring Language Complexity Using Word Embeddings. In: Mitrovic, T., Xue, B., Li, X. (eds) AI 2018: Advances in Artificial Intelligence. AI 2018. Lecture Notes in Computer Science(), vol 11320. Springer, Cham. https://doi.org/10.1007/978-3-030-03991-2_76
Download citation
DOI: https://doi.org/10.1007/978-3-030-03991-2_76
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-03990-5
Online ISBN: 978-3-030-03991-2
eBook Packages: Computer ScienceComputer Science (R0)