Abstract
This paper is the first part of contextual predictability model investigation for Russian, it is focused on linguistic and psychology interpretation of models, features, metrics and sets of features. The aim of this paper is to identify the dependence of the implementation of contextual predictability procedures on the genre characteristics of the text (or text collection): scientific vs. fictional. We construct a model predicting text elements and designate its features for texts of different genres and domains. We analyze various methods for studying contextual predictability, carry out a computational experiment against scientific and fictional texts, and verify its results by the experiment with informants (cloze-tests) and word embeddings (word2vec CBOW model). As a result, text processing model is built. It is evaluated based on the selected contextual predictability features and experiments with informants.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Yagunova, E.V.: Fundamentals of theoretical, computational and experimental linguistics, or Reflections on the place of the linguist in computational linguistics. In: Bolshakova, E.I., Klyshinsky, E.S., Lande, D.V., Noskov, A.A., Peskova, O.V., Yagunova, E.V.-M.: Automatic Processing of Texts in Natural Language and Computational Linguistics: Studies. Allowance. MIEM (2011). (in Russian)
Biemann, C., Remus, S., Hofmann, M.J.: Predicting word ‘predictability’ in cloze completion, electroencephalographic and eye movement data. Natural language processing and cognitive science. In: Bernadette, S., Wiesław, L., Rodolfo, D. (eds.) Libreria Editrice Cafoscarina, Venezia, pp. 83–95 (2015)
Miller, G.A.: The magical number seven, plus or minus two: some limits on our capacity for processing information. Psychol. Rev. 63(2), 81–97 (1956)
Owens, M., O’Boyle, P., Mcmahon, J., Ming J., Smith F.: A comparison of human and statistical language model performance using missing-word tests. Lang. Speech 40(4), 377–389 (1997)
Robinson, R.D.: The cloze procedure: a new tool for adult education. Adult Educ. Quart. 23, 97–98 (1973)
Taylor, W.L.: Cloze procedure: a new tool for measuring readability. J. Quart. 30, 415–433 (1953)
Oller Jr., J.W., Yii, G.K., Greenberg, L.A., Hurtado R.: The learning effect from textual coherence measured with cloze. In: Oller Jr., J.W., Jonz, J. (Eds.) Cloze and coherence, Cranbury, NJ, pp. 247–268 (1994)
Nusbaum, H.C., et al.: Why cloze procedure? In: Oller Jr., J.W., Jonz, J. (eds.) Cloze and Coherence, Cranbury, NJ, pp. 1–20 (1994)
Yagunova, E.V.: Study of the contextual predictability of text units using corpus resources. In: Proceedings of the International Conference “Corpus Linguistics – 2008”. SPSU, SPb, pp. 396–403 (2008). (in Russian)
Yagunova, E.V., Pivovarova, L.M.: The nature of collocations in the Russian language. The experience of automatic extraction and classification on the material of news texts. In: Proceedings of STI, Series 2, no. 6 (2010). (in Russian)
Yagunova, E.V.: Variability of perception strategies of sounding text (experimental research based on Russian-language texts of various functional styles). SPSU – Perm (2008). (in Russian)
Yagunova E.V.: Investigation of the redundancy of Russian sounding text. In: Voeikov, M.D. (ed.): Redundancy in the Grammatical Structure of the Language. SPb, Science, p. 462 (2010). (in Russian)
Markov Models for Text Analysis. Purdue University, Department of Statistics (2009). http://www.stat.purdue.edu/~mdw/CSOI/MarkovLab.html. Accessed 15 Apr 2016
Khokhlova, M.V.: The study of lexical-semantic compatibility in Russian with the help of statistical methods (based on corpus text), St. Petersburg (2010). (in Russian)
Yagunova, E.V., Pivovarova, L.M.: Study of the structure of news text as a sequence of connected segments. In: Izd-vo Rsuh, M.: Computational Linguistics and Intellectual Technologies: Based on the Materials of the Annual International Conference “Dialogue”, Bekasovo, 25–29 May 2011, vol. 10, no. 17 (2011). (in Russian)
Yagunova, E.V.: Study of the contextual predictability of text units using corpus resources. In: Proceedings of the International Conference “Corpus Linguistics – 2008”. SPb, SPSU, pp. 396–403 (2008). (in Russian)
McWhorter, J.: The world’s simplest grammars are creole grammars. Linguist. Typol. 5(2–3), 125–166 (2001)
Kusters, W.: Linguistic complexity: the influence of social change on verbal inflection, Utrecht (2003)
Dahl, Ö.: The growth and maintenance of linguistic complexity, Amsterdam (2004)
Trudgill, P.: Sociolinguistic typology: social determinants of linguistic complexity, Oxford (2011)
Sun, Y., Deng, H., Han, J.: Probabilistic models for text mining. In: Aggarwal, C., Zhai, C. (eds.) Mining Text Data, pp. 259–295. Springer, Boston (2012). https://doi.org/10.1007/978-1-4614-3223-4_8
Berdichevsky, A.: Language complexity (language complexity). Questions of linguistics, no. 5 (2012). (in Russian)
Piotrovsky, R.G.: Linguistic Automaton (in research and continuous learning), SPb (1999). (in Russian)
Piotrovsky R.G.: Informational measurements of language (1968). (in Russian)
MacKay, D.: Information Theory, Inference, and Learning Algorithms. Cambridge University Press, Cambridge (2003)
Daudaravicius, V.: Automatic identification of lexical units. In: Computational Linguistics and Intelligent text processing CICling (2009)
Decision Trees: Entropy, Information Gain, Gain Ratio. Marina Santini. http://www.slideshare.net/marinasantini1/lecture-4-decision-trees-2-entropy-information-gain-gain-ratio-55241087?related=1. 18 May 2016
Myslín, M., Roger L.: Codeswitching and predictability of meaning in discourse. Language 91(4), 871–905 (2015)
Babaylova, A.E.: Text as a product, means and object of communication in teaching a non-native language. Saratov University (1987). (in Russian)
Miller J.A.: The magic number is seven plus or minus two. In: Gippenreiter, Y.B., Romanov, V.Y. (eds.) On Some Limits of Our Ability to Process Information, CheRo, Moscow, pp. 564–582 (1998). (in Russian)
Acknowledgements
The authors acknowledge the RSF for the research grant 18-18-00114.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2020 Springer Nature Switzerland AG
About this paper
Cite this paper
Krutchenko, O., Pronoza, E., Yagunova, E., Timokhov, V., Ivanets, A. (2020). Contextual Predictability of Texts for Texts Processing and Understanding. In: B. R., P., Thenkanidiyoor, V., Prasath, R., Vanga, O. (eds) Mining Intelligence and Knowledge Exploration. MIKE 2019. Lecture Notes in Computer Science(), vol 11987. Springer, Cham. https://doi.org/10.1007/978-3-030-66187-8_11
Download citation
DOI: https://doi.org/10.1007/978-3-030-66187-8_11
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-66186-1
Online ISBN: 978-3-030-66187-8
eBook Packages: Computer ScienceComputer Science (R0)