Contextual Predictability of Texts for Texts Processing and Understanding

Krutchenko, Olga; Pronoza, Ekaterina; Yagunova, Elena; Timokhov, Viktor; Ivanets, Alexander

doi:10.1007/978-3-030-66187-8_11

Olga Krutchenko¹²,
Ekaterina Pronoza¹²,
Elena Yagunova¹²,
Viktor Timokhov¹² &
…
Alexander Ivanets¹²

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 11987))

Included in the following conference series:

International Conference on Mining Intelligence and Knowledge Exploration

240 Accesses

Abstract

This paper is the first part of contextual predictability model investigation for Russian, it is focused on linguistic and psychology interpretation of models, features, metrics and sets of features. The aim of this paper is to identify the dependence of the implementation of contextual predictability procedures on the genre characteristics of the text (or text collection): scientific vs. fictional. We construct a model predicting text elements and designate its features for texts of different genres and domains. We analyze various methods for studying contextual predictability, carry out a computational experiment against scientific and fictional texts, and verify its results by the experiment with informants (cloze-tests) and word embeddings (word2vec CBOW model). As a result, text processing model is built. It is evaluated based on the selected contextual predictability features and experiments with informants.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

LG-Starship: A Framework for Text Analysis

Text Predictor for Lithuanian Language

Human and computer estimations of Predictability of words in written language

Article Open access 10 March 2020

References

Yagunova, E.V.: Fundamentals of theoretical, computational and experimental linguistics, or Reflections on the place of the linguist in computational linguistics. In: Bolshakova, E.I., Klyshinsky, E.S., Lande, D.V., Noskov, A.A., Peskova, O.V., Yagunova, E.V.-M.: Automatic Processing of Texts in Natural Language and Computational Linguistics: Studies. Allowance. MIEM (2011). (in Russian)
Google Scholar
Biemann, C., Remus, S., Hofmann, M.J.: Predicting word ‘predictability’ in cloze completion, electroencephalographic and eye movement data. Natural language processing and cognitive science. In: Bernadette, S., Wiesław, L., Rodolfo, D. (eds.) Libreria Editrice Cafoscarina, Venezia, pp. 83–95 (2015)
Google Scholar
Miller, G.A.: The magical number seven, plus or minus two: some limits on our capacity for processing information. Psychol. Rev. 63(2), 81–97 (1956)
Article Google Scholar
Owens, M., O’Boyle, P., Mcmahon, J., Ming J., Smith F.: A comparison of human and statistical language model performance using missing-word tests. Lang. Speech 40(4), 377–389 (1997)
Google Scholar
Robinson, R.D.: The cloze procedure: a new tool for adult education. Adult Educ. Quart. 23, 97–98 (1973)
Google Scholar
Taylor, W.L.: Cloze procedure: a new tool for measuring readability. J. Quart. 30, 415–433 (1953)
Google Scholar
Oller Jr., J.W., Yii, G.K., Greenberg, L.A., Hurtado R.: The learning effect from textual coherence measured with cloze. In: Oller Jr., J.W., Jonz, J. (Eds.) Cloze and coherence, Cranbury, NJ, pp. 247–268 (1994)
Google Scholar
Nusbaum, H.C., et al.: Why cloze procedure? In: Oller Jr., J.W., Jonz, J. (eds.) Cloze and Coherence, Cranbury, NJ, pp. 1–20 (1994)
Google Scholar
Yagunova, E.V.: Study of the contextual predictability of text units using corpus resources. In: Proceedings of the International Conference “Corpus Linguistics – 2008”. SPSU, SPb, pp. 396–403 (2008). (in Russian)
Google Scholar
Yagunova, E.V., Pivovarova, L.M.: The nature of collocations in the Russian language. The experience of automatic extraction and classification on the material of news texts. In: Proceedings of STI, Series 2, no. 6 (2010). (in Russian)
Google Scholar
Yagunova, E.V.: Variability of perception strategies of sounding text (experimental research based on Russian-language texts of various functional styles). SPSU – Perm (2008). (in Russian)
Google Scholar
Yagunova E.V.: Investigation of the redundancy of Russian sounding text. In: Voeikov, M.D. (ed.): Redundancy in the Grammatical Structure of the Language. SPb, Science, p. 462 (2010). (in Russian)
Google Scholar
Markov Models for Text Analysis. Purdue University, Department of Statistics (2009). http://www.stat.purdue.edu/~mdw/CSOI/MarkovLab.html. Accessed 15 Apr 2016
Khokhlova, M.V.: The study of lexical-semantic compatibility in Russian with the help of statistical methods (based on corpus text), St. Petersburg (2010). (in Russian)
Google Scholar
Yagunova, E.V., Pivovarova, L.M.: Study of the structure of news text as a sequence of connected segments. In: Izd-vo Rsuh, M.: Computational Linguistics and Intellectual Technologies: Based on the Materials of the Annual International Conference “Dialogue”, Bekasovo, 25–29 May 2011, vol. 10, no. 17 (2011). (in Russian)
Google Scholar
Yagunova, E.V.: Study of the contextual predictability of text units using corpus resources. In: Proceedings of the International Conference “Corpus Linguistics – 2008”. SPb, SPSU, pp. 396–403 (2008). (in Russian)
Google Scholar
McWhorter, J.: The world’s simplest grammars are creole grammars. Linguist. Typol. 5(2–3), 125–166 (2001)
Google Scholar
Kusters, W.: Linguistic complexity: the influence of social change on verbal inflection, Utrecht (2003)
Google Scholar
Dahl, Ö.: The growth and maintenance of linguistic complexity, Amsterdam (2004)
Google Scholar
Trudgill, P.: Sociolinguistic typology: social determinants of linguistic complexity, Oxford (2011)
Google Scholar
Sun, Y., Deng, H., Han, J.: Probabilistic models for text mining. In: Aggarwal, C., Zhai, C. (eds.) Mining Text Data, pp. 259–295. Springer, Boston (2012). https://doi.org/10.1007/978-1-4614-3223-4_8
Berdichevsky, A.: Language complexity (language complexity). Questions of linguistics, no. 5 (2012). (in Russian)
Google Scholar
Piotrovsky, R.G.: Linguistic Automaton (in research and continuous learning), SPb (1999). (in Russian)
Google Scholar
Piotrovsky R.G.: Informational measurements of language (1968). (in Russian)
Google Scholar
MacKay, D.: Information Theory, Inference, and Learning Algorithms. Cambridge University Press, Cambridge (2003)
Google Scholar
Daudaravicius, V.: Automatic identification of lexical units. In: Computational Linguistics and Intelligent text processing CICling (2009)
Google Scholar
Decision Trees: Entropy, Information Gain, Gain Ratio. Marina Santini. http://www.slideshare.net/marinasantini1/lecture-4-decision-trees-2-entropy-information-gain-gain-ratio-55241087?related=1. 18 May 2016
Myslín, M., Roger L.: Codeswitching and predictability of meaning in discourse. Language 91(4), 871–905 (2015)
Google Scholar
Babaylova, A.E.: Text as a product, means and object of communication in teaching a non-native language. Saratov University (1987). (in Russian)
Google Scholar
Miller J.A.: The magic number is seven plus or minus two. In: Gippenreiter, Y.B., Romanov, V.Y. (eds.) On Some Limits of Our Ability to Process Information, CheRo, Moscow, pp. 564–582 (1998). (in Russian)
Google Scholar

Download references

Acknowledgements

The authors acknowledge the RSF for the research grant 18-18-00114.

Author information

Authors and Affiliations

St. Petersburg State University, St. Petersburg, Russian Federation
Olga Krutchenko, Ekaterina Pronoza, Elena Yagunova, Viktor Timokhov & Alexander Ivanets

Authors

Olga Krutchenko
View author publications
You can also search for this author in PubMed Google Scholar
Ekaterina Pronoza
View author publications
You can also search for this author in PubMed Google Scholar
Elena Yagunova
View author publications
You can also search for this author in PubMed Google Scholar
Viktor Timokhov
View author publications
You can also search for this author in PubMed Google Scholar
Alexander Ivanets
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Ekaterina Pronoza .

Editor information

Editors and Affiliations

National Institute of Technology, Goa, India
Purushothama B. R.
National Institute of Technology, Goa, India
Veena Thenkanidiyoor
Indian Institute of Information Technology, Sri City, India
Rajendra Prasath
Indian Institute of Information Technology, Sri City, India
Odelu Vanga

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Krutchenko, O., Pronoza, E., Yagunova, E., Timokhov, V., Ivanets, A. (2020). Contextual Predictability of Texts for Texts Processing and Understanding. In: B. R., P., Thenkanidiyoor, V., Prasath, R., Vanga, O. (eds) Mining Intelligence and Knowledge Exploration. MIKE 2019. Lecture Notes in Computer Science(), vol 11987. Springer, Cham. https://doi.org/10.1007/978-3-030-66187-8_11

Download citation

DOI: https://doi.org/10.1007/978-3-030-66187-8_11
Published: 20 December 2020
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-66186-1
Online ISBN: 978-3-030-66187-8
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Contextual Predictability of Texts for Texts Processing and Understanding

Abstract

Access this chapter

Similar content being viewed by others

LG-Starship: A Framework for Text Analysis

Text Predictor for Lithuanian Language

Human and computer estimations of Predictability of words in written language

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Navigation

Contextual Predictability of Texts for Texts Processing and Understanding

Abstract

Access this chapter

Similar content being viewed by others

LG-Starship: A Framework for Text Analysis

Text Predictor for Lithuanian Language

Human and computer estimations of Predictability of words in written language

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation