Skip to main content

Contextual Predictability of Texts for Texts Processing and Understanding

  • Conference paper
  • First Online:
Mining Intelligence and Knowledge Exploration (MIKE 2019)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 11987))

  • 240 Accesses

Abstract

This paper is the first part of contextual predictability model investigation for Russian, it is focused on linguistic and psychology interpretation of models, features, metrics and sets of features. The aim of this paper is to identify the dependence of the implementation of contextual predictability procedures on the genre characteristics of the text (or text collection): scientific vs. fictional. We construct a model predicting text elements and designate its features for texts of different genres and domains. We analyze various methods for studying contextual predictability, carry out a computational experiment against scientific and fictional texts, and verify its results by the experiment with informants (cloze-tests) and word embeddings (word2vec CBOW model). As a result, text processing model is built. It is evaluated based on the selected contextual predictability features and experiments with informants.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

References

  1. Yagunova, E.V.: Fundamentals of theoretical, computational and experimental linguistics, or Reflections on the place of the linguist in computational linguistics. In: Bolshakova, E.I., Klyshinsky, E.S., Lande, D.V., Noskov, A.A., Peskova, O.V., Yagunova, E.V.-M.: Automatic Processing of Texts in Natural Language and Computational Linguistics: Studies. Allowance. MIEM (2011). (in Russian)

    Google Scholar 

  2. Biemann, C., Remus, S., Hofmann, M.J.: Predicting word ‘predictability’ in cloze completion, electroencephalographic and eye movement data. Natural language processing and cognitive science. In: Bernadette, S., Wiesław, L., Rodolfo, D. (eds.) Libreria Editrice Cafoscarina, Venezia, pp. 83–95 (2015)

    Google Scholar 

  3. Miller, G.A.: The magical number seven, plus or minus two: some limits on our capacity for processing information. Psychol. Rev. 63(2), 81–97 (1956)

    Article  Google Scholar 

  4. Owens, M., O’Boyle, P., Mcmahon, J., Ming J., Smith F.: A comparison of human and statistical language model performance using missing-word tests. Lang. Speech 40(4), 377–389 (1997)

    Google Scholar 

  5. Robinson, R.D.: The cloze procedure: a new tool for adult education. Adult Educ. Quart. 23, 97–98 (1973)

    Google Scholar 

  6. Taylor, W.L.: Cloze procedure: a new tool for measuring readability. J. Quart. 30, 415–433 (1953)

    Google Scholar 

  7. Oller Jr., J.W., Yii, G.K., Greenberg, L.A., Hurtado R.: The learning effect from textual coherence measured with cloze. In: Oller Jr., J.W., Jonz, J. (Eds.) Cloze and coherence, Cranbury, NJ, pp. 247–268 (1994)

    Google Scholar 

  8. Nusbaum, H.C., et al.: Why cloze procedure? In: Oller Jr., J.W., Jonz, J. (eds.) Cloze and Coherence, Cranbury, NJ, pp. 1–20 (1994)

    Google Scholar 

  9. Yagunova, E.V.: Study of the contextual predictability of text units using corpus resources. In: Proceedings of the International Conference “Corpus Linguistics – 2008”. SPSU, SPb, pp. 396–403 (2008). (in Russian)

    Google Scholar 

  10. Yagunova, E.V., Pivovarova, L.M.: The nature of collocations in the Russian language. The experience of automatic extraction and classification on the material of news texts. In: Proceedings of STI, Series 2, no. 6 (2010). (in Russian)

    Google Scholar 

  11. Yagunova, E.V.: Variability of perception strategies of sounding text (experimental research based on Russian-language texts of various functional styles). SPSU – Perm (2008). (in Russian)

    Google Scholar 

  12. Yagunova E.V.: Investigation of the redundancy of Russian sounding text. In: Voeikov, M.D. (ed.): Redundancy in the Grammatical Structure of the Language. SPb, Science, p. 462 (2010). (in Russian)

    Google Scholar 

  13. Markov Models for Text Analysis. Purdue University, Department of Statistics (2009). http://www.stat.purdue.edu/~mdw/CSOI/MarkovLab.html. Accessed 15 Apr 2016

  14. Khokhlova, M.V.: The study of lexical-semantic compatibility in Russian with the help of statistical methods (based on corpus text), St. Petersburg (2010). (in Russian)

    Google Scholar 

  15. Yagunova, E.V., Pivovarova, L.M.: Study of the structure of news text as a sequence of connected segments. In: Izd-vo Rsuh, M.: Computational Linguistics and Intellectual Technologies: Based on the Materials of the Annual International Conference “Dialogue”, Bekasovo, 25–29 May 2011, vol. 10, no. 17 (2011). (in Russian)

    Google Scholar 

  16. Yagunova, E.V.: Study of the contextual predictability of text units using corpus resources. In: Proceedings of the International Conference “Corpus Linguistics – 2008”. SPb, SPSU, pp. 396–403 (2008). (in Russian)

    Google Scholar 

  17. McWhorter, J.: The world’s simplest grammars are creole grammars. Linguist. Typol. 5(2–3), 125–166 (2001)

    Google Scholar 

  18. Kusters, W.: Linguistic complexity: the influence of social change on verbal inflection, Utrecht (2003)

    Google Scholar 

  19. Dahl, Ö.: The growth and maintenance of linguistic complexity, Amsterdam (2004)

    Google Scholar 

  20. Trudgill, P.: Sociolinguistic typology: social determinants of linguistic complexity, Oxford (2011)

    Google Scholar 

  21. Sun, Y., Deng, H., Han, J.: Probabilistic models for text mining. In: Aggarwal, C., Zhai, C. (eds.) Mining Text Data, pp. 259–295. Springer, Boston (2012). https://doi.org/10.1007/978-1-4614-3223-4_8

  22. Berdichevsky, A.: Language complexity (language complexity). Questions of linguistics, no. 5 (2012). (in Russian)

    Google Scholar 

  23. Piotrovsky, R.G.: Linguistic Automaton (in research and continuous learning), SPb (1999). (in Russian)

    Google Scholar 

  24. Piotrovsky R.G.: Informational measurements of language (1968). (in Russian)

    Google Scholar 

  25. MacKay, D.: Information Theory, Inference, and Learning Algorithms. Cambridge University Press, Cambridge (2003)

    Google Scholar 

  26. Daudaravicius, V.: Automatic identification of lexical units. In: Computational Linguistics and Intelligent text processing CICling (2009)

    Google Scholar 

  27. Decision Trees: Entropy, Information Gain, Gain Ratio. Marina Santini. http://www.slideshare.net/marinasantini1/lecture-4-decision-trees-2-entropy-information-gain-gain-ratio-55241087?related=1. 18 May 2016

  28. Myslín, M., Roger L.: Codeswitching and predictability of meaning in discourse. Language 91(4), 871–905 (2015)

    Google Scholar 

  29. Babaylova, A.E.: Text as a product, means and object of communication in teaching a non-native language. Saratov University (1987). (in Russian)

    Google Scholar 

  30. Miller J.A.: The magic number is seven plus or minus two. In: Gippenreiter, Y.B., Romanov, V.Y. (eds.) On Some Limits of Our Ability to Process Information, CheRo, Moscow, pp. 564–582 (1998). (in Russian)

    Google Scholar 

Download references

Acknowledgements

The authors acknowledge the RSF for the research grant 18-18-00114.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Ekaterina Pronoza .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2020 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Krutchenko, O., Pronoza, E., Yagunova, E., Timokhov, V., Ivanets, A. (2020). Contextual Predictability of Texts for Texts Processing and Understanding. In: B. R., P., Thenkanidiyoor, V., Prasath, R., Vanga, O. (eds) Mining Intelligence and Knowledge Exploration. MIKE 2019. Lecture Notes in Computer Science(), vol 11987. Springer, Cham. https://doi.org/10.1007/978-3-030-66187-8_11

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-66187-8_11

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-66186-1

  • Online ISBN: 978-3-030-66187-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics