Skip to main content

Features for Discourse-New Referent Detection in Russian

  • Conference paper
  • First Online:
  • 1328 Accesses

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 9623))

Abstract

This paper concerns discourse-new mention detection in Russian. This might be helpful for different NLP applications such as coreference resolution, protagonist identification, summarization and different tasks of information extraction to detect the mention of an entity newly introduced into discourse. In our work, we are dealing with the Russian where there is no grammatical devices, like articles in English, for the overt marking a newly introduced referent. Our aim is to check the impact of various features on this task. The focus is on specific devices for introducing a new discourse prominent referent in Russian specified in theoretical studies. We conduct a pilot study of features impact and provide a series of experiments on detecting the first mention of a referent in a non-singleton coreference chain, drawing on linguistic insights about how a prominent entity introduced into discourse is affected by structural, morphological and lexical features.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Notes

  1. 1.

    The corpus may be downloaded on http://rucoref.maimbava.net.

  2. 2.

    Pronouns are a closed grammatical class therefore it may be treated as a list.

References

  1. Ariel, M.: Accessing Noun-Phrase Antecedents. Routledge, London (1990)

    Google Scholar 

  2. Arutyunova, N.: Nomination, reference, meaning. [nominaciya, referenciya, znacheniye] (in Russian). In: Nomination: General Questions. [Nominaciya: obshie voprosi]. Nauka (1980)

    Google Scholar 

  3. Bean, D.L., Riloff, E.: Corpus-based identification of non-anaphoric noun phrases. In: Proceedings of the 37th Annual Meeting of the Association for Computational Linguistics on Computational Linguistics (ACL 1999), pp. 373–380. Association for Computational Linguistics, Stroudsburg (1999)

    Google Scholar 

  4. Bonch-Osmolovskaya, A., Toldova, S., Klintsov, V.: Introductory noun phrases: a case of mass media texts. [strategii introduktivnoj nominacii v teksrah smi] (in Russian) (2012)

    Google Scholar 

  5. Givón, T. (ed.): Topic Continuity in Discourse: A Quantitative Cross-Language Study. John Benjamins, Amsterdam (1983)

    Google Scholar 

  6. Grosz, B.J., Weinstein, S., Joshi, A.K.: Centering: a framework for modeling the local coherence of discourse. Comput. Linguist. 21(2), 203–225 (1995)

    Google Scholar 

  7. Han, H., Wang, W.-Y., Mao, B.-H.: Borderline-SMOTE: a new over-sampling method in imbalanced data sets learning. In: Huang, D.-S., Zhang, X.-P., Huang, G.-B. (eds.) ICIC 2005. LNCS, vol. 3644, pp. 878–887. Springer, Heidelberg (2005). https://doi.org/10.1007/11538059_91

    Chapter  Google Scholar 

  8. Hawkins, J.A.: Definiteness and Indefniteness: A Study in Reference and Grammaticality Prediction. Croom Helm, London (1978)

    Google Scholar 

  9. Ionov, M., Kutuzov, A.: Influence of morphology processing quality on automated anaphora resolution for Russian. In: Proceedings of the International Conference Dialogue-2014. RGGU (2014)

    Google Scholar 

  10. Kabadjov, M.A.: A comprehensive evaluation of anaphora resolution and discourse-new classification. Ph.D. thesis. Citeseer (2007)

    Google Scholar 

  11. Kibrik, A., Linnik, A., Dobrov, G., Khudyakova, M.: Optimizacija modeli referencial’nogo vybora, osnovannoj na mashinnom obuchenii [Optimization of a model of referential choice, based on machine learning]. In: Computational Linguistics and Intellectual Technologies, vol. 11, pp. 237–246. RGGU, Moscow (2012)

    Google Scholar 

  12. Kibrik, A.A.: Reference in Discourse. Oxford University Press, Oxford (2011)

    Book  Google Scholar 

  13. Lappin, S., Leass, H.J.: An algorithm for pronominal anaphora resolution. Comput. Linguist. 20(4), 535–561 (1994)

    Google Scholar 

  14. Löbner, S.: Definites. J. Semant. 4(4), 279–326 (1985)

    Article  Google Scholar 

  15. Mitkov, R.: Anaphora resolution: the state of the art (1999)

    Google Scholar 

  16. Ng, V., Cardie, C.: Identifying anaphoric and non-anaphoric noun phrases to improve coreference resolution. In: Proceedings of the 19th International Conference on Computational Linguistics, vol. 1, pp. 1–7. Association for Computational Linguistics (2002)

    Google Scholar 

  17. Palek, B.: Cross-Reference a Study from Hyper-syntax. Universita Karlova, Prague (1968)

    Google Scholar 

  18. Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O., Blondel, M., Prettenhofer, P., Weiss, R., Dubourg, V., Vanderplas, J., Passos, A., Cournapeau, D., Brucher, M., Perrot, M., Duchesnay, E.: Scikit-learn: machine learning in Python. J. Mach. Learn. Res. 12, 2825–2830 (2011)

    MathSciNet  MATH  Google Scholar 

  19. Poesio, M., Kabadjov, M.A.: A general-purpose, off-the-shelf anaphora resolution module: implementation and preliminary evaluation. In: Proceeding of LREC, pp. 663–666 (2004)

    Google Scholar 

  20. Poesio, M., Kabadjov, M.A., Vieira, R., Goulart, R., Uryupina, O.: Does discourse-new detection help definite description resolution. In: Proceedings of the Sixth International Workshop on Computational Semantics, Tillburg (2005)

    Google Scholar 

  21. Poesio, M., Ponzetto, S.P., Versley, Y.: Computational models of anaphora resolution: a survey (2010)

    Google Scholar 

  22. Poesio, M., Vieira, R.: A corpus-based investigation of definite description use. Comput. Linguist. 24(2), 183–216 (1998)

    Google Scholar 

  23. Prince, E.F.: The ZPG letter: subjects, definiteness, and information-status. In: Discourse Description: Diverse Analyses of a Fund Raising Text, pp. 295–325 (1992)

    Google Scholar 

  24. Recasens, M., de Marneffe, M.C., Potts, C.: The life and death of discourse entities: identifying singleton mentions. In: Human Language Technologies: The 2013 Annual Conference of the North American Chapter of the Association for Computational Linguistics, pp. 627–633. Association for Computational Linguistics, Stroudsburg, June 2013

    Google Scholar 

  25. Sharoff, S., Nivre, J.: The proper place of men and machines in language technology: processing Russian without any linguistic knowledge. In: Proceedings of Dialogue, Russian International Conference on Computational Linguistics, Bekasovo (2011)

    Google Scholar 

  26. Toldova, S.: Struktura diskursa i mehanizm fokusirovaniya kak vazhnie faktori vibora nominatsii ob’ekta v tekste (Discourse structure and the focusing mechanism as important factors of referential choice in text) (1994)

    Google Scholar 

  27. Toldova, S., Rojtberg, A., Ladygina, A., Vasilyeva, M., Azerkovich, I., Kurzukov, M., Ivanova, A., Nedoluzhko, A., Grishina, J.: RU-EVAL-2014: evaluating anaphora and coreference resolution for Russian. Comput. Linguist. Intell. Technol. 13(20), 681–694 (2014)

    Google Scholar 

  28. Uryupina, O.: High-precision identification of discourse new and unique noun phrases. In: ACL Student Workshop, Sapporo (2003)

    Google Scholar 

  29. Vieira, R., Poesio, M.: An empirically based system for processing definite descriptions. Comput. Linguist. 26(4), 539–593 (2000)

    Article  Google Scholar 

Download references

Acknowledgments

The authors would like to thank anonymous reviewers for their helpful comments, the Lomonosov Moscow University students who participated in the corpus markup, and Dmitrij Gorshkov for software support.

This research was supported by grant from Russian Foundation for Basic Research Fund (15-07-09306).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Max Ionov .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2018 Springer International Publishing AG, part of Springer Nature

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Toldova, S., Ionov, M. (2018). Features for Discourse-New Referent Detection in Russian. In: Gelbukh, A. (eds) Computational Linguistics and Intelligent Text Processing. CICLing 2016. Lecture Notes in Computer Science(), vol 9623. Springer, Cham. https://doi.org/10.1007/978-3-319-75477-2_47

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-75477-2_47

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-75476-5

  • Online ISBN: 978-3-319-75477-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics