Features for Discourse-New Referent Detection in Russian

Toldova, Svetlana; Ionov, Max

doi:10.1007/978-3-319-75477-2_47

Features for Discourse-New Referent Detection in Russian

Svetlana Toldova¹⁴ &
Max Ionov^15,16

Conference paper
First Online: 21 March 2018

1328 Accesses

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 9623))

Abstract

This paper concerns discourse-new mention detection in Russian. This might be helpful for different NLP applications such as coreference resolution, protagonist identification, summarization and different tasks of information extraction to detect the mention of an entity newly introduced into discourse. In our work, we are dealing with the Russian where there is no grammatical devices, like articles in English, for the overt marking a newly introduced referent. Our aim is to check the impact of various features on this task. The focus is on specific devices for introducing a new discourse prominent referent in Russian specified in theoretical studies. We conduct a pilot study of features impact and provide a series of experiments on detecting the first mention of a referent in a non-singleton coreference chain, drawing on linguistic insights about how a prominent entity introduced into discourse is affected by structural, morphological and lexical features.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Notes

1.
The corpus may be downloaded on http://rucoref.maimbava.net.
2.
Pronouns are a closed grammatical class therefore it may be treated as a list.

References

Ariel, M.: Accessing Noun-Phrase Antecedents. Routledge, London (1990)
Google Scholar
Arutyunova, N.: Nomination, reference, meaning. [nominaciya, referenciya, znacheniye] (in Russian). In: Nomination: General Questions. [Nominaciya: obshie voprosi]. Nauka (1980)
Google Scholar
Bean, D.L., Riloff, E.: Corpus-based identification of non-anaphoric noun phrases. In: Proceedings of the 37th Annual Meeting of the Association for Computational Linguistics on Computational Linguistics (ACL 1999), pp. 373–380. Association for Computational Linguistics, Stroudsburg (1999)
Google Scholar
Bonch-Osmolovskaya, A., Toldova, S., Klintsov, V.: Introductory noun phrases: a case of mass media texts. [strategii introduktivnoj nominacii v teksrah smi] (in Russian) (2012)
Google Scholar
Givón, T. (ed.): Topic Continuity in Discourse: A Quantitative Cross-Language Study. John Benjamins, Amsterdam (1983)
Google Scholar
Grosz, B.J., Weinstein, S., Joshi, A.K.: Centering: a framework for modeling the local coherence of discourse. Comput. Linguist. 21(2), 203–225 (1995)
Google Scholar
Han, H., Wang, W.-Y., Mao, B.-H.: Borderline-SMOTE: a new over-sampling method in imbalanced data sets learning. In: Huang, D.-S., Zhang, X.-P., Huang, G.-B. (eds.) ICIC 2005. LNCS, vol. 3644, pp. 878–887. Springer, Heidelberg (2005). https://doi.org/10.1007/11538059_91
Chapter Google Scholar
Hawkins, J.A.: Definiteness and Indefniteness: A Study in Reference and Grammaticality Prediction. Croom Helm, London (1978)
Google Scholar
Ionov, M., Kutuzov, A.: Influence of morphology processing quality on automated anaphora resolution for Russian. In: Proceedings of the International Conference Dialogue-2014. RGGU (2014)
Google Scholar
Kabadjov, M.A.: A comprehensive evaluation of anaphora resolution and discourse-new classification. Ph.D. thesis. Citeseer (2007)
Google Scholar
Kibrik, A., Linnik, A., Dobrov, G., Khudyakova, M.: Optimizacija modeli referencial’nogo vybora, osnovannoj na mashinnom obuchenii [Optimization of a model of referential choice, based on machine learning]. In: Computational Linguistics and Intellectual Technologies, vol. 11, pp. 237–246. RGGU, Moscow (2012)
Google Scholar
Kibrik, A.A.: Reference in Discourse. Oxford University Press, Oxford (2011)
Book Google Scholar
Lappin, S., Leass, H.J.: An algorithm for pronominal anaphora resolution. Comput. Linguist. 20(4), 535–561 (1994)
Google Scholar
Löbner, S.: Definites. J. Semant. 4(4), 279–326 (1985)
Article Google Scholar
Mitkov, R.: Anaphora resolution: the state of the art (1999)
Google Scholar
Ng, V., Cardie, C.: Identifying anaphoric and non-anaphoric noun phrases to improve coreference resolution. In: Proceedings of the 19th International Conference on Computational Linguistics, vol. 1, pp. 1–7. Association for Computational Linguistics (2002)
Google Scholar
Palek, B.: Cross-Reference a Study from Hyper-syntax. Universita Karlova, Prague (1968)
Google Scholar
Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O., Blondel, M., Prettenhofer, P., Weiss, R., Dubourg, V., Vanderplas, J., Passos, A., Cournapeau, D., Brucher, M., Perrot, M., Duchesnay, E.: Scikit-learn: machine learning in Python. J. Mach. Learn. Res. 12, 2825–2830 (2011)
MathSciNet MATH Google Scholar
Poesio, M., Kabadjov, M.A.: A general-purpose, off-the-shelf anaphora resolution module: implementation and preliminary evaluation. In: Proceeding of LREC, pp. 663–666 (2004)
Google Scholar
Poesio, M., Kabadjov, M.A., Vieira, R., Goulart, R., Uryupina, O.: Does discourse-new detection help definite description resolution. In: Proceedings of the Sixth International Workshop on Computational Semantics, Tillburg (2005)
Google Scholar
Poesio, M., Ponzetto, S.P., Versley, Y.: Computational models of anaphora resolution: a survey (2010)
Google Scholar
Poesio, M., Vieira, R.: A corpus-based investigation of definite description use. Comput. Linguist. 24(2), 183–216 (1998)
Google Scholar
Prince, E.F.: The ZPG letter: subjects, definiteness, and information-status. In: Discourse Description: Diverse Analyses of a Fund Raising Text, pp. 295–325 (1992)
Google Scholar
Recasens, M., de Marneffe, M.C., Potts, C.: The life and death of discourse entities: identifying singleton mentions. In: Human Language Technologies: The 2013 Annual Conference of the North American Chapter of the Association for Computational Linguistics, pp. 627–633. Association for Computational Linguistics, Stroudsburg, June 2013
Google Scholar
Sharoff, S., Nivre, J.: The proper place of men and machines in language technology: processing Russian without any linguistic knowledge. In: Proceedings of Dialogue, Russian International Conference on Computational Linguistics, Bekasovo (2011)
Google Scholar
Toldova, S.: Struktura diskursa i mehanizm fokusirovaniya kak vazhnie faktori vibora nominatsii ob’ekta v tekste (Discourse structure and the focusing mechanism as important factors of referential choice in text) (1994)
Google Scholar
Toldova, S., Rojtberg, A., Ladygina, A., Vasilyeva, M., Azerkovich, I., Kurzukov, M., Ivanova, A., Nedoluzhko, A., Grishina, J.: RU-EVAL-2014: evaluating anaphora and coreference resolution for Russian. Comput. Linguist. Intell. Technol. 13(20), 681–694 (2014)
Google Scholar
Uryupina, O.: High-precision identification of discourse new and unique noun phrases. In: ACL Student Workshop, Sapporo (2003)
Google Scholar
Vieira, R., Poesio, M.: An empirically based system for processing definite descriptions. Comput. Linguist. 26(4), 539–593 (2000)
Article Google Scholar

Download references

Acknowledgments

The authors would like to thank anonymous reviewers for their helpful comments, the Lomonosov Moscow University students who participated in the corpus markup, and Dmitrij Gorshkov for software support.

This research was supported by grant from Russian Foundation for Basic Research Fund (15-07-09306).

Author information

Authors and Affiliations

National Research University “Higher School of Economics”, Moscow, Russia
Svetlana Toldova
Goethe University Frankfurt, Frankfurt, Germany
Max Ionov
Moscow State University, Moscow, Russia
Max Ionov

Authors

Svetlana Toldova
View author publications
You can also search for this author in PubMed Google Scholar
Max Ionov
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Max Ionov .

Editor information

Editors and Affiliations

CIC, Instituto Politécnico Nacional, Mexico City, Mexico
Alexander Gelbukh

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Toldova, S., Ionov, M. (2018). Features for Discourse-New Referent Detection in Russian. In: Gelbukh, A. (eds) Computational Linguistics and Intelligent Text Processing. CICLing 2016. Lecture Notes in Computer Science(), vol 9623. Springer, Cham. https://doi.org/10.1007/978-3-319-75477-2_47

Download citation

DOI: https://doi.org/10.1007/978-3-319-75477-2_47
Published: 21 March 2018
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-75476-5
Online ISBN: 978-3-319-75477-2
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics