Abstract
This paper concerns discourse-new mention detection in Russian. This might be helpful for different NLP applications such as coreference resolution, protagonist identification, summarization and different tasks of information extraction to detect the mention of an entity newly introduced into discourse. In our work, we are dealing with the Russian where there is no grammatical devices, like articles in English, for the overt marking a newly introduced referent. Our aim is to check the impact of various features on this task. The focus is on specific devices for introducing a new discourse prominent referent in Russian specified in theoretical studies. We conduct a pilot study of features impact and provide a series of experiments on detecting the first mention of a referent in a non-singleton coreference chain, drawing on linguistic insights about how a prominent entity introduced into discourse is affected by structural, morphological and lexical features.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsNotes
- 1.
The corpus may be downloaded on http://rucoref.maimbava.net.
- 2.
Pronouns are a closed grammatical class therefore it may be treated as a list.
References
Ariel, M.: Accessing Noun-Phrase Antecedents. Routledge, London (1990)
Arutyunova, N.: Nomination, reference, meaning. [nominaciya, referenciya, znacheniye] (in Russian). In: Nomination: General Questions. [Nominaciya: obshie voprosi]. Nauka (1980)
Bean, D.L., Riloff, E.: Corpus-based identification of non-anaphoric noun phrases. In: Proceedings of the 37th Annual Meeting of the Association for Computational Linguistics on Computational Linguistics (ACL 1999), pp. 373–380. Association for Computational Linguistics, Stroudsburg (1999)
Bonch-Osmolovskaya, A., Toldova, S., Klintsov, V.: Introductory noun phrases: a case of mass media texts. [strategii introduktivnoj nominacii v teksrah smi] (in Russian) (2012)
Givón, T. (ed.): Topic Continuity in Discourse: A Quantitative Cross-Language Study. John Benjamins, Amsterdam (1983)
Grosz, B.J., Weinstein, S., Joshi, A.K.: Centering: a framework for modeling the local coherence of discourse. Comput. Linguist. 21(2), 203–225 (1995)
Han, H., Wang, W.-Y., Mao, B.-H.: Borderline-SMOTE: a new over-sampling method in imbalanced data sets learning. In: Huang, D.-S., Zhang, X.-P., Huang, G.-B. (eds.) ICIC 2005. LNCS, vol. 3644, pp. 878–887. Springer, Heidelberg (2005). https://doi.org/10.1007/11538059_91
Hawkins, J.A.: Definiteness and Indefniteness: A Study in Reference and Grammaticality Prediction. Croom Helm, London (1978)
Ionov, M., Kutuzov, A.: Influence of morphology processing quality on automated anaphora resolution for Russian. In: Proceedings of the International Conference Dialogue-2014. RGGU (2014)
Kabadjov, M.A.: A comprehensive evaluation of anaphora resolution and discourse-new classification. Ph.D. thesis. Citeseer (2007)
Kibrik, A., Linnik, A., Dobrov, G., Khudyakova, M.: Optimizacija modeli referencial’nogo vybora, osnovannoj na mashinnom obuchenii [Optimization of a model of referential choice, based on machine learning]. In: Computational Linguistics and Intellectual Technologies, vol. 11, pp. 237–246. RGGU, Moscow (2012)
Kibrik, A.A.: Reference in Discourse. Oxford University Press, Oxford (2011)
Lappin, S., Leass, H.J.: An algorithm for pronominal anaphora resolution. Comput. Linguist. 20(4), 535–561 (1994)
Löbner, S.: Definites. J. Semant. 4(4), 279–326 (1985)
Mitkov, R.: Anaphora resolution: the state of the art (1999)
Ng, V., Cardie, C.: Identifying anaphoric and non-anaphoric noun phrases to improve coreference resolution. In: Proceedings of the 19th International Conference on Computational Linguistics, vol. 1, pp. 1–7. Association for Computational Linguistics (2002)
Palek, B.: Cross-Reference a Study from Hyper-syntax. Universita Karlova, Prague (1968)
Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O., Blondel, M., Prettenhofer, P., Weiss, R., Dubourg, V., Vanderplas, J., Passos, A., Cournapeau, D., Brucher, M., Perrot, M., Duchesnay, E.: Scikit-learn: machine learning in Python. J. Mach. Learn. Res. 12, 2825–2830 (2011)
Poesio, M., Kabadjov, M.A.: A general-purpose, off-the-shelf anaphora resolution module: implementation and preliminary evaluation. In: Proceeding of LREC, pp. 663–666 (2004)
Poesio, M., Kabadjov, M.A., Vieira, R., Goulart, R., Uryupina, O.: Does discourse-new detection help definite description resolution. In: Proceedings of the Sixth International Workshop on Computational Semantics, Tillburg (2005)
Poesio, M., Ponzetto, S.P., Versley, Y.: Computational models of anaphora resolution: a survey (2010)
Poesio, M., Vieira, R.: A corpus-based investigation of definite description use. Comput. Linguist. 24(2), 183–216 (1998)
Prince, E.F.: The ZPG letter: subjects, definiteness, and information-status. In: Discourse Description: Diverse Analyses of a Fund Raising Text, pp. 295–325 (1992)
Recasens, M., de Marneffe, M.C., Potts, C.: The life and death of discourse entities: identifying singleton mentions. In: Human Language Technologies: The 2013 Annual Conference of the North American Chapter of the Association for Computational Linguistics, pp. 627–633. Association for Computational Linguistics, Stroudsburg, June 2013
Sharoff, S., Nivre, J.: The proper place of men and machines in language technology: processing Russian without any linguistic knowledge. In: Proceedings of Dialogue, Russian International Conference on Computational Linguistics, Bekasovo (2011)
Toldova, S.: Struktura diskursa i mehanizm fokusirovaniya kak vazhnie faktori vibora nominatsii ob’ekta v tekste (Discourse structure and the focusing mechanism as important factors of referential choice in text) (1994)
Toldova, S., Rojtberg, A., Ladygina, A., Vasilyeva, M., Azerkovich, I., Kurzukov, M., Ivanova, A., Nedoluzhko, A., Grishina, J.: RU-EVAL-2014: evaluating anaphora and coreference resolution for Russian. Comput. Linguist. Intell. Technol. 13(20), 681–694 (2014)
Uryupina, O.: High-precision identification of discourse new and unique noun phrases. In: ACL Student Workshop, Sapporo (2003)
Vieira, R., Poesio, M.: An empirically based system for processing definite descriptions. Comput. Linguist. 26(4), 539–593 (2000)
Acknowledgments
The authors would like to thank anonymous reviewers for their helpful comments, the Lomonosov Moscow University students who participated in the corpus markup, and Dmitrij Gorshkov for software support.
This research was supported by grant from Russian Foundation for Basic Research Fund (15-07-09306).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2018 Springer International Publishing AG, part of Springer Nature
About this paper
Cite this paper
Toldova, S., Ionov, M. (2018). Features for Discourse-New Referent Detection in Russian. In: Gelbukh, A. (eds) Computational Linguistics and Intelligent Text Processing. CICLing 2016. Lecture Notes in Computer Science(), vol 9623. Springer, Cham. https://doi.org/10.1007/978-3-319-75477-2_47
Download citation
DOI: https://doi.org/10.1007/978-3-319-75477-2_47
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-75476-5
Online ISBN: 978-3-319-75477-2
eBook Packages: Computer ScienceComputer Science (R0)