Skip to main content

Introducing Baselines for Russian Named Entity Recognition

  • Conference paper

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 7816))

Abstract

Current research efforts in Named Entity Recognition deal mostly with the English language. Even though the interest in multi-language Information Extraction is growing, there are only few works reporting results for the Russian language. This paper introduces quality baselines for the Russian NER task. We propose a corpus which was manually annotated with organization and person names. The main purpose of this corpus is to provide gold standard for evaluation. We implemented and evaluated two approaches to NER: knowledge-based and statistical. The first one comprises several components: dictionary matching, pattern matching and rule-based search of lexical representations of entity names within a document. We assembled a set of linguistic resources and evaluated their impact on performance. For the data-driven approach we utilized our implementation of a linear-chain CRF which uses a rich set of features. The performance of both systems is promising (62.17% and 75.05% F 1 measure), although they do not employ morphological or syntactical analysis.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Grishman, R., Sundheim, B.: Message understanding conference-6: a brief history. In: Proceedings of the 16th Conference on Computational Linguistics, vol. 1, pp. 466–471. ACL, Stroudsburg (1996)

    Chapter  Google Scholar 

  2. Tjong Kim Sang, E.F., De Meulder, F.: Introduction to the CoNLL-2003 shared task: language-independent named entity recognition. In: Proceedings of the Seventh Conference on Natural Language Learning at HLT-NAACL 2003, pp. 142–147. ACL, Morristown (2003)

    Chapter  Google Scholar 

  3. Cunningham, H., Wilks, Y., Gaizauskas, R.J.: GATE: a general architecture for text engineering. In: Proceedings of the 16th Conference on Computational Linguistics, vol. 2, pp. 1057–1060. ACL, Stroudsburg (1996)

    Chapter  Google Scholar 

  4. Popov, B., Kirilov, A., Maynard, D., Manov, D.: Creation of reusable components and language resources for named entity recognition in Russian. In: Proceedings of the Fourth International Conference on Language Resources and Evaluation. European Language Resources Association (2004)

    Google Scholar 

  5. Babych, B., Hartley, A.: Improving machine translation quality with automatic named entity recognition. In: Proceedings of the 7th International EAMT Workshop, pp. 1–8. ACL, Stroudsburg (2003)

    Google Scholar 

  6. Lafferty, J.D., McCallum, A., Pereira, F.C.N.: Conditional random fields: Probabilistic models for segmenting and labeling sequence data. In: Proceedings of the Eighteenth International Conference on Machine Learning, pp. 282–289. Morgan Kaufmann Publishers Inc., San Francisco (2001)

    Google Scholar 

  7. Kim, J.D., Ohta, T., Tsuruoka, Y., Tateisi, Y., Collier, N.: Introduction to the bio-entity recognition task at JNLPBA. In: Proceedings of the International Joint Workshop on Natural Language Processing in Biomedicine and its Applications, pp. 70–75. ACL, Stroudsburg (2004)

    Google Scholar 

  8. Ritter, A., Clark, S., Mausam, E.O.: Named entity recognition in tweets: an experimental study. In: Proceedings of the Conference on Empirical Methods in Natural Language Processing, pp. 1524–1534. ACL, Stroudsburg (2011)

    Google Scholar 

  9. Nadeau, D., Sekine, S.: A survey of named entity recognition and classification. Lingvisticae Investigationes 30(1), 3–26 (2007)

    Article  Google Scholar 

  10. Ratinov, L., Roth, D.: Design challenges and misconceptions in named entity recognition. In: Proceedings of the 13th Conference on Computational Natural Language Learning, pp. 147–155. ACL, Stroudsburg (2009)

    Chapter  Google Scholar 

  11. Lin, D., Wu, X.: Phrase clustering for discriminative learning. In: Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP, pp. 1030–1038. ACL, Suntec (2009)

    Google Scholar 

  12. Ciaramita, M., Altun, Y.: Named-entity recognition in novel domains with external lexical knowledge. In: Proceedings of the NIPS Workshop on Advances in Structured Learning for Text and Speech Processing (2005)

    Google Scholar 

  13. Tkachenko, M., Simanovsky, A.: Named entity recognition: Exploring features. In: Jancsary, J. (ed.) Proceedings of KONVENS 2012, ÖGAI, pp. 118–127 (2012)

    Google Scholar 

  14. Hovy, E., Marcus, M., Palmer, M., Ramshaw, L., Weischedel, R.: OntoNotes: the 90% solution. In: Proceedings of the Human Language Technology Conference of the NAACL, Companion Volume: Short Papers, pp. 57–60. Association for Computational Linguistics, Stroudsburg (2006)

    Chapter  Google Scholar 

  15. Finkel, J.R., Manning, C.D.: Joint parsing and named entity recognition. In: Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics, pp. 326–334. ACL, Stroudsburg (2009)

    Google Scholar 

  16. Du, M., von Etter, P., Kopotev, M., Novikov, M., Tarbeeva, N., Yangarber, R.: Building Support Tools for Russian-Language Information Extraction. In: Habernal, I., Matoušek, V. (eds.) TSD 2011. LNCS, vol. 6836, pp. 380–387. Springer, Heidelberg (2011)

    Chapter  Google Scholar 

  17. Ehrmann, M., Turchi, M., Steinberger, R.: Building a multilingual named entity-annotated corpus using annotation projection. In: Proceedings of the International Conference Recent Advances in Natural Language Processing, RANLP 2011 Organising Committee, Hissar, Bulgaria, pp. 118–124 (2011)

    Google Scholar 

  18. Szabó, M.K., Vincze, V., Nagy T., I.: HunOr: A Hungarian-Russian parallel corpus. In: Calzolari, N., Choukri, K., Declerck, T., Dogan, M.U., Maegaard, B., Mariani, J., Odijk, J., Piperidis, S. (eds.) Proceedings of the 8th International Conference on Language Resources and Evaluation (LREC 2012). European Language Resources Association, Istanbul (2012)

    Google Scholar 

  19. Chinchor, N.A.: MUC-7 named entity task definition. In: Proceedings of the 7th Message Understanding Conference (MUC-7), Fairfax, VA, USA (1998)

    Google Scholar 

  20. Tanenblatt, M., Coden, A., Sominsky, I.: The ConceptMapper approach to named entity recognition. In: Calzolari, N., Choukri, K., Maegaard, B., Mariani, J., Odijk, J., Piperidis, S., Rosner, M., Tapias, D. (eds.) Proceedings of the 7th International Conference on Language Resources and Evaluation (LREC 2010), European Language Resources Association, Valletta (2010)

    Google Scholar 

  21. Kluegl, P., Atzmueller, M., Puppe, F.: TextMarker: A tool for rule-based information extraction. In: Proceedings of the 2nd UIMA@GSCL Workshop, 2009 Conference of the GSCL (Gesellschaft fur Sprachtechnologie und Computerlinguistik) (2009)

    Google Scholar 

  22. Bizer, C., Lehmann, J., Kobilarov, G., Auer, S., Becker, C., Cyganiak, R., Hellmann, S.: DBpedia - a crystallization point for the Web of Data. Web Semant. 7(3), 154–165 (2009)

    Article  Google Scholar 

  23. Mendes, P.N., Jakob, M., García-Silva, A., Bizer, C.: DBpedia Spotlight: shedding light on the web of documents. In: Proceedings of the 7th International Conference on Semantic Systems, pp. 1–8. ACM, New York (2011)

    Google Scholar 

  24. Chrupala, G.: Efficient induction of probabilistic word classes with LDA. In: Proceedings of the 5th International Joint Conference on Natural Language Processing, Chiang Mai, Thailand, pp. 363–372. Asian Federation of Natural Language Processing (2011)

    Google Scholar 

  25. Brown, P.F., Della Pietra, V.J., Desouza, P.V., Lai, J.C., Mercer, R.L.: Class-based n-gram models of natural language. Comp. Linguistics 18(4), 467–479 (1992)

    Google Scholar 

  26. Clark, A.: Combining distributional and morphological information for part of speech induction. In: Proceedings of the 10th Conference on European Chapter of the Association for Computational Linguistics, vol. 1, pp. 59–66. ACL, Stroudsburg (2003)

    Google Scholar 

  27. Krishnan, V., Manning, C.D.: An effective two-stage model for exploiting non-local dependencies in named entity recognition. In: Proceedings of the 21st International Conference on Computational Linguistics and the 44th Annual Meeting of the Association for Computational Linguistics, pp. 1121–1128. ACL, Stroudsburg (2006)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2013 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Gareev, R., Tkachenko, M., Solovyev, V., Simanovsky, A., Ivanov, V. (2013). Introducing Baselines for Russian Named Entity Recognition. In: Gelbukh, A. (eds) Computational Linguistics and Intelligent Text Processing. CICLing 2013. Lecture Notes in Computer Science, vol 7816. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-37247-6_27

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-37247-6_27

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-37246-9

  • Online ISBN: 978-3-642-37247-6

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics