Skip to main content

Using Pre-trained Deeply Contextual Model BERT for Russian Named Entity Recognition

  • Conference paper
  • First Online:
Analysis of Images, Social Networks and Texts (AIST 2019)

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 1086))

Abstract

Named entity recognition is an important part of the Information Extraction process (extracting structured data from unstructured or semi-structured computer-readable documents). To highlight in the text of people, organizations, geographical locations, etc., many approaches are used. Although, well-known bidirectional LSTM neural networks, show good results, there are points for improvement. Usually, the word embedding vector are used as the input layer, but the main disadvantage of the last vector models (word2vec, GLOVe, FastText) is that they do not consider the context of documents.

In this paper we present the effective neural network based on the deeply pre-trained bidirectional BERT model, which was introduced in the fall of 2018, in the task of named entity recognition for the Russian language. The BERT model, trained for a long time on large unannotated corpuses of texts, were used in our work in two modes: feature extraction and fine-tuning for the NER task. Evaluation of the results was carried out on the FactRuEval dataset and the BiLSTM network (FastText + CNN + extra) was taken as the baseline. Our model, built on fine-tuned deep contextual BERT model, shows good results.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Lafferty, J., McCallum, A., Pereira, F.: Conditional random fields: probabilistic models for segmenting and labeling sequence data. In: Proceedings of the 18th International Conference on Machine Learning, pp. 282–289 (2001)

    Google Scholar 

  2. Goller, C., Kuchler, A.: Learning task-dependent distributed representations by back-propagation through structure. In: 1996 IEEE International Conference on Neural Networks, vol. 1, pp. 347–352. IEEE (1996)

    Google Scholar 

  3. Gers, F.A., Schmidhuber, J., Cummins, F.: Learning to forget: continual prediction with LSTM. Neural Comput. 12(10), 2451–2471 (2000)

    Article  Google Scholar 

  4. Hammerton, J.: Named entity recognition with long short-term memory. In: 2003 Proceedings of the Seventh Conference on Natural Language Learning at HLT-NAACL, pp. 172–175. Association for Computational Linguistics (2003)

    Google Scholar 

  5. Collobert, R., Weston, J.; A unified architecture for natural language processing: deep neural networks with multitask learning. In: Proceedings of the 25th International Conference on Machine Learning, pp. 160–167. ACM (2008)

    Google Scholar 

  6. Collobert, R., Weston, J., Bottou, L., Karlen, M., Kavukcuoglu, K., Kuksa, P.: Natural language processing (almost) from scratch. J. Mach. Learn. Res. 12, 2493–2537 (2011)

    MATH  Google Scholar 

  7. Labeau, M., Loser, K., Allauzen, A.: Non-lexical neural architecture for fine-grained POS tagging. In: Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, pp. 232–237. Association for Computational Linguistics (2015)

    Google Scholar 

  8. Huang, Z., Xu, W., Yu, K.: Bidirectional LSTM-CRF models for sequence tagging. CoRR, abs/1508.01991 (2015)

    Google Scholar 

  9. Chiu, J.P.C., Nichols, E.: Named entity recognition with bidirectional LSTM-CNNs. arXiv preprint arXiv:1511.08308 (2015)

  10. Turian, J., Ratinov, L., Bengio, Y.: Word representations: a simple and general method for semi-supervised learning. In: Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics, ACL 2010, pp. 384–394 (2010)

    Google Scholar 

  11. Mikolov, T., Sutskever, I., Chen, K., Corrado, G.S., Dean, J.: Distributed representations of words and phrases and their compositionality. In: Proceedings of the Twenty-Seventh Annual Conference on Advances in Neural Information Processing Systems, pp. 3111–3119 (2013)

    Google Scholar 

  12. Pennington, J., Socher, R., Manning, C.D.: GloVe: global vectors for word representation. In: Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing, pp. 1532–1543 (2014)

    Google Scholar 

  13. Melamud, O., Goldberger, J., Dagan, I.: context2vec: learning generic context embedding with bidirectional LSTM. In: CoNLL (2016)

    Google Scholar 

  14. McCann, B., Bradbury, J., Xiong, C., Socher, R.: Learned in translation: contextualized word vectors. In: NIPS 2017 (2017)

    Google Scholar 

  15. Peters, M.E., et al.: Deep contextualized word representations. arXiv:1802.05365

  16. Vaswani, A., et al.: Attention is all you need. In: Advances in Neural Information Processing Systems, pp. 6000–6010 (2017)

    Google Scholar 

  17. Radford, A., Narasimhan, K., Salimans, T., Sutskever, I.: Improving language understanding with unsupervised learning. Technical report, OpenAI (2018)

    Google Scholar 

  18. Devlin, J., Chang, M.-W., Lee, K., Toutanova, K.: BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. arXiv:1810.04805 (2018)

  19. Starostin, A.S., et al.: FactRuEval 2016: evaluation of named entity recognition and fact extraction systems for Russian. In: Proceedings of the Annual International Conference on Computational Linguistics and Intellectual Technologies, Dialogue No. 15, pp. 720–738 (2016)

    Google Scholar 

  20. https://github.com/mayhewsw/conlleval.py/blob/master/conlleval.py

  21. Anh, L.T., Arkhipov, M.Y., Burtsev, M.S.: Application of a Hybrid Bi-LSTM-CRF model to the task of Russian Named Entity Recognition. arXiv:1709.09686 (2017)

  22. Konoplich, G., Putin, E., Filchenkov, A., Rybka, R.: Named entity recognition in Russian with word representation learned by a bidirectional language model. In: Ustalov, D., Filchenkov, A., Pivovarova, L., Žižka, J. (eds.) AINL 2018. CCIS, vol. 930, pp. 48–58. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01204-5_5

    Chapter  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Eugeny Mukhin .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2020 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Mukhin, E. (2020). Using Pre-trained Deeply Contextual Model BERT for Russian Named Entity Recognition. In: van der Aalst, W., et al. Analysis of Images, Social Networks and Texts. AIST 2019. Communications in Computer and Information Science, vol 1086. Springer, Cham. https://doi.org/10.1007/978-3-030-39575-9_17

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-39575-9_17

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-39574-2

  • Online ISBN: 978-3-030-39575-9

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics