Using Pre-trained Deeply Contextual Model BERT for Russian Named Entity Recognition

Mukhin, Eugeny

doi:10.1007/978-3-030-39575-9_17

Eugeny Mukhin²⁰

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 1086))

Included in the following conference series:

International Conference on Analysis of Images, Social Networks and Texts

693 Accesses
2 Citations

Abstract

Named entity recognition is an important part of the Information Extraction process (extracting structured data from unstructured or semi-structured computer-readable documents). To highlight in the text of people, organizations, geographical locations, etc., many approaches are used. Although, well-known bidirectional LSTM neural networks, show good results, there are points for improvement. Usually, the word embedding vector are used as the input layer, but the main disadvantage of the last vector models (word2vec, GLOVe, FastText) is that they do not consider the context of documents.

In this paper we present the effective neural network based on the deeply pre-trained bidirectional BERT model, which was introduced in the fall of 2018, in the task of named entity recognition for the Russian language. The BERT model, trained for a long time on large unannotated corpuses of texts, were used in our work in two modes: feature extraction and fine-tuning for the NER task. Evaluation of the results was carried out on the FactRuEval dataset and the BiLSTM network (FastText + CNN + extra) was taken as the baseline. Our model, built on fine-tuned deep contextual BERT model, shows good results.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Lafferty, J., McCallum, A., Pereira, F.: Conditional random fields: probabilistic models for segmenting and labeling sequence data. In: Proceedings of the 18th International Conference on Machine Learning, pp. 282–289 (2001)
Google Scholar
Goller, C., Kuchler, A.: Learning task-dependent distributed representations by back-propagation through structure. In: 1996 IEEE International Conference on Neural Networks, vol. 1, pp. 347–352. IEEE (1996)
Google Scholar
Gers, F.A., Schmidhuber, J., Cummins, F.: Learning to forget: continual prediction with LSTM. Neural Comput. 12(10), 2451–2471 (2000)
Article Google Scholar
Hammerton, J.: Named entity recognition with long short-term memory. In: 2003 Proceedings of the Seventh Conference on Natural Language Learning at HLT-NAACL, pp. 172–175. Association for Computational Linguistics (2003)
Google Scholar
Collobert, R., Weston, J.; A unified architecture for natural language processing: deep neural networks with multitask learning. In: Proceedings of the 25th International Conference on Machine Learning, pp. 160–167. ACM (2008)
Google Scholar
Collobert, R., Weston, J., Bottou, L., Karlen, M., Kavukcuoglu, K., Kuksa, P.: Natural language processing (almost) from scratch. J. Mach. Learn. Res. 12, 2493–2537 (2011)
MATH Google Scholar
Labeau, M., Loser, K., Allauzen, A.: Non-lexical neural architecture for fine-grained POS tagging. In: Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, pp. 232–237. Association for Computational Linguistics (2015)
Google Scholar
Huang, Z., Xu, W., Yu, K.: Bidirectional LSTM-CRF models for sequence tagging. CoRR, abs/1508.01991 (2015)
Google Scholar
Chiu, J.P.C., Nichols, E.: Named entity recognition with bidirectional LSTM-CNNs. arXiv preprint arXiv:1511.08308 (2015)
Turian, J., Ratinov, L., Bengio, Y.: Word representations: a simple and general method for semi-supervised learning. In: Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics, ACL 2010, pp. 384–394 (2010)
Google Scholar
Mikolov, T., Sutskever, I., Chen, K., Corrado, G.S., Dean, J.: Distributed representations of words and phrases and their compositionality. In: Proceedings of the Twenty-Seventh Annual Conference on Advances in Neural Information Processing Systems, pp. 3111–3119 (2013)
Google Scholar
Pennington, J., Socher, R., Manning, C.D.: GloVe: global vectors for word representation. In: Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing, pp. 1532–1543 (2014)
Google Scholar
Melamud, O., Goldberger, J., Dagan, I.: context2vec: learning generic context embedding with bidirectional LSTM. In: CoNLL (2016)
Google Scholar
McCann, B., Bradbury, J., Xiong, C., Socher, R.: Learned in translation: contextualized word vectors. In: NIPS 2017 (2017)
Google Scholar
Peters, M.E., et al.: Deep contextualized word representations. arXiv:1802.05365
Vaswani, A., et al.: Attention is all you need. In: Advances in Neural Information Processing Systems, pp. 6000–6010 (2017)
Google Scholar
Radford, A., Narasimhan, K., Salimans, T., Sutskever, I.: Improving language understanding with unsupervised learning. Technical report, OpenAI (2018)
Google Scholar
Devlin, J., Chang, M.-W., Lee, K., Toutanova, K.: BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. arXiv:1810.04805 (2018)
Starostin, A.S., et al.: FactRuEval 2016: evaluation of named entity recognition and fact extraction systems for Russian. In: Proceedings of the Annual International Conference on Computational Linguistics and Intellectual Technologies, Dialogue No. 15, pp. 720–738 (2016)
Google Scholar
https://github.com/mayhewsw/conlleval.py/blob/master/conlleval.py
Anh, L.T., Arkhipov, M.Y., Burtsev, M.S.: Application of a Hybrid Bi-LSTM-CRF model to the task of Russian Named Entity Recognition. arXiv:1709.09686 (2017)
Konoplich, G., Putin, E., Filchenkov, A., Rybka, R.: Named entity recognition in Russian with word representation learned by a bidirectional language model. In: Ustalov, D., Filchenkov, A., Pivovarova, L., Žižka, J. (eds.) AINL 2018. CCIS, vol. 930, pp. 48–58. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01204-5_5
Chapter Google Scholar

Download references

Author information

Authors and Affiliations

Chelyabinsk State University, Chelyabinsk, Russia
Eugeny Mukhin

Authors

Eugeny Mukhin
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Eugeny Mukhin .

Editor information

Editors and Affiliations

RWTH Aachen University, Aachen, Germany
Wil M. P. van der Aalst
University of Ljubljana, Ljubljana, Slovenia
Vladimir Batagelj
National Research University Higher School of Economics, Moscow, Russia
Dmitry I. Ignatov
Institute of Mathematics and Mechanics Yekaterinburg, Yekaterinburg, Russia
Michael Khachay
National Research University Higher School of Economics, Moscow, Russia
Valentina Kuskova
University of Oslo, Oslo, Norway
Andrey Kutuzov
National Research University Higher School of Economics, Moscow, Russia
Sergei O. Kuznetsov
National Research University Higher School of Economics, Moscow, Russia
Irina A. Lomazova
Moscow State University, Moscow, Russia
Natalia Loukachevitch
Loria, Vandoeuvre lès Nancy, France
Amedeo Napoli
University of Florida, Gainesville, USA
Panos M. Pardalos
Ca' Foscari University of Venice, Venezia Mestre, Italy
Marcello Pelillo
National Research University Higher School of Economics, Nizhny Novgorod, Russia
Andrey V. Savchenko
Kazan Federal University, Kazan, Russia
Elena Tutubalina

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Mukhin, E. (2020). Using Pre-trained Deeply Contextual Model BERT for Russian Named Entity Recognition. In: van der Aalst, W., et al. Analysis of Images, Social Networks and Texts. AIST 2019. Communications in Computer and Information Science, vol 1086. Springer, Cham. https://doi.org/10.1007/978-3-030-39575-9_17

Download citation

DOI: https://doi.org/10.1007/978-3-030-39575-9_17
Published: 02 February 2020
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-39574-2
Online ISBN: 978-3-030-39575-9
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics