The Influence of Different Methods on the Quality of the Russian-Tatar Neural Machine Translation

Khusainov, Aidar; Suleymanov, Djavdet; Gilmullin, Rinat

doi:10.1007/978-3-030-59535-7_18

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 12412))

Included in the following conference series:

Russian Conference on Artificial Intelligence

1011 Accesses

Abstract

This article presents the results of experiments on the use of various methods and algorithms in creating the Russian-Tatar machine translation system. As a basic algorithm, we used a neural network approach based on the Transformer architecture as well as various algorithms to increase the amount of parallel data using monolingual corpora (back-translation). For the first time experiments were conducted for the Russian-Tatar language pair on the use of transfer learning (based on Kazakh-Russian parallel corpus). As the main training data, we created and used the parallel corpus with a total volume of about 1 million Russian-Tatar sentence pairs. Experiments show that the created system is superior in quality to the currently existing Russian-Tatar translators. The best quality for the Russian-Tatar translation direction was achieved by our basic model (BLEU 35.4), and for the Tatar-Russian direction – by the model for which the back-translation algorithm was used (BLEU 39.2).

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Building the Tatar-Russian NMT System Based on Re-translation of Multilingual Data

Language Revitalization: A Benchmark for Akan-to-English Machine Translation

Low Resource Neural Machine Translation from English to Khasi: A Transformer-Based Approach

References

Forcada, M.L., Ginestí-Rosell, M., Nordfalk, J., et al.: Apertium: a free/open-source platform for rule-based machine translation. Mach. Trans. 25, 127–144 (2011). https://doi.org/10.1007/s10590-011-9090-0
Article Google Scholar
Yandex translate. https://translate.yandex.com/. Accessed 14 Mar 2019
Khusainov, A., Suleymanov, D., Gilmullin, R., Gatiatullin, A.: Building the Tatar-Russian NMT system based on re-translation of multilingual data. In: Sojka, P., Horák, A., Kopeček, I., Pala, K. (eds.) TSD 2018. LNCS (LNAI), vol. 11107, pp. 163–170. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-00794-2_17
Chapter Google Scholar
Open-source neural machine translation in Theano. https://github.com/rsennrich/nematus. Accessed 21 Nov 2019
Sennrich, R., et al.: The University of Edinburgh’s neural Mt systems for WMT17. In: Proceedings of the Second Conference on Machine Translation, vol. 2: Shared Task Papers, Stroudsburg, PA, USA (2017)
Google Scholar
Gage, P.: A new algorithm for data compression. C Users J. 12(2), 23–38 (1994)
Google Scholar
Sennrich, R., Haddow, B., Birch, A.: Neural machine translation of rare words with subword units. In: Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (ACL 2016), Berlin, Germany (2016)
Google Scholar
Sutskever, I., Vinyals, O., Le, Q.V.: Sequence to sequence learning with neural networks. In: Advances in Neural Information Processing Systems, pp. 3104–3112 (2014)
Google Scholar
Gehring, J., Auli, M., Grangier, D., Yarats, D., Dauphin, Y.N.: Convolutional sequence to sequence learning. In: International Conference of Machine Learning (ICML) (2017)
Google Scholar
Vaswani, A., et al.: Attention is all you need. In: Conference on Advances in Neural Information Processing Systems (NIPS) (2017)
Google Scholar
Paulus, R., Xiong, C., Socher, R.: A deep reinforced model for abstractive summarization. In: International Conference on Learning Representations (ICLR) (2018)
Google Scholar
Johnson, M.: Google’s multilingual neural machine translation system: enabling zero-shot translation. Trans. Assoc. Comput. Linguist. 5, 339–351 (2017)
Article Google Scholar
Irvine, A., Callison-Burch, C.: End-to-end statistical machine translation with zero or small parallel texts. Nat. Lang. Eng. 1(1), 517 (2015)
Google Scholar
Gulcehre, C., et al.: On using monolingual corpora in neural machine translation. arXiv:1503.03535 (2015)
Gulcehre, C., Firat, O., Xu, K., Cho, K., Bengio, Y.: On integrating a language model into neural machine translation. Comput. Speech Lang. 45, 137–148 (2017)
Article Google Scholar
Domhan, T., Hieber, F.: Using target-side monolingual data for neural machine translation through multi-task learning. In: Conference on Empirical Methods in Natural Language Processing (EMNLP) (2017)
Google Scholar
Cheng, Y., et al.: Semi-supervised learning for neural machine translation. arXiv:1606.04596 (2016)
He, D., et al.: Dual learning for machine translation. In: Advances in Neural Information Processing Systems, pp. 820–828 (2016)
Google Scholar
Sennrich, R., Haddow, B., Birch, A.: Improving neural machine translation models with monolingual data. arXiv preprint. arXiv:1511.06709 (2015)
Imamura, K., Fujita, A., Sumita, E.: Enhancement of encoder and attention using target monolingual corpora in neural machine translation. In: Proceedings of the 2nd Workshop on Neural Machine Translation and Generation, pp. 55–63 (2018)
Google Scholar
Abbyy aligner 2.0. https://www.abbyy.com/ru-ru/aligner. Accessed 10 May 2019
Abbyy smartcat tool for professional translators. https://smartcat.ai/workspace. Accessed 02 Apr 2019
Corpora Collection Leipzig University. https://corpora.uni-leipzig.de/en. Accessed 10 Apr 2020
Papineni, K., Roukos, S., Ward, T., Zhu, W.: BLEU: a method for automatic evaluation of machine translation. In: Proceedings of the 40th Annual Meeting on Association for Computational Linguistics, pp. 311–318 (2002)
Google Scholar
Baisa, V.: Problems of machine translation evaluation. In: Proceedings of Recent Advances in Slavonic Natural Language Processing, Brno (2009)
Google Scholar
Tatsoft translate. https://translate.tatar/. Accessed 10 Jun 2020
Google translate. https://translate.google.ru/. Accessed 12 Apr 2020

Download references

Acknowledgments

The reported study was funded by RFBR, project number 20-07-00823.

Author information

Authors and Affiliations

Institute of Applied Semiotics of the Tatarstan Academy of Sciences, Kazan, Russia
Aidar Khusainov, Djavdet Suleymanov & Rinat Gilmullin

Authors

Aidar Khusainov
View author publications
You can also search for this author in PubMed Google Scholar
Djavdet Suleymanov
View author publications
You can also search for this author in PubMed Google Scholar
Rinat Gilmullin
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Aidar Khusainov .

Editor information

Editors and Affiliations

National Research University Higher School, Moscow, Russia
Sergei O. Kuznetsov
Moscow Institute of Physics and Technology, Dolgoprudny, Russia
Aleksandr I. Panov
Federal Research Center Computer Science and Control, Moscow, Russia
Konstantin S. Yakovlev

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Khusainov, A., Suleymanov, D., Gilmullin, R. (2020). The Influence of Different Methods on the Quality of the Russian-Tatar Neural Machine Translation. In: Kuznetsov, S.O., Panov, A.I., Yakovlev, K.S. (eds) Artificial Intelligence. RCAI 2020. Lecture Notes in Computer Science(), vol 12412. Springer, Cham. https://doi.org/10.1007/978-3-030-59535-7_18

Download citation

DOI: https://doi.org/10.1007/978-3-030-59535-7_18
Published: 22 September 2020
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-59534-0
Online ISBN: 978-3-030-59535-7
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics