Abstract
This article presents the results of experiments on the use of various methods and algorithms in creating the Russian-Tatar machine translation system. As a basic algorithm, we used a neural network approach based on the Transformer architecture as well as various algorithms to increase the amount of parallel data using monolingual corpora (back-translation). For the first time experiments were conducted for the Russian-Tatar language pair on the use of transfer learning (based on Kazakh-Russian parallel corpus). As the main training data, we created and used the parallel corpus with a total volume of about 1 million Russian-Tatar sentence pairs. Experiments show that the created system is superior in quality to the currently existing Russian-Tatar translators. The best quality for the Russian-Tatar translation direction was achieved by our basic model (BLEU 35.4), and for the Tatar-Russian direction – by the model for which the back-translation algorithm was used (BLEU 39.2).
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Forcada, M.L., Ginestí-Rosell, M., Nordfalk, J., et al.: Apertium: a free/open-source platform for rule-based machine translation. Mach. Trans. 25, 127–144 (2011). https://doi.org/10.1007/s10590-011-9090-0
Yandex translate. https://translate.yandex.com/. Accessed 14 Mar 2019
Khusainov, A., Suleymanov, D., Gilmullin, R., Gatiatullin, A.: Building the Tatar-Russian NMT system based on re-translation of multilingual data. In: Sojka, P., Horák, A., Kopeček, I., Pala, K. (eds.) TSD 2018. LNCS (LNAI), vol. 11107, pp. 163–170. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-00794-2_17
Open-source neural machine translation in Theano. https://github.com/rsennrich/nematus. Accessed 21 Nov 2019
Sennrich, R., et al.: The University of Edinburgh’s neural Mt systems for WMT17. In: Proceedings of the Second Conference on Machine Translation, vol. 2: Shared Task Papers, Stroudsburg, PA, USA (2017)
Gage, P.: A new algorithm for data compression. C Users J. 12(2), 23–38 (1994)
Sennrich, R., Haddow, B., Birch, A.: Neural machine translation of rare words with subword units. In: Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (ACL 2016), Berlin, Germany (2016)
Sutskever, I., Vinyals, O., Le, Q.V.: Sequence to sequence learning with neural networks. In: Advances in Neural Information Processing Systems, pp. 3104–3112 (2014)
Gehring, J., Auli, M., Grangier, D., Yarats, D., Dauphin, Y.N.: Convolutional sequence to sequence learning. In: International Conference of Machine Learning (ICML) (2017)
Vaswani, A., et al.: Attention is all you need. In: Conference on Advances in Neural Information Processing Systems (NIPS) (2017)
Paulus, R., Xiong, C., Socher, R.: A deep reinforced model for abstractive summarization. In: International Conference on Learning Representations (ICLR) (2018)
Johnson, M.: Google’s multilingual neural machine translation system: enabling zero-shot translation. Trans. Assoc. Comput. Linguist. 5, 339–351 (2017)
Irvine, A., Callison-Burch, C.: End-to-end statistical machine translation with zero or small parallel texts. Nat. Lang. Eng. 1(1), 517 (2015)
Gulcehre, C., et al.: On using monolingual corpora in neural machine translation. arXiv:1503.03535 (2015)
Gulcehre, C., Firat, O., Xu, K., Cho, K., Bengio, Y.: On integrating a language model into neural machine translation. Comput. Speech Lang. 45, 137–148 (2017)
Domhan, T., Hieber, F.: Using target-side monolingual data for neural machine translation through multi-task learning. In: Conference on Empirical Methods in Natural Language Processing (EMNLP) (2017)
Cheng, Y., et al.: Semi-supervised learning for neural machine translation. arXiv:1606.04596 (2016)
He, D., et al.: Dual learning for machine translation. In: Advances in Neural Information Processing Systems, pp. 820–828 (2016)
Sennrich, R., Haddow, B., Birch, A.: Improving neural machine translation models with monolingual data. arXiv preprint. arXiv:1511.06709 (2015)
Imamura, K., Fujita, A., Sumita, E.: Enhancement of encoder and attention using target monolingual corpora in neural machine translation. In: Proceedings of the 2nd Workshop on Neural Machine Translation and Generation, pp. 55–63 (2018)
Abbyy aligner 2.0. https://www.abbyy.com/ru-ru/aligner. Accessed 10 May 2019
Abbyy smartcat tool for professional translators. https://smartcat.ai/workspace. Accessed 02 Apr 2019
Corpora Collection Leipzig University. https://corpora.uni-leipzig.de/en. Accessed 10 Apr 2020
Papineni, K., Roukos, S., Ward, T., Zhu, W.: BLEU: a method for automatic evaluation of machine translation. In: Proceedings of the 40th Annual Meeting on Association for Computational Linguistics, pp. 311–318 (2002)
Baisa, V.: Problems of machine translation evaluation. In: Proceedings of Recent Advances in Slavonic Natural Language Processing, Brno (2009)
Tatsoft translate. https://translate.tatar/. Accessed 10 Jun 2020
Google translate. https://translate.google.ru/. Accessed 12 Apr 2020
Acknowledgments
The reported study was funded by RFBR, project number 20-07-00823.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2020 Springer Nature Switzerland AG
About this paper
Cite this paper
Khusainov, A., Suleymanov, D., Gilmullin, R. (2020). The Influence of Different Methods on the Quality of the Russian-Tatar Neural Machine Translation. In: Kuznetsov, S.O., Panov, A.I., Yakovlev, K.S. (eds) Artificial Intelligence. RCAI 2020. Lecture Notes in Computer Science(), vol 12412. Springer, Cham. https://doi.org/10.1007/978-3-030-59535-7_18
Download citation
DOI: https://doi.org/10.1007/978-3-030-59535-7_18
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-59534-0
Online ISBN: 978-3-030-59535-7
eBook Packages: Computer ScienceComputer Science (R0)