Abstract
Neural Machine Translation (NMT) has recently achieved the state-of-the-art in many machine translation tasks, but one of the challenges that NMT faces is the lack of parallel corpora, especially for low-resource language pairs. And the result is that the performance of NMT is much less effective for low-resource languages. To address this specific problem, in this paper, we describe a novel NMT model that is based on encoder-decoder architecture and relies on character-level inputs. Our proposed model employs Convolutional Neural Networks (CNN) and highway networks over character inputs, whose outputs are given to an encoder-decoder neural machine translation network. Besides, we also present two other approaches to improve the performance of the low-resource NMT system much further. First, we use language modeling implemented by denoising autoencoding to pre-train and initialize the full model. Second, we share the weights of the front few layers of two encoders between two languages to strengthen the encoding ability of the model. We demonstrate our model on two low-resource language pairs. On the IWSLT2015 English-Vietnamese translation task, our proposed model obtains improvements up to 2.5 BLEU points compared to the baseline. We also outperform the baseline approach more than 3 BLEU points on the CWMT2018 Chinese-Mongolian translation task.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsReferences
Vaswani, A., Shazeer, N.: Attention is all you need. In: Advances in Neural Information Processing Systems, pp. 5998–6008 (2017)
Hassan, H., Aue, A., et al.: Achieving human parity on automatic Chinese to English news translation. arXiv preprint arXiv:1803.05567 (2018)
Zoph, B., Yuret, D.: Transfer learning for low-resource neural machine translation. arXiv preprint arXiv:1604.02201 (2016)
Koehn, P., Knowles, R.: Six challenges for neural machine translation. arXiv preprint arXiv:1706.03872 (2017)
Santos, C.N.D., Guimaraes, V.: Boosting named entity recognition with neural character embeddings. arXiv preprint arXiv:1505.05008 (2015)
Yang, Z., Chen, W.: Unsupervised neural machine translation with weight sharing. arXiv preprint arXiv:1804.09057 (2018)
Devlin, J., Chang, M.W.: BERT: pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805 (2018)
Vincent, P., Larochelle, H.: Extracting and composing robust features with denoising autoencoders. In: Proceedings of the 25th International Conference on Machine learning, pp. 1096–1103. ACM (2008)
Bahdanau, D., Cho, K.: Neural machine translation by jointly learning to align and translate. arXiv preprint arXiv:1409.0473 (2014)
Gu, J., Wang, Y.: Meta-learning for low-resource neural machine translation. arXiv preprint arXiv:1808.08437 (2018)
Sennrich, R., Haddow, B.: Improving neural machine translation models with monolingual data. arXiv preprint arXiv:1511.06709 (2015)
Currey, A., Barone, A.V.M.: Copied monolingual data improves low-resource neural machine translation. In: Proceedings of the Second Conference on Machine Translation, pp. 148–156 (2017)
Ling, W., Trancoso, I.: Character-based neural machine translation. arXiv preprint arXiv:1511.04586 (2015)
Ruiz Costa-Jussà, M., Rodríguez Fonollosa, J.A.: Character-based neural machine translation. In: Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics, pp. 357–361 (2016)
Luong, M.T., Manning, C.D.: Achieving open vocabulary neural machine translation with hybrid word-character models. arXiv preprint arXiv:1604.00788 (2016)
Passban, P., Liu, Q., Way, A.: Improving character-based decoding using targetside morphological information for neural machine translation. arXiv preprint arXiv:1804.06506 (2018)
Radford, A., Narasimhan, K.: Improving language understanding with unsupervised learning. Technical report, OpenAI (2018)
Lample, G., Ott, M.: Phrase-based & neural unsupervised machine translation. arXiv preprint arXiv:1804.07755 (2018)
Firat, O., Cho, K.: Multi-way, multilingual neural machine translation with a shared attention mechanism. arXiv preprint arXiv:1601.01073 (2016)
Renduchintala, A., Shapiro, P.: Character-aware decoder for neural machine translation. arXiv preprint arXiv:1809.02223 (2018)
Kim, Y., Jernite, Y.: Character-aware neural language models. In: Thirtieth AAAI Conference on Artificial Intelligence (2016)
Srivastava, R.K., Greff, K.: Training very deep networks. In: Advances in Neural Information Processing Systems, pp. 2377–2385 (2015)
Lee, J., Cho, K.: Fully character-level neural machine translation without explicit segmentation. Trans. Assoc. Comput. Linguist. 5, 365–378 (2017)
Lample, G., Conneau, A.: Cross-lingual language model pretraining (2019)
Sun, M., Chen, X.: THULAC: an efficient lexical analyzer for Chinese. Technical report (2016)
Kingma, D.P., Ba, J.: Adam: a method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014)
Srivastava, N., Hinton, G.: Dropout: a simple way to prevent neural networks from overfitting. J. Mach. Learn. Res. 15(1), 1929–1958 (2014)
Acknowledgements
This work is supported by the National Natural Science Foundation of China under Grant No. 61572462 and the 13th Five-year Informatization Plan of Chinese Academy of Science, Grant No. XXH13505-03-203.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2019 Springer Nature Switzerland AG
About this paper
Cite this paper
Cao, Y., Li, M., Feng, T., Wang, R. (2019). Character-Aware Low-Resource Neural Machine Translation with Weight Sharing and Pre-training. In: Sun, M., Huang, X., Ji, H., Liu, Z., Liu, Y. (eds) Chinese Computational Linguistics. CCL 2019. Lecture Notes in Computer Science(), vol 11856. Springer, Cham. https://doi.org/10.1007/978-3-030-32381-3_26
Download citation
DOI: https://doi.org/10.1007/978-3-030-32381-3_26
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-32380-6
Online ISBN: 978-3-030-32381-3
eBook Packages: Computer ScienceComputer Science (R0)