Character-Aware Low-Resource Neural Machine Translation with Weight Sharing and Pre-training

Cao, Yichao; Li, Miao; Feng, Tao; Wang, Rujing

doi:10.1007/978-3-030-32381-3_26

Character-Aware Low-Resource Neural Machine Translation with Weight Sharing and Pre-training

Yichao Cao^13,14,
Miao Li¹³,
Tao Feng^13,14 &
…
Rujing Wang^13,14

Conference paper
First Online: 13 October 2019

4156 Accesses

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 11856))

Abstract

Neural Machine Translation (NMT) has recently achieved the state-of-the-art in many machine translation tasks, but one of the challenges that NMT faces is the lack of parallel corpora, especially for low-resource language pairs. And the result is that the performance of NMT is much less effective for low-resource languages. To address this specific problem, in this paper, we describe a novel NMT model that is based on encoder-decoder architecture and relies on character-level inputs. Our proposed model employs Convolutional Neural Networks (CNN) and highway networks over character inputs, whose outputs are given to an encoder-decoder neural machine translation network. Besides, we also present two other approaches to improve the performance of the low-resource NMT system much further. First, we use language modeling implemented by denoising autoencoding to pre-train and initialize the full model. Second, we share the weights of the front few layers of two encoders between two languages to strengthen the encoding ability of the model. We demonstrate our model on two low-resource language pairs. On the IWSLT2015 English-Vietnamese translation task, our proposed model obtains improvements up to 2.5 BLEU points compared to the baseline. We also outperform the baseline approach more than 3 BLEU points on the CWMT2018 Chinese-Mongolian translation task.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

Vaswani, A., Shazeer, N.: Attention is all you need. In: Advances in Neural Information Processing Systems, pp. 5998–6008 (2017)
Google Scholar
Hassan, H., Aue, A., et al.: Achieving human parity on automatic Chinese to English news translation. arXiv preprint arXiv:1803.05567 (2018)
Zoph, B., Yuret, D.: Transfer learning for low-resource neural machine translation. arXiv preprint arXiv:1604.02201 (2016)
Koehn, P., Knowles, R.: Six challenges for neural machine translation. arXiv preprint arXiv:1706.03872 (2017)
Santos, C.N.D., Guimaraes, V.: Boosting named entity recognition with neural character embeddings. arXiv preprint arXiv:1505.05008 (2015)
Yang, Z., Chen, W.: Unsupervised neural machine translation with weight sharing. arXiv preprint arXiv:1804.09057 (2018)
Devlin, J., Chang, M.W.: BERT: pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805 (2018)
Vincent, P., Larochelle, H.: Extracting and composing robust features with denoising autoencoders. In: Proceedings of the 25th International Conference on Machine learning, pp. 1096–1103. ACM (2008)
Google Scholar
Bahdanau, D., Cho, K.: Neural machine translation by jointly learning to align and translate. arXiv preprint arXiv:1409.0473 (2014)
Gu, J., Wang, Y.: Meta-learning for low-resource neural machine translation. arXiv preprint arXiv:1808.08437 (2018)
Sennrich, R., Haddow, B.: Improving neural machine translation models with monolingual data. arXiv preprint arXiv:1511.06709 (2015)
Currey, A., Barone, A.V.M.: Copied monolingual data improves low-resource neural machine translation. In: Proceedings of the Second Conference on Machine Translation, pp. 148–156 (2017)
Google Scholar
Ling, W., Trancoso, I.: Character-based neural machine translation. arXiv preprint arXiv:1511.04586 (2015)
Ruiz Costa-Jussà, M., Rodríguez Fonollosa, J.A.: Character-based neural machine translation. In: Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics, pp. 357–361 (2016)
Google Scholar
Luong, M.T., Manning, C.D.: Achieving open vocabulary neural machine translation with hybrid word-character models. arXiv preprint arXiv:1604.00788 (2016)
Passban, P., Liu, Q., Way, A.: Improving character-based decoding using targetside morphological information for neural machine translation. arXiv preprint arXiv:1804.06506 (2018)
Radford, A., Narasimhan, K.: Improving language understanding with unsupervised learning. Technical report, OpenAI (2018)
Google Scholar
Lample, G., Ott, M.: Phrase-based & neural unsupervised machine translation. arXiv preprint arXiv:1804.07755 (2018)
Firat, O., Cho, K.: Multi-way, multilingual neural machine translation with a shared attention mechanism. arXiv preprint arXiv:1601.01073 (2016)
Renduchintala, A., Shapiro, P.: Character-aware decoder for neural machine translation. arXiv preprint arXiv:1809.02223 (2018)
Kim, Y., Jernite, Y.: Character-aware neural language models. In: Thirtieth AAAI Conference on Artificial Intelligence (2016)
Google Scholar
Srivastava, R.K., Greff, K.: Training very deep networks. In: Advances in Neural Information Processing Systems, pp. 2377–2385 (2015)
Google Scholar
Lee, J., Cho, K.: Fully character-level neural machine translation without explicit segmentation. Trans. Assoc. Comput. Linguist. 5, 365–378 (2017)
Article Google Scholar
Lample, G., Conneau, A.: Cross-lingual language model pretraining (2019)
Google Scholar
Sun, M., Chen, X.: THULAC: an efficient lexical analyzer for Chinese. Technical report (2016)
Google Scholar
Kingma, D.P., Ba, J.: Adam: a method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014)
Srivastava, N., Hinton, G.: Dropout: a simple way to prevent neural networks from overfitting. J. Mach. Learn. Res. 15(1), 1929–1958 (2014)
MathSciNet MATH Google Scholar

Download references

Acknowledgements

This work is supported by the National Natural Science Foundation of China under Grant No. 61572462 and the 13th Five-year Informatization Plan of Chinese Academy of Science, Grant No. XXH13505-03-203.

Author information

Authors and Affiliations

Institute of Intelligent Machines, Chinese Academy of Sciences, Hefei, 230031, People’s Republic of China
Yichao Cao, Miao Li, Tao Feng & Rujing Wang
University of Science and Technology of China, Hefei, 230026, People’s Republic of China
Yichao Cao, Tao Feng & Rujing Wang

Authors

Yichao Cao
View author publications
You can also search for this author in PubMed Google Scholar
Miao Li
View author publications
You can also search for this author in PubMed Google Scholar
Tao Feng
View author publications
You can also search for this author in PubMed Google Scholar
Rujing Wang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Miao Li .

Editor information

Editors and Affiliations

Tsinghua University, Beijing, China
Maosong Sun
Fudan University, Shanghai, China
Xuanjing Huang
University of Illinois at Urbana Champaign, Illinois, USA
Heng Ji
Tsinghua University, Beijing, China
Zhiyuan Liu
Tsinghua University, Beijing, China
Yang Liu

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Cao, Y., Li, M., Feng, T., Wang, R. (2019). Character-Aware Low-Resource Neural Machine Translation with Weight Sharing and Pre-training. In: Sun, M., Huang, X., Ji, H., Liu, Z., Liu, Y. (eds) Chinese Computational Linguistics. CCL 2019. Lecture Notes in Computer Science(), vol 11856. Springer, Cham. https://doi.org/10.1007/978-3-030-32381-3_26

Download citation

DOI: https://doi.org/10.1007/978-3-030-32381-3_26
Published: 13 October 2019
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-32380-6
Online ISBN: 978-3-030-32381-3
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics