Skip to main content

Character-Aware Low-Resource Neural Machine Translation with Weight Sharing and Pre-training

  • Conference paper
  • First Online:
  • 4156 Accesses

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 11856))

Abstract

Neural Machine Translation (NMT) has recently achieved the state-of-the-art in many machine translation tasks, but one of the challenges that NMT faces is the lack of parallel corpora, especially for low-resource language pairs. And the result is that the performance of NMT is much less effective for low-resource languages. To address this specific problem, in this paper, we describe a novel NMT model that is based on encoder-decoder architecture and relies on character-level inputs. Our proposed model employs Convolutional Neural Networks (CNN) and highway networks over character inputs, whose outputs are given to an encoder-decoder neural machine translation network. Besides, we also present two other approaches to improve the performance of the low-resource NMT system much further. First, we use language modeling implemented by denoising autoencoding to pre-train and initialize the full model. Second, we share the weights of the front few layers of two encoders between two languages to strengthen the encoding ability of the model. We demonstrate our model on two low-resource language pairs. On the IWSLT2015 English-Vietnamese translation task, our proposed model obtains improvements up to 2.5 BLEU points compared to the baseline. We also outperform the baseline approach more than 3 BLEU points on the CWMT2018 Chinese-Mongolian translation task.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

  1. Vaswani, A., Shazeer, N.: Attention is all you need. In: Advances in Neural Information Processing Systems, pp. 5998–6008 (2017)

    Google Scholar 

  2. Hassan, H., Aue, A., et al.: Achieving human parity on automatic Chinese to English news translation. arXiv preprint arXiv:1803.05567 (2018)

  3. Zoph, B., Yuret, D.: Transfer learning for low-resource neural machine translation. arXiv preprint arXiv:1604.02201 (2016)

  4. Koehn, P., Knowles, R.: Six challenges for neural machine translation. arXiv preprint arXiv:1706.03872 (2017)

  5. Santos, C.N.D., Guimaraes, V.: Boosting named entity recognition with neural character embeddings. arXiv preprint arXiv:1505.05008 (2015)

  6. Yang, Z., Chen, W.: Unsupervised neural machine translation with weight sharing. arXiv preprint arXiv:1804.09057 (2018)

  7. Devlin, J., Chang, M.W.: BERT: pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805 (2018)

  8. Vincent, P., Larochelle, H.: Extracting and composing robust features with denoising autoencoders. In: Proceedings of the 25th International Conference on Machine learning, pp. 1096–1103. ACM (2008)

    Google Scholar 

  9. Bahdanau, D., Cho, K.: Neural machine translation by jointly learning to align and translate. arXiv preprint arXiv:1409.0473 (2014)

  10. Gu, J., Wang, Y.: Meta-learning for low-resource neural machine translation. arXiv preprint arXiv:1808.08437 (2018)

  11. Sennrich, R., Haddow, B.: Improving neural machine translation models with monolingual data. arXiv preprint arXiv:1511.06709 (2015)

  12. Currey, A., Barone, A.V.M.: Copied monolingual data improves low-resource neural machine translation. In: Proceedings of the Second Conference on Machine Translation, pp. 148–156 (2017)

    Google Scholar 

  13. Ling, W., Trancoso, I.: Character-based neural machine translation. arXiv preprint arXiv:1511.04586 (2015)

  14. Ruiz Costa-Jussà, M., Rodríguez Fonollosa, J.A.: Character-based neural machine translation. In: Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics, pp. 357–361 (2016)

    Google Scholar 

  15. Luong, M.T., Manning, C.D.: Achieving open vocabulary neural machine translation with hybrid word-character models. arXiv preprint arXiv:1604.00788 (2016)

  16. Passban, P., Liu, Q., Way, A.: Improving character-based decoding using targetside morphological information for neural machine translation. arXiv preprint arXiv:1804.06506 (2018)

  17. Radford, A., Narasimhan, K.: Improving language understanding with unsupervised learning. Technical report, OpenAI (2018)

    Google Scholar 

  18. Lample, G., Ott, M.: Phrase-based & neural unsupervised machine translation. arXiv preprint arXiv:1804.07755 (2018)

  19. Firat, O., Cho, K.: Multi-way, multilingual neural machine translation with a shared attention mechanism. arXiv preprint arXiv:1601.01073 (2016)

  20. Renduchintala, A., Shapiro, P.: Character-aware decoder for neural machine translation. arXiv preprint arXiv:1809.02223 (2018)

  21. Kim, Y., Jernite, Y.: Character-aware neural language models. In: Thirtieth AAAI Conference on Artificial Intelligence (2016)

    Google Scholar 

  22. Srivastava, R.K., Greff, K.: Training very deep networks. In: Advances in Neural Information Processing Systems, pp. 2377–2385 (2015)

    Google Scholar 

  23. Lee, J., Cho, K.: Fully character-level neural machine translation without explicit segmentation. Trans. Assoc. Comput. Linguist. 5, 365–378 (2017)

    Article  Google Scholar 

  24. Lample, G., Conneau, A.: Cross-lingual language model pretraining (2019)

    Google Scholar 

  25. Sun, M., Chen, X.: THULAC: an efficient lexical analyzer for Chinese. Technical report (2016)

    Google Scholar 

  26. Kingma, D.P., Ba, J.: Adam: a method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014)

  27. Srivastava, N., Hinton, G.: Dropout: a simple way to prevent neural networks from overfitting. J. Mach. Learn. Res. 15(1), 1929–1958 (2014)

    MathSciNet  MATH  Google Scholar 

Download references

Acknowledgements

This work is supported by the National Natural Science Foundation of China under Grant No. 61572462 and the 13th Five-year Informatization Plan of Chinese Academy of Science, Grant No. XXH13505-03-203.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Miao Li .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2019 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Cao, Y., Li, M., Feng, T., Wang, R. (2019). Character-Aware Low-Resource Neural Machine Translation with Weight Sharing and Pre-training. In: Sun, M., Huang, X., Ji, H., Liu, Z., Liu, Y. (eds) Chinese Computational Linguistics. CCL 2019. Lecture Notes in Computer Science(), vol 11856. Springer, Cham. https://doi.org/10.1007/978-3-030-32381-3_26

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-32381-3_26

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-32380-6

  • Online ISBN: 978-3-030-32381-3

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics