skip to main content
research-article

Multi-Round Transfer Learning for Low-Resource NMT Using Multiple High-Resource Languages

Authors Info & Claims
Published:21 May 2019Publication History
Skip Abstract Section

Abstract

Neural machine translation (NMT) has made remarkable progress in recent years, but the performance of NMT suffers from a data sparsity problem since large-scale parallel corpora are only readily available for high-resource languages (HRLs). In recent days, transfer learning (TL) has been used widely in low-resource languages (LRLs) machine translation, while TL is becoming one of the vital directions for addressing the data sparsity problem in low-resource NMT. As a solution, a transfer learning method in NMT is generally obtained via initializing the low-resource model (child) with the high-resource model (parent). However, leveraging the original TL to low-resource models is neither able to make full use of highly related multiple HRLs nor to receive different parameters from the same parents. In order to exploit multiple HRLs effectively, we present a language-independent and straightforward multi-round transfer learning (MRTL) approach to low-resource NMT. Besides, with the intention of reducing the differences between high-resource and low-resource languages at the character level, we introduce a unified transliteration method for various language families, which are both semantically and syntactically highly analogous with each other. Experiments on low-resource datasets show that our approaches are effective, significantly outperform the state-of-the-art methods, and yield improvements of up to 5.63 BLEU points.

References

  1. Jimmy Lei Ba, Jamie Ryan Kiros, and Geoffrey E. Hinton. 2016. Layer normalization. arXiv preprint arXiv:1607.06450.Google ScholarGoogle Scholar
  2. Dzmitry Bahdanau, KyungHyun Cho, and Yoshua Bengio. 2015. Neural machine translation by jointly learning to align and translate. In Proceedings of ICLR.Google ScholarGoogle Scholar
  3. Antonio Valerio Miceli Barone and Giuseppe Attardi. 2013. Pre-reordering for machine translation using transition-based walks on dependency parse trees. In WMT@ACL.Google ScholarGoogle Scholar
  4. Peter F. Brown, Stephen A. Della Pietra, Vincent J. Della Pietra, and Robert L. Mercer. 1993. The mathematics of statistical machine translation: Parameter estimation. Comput. Ling. 19 (1993), 263--311. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Marine Carpuat, Hal Daumé III, Hatharine Henry, Ann Irvine, Jagadeesh Jagarlamudi, and Rachel Rudinger. 2013. SenseSpotting: Never let your parallel data tie you to an old domain. In Proceedings of ACL.Google ScholarGoogle Scholar
  6. Yun Chen, Yang Liu, Yong Cheng, and Victor O. K. Li. 2017. A teacher-student framework for zero-resource neural machine translation. In Proceedings of ACL.Google ScholarGoogle Scholar
  7. Yun Chen, Yang Liu, and Victor O. K. Li. 2018. Zero-resource neural machine translation with multi-agent communication game. CoRR abs/1802.03116.Google ScholarGoogle Scholar
  8. Yong Cheng, Wei Xu, Zhongjun He, Wei He, Hua Wu, Maosong Sun, and Yang Liu. 2016. Semi-superivsed learning for neural machine transaltion. In Proceedings of ACL.Google ScholarGoogle ScholarCross RefCross Ref
  9. Yong Cheng, Qian Yang, Yang Liu, Maosong Sun, and Wei Xu. 2017. Joint training for pivot-based neural machine translation. In Proceedings of IJCAI. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. David Chiang. 2005. A hierarchical phrase-based model for statistical machine translation. In Proceedings of ACL. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Kyunghyun Cho, Bart van Merrienboer, Caglar Gulcehre, Dzmitry Bahdanau, Fethi Bougares, Holger Schwenk, and Yoshua Bengio. 2014. Learning phrase representations using RNN encoder--decoder for statistical machine translation. In Proceedings of EMNLP.Google ScholarGoogle ScholarCross RefCross Ref
  12. Chenhui Chu, Raj Dabre, and Sadao Kurohashi. 2017. An empirical comparison of simple domain adaptation methods for neural machine translation. CoRR abs/1701.03214.Google ScholarGoogle Scholar
  13. Raj Dabre, Tetsuji Nakagawa, and Hideto Kazawa. 2017. An empirical study of language relatedness for transfer learning in neural machine translation. In Proceedings of the 31st Pacific Asia Conference on Language, Information, and Computation. The National University (Phillippines), 282--286. http://aclweb.org/anthology/Y17-1038.Google ScholarGoogle Scholar
  14. Daxiang Dong, Hua Wu, Wei He, Dianhai Yu, and Haifeng Wang. 2015. Multi-task learning for multiple language translation. In Proceedings of ACL.Google ScholarGoogle ScholarCross RefCross Ref
  15. Marzieh Fadaee, Arianna Bisazza, and Christof Monz. 2017. Data augmentation for low-resource neural machine translation. In ACL.Google ScholarGoogle Scholar
  16. Orhan Firat, Kyunghyun Cho, and Yoshua Bengio. 2016a. Multi-way, multilingual neural machine translation with a shared attention mechanism. In Proceedings of NAACL.Google ScholarGoogle ScholarCross RefCross Ref
  17. Orhan Firat, Baskaran Sankaran, Yaser Al-Onaizan, Fatos T. Yarman Vural, and Kyunghyun Cho. 2016b. Zero-resource translation with multi-lingual neural machine translation. In Proceedings of EMNLP.Google ScholarGoogle ScholarCross RefCross Ref
  18. Sharon Goldwater and David McClosky. 2005. Improving statistical MT through morphological analysis. In HLT/EMNLP. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Kazi Saidul Hasan, Altaf Rahman, and Vincent Ng. 2009. Learning-based named entity recognition for morphologically-rich, resource-scarce languages. In EACL.Google ScholarGoogle Scholar
  20. Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep residual learning for image recognition. In Proceedings of the CVPR. 770--778.Google ScholarGoogle ScholarCross RefCross Ref
  21. Sepp Hochreiter and Jürgen Schmidhuber. 1997. Long short-term memory. Neural Comput. (1997).Google ScholarGoogle Scholar
  22. Seunghoon Hong, Junhyuk Oh, Honglak Lee, and Bohyung Han. 2016. Learning transferrable knowledge for semantic segmentation with deep convolutional neural network. In Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 3204--3212.Google ScholarGoogle ScholarCross RefCross Ref
  23. Melvin Johnson, Mike Schuster, Quoc V. Le, Maxim Krikun, Yonghui Wu, Zhifeng Chen, Nikhil Thorat, Fernanda Viégas, Martin Wattenberg, Greg Corrado, Macduff Hughes, and Jeffrey Dean. 2017. Google’s multilingual neural machine translation system: Enabling zero-shot translation. Trans. ACL (2017).Google ScholarGoogle Scholar
  24. Marcin Junczys-Dowmunt, Tomasz Dwojak, and Hieu Hoang. 2016. Is Neural Machine Translation Ready for Deployment? A Case Study on 30 Translation Directions. arXiv:1610.01108v2.Google ScholarGoogle Scholar
  25. Alina Karakanta, Jon Dehdari, and Josef van Genabith. 2017. Neural machine translation for low-resource languages without parallel corpora. Mach. Transl. (07 Nov 2017). Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. Philipp Koehn and Rebecca Knowles. 2017. Six challenges for neural machine translation. In NMT@ACL.Google ScholarGoogle Scholar
  27. Philipp Koehn, Franz J. Och, and Daniel Marcu. 2003. Statistical phrase-based translation. In Proceedings of NAACL. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. Zhongguo Li and Maosong Sun. 2009. Punctuation as implicit annotations for Chinese word segmentation. Comput. Ling. 35 (2009), 505--512. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. Mingsheng Long, Yue Cao, Jianmin Wang, and Michael I. Jordan. 2015. Learning transferable features with deep adaptation networks. In ICML. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. Minh-Thang Luong and Christopher D. Manning. 2015. Stanford neural machine translation systems for spoken language domains. In Proceedings of IWSLT.Google ScholarGoogle Scholar
  31. Thang Luong, Ilya Sutskever, Quoc V. Le, Oriol Vinyals, and Wojciech Zaremba. 2015. Addressing the rare word problem in neural machine translation. In ACL.Google ScholarGoogle Scholar
  32. Maimaiti Mieradilijiang and Zou Xiaohui. 2018. Discussion on bilingual cognition in international exchange activities. In ICIS2018.Google ScholarGoogle Scholar
  33. Toan Q. Nguyen and David Chiang. 2017. Transfer learning across low-resource, related languages for neural machine translation. In IJCNLP.Google ScholarGoogle Scholar
  34. Maxime Oquab, Léon Bottou, Ivan Laptev, and Josef Sivic. 2014. Learning and transferring mid-level image representations using convolutional neural networks. In Proceedings of the 2014 IEEE Conference on Computer Vision and Pattern Recognition, 1717--1724. Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. Sinno Jialin Pan, Ivor W. Tsang, James T. Kwok, and Qiang Yang. 2009. Domain adaptation via transfer component analysis. IEEE Trans. Neural Networks 22 (2009), 199--210. Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. Sinno Jialin Pan and Qiang Yang. 2010. A survey on transfer learning. IEEE Trans. Knowl. Data Eng. 22 (2010), 1345--1359. Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. Kishore Papineni, Salim Roukos, Todd Ward, and Wei-Jing Zhu. 2002. BLEU: A method for automatic evaluation of machine translation. In Proceedings of ACL. Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. Peyman Passban, Qun Liu, and Andy Way. 2017. Translating low-resource languages by vocabulary adaptation from close counterparts. ACM Trans. Asian Low-Resour. Lang. Inf. Process. 16 (2017), 29:1--29:14. Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. Rico Sennrich, Barry Haddow, and Alexandra Birch. 2016. Neural machine translation of rare words with subword units. CoRR abs/1508.07909 (2016).Google ScholarGoogle Scholar
  40. Hendra Setiawan, Zhongqiang Huang, and Rabih Zbib. 2017. BBN’s low-resource machine translation for the LoReHLT 2016 evaluation. Mach. Transl. (2017), 1--13. Google ScholarGoogle ScholarDigital LibraryDigital Library
  41. Ilya Sutskever, Oriol Vinyals, and Quoc V. Le. 2014. Sequence to sequence learning with neural networks. In Proceedings of NIPS. Google ScholarGoogle ScholarDigital LibraryDigital Library
  42. Ben Tan, Yu Zhang, Sinno Jialin Pan, and Qiang Yang. 2017. Distant domain transfer learning. In AAAI. Google ScholarGoogle ScholarDigital LibraryDigital Library
  43. Clara Vania and Adam Lopez. 2017. From characters to words to in between: Do we capture morphology? In ACL.Google ScholarGoogle Scholar
  44. Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Łukasz Kaiser, and Illia Polosukhin. 2017. Attention is all you need. In Proceedings of NIPS. Google ScholarGoogle ScholarDigital LibraryDigital Library
  45. Zhiyang Wang, Yajuan Lü, and Qun Liu. 2011. Multi-granularity word alignment and decoding for agglutinative language translation. In Proceedings of MT Summit XIII, Xiamen, China.Google ScholarGoogle Scholar
  46. Yonghui Wu, Mike Schuster, Zhifeng Chen, Quoc V. Le, Mohammad Norouzi, Wolfgang Macherey, Maxim Krikun, Yuan Cao, Qin Gao, Klaus Macherey, Jeff Klingner, Apurva Shah, Melvin Johnson, Xiaobing Liu, Lukasz Kaiser, Stephan Gouws, Yoshikiyo Kato, Taku Kudo, Hideto Kazawa, Keith Stevens, George Kurian, Nishant Patil, Wei Wang, Cliff Young, Jason Smith, Jason Riesa, Alex Rudnick, Oriol Vinyals, Gregory S. Corrado, Macduff Hughes, and Jeffrey Dean 2016. Google’s neural machine translation system: Bridging the gap between human and machine translation. CoRR abs/1609.08144.Google ScholarGoogle Scholar
  47. Jiacheng Zhang, Yanzhuo Ding, Shiqi Shen, Yong Cheng, Maosong Sun, Huanbo Luan, and Yang Liu. 2017. THUMT: An open source toolkit for neural machine translation. arXiv preprint arXiv:1706.06415.Google ScholarGoogle Scholar
  48. Hao Zheng, Yong Cheng, and Yang Liu. 2017. Maximum expected likelihood estimation for zero-resource neural machine translation. In Proceedings of IJCAI. Google ScholarGoogle ScholarDigital LibraryDigital Library
  49. Barret Zoph and Kevin Knight. 2016. Multi-source neural translation. In HLT-NAACL.Google ScholarGoogle Scholar
  50. Barret Zoph, Vijay Vasudevan, Jonathon Shlens, and Quoc V. Le. 2017. Learning transferable architectures for scalable image recognition. CoRR abs/1707.07012.Google ScholarGoogle Scholar
  51. Barret Zoph, Deniz Yuret, Jonathan May, and Kevin Knight. 2016. Transfer learning for low-resource neural machine translation. In Proceedings of EMNLP.Google ScholarGoogle ScholarCross RefCross Ref

Index Terms

  1. Multi-Round Transfer Learning for Low-Resource NMT Using Multiple High-Resource Languages

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in

        Full Access

        • Published in

          cover image ACM Transactions on Asian and Low-Resource Language Information Processing
          ACM Transactions on Asian and Low-Resource Language Information Processing  Volume 18, Issue 4
          December 2019
          305 pages
          ISSN:2375-4699
          EISSN:2375-4702
          DOI:10.1145/3327969
          Issue’s Table of Contents

          Copyright © 2019 ACM

          Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

          Publisher

          Association for Computing Machinery

          New York, NY, United States

          Publication History

          • Published: 21 May 2019
          • Accepted: 1 January 2019
          • Received: 1 June 2018
          Published in tallip Volume 18, Issue 4

          Permissions

          Request permissions about this article.

          Request Permissions

          Check for updates

          Qualifiers

          • research-article
          • Research
          • Refereed

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader

        HTML Format

        View this article in HTML Format .

        View HTML Format