research-article

Multi-Round Transfer Learning for Low-Resource NMT Using Multiple High-Resource Languages

Authors:
Mieradilijiang Maimaiti

State Key Laboratory of Intelligent Technology and Systems, Institute for Artificial Intelligence, Tsinghua University, Beijing National Research Center for Information Science and Technology (BNRist), Department of Computer Science and Technology, Tsinghua University

State Key Laboratory of Intelligent Technology and Systems, Institute for Artificial Intelligence, Tsinghua University, Beijing National Research Center for Information Science and Technology (BNRist), Department of Computer Science and Technology, Tsinghua University

0000-0002-4231-6519
View Profile

,
Yang Liu

State Key Laboratory of Intelligent Technology and Systems, Institute for Artificial Intelligence, Tsinghua University, Beijing National Research Center for Information Science and Technology (BNRist), Department of Computer Science and Technology, Tsinghua University

State Key Laboratory of Intelligent Technology and Systems, Institute for Artificial Intelligence, Tsinghua University, Beijing National Research Center for Information Science and Technology (BNRist), Department of Computer Science and Technology, Tsinghua University
View Profile

,
Huanbo Luan

State Key Laboratory of Intelligent Technology and Systems, Institute for Artificial Intelligence, Tsinghua University, Beijing National Research Center for Information Science and Technology (BNRist), Department of Computer Science and Technology, Tsinghua University

State Key Laboratory of Intelligent Technology and Systems, Institute for Artificial Intelligence, Tsinghua University, Beijing National Research Center for Information Science and Technology (BNRist), Department of Computer Science and Technology, Tsinghua University
View Profile

,
Maosong Sun

State Key Laboratory of Intelligent Technology and Systems, Institute for Artificial Intelligence, Tsinghua University, Beijing National Research Center for Information Science and Technology (BNRist), Department of Computer Science and Technology, Tsinghua University

State Key Laboratory of Intelligent Technology and Systems, Institute for Artificial Intelligence, Tsinghua University, Beijing National Research Center for Information Science and Technology (BNRist), Department of Computer Science and Technology, Tsinghua University
View Profile

ACM Transactions on Asian and Low-Resource Language Information Processing Volume 18 Issue 4Article No.: 38pp 1–26https://doi.org/10.1145/3314945

Published:21 May 2019Publication History

ACM Transactions on Asian and Low-Resource Language Information Processing

Abstract

Neural machine translation (NMT) has made remarkable progress in recent years, but the performance of NMT suffers from a data sparsity problem since large-scale parallel corpora are only readily available for high-resource languages (HRLs). In recent days, transfer learning (TL) has been used widely in low-resource languages (LRLs) machine translation, while TL is becoming one of the vital directions for addressing the data sparsity problem in low-resource NMT. As a solution, a transfer learning method in NMT is generally obtained via initializing the low-resource model (child) with the high-resource model (parent). However, leveraging the original TL to low-resource models is neither able to make full use of highly related multiple HRLs nor to receive different parameters from the same parents. In order to exploit multiple HRLs effectively, we present a language-independent and straightforward multi-round transfer learning (MRTL) approach to low-resource NMT. Besides, with the intention of reducing the differences between high-resource and low-resource languages at the character level, we introduce a unified transliteration method for various language families, which are both semantically and syntactically highly analogous with each other. Experiments on low-resource datasets show that our approaches are effective, significantly outperform the state-of-the-art methods, and yield improvements of up to 5.63 BLEU points.

References

Jimmy Lei Ba, Jamie Ryan Kiros, and Geoffrey E. Hinton. 2016. Layer normalization. arXiv preprint arXiv:1607.06450.Google Scholar
Dzmitry Bahdanau, KyungHyun Cho, and Yoshua Bengio. 2015. Neural machine translation by jointly learning to align and translate. In Proceedings of ICLR.Google Scholar
Antonio Valerio Miceli Barone and Giuseppe Attardi. 2013. Pre-reordering for machine translation using transition-based walks on dependency parse trees. In WMT@ACL.Google Scholar
Peter F. Brown, Stephen A. Della Pietra, Vincent J. Della Pietra, and Robert L. Mercer. 1993. The mathematics of statistical machine translation: Parameter estimation. Comput. Ling. 19 (1993), 263--311. Google ScholarDigital Library
Marine Carpuat, Hal Daumé III, Hatharine Henry, Ann Irvine, Jagadeesh Jagarlamudi, and Rachel Rudinger. 2013. SenseSpotting: Never let your parallel data tie you to an old domain. In Proceedings of ACL.Google Scholar
Yun Chen, Yang Liu, Yong Cheng, and Victor O. K. Li. 2017. A teacher-student framework for zero-resource neural machine translation. In Proceedings of ACL.Google Scholar
Yun Chen, Yang Liu, and Victor O. K. Li. 2018. Zero-resource neural machine translation with multi-agent communication game. CoRR abs/1802.03116.Google Scholar
Yong Cheng, Wei Xu, Zhongjun He, Wei He, Hua Wu, Maosong Sun, and Yang Liu. 2016. Semi-superivsed learning for neural machine transaltion. In Proceedings of ACL.Google ScholarCross Ref
Yong Cheng, Qian Yang, Yang Liu, Maosong Sun, and Wei Xu. 2017. Joint training for pivot-based neural machine translation. In Proceedings of IJCAI. Google ScholarDigital Library
David Chiang. 2005. A hierarchical phrase-based model for statistical machine translation. In Proceedings of ACL. Google ScholarDigital Library
Kyunghyun Cho, Bart van Merrienboer, Caglar Gulcehre, Dzmitry Bahdanau, Fethi Bougares, Holger Schwenk, and Yoshua Bengio. 2014. Learning phrase representations using RNN encoder--decoder for statistical machine translation. In Proceedings of EMNLP.Google ScholarCross Ref
Chenhui Chu, Raj Dabre, and Sadao Kurohashi. 2017. An empirical comparison of simple domain adaptation methods for neural machine translation. CoRR abs/1701.03214.Google Scholar
Raj Dabre, Tetsuji Nakagawa, and Hideto Kazawa. 2017. An empirical study of language relatedness for transfer learning in neural machine translation. In Proceedings of the 31st Pacific Asia Conference on Language, Information, and Computation. The National University (Phillippines), 282--286. http://aclweb.org/anthology/Y17-1038.Google Scholar
Daxiang Dong, Hua Wu, Wei He, Dianhai Yu, and Haifeng Wang. 2015. Multi-task learning for multiple language translation. In Proceedings of ACL.Google ScholarCross Ref
Marzieh Fadaee, Arianna Bisazza, and Christof Monz. 2017. Data augmentation for low-resource neural machine translation. In ACL.Google Scholar
Orhan Firat, Kyunghyun Cho, and Yoshua Bengio. 2016a. Multi-way, multilingual neural machine translation with a shared attention mechanism. In Proceedings of NAACL.Google ScholarCross Ref
Orhan Firat, Baskaran Sankaran, Yaser Al-Onaizan, Fatos T. Yarman Vural, and Kyunghyun Cho. 2016b. Zero-resource translation with multi-lingual neural machine translation. In Proceedings of EMNLP.Google ScholarCross Ref
Sharon Goldwater and David McClosky. 2005. Improving statistical MT through morphological analysis. In HLT/EMNLP. Google ScholarDigital Library
Kazi Saidul Hasan, Altaf Rahman, and Vincent Ng. 2009. Learning-based named entity recognition for morphologically-rich, resource-scarce languages. In EACL.Google Scholar
Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep residual learning for image recognition. In Proceedings of the CVPR. 770--778.Google ScholarCross Ref
Sepp Hochreiter and Jürgen Schmidhuber. 1997. Long short-term memory. Neural Comput. (1997).Google Scholar
Seunghoon Hong, Junhyuk Oh, Honglak Lee, and Bohyung Han. 2016. Learning transferrable knowledge for semantic segmentation with deep convolutional neural network. In Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 3204--3212.Google ScholarCross Ref
Melvin Johnson, Mike Schuster, Quoc V. Le, Maxim Krikun, Yonghui Wu, Zhifeng Chen, Nikhil Thorat, Fernanda Viégas, Martin Wattenberg, Greg Corrado, Macduff Hughes, and Jeffrey Dean. 2017. Google’s multilingual neural machine translation system: Enabling zero-shot translation. Trans. ACL (2017).Google Scholar
Marcin Junczys-Dowmunt, Tomasz Dwojak, and Hieu Hoang. 2016. Is Neural Machine Translation Ready for Deployment? A Case Study on 30 Translation Directions. arXiv:1610.01108v2.Google Scholar
Alina Karakanta, Jon Dehdari, and Josef van Genabith. 2017. Neural machine translation for low-resource languages without parallel corpora. Mach. Transl. (07 Nov 2017). Google ScholarDigital Library
Philipp Koehn and Rebecca Knowles. 2017. Six challenges for neural machine translation. In NMT@ACL.Google Scholar
Philipp Koehn, Franz J. Och, and Daniel Marcu. 2003. Statistical phrase-based translation. In Proceedings of NAACL. Google ScholarDigital Library
Zhongguo Li and Maosong Sun. 2009. Punctuation as implicit annotations for Chinese word segmentation. Comput. Ling. 35 (2009), 505--512. Google ScholarDigital Library
Mingsheng Long, Yue Cao, Jianmin Wang, and Michael I. Jordan. 2015. Learning transferable features with deep adaptation networks. In ICML. Google ScholarDigital Library
Minh-Thang Luong and Christopher D. Manning. 2015. Stanford neural machine translation systems for spoken language domains. In Proceedings of IWSLT.Google Scholar
Thang Luong, Ilya Sutskever, Quoc V. Le, Oriol Vinyals, and Wojciech Zaremba. 2015. Addressing the rare word problem in neural machine translation. In ACL.Google Scholar
Maimaiti Mieradilijiang and Zou Xiaohui. 2018. Discussion on bilingual cognition in international exchange activities. In ICIS2018.Google Scholar
Toan Q. Nguyen and David Chiang. 2017. Transfer learning across low-resource, related languages for neural machine translation. In IJCNLP.Google Scholar
Maxime Oquab, Léon Bottou, Ivan Laptev, and Josef Sivic. 2014. Learning and transferring mid-level image representations using convolutional neural networks. In Proceedings of the 2014 IEEE Conference on Computer Vision and Pattern Recognition, 1717--1724. Google ScholarDigital Library
Sinno Jialin Pan, Ivor W. Tsang, James T. Kwok, and Qiang Yang. 2009. Domain adaptation via transfer component analysis. IEEE Trans. Neural Networks 22 (2009), 199--210. Google ScholarDigital Library
Sinno Jialin Pan and Qiang Yang. 2010. A survey on transfer learning. IEEE Trans. Knowl. Data Eng. 22 (2010), 1345--1359. Google ScholarDigital Library
Kishore Papineni, Salim Roukos, Todd Ward, and Wei-Jing Zhu. 2002. BLEU: A method for automatic evaluation of machine translation. In Proceedings of ACL. Google ScholarDigital Library
Peyman Passban, Qun Liu, and Andy Way. 2017. Translating low-resource languages by vocabulary adaptation from close counterparts. ACM Trans. Asian Low-Resour. Lang. Inf. Process. 16 (2017), 29:1--29:14. Google ScholarDigital Library
Rico Sennrich, Barry Haddow, and Alexandra Birch. 2016. Neural machine translation of rare words with subword units. CoRR abs/1508.07909 (2016).Google Scholar
Hendra Setiawan, Zhongqiang Huang, and Rabih Zbib. 2017. BBN’s low-resource machine translation for the LoReHLT 2016 evaluation. Mach. Transl. (2017), 1--13. Google ScholarDigital Library
Ilya Sutskever, Oriol Vinyals, and Quoc V. Le. 2014. Sequence to sequence learning with neural networks. In Proceedings of NIPS. Google ScholarDigital Library
Ben Tan, Yu Zhang, Sinno Jialin Pan, and Qiang Yang. 2017. Distant domain transfer learning. In AAAI. Google ScholarDigital Library
Clara Vania and Adam Lopez. 2017. From characters to words to in between: Do we capture morphology? In ACL.Google Scholar
Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Łukasz Kaiser, and Illia Polosukhin. 2017. Attention is all you need. In Proceedings of NIPS. Google ScholarDigital Library
Zhiyang Wang, Yajuan Lü, and Qun Liu. 2011. Multi-granularity word alignment and decoding for agglutinative language translation. In Proceedings of MT Summit XIII, Xiamen, China.Google Scholar
Yonghui Wu, Mike Schuster, Zhifeng Chen, Quoc V. Le, Mohammad Norouzi, Wolfgang Macherey, Maxim Krikun, Yuan Cao, Qin Gao, Klaus Macherey, Jeff Klingner, Apurva Shah, Melvin Johnson, Xiaobing Liu, Lukasz Kaiser, Stephan Gouws, Yoshikiyo Kato, Taku Kudo, Hideto Kazawa, Keith Stevens, George Kurian, Nishant Patil, Wei Wang, Cliff Young, Jason Smith, Jason Riesa, Alex Rudnick, Oriol Vinyals, Gregory S. Corrado, Macduff Hughes, and Jeffrey Dean 2016. Google’s neural machine translation system: Bridging the gap between human and machine translation. CoRR abs/1609.08144.Google Scholar
Jiacheng Zhang, Yanzhuo Ding, Shiqi Shen, Yong Cheng, Maosong Sun, Huanbo Luan, and Yang Liu. 2017. THUMT: An open source toolkit for neural machine translation. arXiv preprint arXiv:1706.06415.Google Scholar
Hao Zheng, Yong Cheng, and Yang Liu. 2017. Maximum expected likelihood estimation for zero-resource neural machine translation. In Proceedings of IJCAI. Google ScholarDigital Library
Barret Zoph and Kevin Knight. 2016. Multi-source neural translation. In HLT-NAACL.Google Scholar
Barret Zoph, Vijay Vasudevan, Jonathon Shlens, and Quoc V. Le. 2017. Learning transferable architectures for scalable image recognition. CoRR abs/1707.07012.Google Scholar
Barret Zoph, Deniz Yuret, Jonathan May, and Kevin Knight. 2016. Transfer learning for low-resource neural machine translation. In Proceedings of EMNLP.Google ScholarCross Ref

Index Terms

Multi-Round Transfer Learning for Low-Resource NMT Using Multiple High-Resource Languages
1. Computing methodologies
  1. Artificial intelligence
    1. Natural language processing
      1. Machine translation
  2. Machine learning
    1. Learning paradigms
      1. Multi-task learning
        Transfer learning
    2. Machine learning approaches
      1. Neural networks

Recommendations

Neural Machine Translation for Low-resource Languages: A Survey
Neural Machine Translation (NMT) has seen tremendous growth in the last ten years since the early 2000s and has already entered a mature phase. While considered the most widely used solution for Machine Translation, its performance on low-resource ...
Read More
Translating Low-Resource Languages by Vocabulary Adaptation from Close Counterparts

Some natural languages belong to the same family or share similar syntactic and/or semantic regularities. This property persuades researchers to share computational models across languages and benefit from high-quality models to boost existing low-...
Read More
Mining parallel sentences from internet with multi-view knowledge distillation for low-resource language pairs
Abstract
The neural machine translation (NMT), which relies on a large training data (bilingual parallel sentences, for NMT) to obtain the state-of-the-art performance, is similar with deep learning. In order to construct NMT systems, the number of ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Published in
ACM Transactions on Asian and Low-Resource Language Information Processing Volume 18, Issue 4
December 2019
305 pages
ISSN:2375-4699
EISSN:2375-4702
DOI:10.1145/3327969
Editor:
Nianwen Xue
Brandeis University, Waltham, USA
Issue’s Table of Contents
Copyright © 2019 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 21 May 2019
- Accepted: 1 January 2019
- Received: 1 June 2018
Published in tallip Volume 18, Issue 4

Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
Neural machine translation
high-resource language
low-resource language
multi-round
transfer learning
transliteration
Qualifiers
- research-article
- Research
- Refereed
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 8
  Total Citations
  View Citations
- 599
  Total Downloads
- Downloads (Last 12 months)49
- Downloads (Last 6 weeks)7
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format .

View HTML Format

Multi-Round Transfer Learning for Low-Resource NMT Using Multiple High-Resource Languages

ACM Transactions on Asian and Low-Resource Language Information Processing

Abstract

References

Cited By

Index Terms

Recommendations

Neural Machine Translation for Low-resource Languages: A Survey

Translating Low-Resource Languages by Vocabulary Adaptation from Close Counterparts

Mining parallel sentences from internet with multi-view knowledge distillation for low-resource language pairs

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

HTML Format

Caption

Multi-Round Transfer Learning for Low-Resource NMT Using Multiple High-Resource Languages

ACM Transactions on Asian and Low-Resource Language Information Processing

Abstract

References

Cited By

Index Terms

Recommendations

Neural Machine Translation for Low-resource Languages: A Survey

Translating Low-Resource Languages by Vocabulary Adaptation from Close Counterparts

Mining parallel sentences from internet with multi-view knowledge distillation for low-resource language pairs

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

HTML Format

Share this Publication link

Share on Social Media