Abstract
Neural machine translation (NMT) has made remarkable progress in recent years, but the performance of NMT suffers from a data sparsity problem since large-scale parallel corpora are only readily available for high-resource languages (HRLs). In recent days, transfer learning (TL) has been used widely in low-resource languages (LRLs) machine translation, while TL is becoming one of the vital directions for addressing the data sparsity problem in low-resource NMT. As a solution, a transfer learning method in NMT is generally obtained via initializing the low-resource model (child) with the high-resource model (parent). However, leveraging the original TL to low-resource models is neither able to make full use of highly related multiple HRLs nor to receive different parameters from the same parents. In order to exploit multiple HRLs effectively, we present a language-independent and straightforward multi-round transfer learning (MRTL) approach to low-resource NMT. Besides, with the intention of reducing the differences between high-resource and low-resource languages at the character level, we introduce a unified transliteration method for various language families, which are both semantically and syntactically highly analogous with each other. Experiments on low-resource datasets show that our approaches are effective, significantly outperform the state-of-the-art methods, and yield improvements of up to 5.63 BLEU points.
- Jimmy Lei Ba, Jamie Ryan Kiros, and Geoffrey E. Hinton. 2016. Layer normalization. arXiv preprint arXiv:1607.06450.Google Scholar
- Dzmitry Bahdanau, KyungHyun Cho, and Yoshua Bengio. 2015. Neural machine translation by jointly learning to align and translate. In Proceedings of ICLR.Google Scholar
- Antonio Valerio Miceli Barone and Giuseppe Attardi. 2013. Pre-reordering for machine translation using transition-based walks on dependency parse trees. In WMT@ACL.Google Scholar
- Peter F. Brown, Stephen A. Della Pietra, Vincent J. Della Pietra, and Robert L. Mercer. 1993. The mathematics of statistical machine translation: Parameter estimation. Comput. Ling. 19 (1993), 263--311. Google ScholarDigital Library
- Marine Carpuat, Hal Daumé III, Hatharine Henry, Ann Irvine, Jagadeesh Jagarlamudi, and Rachel Rudinger. 2013. SenseSpotting: Never let your parallel data tie you to an old domain. In Proceedings of ACL.Google Scholar
- Yun Chen, Yang Liu, Yong Cheng, and Victor O. K. Li. 2017. A teacher-student framework for zero-resource neural machine translation. In Proceedings of ACL.Google Scholar
- Yun Chen, Yang Liu, and Victor O. K. Li. 2018. Zero-resource neural machine translation with multi-agent communication game. CoRR abs/1802.03116.Google Scholar
- Yong Cheng, Wei Xu, Zhongjun He, Wei He, Hua Wu, Maosong Sun, and Yang Liu. 2016. Semi-superivsed learning for neural machine transaltion. In Proceedings of ACL.Google ScholarCross Ref
- Yong Cheng, Qian Yang, Yang Liu, Maosong Sun, and Wei Xu. 2017. Joint training for pivot-based neural machine translation. In Proceedings of IJCAI. Google ScholarDigital Library
- David Chiang. 2005. A hierarchical phrase-based model for statistical machine translation. In Proceedings of ACL. Google ScholarDigital Library
- Kyunghyun Cho, Bart van Merrienboer, Caglar Gulcehre, Dzmitry Bahdanau, Fethi Bougares, Holger Schwenk, and Yoshua Bengio. 2014. Learning phrase representations using RNN encoder--decoder for statistical machine translation. In Proceedings of EMNLP.Google ScholarCross Ref
- Chenhui Chu, Raj Dabre, and Sadao Kurohashi. 2017. An empirical comparison of simple domain adaptation methods for neural machine translation. CoRR abs/1701.03214.Google Scholar
- Raj Dabre, Tetsuji Nakagawa, and Hideto Kazawa. 2017. An empirical study of language relatedness for transfer learning in neural machine translation. In Proceedings of the 31st Pacific Asia Conference on Language, Information, and Computation. The National University (Phillippines), 282--286. http://aclweb.org/anthology/Y17-1038.Google Scholar
- Daxiang Dong, Hua Wu, Wei He, Dianhai Yu, and Haifeng Wang. 2015. Multi-task learning for multiple language translation. In Proceedings of ACL.Google ScholarCross Ref
- Marzieh Fadaee, Arianna Bisazza, and Christof Monz. 2017. Data augmentation for low-resource neural machine translation. In ACL.Google Scholar
- Orhan Firat, Kyunghyun Cho, and Yoshua Bengio. 2016a. Multi-way, multilingual neural machine translation with a shared attention mechanism. In Proceedings of NAACL.Google ScholarCross Ref
- Orhan Firat, Baskaran Sankaran, Yaser Al-Onaizan, Fatos T. Yarman Vural, and Kyunghyun Cho. 2016b. Zero-resource translation with multi-lingual neural machine translation. In Proceedings of EMNLP.Google ScholarCross Ref
- Sharon Goldwater and David McClosky. 2005. Improving statistical MT through morphological analysis. In HLT/EMNLP. Google ScholarDigital Library
- Kazi Saidul Hasan, Altaf Rahman, and Vincent Ng. 2009. Learning-based named entity recognition for morphologically-rich, resource-scarce languages. In EACL.Google Scholar
- Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep residual learning for image recognition. In Proceedings of the CVPR. 770--778.Google ScholarCross Ref
- Sepp Hochreiter and Jürgen Schmidhuber. 1997. Long short-term memory. Neural Comput. (1997).Google Scholar
- Seunghoon Hong, Junhyuk Oh, Honglak Lee, and Bohyung Han. 2016. Learning transferrable knowledge for semantic segmentation with deep convolutional neural network. In Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 3204--3212.Google ScholarCross Ref
- Melvin Johnson, Mike Schuster, Quoc V. Le, Maxim Krikun, Yonghui Wu, Zhifeng Chen, Nikhil Thorat, Fernanda Viégas, Martin Wattenberg, Greg Corrado, Macduff Hughes, and Jeffrey Dean. 2017. Google’s multilingual neural machine translation system: Enabling zero-shot translation. Trans. ACL (2017).Google Scholar
- Marcin Junczys-Dowmunt, Tomasz Dwojak, and Hieu Hoang. 2016. Is Neural Machine Translation Ready for Deployment? A Case Study on 30 Translation Directions. arXiv:1610.01108v2.Google Scholar
- Alina Karakanta, Jon Dehdari, and Josef van Genabith. 2017. Neural machine translation for low-resource languages without parallel corpora. Mach. Transl. (07 Nov 2017). Google ScholarDigital Library
- Philipp Koehn and Rebecca Knowles. 2017. Six challenges for neural machine translation. In NMT@ACL.Google Scholar
- Philipp Koehn, Franz J. Och, and Daniel Marcu. 2003. Statistical phrase-based translation. In Proceedings of NAACL. Google ScholarDigital Library
- Zhongguo Li and Maosong Sun. 2009. Punctuation as implicit annotations for Chinese word segmentation. Comput. Ling. 35 (2009), 505--512. Google ScholarDigital Library
- Mingsheng Long, Yue Cao, Jianmin Wang, and Michael I. Jordan. 2015. Learning transferable features with deep adaptation networks. In ICML. Google ScholarDigital Library
- Minh-Thang Luong and Christopher D. Manning. 2015. Stanford neural machine translation systems for spoken language domains. In Proceedings of IWSLT.Google Scholar
- Thang Luong, Ilya Sutskever, Quoc V. Le, Oriol Vinyals, and Wojciech Zaremba. 2015. Addressing the rare word problem in neural machine translation. In ACL.Google Scholar
- Maimaiti Mieradilijiang and Zou Xiaohui. 2018. Discussion on bilingual cognition in international exchange activities. In ICIS2018.Google Scholar
- Toan Q. Nguyen and David Chiang. 2017. Transfer learning across low-resource, related languages for neural machine translation. In IJCNLP.Google Scholar
- Maxime Oquab, Léon Bottou, Ivan Laptev, and Josef Sivic. 2014. Learning and transferring mid-level image representations using convolutional neural networks. In Proceedings of the 2014 IEEE Conference on Computer Vision and Pattern Recognition, 1717--1724. Google ScholarDigital Library
- Sinno Jialin Pan, Ivor W. Tsang, James T. Kwok, and Qiang Yang. 2009. Domain adaptation via transfer component analysis. IEEE Trans. Neural Networks 22 (2009), 199--210. Google ScholarDigital Library
- Sinno Jialin Pan and Qiang Yang. 2010. A survey on transfer learning. IEEE Trans. Knowl. Data Eng. 22 (2010), 1345--1359. Google ScholarDigital Library
- Kishore Papineni, Salim Roukos, Todd Ward, and Wei-Jing Zhu. 2002. BLEU: A method for automatic evaluation of machine translation. In Proceedings of ACL. Google ScholarDigital Library
- Peyman Passban, Qun Liu, and Andy Way. 2017. Translating low-resource languages by vocabulary adaptation from close counterparts. ACM Trans. Asian Low-Resour. Lang. Inf. Process. 16 (2017), 29:1--29:14. Google ScholarDigital Library
- Rico Sennrich, Barry Haddow, and Alexandra Birch. 2016. Neural machine translation of rare words with subword units. CoRR abs/1508.07909 (2016).Google Scholar
- Hendra Setiawan, Zhongqiang Huang, and Rabih Zbib. 2017. BBN’s low-resource machine translation for the LoReHLT 2016 evaluation. Mach. Transl. (2017), 1--13. Google ScholarDigital Library
- Ilya Sutskever, Oriol Vinyals, and Quoc V. Le. 2014. Sequence to sequence learning with neural networks. In Proceedings of NIPS. Google ScholarDigital Library
- Ben Tan, Yu Zhang, Sinno Jialin Pan, and Qiang Yang. 2017. Distant domain transfer learning. In AAAI. Google ScholarDigital Library
- Clara Vania and Adam Lopez. 2017. From characters to words to in between: Do we capture morphology? In ACL.Google Scholar
- Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Łukasz Kaiser, and Illia Polosukhin. 2017. Attention is all you need. In Proceedings of NIPS. Google ScholarDigital Library
- Zhiyang Wang, Yajuan Lü, and Qun Liu. 2011. Multi-granularity word alignment and decoding for agglutinative language translation. In Proceedings of MT Summit XIII, Xiamen, China.Google Scholar
- Yonghui Wu, Mike Schuster, Zhifeng Chen, Quoc V. Le, Mohammad Norouzi, Wolfgang Macherey, Maxim Krikun, Yuan Cao, Qin Gao, Klaus Macherey, Jeff Klingner, Apurva Shah, Melvin Johnson, Xiaobing Liu, Lukasz Kaiser, Stephan Gouws, Yoshikiyo Kato, Taku Kudo, Hideto Kazawa, Keith Stevens, George Kurian, Nishant Patil, Wei Wang, Cliff Young, Jason Smith, Jason Riesa, Alex Rudnick, Oriol Vinyals, Gregory S. Corrado, Macduff Hughes, and Jeffrey Dean 2016. Google’s neural machine translation system: Bridging the gap between human and machine translation. CoRR abs/1609.08144.Google Scholar
- Jiacheng Zhang, Yanzhuo Ding, Shiqi Shen, Yong Cheng, Maosong Sun, Huanbo Luan, and Yang Liu. 2017. THUMT: An open source toolkit for neural machine translation. arXiv preprint arXiv:1706.06415.Google Scholar
- Hao Zheng, Yong Cheng, and Yang Liu. 2017. Maximum expected likelihood estimation for zero-resource neural machine translation. In Proceedings of IJCAI. Google ScholarDigital Library
- Barret Zoph and Kevin Knight. 2016. Multi-source neural translation. In HLT-NAACL.Google Scholar
- Barret Zoph, Vijay Vasudevan, Jonathon Shlens, and Quoc V. Le. 2017. Learning transferable architectures for scalable image recognition. CoRR abs/1707.07012.Google Scholar
- Barret Zoph, Deniz Yuret, Jonathan May, and Kevin Knight. 2016. Transfer learning for low-resource neural machine translation. In Proceedings of EMNLP.Google ScholarCross Ref
Index Terms
- Multi-Round Transfer Learning for Low-Resource NMT Using Multiple High-Resource Languages
Recommendations
Neural Machine Translation for Low-resource Languages: A Survey
Neural Machine Translation (NMT) has seen tremendous growth in the last ten years since the early 2000s and has already entered a mature phase. While considered the most widely used solution for Machine Translation, its performance on low-resource ...
Translating Low-Resource Languages by Vocabulary Adaptation from Close Counterparts
Some natural languages belong to the same family or share similar syntactic and/or semantic regularities. This property persuades researchers to share computational models across languages and benefit from high-quality models to boost existing low-...
Mining parallel sentences from internet with multi-view knowledge distillation for low-resource language pairs
AbstractThe neural machine translation (NMT), which relies on a large training data (bilingual parallel sentences, for NMT) to obtain the state-of-the-art performance, is similar with deep learning. In order to construct NMT systems, the number of ...
Comments