research-article

An Ensemble Strategy with Gradient Conflict for Multi-Domain Neural Machine Translation

Authors:

Jinan XuAuthors Info & Claims

ACM Transactions on Asian and Low-Resource Language Information Processing, Volume 23, Issue 2

Article No.: 20, Pages 1 - 22

https://doi.org/10.1145/3638248

Published: 08 February 2024 Publication History

Abstract

Multi-domain neural machine translation aims to construct a unified neural machine translation model to translate sentences across various domains. Nevertheless, previous studies have one limitation is the incapacity to acquire both domain-general and domain-specific representations concurrently. To this end, we propose an ensemble strategy with gradient conflict for multi-domain neural machine translation that automatically learns model parameters by identifying both domain-shared and domain-specific features. Specifically, our approach consists of (1) a parameter-sharing framework, where the parameters of all the layers are originally shared and equivalent to each domain, and (2) ensemble strategy, in which we design an Extra Ensemble strategy via a piecewise condition function to learn direction and distance-based gradient conflict. In addition, we give a detailed theoretical analysis of the gradient conflict to further validate the effectiveness of our approach. Experimental results on two multi-domain datasets show the superior performance of our proposed model compared to previous work.

References

[1]

Roee Aharoni and Yoav Goldberg. 2020. Unsupervised domain clusters in pretrained language models. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics (ACL ’20).

[2]

Dzmitry Bahdanau, Kyung Hyun Cho, and Yoshua Bengio. 2015. Neural machine translation by jointly learning to align and translate. In Proceedings of the 3rd International Conference on Learning Representations (ICLR ’15).

[3]

Shai Ben-David, John Blitzer, Koby Crammer, and Fernando Pereira. 2006. Analysis of representations for domain adaptation. Advances in Neural Information Processing Systems 19 (2006), 137–144.

[4]

Denny Britz, Quoc Le, and Reid Pryzant. 2017. Effective domain mixing for neural machine translation. In Proceedings of the 2nd Conference on Machine Translation. 118–126.

[5]

Boxing Chen, Colin Cherry, George Foster, and Samuel Larkin. 2017. Cost weighting for neural machine translation domain adaptation. In Proceedings of the 1st Workshop on Neural Machine Translation. 40–46.

[6]

Chenhui Chu, Raj Dabre, and Sadao Kurohashi. 2017. An empirical comparison of domain adaptation methods for neural machine translation. In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers). 385–391.

[7]

Chenhui Chu and Rui Wang. 2018. A survey of domain adaptation for neural machine translation. In Proceedings of the 27th International Conference on Computational Linguistics. 1304–1319.

[8]

Jonathan H. Clark, Alon Lavie, and Chris Dyer. 2012. One system, many domains: Open-domain statistical machine translation via feature augmentation. In Proceedings of the 10th Conference of the Association for Machine Translation in the Americas: Research Papers.

[9]

Patrick Fernandes, António Farinhas, Ricardo Rei, José De Souza, Perez Ogayo, Graham Neubig, and Andre Martins. 2022. Quality-aware decoding for neural machine translation. In Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 1396–1412. DOI:

[10]

Markus Freitag and Yaser Al-Onaizan. 2016. Fast domain adaptation for neural machine translation. arXiv preprint arXiv:1612.06897 (2016).

[11]

Yaroslav Ganin, Evgeniya Ustinova, Hana Ajakan, Pascal Germain, Hugo Larochelle, François Laviolette, Mario Marchand, and Victor Lempitsky. 2016. Domain-adversarial training of neural networks. Journal of Machine Learning Research 17, 1 (2016), 2096–2030.

[12]

Shuhao Gu, Yang Feng, and Qun Liu. 2019. Improving domain adaptation translation with domain invariant and specific information. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long and Short Papers). 3081–3091.

[13]

Shuhao Gu, Yang Feng, and Wanying Xie. 2021. Pruning-then-expanding model for domain adaptation of neural machine translation. arXiv preprint arXiv:2103.13678 (2021).

[14]

Eva Hasler, Tobias Domhan, Jonay Trénous, Ke M. Tran, Bill Byrne, and Felix Hieber. 2021. Improving the quality trade-off for neural machine translation multi-domain adaptation. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing. 8470–8477.

[15]

Amr Hendy, Mohamed Abdelghaffar, Mohamed Afify, and Ahmed Y. Tawfik. 2022. Domain specific sub-network for multi-domain neural machine translation. In Proceedings of the 2nd Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics and the 12th International Joint Conference on Natural Language Processing. 351–356.

[16]

Matthias Huck, Alexandra Birch, and Barry Haddow. 2015. Mixed-domain vs. multi-domain statistical machine translation. Proceedings of MT Summit XV 1 (2015), 240–255.

[17]

Haoming Jiang, Chen Liang, Chong Wang, and Tuo Zhao. 2020. Multi-domain neural machine translation with word-level adaptive layer-wise domain mixing. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. 1823–1834.

[18]

Di Jin, Zhijing Jin, Joey Tianyi Zhou, and Peter Szolovits. 2020. A simple baseline to semi-supervised domain adaptation for machine translation. arXiv e-prints arXiv:2001.08140 (2020).

[19]

Catherine Kobus, Josep Crego, and Jean Senellart. 2016. Domain control for neural machine translation. arXiv preprint arXiv:1612.06140 (2016).

[20]

Philipp Koehn, Hieu Hoang, Alexandra Birch, Chris Callison-Burch, Marcello Federico, Nicola Bertoldi, Brooke Cowan, Wade Shen, Christine Moran, Richard Zens, Chris Dyer, Ondrej Bojar, Alexandra Constantin, and Evan Herbst. 2007. Open source toolkit for statistical machine translation: Factored translation models and confusion network decoding. In Proceedings of the ACL 2007 Demo and Poster Sessions.

[21]

Wen Lai, Jindřich Libovickỳ, and Alexander Fraser. 2021. Improving both domain robustness and domain adaptability in machine translation. arXiv preprint arXiv:2112.08288 (2021).

[22]

Jiyoung Lee, Hantae Kim, Hyunchang Cho, Edward Choi, and Cheonbok Park. 2022. Specializing multi-domain NMT via penalizing low mutual information. arXiv preprint arXiv:2210.12910 (2022).

[23]

Jianze Liang, Chengqi Zhao, Mingxuan Wang, Xipeng Qiu, and Lei Li. 2021. Finding sparse structures for domain specific neural machine translation. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 35. 13333–13342.

[24]

Bo Liu, Xingchao Liu, Xiaojie Jin, Peter Stone, and Qiang Liu. 2021. Conflict-averse gradient descent for multi-task learning. Advances in Neural Information Processing Systems 34 (2021), 18878–18890.

[25]

Minh-Thang Luong and Christopher D. Manning. 2015. Stanford neural machine translation systems for spoken language domains. In Proceedings of the International Workshop on Spoken Language Translation.

[26]

Zhibo Man, Zengcheng Huang, Yujie Zhang, Yu Li, Yuanmeng Chen, Yufeng Chen, and Jinan Xu. 2023. WDSRL: Multi-domain neural machine translation with word-level domain-sensitive representation learning. IEEE/ACM Transactions on Audio, Speech, and Language Processing 32 (2023), 577–590.

[27]

Zhibo Man, Yujie Zhang, Yuanmeng Chen, Yufeng Chen, and Jinan Xu. 2023. Exploring domain-shared and domain-specific knowledge in multi-domain neural machine translation. In Proceedings of Machine Translation Summit XIX, Vol. 1: Research Track. 99–110.

[28]

Myle Ott, Sergey Edunov, Alexei Baevski, Angela Fan, Sam Gross, Nathan Ng, David Grangier, and Michael Auli. 2019. fairseq: A fast, extensible toolkit for sequence modeling. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics (Demonstrations). 48–53.

[29]

Matt Post. 2018. A call for clarity in reporting BLEU scores. In Proceedings of the 3rd Conference on Machine Translation: Research Papers. 186–191.

[30]

Devendra Sachan and Graham Neubig. 2018. Parameter sharing methods for multilingual self-attentional translation models. In Proceedings of the 3rd Conference on Machine Translation: Research Papers. 261–271.

[31]

Danielle Saunders. 2021. Domain adaptation and multi-domain adaptation for neural machine translation: A survey. arXiv preprint arXiv:2104.06951 (2021).

[32]

Rico Sennrich, Barry Haddow, and Alexandra Birch. 2016. Neural machine translation of rare words with subword units. In Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). 1715–1725.

[33]

Rico Sennrich, Holger Schwenk, and Walid Aransa. 2013. A multi-domain translation model framework for statistical machine translation. In Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). 832–840.

[34]

Christophe Servan, Josep Crego, and Jean Senellart. 2016. Domain specialization: A post-training domain adaptation for neural machine translation. arXiv preprint arXiv:1612.06141 (2016).

[35]

Jinsong Su, Jiali Zeng, Jun Xie, Huating Wen, Yongjing Yin, and Yang Liu. 2021. Exploring discriminative word-level domain contexts for multi-domain neural machine translation. IEEE Transactions on Pattern Analysis and Machine Intelligence 43, 5 (2021), 1530–1545. DOI:

[36]

Sander Tars and Mark Fishel. 2018. Multi-domain neural machine translation. arXiv preprint arXiv:1805.02282 (2018).

[37]

Liang Tian, Derek F. Wong, Lidia S. Chao, Paulo Quaresma, Francisco Oliveira, Yi Lu, Shuo Li, Yiming Wang, and Longyue Wang. 2014. UM-Corpus: A large English-Chinese parallel corpus for statistical machine translation. In Proceedings of the 9th International Conference on Language Resources and Evaluation (LREC ’14). 1837–1842.

[38]

Huihsin Tseng, Pi-Chuan Chang, Galen Andrew, Dan Jurafsky, and Christopher D. Manning. 2005. A conditional random field word segmenter for Sighan Bakeoff 2005. In Proceedings of the 4th SIGHAN Workshop on Chinese Language Processing.

[39]

Laurens Van der Maaten and Geoffrey Hinton. 2008. Visualizing data using t-SNE. Journal of Machine Learning Research 9, 11 (2008), 2579–2605.

[40]

Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Łukasz Kaiser, and Illia Polosukhin. 2017. Attention is all you need. In Proceedings of the 31st Conference on Neural Information Processing Systems (NIPS ’17). 5998–6008.

[41]

David Vilar. 2018. Learning hidden unit contribution for adapting neural machine translation models. In Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 2: Short Papers). 500–505.

[42]

Thuy Vu, Shahram Khadivi, Dinh Phung, and Gholamreza Haffari. 2022. Domain generalisation of NMT: Fusing adapters with leave-one-domain-out training. In Findings of the Association for Computational Linguistics: ACL 2022. Association for Computational Linguistics, 582–588.

[43]

Thuy-Trang Vu, Shahram Khadivi, Dinh Phung, and Gholamreza Haffari. 2022. Domain generalisation of NMT: Fusing adapters with leave-one-domain-out training. In Findings of the Association for Computational Linguistics: ACL 2022. Association for Computational Linguistics, 582–588. DOI:

[44]

Qian Wang, Chen Wang, and Jiajun Zhang. 2022. Investigating parameter sharing in multilingual speech translation. Proceedings of Interspeech 2022. 1731–1735.

[45]

Qian Wang and Jiajun Zhang. 2022. Parameter differentiation based multilingual neural machine translation. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 36. 11440–11448.

[46]

Rui Wang, Andrew Finch, Masao Utiyama, and Eiichiro Sumita. 2017. Sentence embedding for neural machine translation domain adaptation. In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers). 560–566.

[47]

Rui Wang, Masao Utiyama, Lemao Liu, Kehai Chen, and Eiichiro Sumita. 2017. Instance weighting for neural machine translation domain adaptation. In Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing. 1482–1488.

[48]

Yong Wang, Longyue Wang, Shuming Shi, Victor O. K. Li, and Zhaopeng Tu. 2020. Go from the general to the particular: Multi-domain translation with domain transformation networks. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 34. 9233–9241.

[49]

Zirui Wang, Yulia Tsvetkov, Orhan Firat, and Yuan Cao. Gradient vaccine: Investigating and improving multi-task optimization in massively multilingual models. In Proceedings of the International Conference on Learning Representations.

[50]

Xiaolin Xing, Yu Hong, Minhan Xu, Jianmin Yao, and Guodong Zhou. 2022. Taking actions separately: A bidirectionally-adaptive transfer learning method for low-resource neural machine translation. In Proceedings of the 29th International Conference on Computational Linguistics. 4481–4491. https://aclanthology.org/2022.coling-1.395

[51]

Yilin Yang, Akiko Eriguchi, Alexandre Muzio, Prasad Tadepalli, Stefan Lee, and Hany Hassan. 2021. Improving multilingual translation by representation and gradient regularization. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing. 7266–7279.

[52]

Yonghui Wu, Mike Schuster, Zhifeng Chen, Quoc V. Le, Mohammad Norouzi, Wolfgang Macherey, Maxim Krikun, Yuan Cao, Qin Gao, Klaus Macherey, Jeff Klingner, Apurva Shah, Melvin Johnson, Xiobing Liu, Lukasz Kaiser, Stephan Gouws, Yoshikiyo Kato, Taku Kudo, Hideto Kazawa, Keith Stevens, George Kurian, Nishant Patil, Wei Wang, Cliff Young, Jason Smith, Jason Riesa, Alex Rudnick, Oriol Vinyals, Greg Corrado, Macduff Hughes, and Jeffrey Dean. 2016. Google’s neural machine translation system: Bridging the gap between human and machine translation. arXiv preprint arXiv:1609.08144 (2016).

[53]

Tianhe Yu, Saurabh Kumar, Abhishek Gupta, Sergey Levine, Karol Hausman, and Chelsea Finn. 2020. Gradient surgery for multi-task learning. Advances in Neural Information Processing Systems 33 (2020), 5824–5836.

[54]

Jiali Zeng, Yang Liu, Jinsong Su, Yubing Ge, Yaojie Lu, Yongjing Yin, and Jiebo Luo. 2019. Iterative dual domain adaptation for neural machine translation. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP ’19). 845–855.

[55]

Jiali Zeng, Jinsong Su, Huating Wen, Yang Liu, Jun Xie, Yongjing Yin, and Jianqiang Zhao. 2018. Multi-domain neural machine translation with word-level domain context discrimination. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing. 447–457.

[56]

Runzhe Zhan, Xuebo Liu, Derek F Wong, and Lidia S. Chao. 2021. Meta-curriculum learning for domain adaptation in neural machine translation. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 35. 14310–14318.

[57]

Shiqi Zhang, Yan Liu, Deyi Xiong, Pei Zhang, and Boxing Chen. 2021. Domain-aware self-attention for multi-domain neural machine translation. In Proceedings of Interspeech 2021. 2047–2051.

[58]

Xuan Zhang, Pamela Shapiro, Gaurav Kumar, Paul McNamee, Marine Carpuat, and Kevin Duh. 2019. Curriculum learning for domain adaptation in neural machine translation. arXiv preprint arXiv:1905.05816 (2019).

[59]

Minghao Zhu, Junli Wang, and Chungang Yan. 2022. Non-autoregressive neural machine translation with consistency regularization optimized variational framework. In Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 607–617. DOI:

[60]

Barret Zoph, Deniz Yuret, Jonathan May, and Kevin Knight. 2016. Transfer learning for low-resource neural machine translation. arXiv preprint arXiv:1604.02201 (2016).

Index Terms

An Ensemble Strategy with Gradient Conflict for Multi-Domain Neural Machine Translation
1. Computing methodologies
  1. Artificial intelligence
    1. Natural language processing
      1. Machine translation

Recommendations

Sentence Selection and Weighting for Neural Machine Translation Domain Adaptation

Neural machine translation NMT has been prominent in many machine translation tasks. However, in some domain-specific tasks, only the corpora from similar domains can improve translation performance. If out-of-domain corpora are directly added into the ...
Exploring iterative dual domain adaptation for neural machine translation
Abstract
Domain adaptation for neural machine translation (NMT) has always been a hot research topic in the community of machine translation. Generally, previous studies focus on the one-pass translation knowledge transfer from single source domain to a ...
Effective domain awareness and adaptation approach via mask substructure for multi-domain neural machine translation
Abstract
Multi-domain adaptation of neural machine translation (NMT) aims to learn a unified seq2seq framework based on multi-domain data. Domain corpus data mixing is one of the most important ways for multi-domain NMT, which has been widely explored in ...

Comments

Information & Contributors

Information

Published In

cover image ACM Transactions on Asian and Low-Resource Language Information Processing

ACM Transactions on Asian and Low-Resource Language Information Processing Volume 23, Issue 2

February 2024

340 pages

EISSN:2375-4702

DOI:10.1145/3613556

Editor:
Imed Zitouni
Google, USA

Issue’s Table of Contents

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 08 February 2024

Online AM: 21 December 2023

Accepted: 14 December 2023

Revised: 10 December 2023

Received: 19 April 2023

Published in TALLIP Volume 23, Issue 2

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Funding Sources

National Nature Science Foundation of China

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

0
Total Citations
208
Total Downloads

Downloads (Last 12 months)124
Downloads (Last 6 weeks)6

Reflects downloads up to 02 Mar 2025

Other Metrics

View Author Metrics

Citations

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Full Text

View this article in Full Text.

Figures

Tables

Media

View full text|Download PDF

View Issue’s Table of Contents