skip to main content
research-article

An Ensemble Strategy with Gradient Conflict for Multi-Domain Neural Machine Translation

Published: 08 February 2024 Publication History

Abstract

Multi-domain neural machine translation aims to construct a unified neural machine translation model to translate sentences across various domains. Nevertheless, previous studies have one limitation is the incapacity to acquire both domain-general and domain-specific representations concurrently. To this end, we propose an ensemble strategy with gradient conflict for multi-domain neural machine translation that automatically learns model parameters by identifying both domain-shared and domain-specific features. Specifically, our approach consists of (1) a parameter-sharing framework, where the parameters of all the layers are originally shared and equivalent to each domain, and (2) ensemble strategy, in which we design an Extra Ensemble strategy via a piecewise condition function to learn direction and distance-based gradient conflict. In addition, we give a detailed theoretical analysis of the gradient conflict to further validate the effectiveness of our approach. Experimental results on two multi-domain datasets show the superior performance of our proposed model compared to previous work.

References

[1]
Roee Aharoni and Yoav Goldberg. 2020. Unsupervised domain clusters in pretrained language models. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics (ACL ’20).
[2]
Dzmitry Bahdanau, Kyung Hyun Cho, and Yoshua Bengio. 2015. Neural machine translation by jointly learning to align and translate. In Proceedings of the 3rd International Conference on Learning Representations (ICLR ’15).
[3]
Shai Ben-David, John Blitzer, Koby Crammer, and Fernando Pereira. 2006. Analysis of representations for domain adaptation. Advances in Neural Information Processing Systems 19 (2006), 137–144.
[4]
Denny Britz, Quoc Le, and Reid Pryzant. 2017. Effective domain mixing for neural machine translation. In Proceedings of the 2nd Conference on Machine Translation. 118–126.
[5]
Boxing Chen, Colin Cherry, George Foster, and Samuel Larkin. 2017. Cost weighting for neural machine translation domain adaptation. In Proceedings of the 1st Workshop on Neural Machine Translation. 40–46.
[6]
Chenhui Chu, Raj Dabre, and Sadao Kurohashi. 2017. An empirical comparison of domain adaptation methods for neural machine translation. In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers). 385–391.
[7]
Chenhui Chu and Rui Wang. 2018. A survey of domain adaptation for neural machine translation. In Proceedings of the 27th International Conference on Computational Linguistics. 1304–1319.
[8]
Jonathan H. Clark, Alon Lavie, and Chris Dyer. 2012. One system, many domains: Open-domain statistical machine translation via feature augmentation. In Proceedings of the 10th Conference of the Association for Machine Translation in the Americas: Research Papers.
[9]
Patrick Fernandes, António Farinhas, Ricardo Rei, José De Souza, Perez Ogayo, Graham Neubig, and Andre Martins. 2022. Quality-aware decoding for neural machine translation. In Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 1396–1412. DOI:
[10]
Markus Freitag and Yaser Al-Onaizan. 2016. Fast domain adaptation for neural machine translation. arXiv preprint arXiv:1612.06897 (2016).
[11]
Yaroslav Ganin, Evgeniya Ustinova, Hana Ajakan, Pascal Germain, Hugo Larochelle, François Laviolette, Mario Marchand, and Victor Lempitsky. 2016. Domain-adversarial training of neural networks. Journal of Machine Learning Research 17, 1 (2016), 2096–2030.
[12]
Shuhao Gu, Yang Feng, and Qun Liu. 2019. Improving domain adaptation translation with domain invariant and specific information. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long and Short Papers). 3081–3091.
[13]
Shuhao Gu, Yang Feng, and Wanying Xie. 2021. Pruning-then-expanding model for domain adaptation of neural machine translation. arXiv preprint arXiv:2103.13678 (2021).
[14]
Eva Hasler, Tobias Domhan, Jonay Trénous, Ke M. Tran, Bill Byrne, and Felix Hieber. 2021. Improving the quality trade-off for neural machine translation multi-domain adaptation. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing. 8470–8477.
[15]
Amr Hendy, Mohamed Abdelghaffar, Mohamed Afify, and Ahmed Y. Tawfik. 2022. Domain specific sub-network for multi-domain neural machine translation. In Proceedings of the 2nd Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics and the 12th International Joint Conference on Natural Language Processing. 351–356.
[16]
Matthias Huck, Alexandra Birch, and Barry Haddow. 2015. Mixed-domain vs. multi-domain statistical machine translation. Proceedings of MT Summit XV 1 (2015), 240–255.
[17]
Haoming Jiang, Chen Liang, Chong Wang, and Tuo Zhao. 2020. Multi-domain neural machine translation with word-level adaptive layer-wise domain mixing. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. 1823–1834.
[18]
Di Jin, Zhijing Jin, Joey Tianyi Zhou, and Peter Szolovits. 2020. A simple baseline to semi-supervised domain adaptation for machine translation. arXiv e-prints arXiv:2001.08140 (2020).
[19]
Catherine Kobus, Josep Crego, and Jean Senellart. 2016. Domain control for neural machine translation. arXiv preprint arXiv:1612.06140 (2016).
[20]
Philipp Koehn, Hieu Hoang, Alexandra Birch, Chris Callison-Burch, Marcello Federico, Nicola Bertoldi, Brooke Cowan, Wade Shen, Christine Moran, Richard Zens, Chris Dyer, Ondrej Bojar, Alexandra Constantin, and Evan Herbst. 2007. Open source toolkit for statistical machine translation: Factored translation models and confusion network decoding. In Proceedings of the ACL 2007 Demo and Poster Sessions.
[21]
Wen Lai, Jindřich Libovickỳ, and Alexander Fraser. 2021. Improving both domain robustness and domain adaptability in machine translation. arXiv preprint arXiv:2112.08288 (2021).
[22]
Jiyoung Lee, Hantae Kim, Hyunchang Cho, Edward Choi, and Cheonbok Park. 2022. Specializing multi-domain NMT via penalizing low mutual information. arXiv preprint arXiv:2210.12910 (2022).
[23]
Jianze Liang, Chengqi Zhao, Mingxuan Wang, Xipeng Qiu, and Lei Li. 2021. Finding sparse structures for domain specific neural machine translation. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 35. 13333–13342.
[24]
Bo Liu, Xingchao Liu, Xiaojie Jin, Peter Stone, and Qiang Liu. 2021. Conflict-averse gradient descent for multi-task learning. Advances in Neural Information Processing Systems 34 (2021), 18878–18890.
[25]
Minh-Thang Luong and Christopher D. Manning. 2015. Stanford neural machine translation systems for spoken language domains. In Proceedings of the International Workshop on Spoken Language Translation.
[26]
Zhibo Man, Zengcheng Huang, Yujie Zhang, Yu Li, Yuanmeng Chen, Yufeng Chen, and Jinan Xu. 2023. WDSRL: Multi-domain neural machine translation with word-level domain-sensitive representation learning. IEEE/ACM Transactions on Audio, Speech, and Language Processing 32 (2023), 577–590.
[27]
Zhibo Man, Yujie Zhang, Yuanmeng Chen, Yufeng Chen, and Jinan Xu. 2023. Exploring domain-shared and domain-specific knowledge in multi-domain neural machine translation. In Proceedings of Machine Translation Summit XIX, Vol. 1: Research Track. 99–110.
[28]
Myle Ott, Sergey Edunov, Alexei Baevski, Angela Fan, Sam Gross, Nathan Ng, David Grangier, and Michael Auli. 2019. fairseq: A fast, extensible toolkit for sequence modeling. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics (Demonstrations). 48–53.
[29]
Matt Post. 2018. A call for clarity in reporting BLEU scores. In Proceedings of the 3rd Conference on Machine Translation: Research Papers. 186–191.
[30]
Devendra Sachan and Graham Neubig. 2018. Parameter sharing methods for multilingual self-attentional translation models. In Proceedings of the 3rd Conference on Machine Translation: Research Papers. 261–271.
[31]
Danielle Saunders. 2021. Domain adaptation and multi-domain adaptation for neural machine translation: A survey. arXiv preprint arXiv:2104.06951 (2021).
[32]
Rico Sennrich, Barry Haddow, and Alexandra Birch. 2016. Neural machine translation of rare words with subword units. In Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). 1715–1725.
[33]
Rico Sennrich, Holger Schwenk, and Walid Aransa. 2013. A multi-domain translation model framework for statistical machine translation. In Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). 832–840.
[34]
Christophe Servan, Josep Crego, and Jean Senellart. 2016. Domain specialization: A post-training domain adaptation for neural machine translation. arXiv preprint arXiv:1612.06141 (2016).
[35]
Jinsong Su, Jiali Zeng, Jun Xie, Huating Wen, Yongjing Yin, and Yang Liu. 2021. Exploring discriminative word-level domain contexts for multi-domain neural machine translation. IEEE Transactions on Pattern Analysis and Machine Intelligence 43, 5 (2021), 1530–1545. DOI:
[36]
Sander Tars and Mark Fishel. 2018. Multi-domain neural machine translation. arXiv preprint arXiv:1805.02282 (2018).
[37]
Liang Tian, Derek F. Wong, Lidia S. Chao, Paulo Quaresma, Francisco Oliveira, Yi Lu, Shuo Li, Yiming Wang, and Longyue Wang. 2014. UM-Corpus: A large English-Chinese parallel corpus for statistical machine translation. In Proceedings of the 9th International Conference on Language Resources and Evaluation (LREC ’14). 1837–1842.
[38]
Huihsin Tseng, Pi-Chuan Chang, Galen Andrew, Dan Jurafsky, and Christopher D. Manning. 2005. A conditional random field word segmenter for Sighan Bakeoff 2005. In Proceedings of the 4th SIGHAN Workshop on Chinese Language Processing.
[39]
Laurens Van der Maaten and Geoffrey Hinton. 2008. Visualizing data using t-SNE. Journal of Machine Learning Research 9, 11 (2008), 2579–2605.
[40]
Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Łukasz Kaiser, and Illia Polosukhin. 2017. Attention is all you need. In Proceedings of the 31st Conference on Neural Information Processing Systems (NIPS ’17). 5998–6008.
[41]
David Vilar. 2018. Learning hidden unit contribution for adapting neural machine translation models. In Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 2: Short Papers). 500–505.
[42]
Thuy Vu, Shahram Khadivi, Dinh Phung, and Gholamreza Haffari. 2022. Domain generalisation of NMT: Fusing adapters with leave-one-domain-out training. In Findings of the Association for Computational Linguistics: ACL 2022. Association for Computational Linguistics, 582–588.
[43]
Thuy-Trang Vu, Shahram Khadivi, Dinh Phung, and Gholamreza Haffari. 2022. Domain generalisation of NMT: Fusing adapters with leave-one-domain-out training. In Findings of the Association for Computational Linguistics: ACL 2022. Association for Computational Linguistics, 582–588. DOI:
[44]
Qian Wang, Chen Wang, and Jiajun Zhang. 2022. Investigating parameter sharing in multilingual speech translation. Proceedings of Interspeech 2022. 1731–1735.
[45]
Qian Wang and Jiajun Zhang. 2022. Parameter differentiation based multilingual neural machine translation. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 36. 11440–11448.
[46]
Rui Wang, Andrew Finch, Masao Utiyama, and Eiichiro Sumita. 2017. Sentence embedding for neural machine translation domain adaptation. In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers). 560–566.
[47]
Rui Wang, Masao Utiyama, Lemao Liu, Kehai Chen, and Eiichiro Sumita. 2017. Instance weighting for neural machine translation domain adaptation. In Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing. 1482–1488.
[48]
Yong Wang, Longyue Wang, Shuming Shi, Victor O. K. Li, and Zhaopeng Tu. 2020. Go from the general to the particular: Multi-domain translation with domain transformation networks. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 34. 9233–9241.
[49]
Zirui Wang, Yulia Tsvetkov, Orhan Firat, and Yuan Cao. Gradient vaccine: Investigating and improving multi-task optimization in massively multilingual models. In Proceedings of the International Conference on Learning Representations.
[50]
Xiaolin Xing, Yu Hong, Minhan Xu, Jianmin Yao, and Guodong Zhou. 2022. Taking actions separately: A bidirectionally-adaptive transfer learning method for low-resource neural machine translation. In Proceedings of the 29th International Conference on Computational Linguistics. 4481–4491. https://aclanthology.org/2022.coling-1.395
[51]
Yilin Yang, Akiko Eriguchi, Alexandre Muzio, Prasad Tadepalli, Stefan Lee, and Hany Hassan. 2021. Improving multilingual translation by representation and gradient regularization. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing. 7266–7279.
[52]
Yonghui Wu, Mike Schuster, Zhifeng Chen, Quoc V. Le, Mohammad Norouzi, Wolfgang Macherey, Maxim Krikun, Yuan Cao, Qin Gao, Klaus Macherey, Jeff Klingner, Apurva Shah, Melvin Johnson, Xiobing Liu, Lukasz Kaiser, Stephan Gouws, Yoshikiyo Kato, Taku Kudo, Hideto Kazawa, Keith Stevens, George Kurian, Nishant Patil, Wei Wang, Cliff Young, Jason Smith, Jason Riesa, Alex Rudnick, Oriol Vinyals, Greg Corrado, Macduff Hughes, and Jeffrey Dean. 2016. Google’s neural machine translation system: Bridging the gap between human and machine translation. arXiv preprint arXiv:1609.08144 (2016).
[53]
Tianhe Yu, Saurabh Kumar, Abhishek Gupta, Sergey Levine, Karol Hausman, and Chelsea Finn. 2020. Gradient surgery for multi-task learning. Advances in Neural Information Processing Systems 33 (2020), 5824–5836.
[54]
Jiali Zeng, Yang Liu, Jinsong Su, Yubing Ge, Yaojie Lu, Yongjing Yin, and Jiebo Luo. 2019. Iterative dual domain adaptation for neural machine translation. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP ’19). 845–855.
[55]
Jiali Zeng, Jinsong Su, Huating Wen, Yang Liu, Jun Xie, Yongjing Yin, and Jianqiang Zhao. 2018. Multi-domain neural machine translation with word-level domain context discrimination. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing. 447–457.
[56]
Runzhe Zhan, Xuebo Liu, Derek F Wong, and Lidia S. Chao. 2021. Meta-curriculum learning for domain adaptation in neural machine translation. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 35. 14310–14318.
[57]
Shiqi Zhang, Yan Liu, Deyi Xiong, Pei Zhang, and Boxing Chen. 2021. Domain-aware self-attention for multi-domain neural machine translation. In Proceedings of Interspeech 2021. 2047–2051.
[58]
Xuan Zhang, Pamela Shapiro, Gaurav Kumar, Paul McNamee, Marine Carpuat, and Kevin Duh. 2019. Curriculum learning for domain adaptation in neural machine translation. arXiv preprint arXiv:1905.05816 (2019).
[59]
Minghao Zhu, Junli Wang, and Chungang Yan. 2022. Non-autoregressive neural machine translation with consistency regularization optimized variational framework. In Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 607–617. DOI:
[60]
Barret Zoph, Deniz Yuret, Jonathan May, and Kevin Knight. 2016. Transfer learning for low-resource neural machine translation. arXiv preprint arXiv:1604.02201 (2016).

Index Terms

  1. An Ensemble Strategy with Gradient Conflict for Multi-Domain Neural Machine Translation

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Transactions on Asian and Low-Resource Language Information Processing
    ACM Transactions on Asian and Low-Resource Language Information Processing  Volume 23, Issue 2
    February 2024
    340 pages
    EISSN:2375-4702
    DOI:10.1145/3613556
    Issue’s Table of Contents

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 08 February 2024
    Online AM: 21 December 2023
    Accepted: 14 December 2023
    Revised: 10 December 2023
    Received: 19 April 2023
    Published in TALLIP Volume 23, Issue 2

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. Multi-domain neural machine translation
    2. domain-specific
    3. gradient conflict

    Qualifiers

    • Research-article

    Funding Sources

    • National Nature Science Foundation of China

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • 0
      Total Citations
    • 208
      Total Downloads
    • Downloads (Last 12 months)124
    • Downloads (Last 6 weeks)6
    Reflects downloads up to 02 Mar 2025

    Other Metrics

    Citations

    View Options

    Login options

    Full Access

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Full Text

    View this article in Full Text.

    Full Text

    Figures

    Tables

    Media

    Share

    Share

    Share this Publication link

    Share on social media