skip to main content
research-article

Approximating to the Real Translation Quality for Neural Machine Translation via Causal Motivated Methods

Authors Info & Claims
Published:09 May 2023Publication History
Skip Abstract Section

Abstract

It is hard to evaluate translations objectively and accurately, which limits the applications of machine translation. In this article, we assume that the above phenomenon is caused by noise interference during translation evaluation, and we handle the problem through a perspective of causal inference. We assume that the observable translation score is affected by the unobservable true translation quality and some noise simultaneously. If there is a variable that is related to the noise and independent to the true translation quality, the related noise can be eliminated by removing the effect of that variable from the observed score. Based on the above causality hypothesis, this article studies the length bias problem of beam search for neural machine translation (NMT) and the input related noise problem of translation quality estimation (QE). For the NMT length bias problem, we conduct the experiments on four typical NMT tasks (Uyghur–Chinese, Chinese–English, English–German, and English–French) with different scales of datasets. Comparing with previous approaches, the proposed causal motivated method is model-agnostic and does not require supervised training. For QE tasks, we conduct the experiments on the WMT’20 submissions. Experimental results show that the denoised QE results gain better Pearson’s correlation scores with human assessed scores compared to the original submissions. Further analyses on the NMT and QE tasks also demonstrate the rationality of the empirical assumptions made on our methods.

REFERENCES

  1. [1] Baba Kunihiro, Shibata Ritei, and Sibuya Masaaki. 2004. Partial correlation and conditional correlation as measures of conditional independence. Australian & New Zealand Journal of Statistics 46, 4 (2004), 657664.Google ScholarGoogle ScholarCross RefCross Ref
  2. [2] Bahdanau Dzmitry, Cho Kyunghyun, and Bengio Yoshua. 2015. Neural machine translation by jointly learning to align and translate. In Proceedings of the 3rd International Conference on Learning Representations.Google ScholarGoogle Scholar
  3. [3] Blatz John, Fitzgerald Erin, Foster George, Gandrabur Simona, Goutte Cyril, Kulesza Alex, Sanchis Alberto, and Ueffing Nicola. 2004. Confidence estimation for machine translation. In Proceedings of the 20th International Conference on Computational Linguistics. 315321.Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. [4] Boulanger-Lewandowski Nicolas, Bengio Yoshua, and Vincent Pascal. 2013. Audio chord recognition with recurrent neural networks. In Proceedings of the 14th International Society for Music Information Retrieval Conference. 335340.Google ScholarGoogle Scholar
  5. [5] Che Wanxiang, Li Zhenghua, and Liu Ting. 2010. LTP: A Chinese language technology platform. In Proceedings of the 23rd International Conference on Computational Linguistics: Demonstrations. 1316.Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. [6] Cho Kyunghyun, Merriënboer Bart van, Bahdanau Dzmitry, and Bengio Yoshua. 2014. On the properties of neural machine translation: Encoder–decoder approaches. In Proceedings of the 8th Workshop on Syntax, Semantics and Structure in Statistical Translation. Association for Computational Linguistics, 103111.Google ScholarGoogle ScholarCross RefCross Ref
  7. [7] Conneau Alexis, Khandelwal Kartikay, Goyal Naman, Chaudhary Vishrav, Wenzek Guillaume, Guzmán Francisco, Grave Edouard, Ott Myle, Zettlemoyer Luke, and Stoyanov Veselin. 2020. Unsupervised cross-lingual representation learning at scale. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. Association for Computational Linguistics, 84408451.Google ScholarGoogle ScholarCross RefCross Ref
  8. [8] Devlin Jacob, Chang Ming-Wei, Lee Kenton, and Toutanova Kristina. 2019. BERT: Pre-training of deep bidirectional transformers for language understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers). Association for Computational Linguistics, 41714186.Google ScholarGoogle Scholar
  9. [9] Fomicheva Marina, Sun Shuo, Yankovskaya Lisa, Blain Frédéric, Chaudhary Vishrav, Fishel Mark, Guzmán Francisco, and Specia Lucia. 2020. BERGAMOT-LATTE submissions for the WMT20 quality estimation shared task. In Proceedings of the 5th Conference on Machine Translation. Association for Computational Linguistics, 10101017.Google ScholarGoogle Scholar
  10. [10] Gehring Jonas, Auli Michael, Grangier David, Yarats Denis, and Dauphin Yann N.. 2017. Convolutional sequence to sequence learning. In Proceedings of the 34th International Conference on Machine Learning. Vol. 70, PMLR, 12431252.Google ScholarGoogle Scholar
  11. [11] Guo Chuan, Pleiss Geoff, Sun Yu, and Weinberger Kilian Q.. 2017. On calibration of modern neural networks. In Proceedings of the 34th International Conference on Machine Learning. Vol. 70, PMLR, 13211330.Google ScholarGoogle Scholar
  12. [12] Guzmán Francisco, Chen Peng-Jen, Ott Myle, Pino Juan, Lample Guillaume, Koehn Philipp, Chaudhary Vishrav, and Ranzato Marc’Aurelio. 2019. The FLORES evaluation datasets for low-resource machine translation: Nepali–English and Sinhala–English. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing. Association for Computational Linguistics, 60986111.Google ScholarGoogle ScholarCross RefCross Ref
  13. [13] He Wei, He Zhongjun, Wu Hua, and Wang Haifeng. 2016. Improved neural machine translation with SMT features. In Proceedings of the 30th AAAI Conference on Artificial Intelligence. AAAI Press, 151157.Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. [14] Hu Chi, Liu Hui, Feng Kai, Xu Chen, Xu Nuo, Zhou Zefan, Yan Shiqin, Luo Yingfeng, Wang Chenglong, Meng Xia, Xiao Tong, and Zhu Jingbo. 2020. The NiuTrans system for the WMT20 quality estimation shared task. In Proceedings of the 5th Conference on Machine Translation. Association for Computational Linguistics, 10181023.Google ScholarGoogle Scholar
  15. [15] Huang Liang, Zhao Kai, and Ma Mingbo. 2017. When to finish? Optimal beam search for neural text generation (modulo beam size). In Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, 21342139.Google ScholarGoogle ScholarCross RefCross Ref
  16. [16] Jean Sébastien, Firat Orhan, Cho Kyunghyun, Memisevic Roland, and Bengio Yoshua. 2015. Montreal neural machine translation systems for WMT’15. In Proceedings of the 10th Workshop on Statistical Machine Translation. Association for Computational Linguistics, 134140.Google ScholarGoogle ScholarCross RefCross Ref
  17. [17] Kané Hassan, Kocyigit Muhammed Yusuf, Abdalla Ali, Ajanoh Pelkins, and Coulibali Mohamed. 2020. NUBIA: NeUral based interchangeability assessor for text generation. In Proceedings of the 1st Workshop on Evaluating NLG Evaluation. Association for Computational Linguistics, Online (Dublin, Ireland), 28–37. https://aclanthology.org/2020.evalnlgeval-1.4.Google ScholarGoogle Scholar
  18. [18] Klein Guillaume, Kim Yoon, Deng Yuntian, Senellart Jean, and Rush Alexander. 2017. OpenNMT: Open-source toolkit for neural machine translation. In Proceedings of ACL 2017, System Demonstrations. Association for Computational Linguistics, 6772.Google ScholarGoogle ScholarCross RefCross Ref
  19. [19] Koehn Philipp. 2004. Statistical significance tests for machine translation evaluation. In Proceedings of the 2004 Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, 388395.Google ScholarGoogle Scholar
  20. [20] Koehn Philipp. 2010. Statistical Machine Translation. Cambridge University Press.Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. [21] Koehn Philipp, Hoang Hieu, Birch Alexandra, Callison-Burch Chris, Federico Marcello, Bertoldi Nicola, Cowan Brooke, Shen Wade, Moran Christine, Zens Richard, Dyer Chris, Bojar Ondřej, Constantin Alexandra, and Herbst Evan. 2007. Moses: Open source toolkit for statistical machine translation. In Proceedings of the 45th Annual Meeting of the Association for Computational Linguistics Companion Volume Proceedings of the Demo and Poster Sessions. Association for Computational Linguistics, 177180.Google ScholarGoogle ScholarCross RefCross Ref
  22. [22] Koehn Philipp and Knowles Rebecca. 2017. Six challenges for neural machine translation. In Proceedings of the 1st Workshop on Neural Machine Translation. Association for Computational Linguistics, 2839.Google ScholarGoogle ScholarCross RefCross Ref
  23. [23] Kudo Taku. 2018. Subword regularization: Improving neural network translation models with multiple subword candidates. In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). Association for Computational Linguistics, 6675.Google ScholarGoogle ScholarCross RefCross Ref
  24. [24] Lakshminarayanan Balaji, Pritzel Alexander, and Blundell Charles. 2017. Simple and scalable predictive uncertainty estimation using deep ensembles. In Proceedings of the 31st International Conference on Neural Information Processing Systems. 64026413.Google ScholarGoogle Scholar
  25. [25] Li Jiwei and Jurafsky Dan. 2016. Mutual information and diverse decoding improve neural machine translation. arxiv:1601.00372. Retrieved from https://arxiv.org/abs/1601.00372.Google ScholarGoogle Scholar
  26. [26] Ma Xiaoyi. 2006. Champollion: A robust parallel text sentence aligner. In Proceedings of the 5th International Conference on Language Resources and Evaluation. European Language Resources Association (ELRA).Google ScholarGoogle Scholar
  27. [27] Meister Clara, Cotterell Ryan, and Vieira Tim. 2020. If beam search is the answer, what was the question?. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, 21732185.Google ScholarGoogle ScholarCross RefCross Ref
  28. [28] Moura João, Vera Miguel, Stigt Daan van, Kepler Fabio, and Martins André F. T.. 2020. IST-unbabel participation in the WMT20 quality estimation shared task. In Proceedings of the 5th Conference on Machine Translation. Association for Computational Linguistics, 10291036.Google ScholarGoogle Scholar
  29. [29] Murray Kenton and Chiang David. 2018. Correcting length bias in neural machine translation. In Proceedings of the 3rd Conference on Machine Translation: Research Papers. Association for Computational Linguistics, 212223.Google ScholarGoogle ScholarCross RefCross Ref
  30. [30] Nakamachi Akifumi, Shimanaka Hiroki, Kajiwara Tomoyuki, and Komachi Mamoru. 2020. TMUOU submission for WMT20 quality estimation shared task. In Proceedings of the 5th Conference on Machine Translation. Association for Computational Linguistics, 10371041.Google ScholarGoogle Scholar
  31. [31] Neumann Michael and Vu Ngoc Thang. 2017. Attentive convolutional neural network based speech emotion recognition: A study on the impact of input features, signal length, and acted speech. In Proceedings of the 18th Annual Conference of the International Speech Communication Association. ISCA, 12631267.Google ScholarGoogle ScholarCross RefCross Ref
  32. [32] Och Franz Josef. 2003. Minimum error rate training in statistical machine translation. In Proceedings of the 41st Annual Meeting of the Association for Computational Linguistics. Association for Computational Linguistics, 160167.Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. [33] Och Franz Josef and Ney Hermann. 2002. Discriminative training and maximum entropy models for statistical machine translation. In Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics. Association for Computational Linguistics, 295302.Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. [34] Ott Myle, Auli Michael, Grangier David, and Ranzato Marc’Aurelio. 2018. Analyzing uncertainty in neural machine translation. In Proceedings of the 35th International Conference on Machine Learning. Vol. 80, PMLR, 39533962.Google ScholarGoogle Scholar
  35. [35] Ott Myle, Edunov Sergey, Baevski Alexei, Fan Angela, Gross Sam, Ng Nathan, Grangier David, and Auli Michael. 2019. fairseq: A fast, extensible toolkit for sequence modeling. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics (Demonstrations). Association for Computational Linguistics, 4853.Google ScholarGoogle ScholarCross RefCross Ref
  36. [36] Papineni Kishore, Roukos Salim, Ward Todd, and Zhu Wei-Jing. 2002. Bleu: A method for automatic evaluation of machine translation. In Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics. Association for Computational Linguistics, 311318.Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. [37] Pedregosa F., Varoquaux G., Gramfort A., Michel V., Thirion B., Grisel O., Blondel M., Prettenhofer P., Weiss R., Dubourg V., Vanderplas J., Passos A., Cournapeau D., Brucher M., Perrot M., and Duchesnay E.. 2011. Scikit-learn: Machine learning in Python. Journal of Machine Learning Research 12 (2011), 28252830.Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. [38] Ranasinghe Tharindu, Orasan Constantin, and Mitkov Ruslan. 2020. TransQuest at WMT2020: Sentence-level direct assessment. In Proceedings of the 5th Conference on Machine Translation. Association for Computational Linguistics, 10491055.Google ScholarGoogle Scholar
  39. [39] Schölkopf Bernhard, Hogg David W., Wang Dun, Foreman-Mackey Daniel, Janzing Dominik, Simon-Gabriel Carl-Johann, and Peters Jonas. 2016. Modeling confounding by half-sibling regression. Proceedings of the National Academy of Sciences of the United States of America 113, 27 (2016), 73917398.Google ScholarGoogle ScholarCross RefCross Ref
  40. [40] Sennrich Rico, Haddow Barry, and Birch Alexandra. 2016. Neural machine translation of rare words with subword units. In Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). Association for Computational Linguistics, 17151725.Google ScholarGoogle ScholarCross RefCross Ref
  41. [41] Specia Lucia, Blain Frédéric, Fomicheva Marina, Fonseca Erick, Chaudhary Vishrav, Guzmán Francisco, and Martins André F. T.. 2020. Findings of the WMT 2020 shared task on quality estimation. In Proceedings of the 5th Conference on Machine Translation. Association for Computational Linguistics, 743764.Google ScholarGoogle Scholar
  42. [42] Specia Lucia, Turchi Marco, Cancedda Nicola, Cristianini Nello, and Dymetman Marc. 2009. Estimating the sentence-level quality of machine translation systems. In Proceedings of the 13th Annual Conference of the European Association for Machine Translation. European Association for Machine Translation.Google ScholarGoogle Scholar
  43. [43] Stahlberg Felix and Byrne Bill. 2019. On NMT search errors and model errors: Cat got your tongue?. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing. Association for Computational Linguistics, 33563362.Google ScholarGoogle ScholarCross RefCross Ref
  44. [44] Sutskever Ilya, Vinyals Oriol, and Le Quoc V.. 2014. Sequence to sequence learning with neural networks. In Proceedings of the 27th International Conference on Neural Information Processing Systems. 31043112.Google ScholarGoogle ScholarDigital LibraryDigital Library
  45. [45] Vaswani Ashish, Shazeer Noam, Parmar Niki, Uszkoreit Jakob, Jones Llion, Gomez Aidan N., Kaiser Lukasz, and Polosukhin Illia. 2017. Attention is all you need. In Proceedings of the 31st International Conference on Neural Information Processing Systems. 59986008.Google ScholarGoogle Scholar
  46. [46] Wu Yonghui, Schuster Mike, Chen Zhifeng, Le Quoc V., Norouzi Mohammad, Macherey Wolfgang, Krikun Maxim, Cao Yuan, Gao Qin, Macherey Klaus, Klingner Jeff, Shah Apurva, Johnson Melvin, Liu Xiaobing, Kaiser Lukasz, Gouws Stephan, Kato Yoshikiyo, Kudo Taku, Kazawa Hideto, Stevens Keith, Kurian George, Patil Nishant, Wang Wei, Young Cliff, Smith Jason, Riesa Jason, Rudnick Alex, Vinyals Oriol, Corrado Greg, Hughes Macduff, and Dean Jeffrey. 2016. Google’s neural machine translation system: Bridging the gap between human and machine translation. arxiv:1609.08144. Retrieved from https://arxiv.org/abs/1609.08144.Google ScholarGoogle Scholar
  47. [47] Yang Muyun, Hu Xixin, Xiong Hao, Wang Jiayi, Jiaermuhamaiti Yiliyaer, He Zhongjun, Luo Weihua, and Huang Shujian. 2019. CCMT 2019 machine translation evaluation report. In Machine Translation Shujian Huang and Kevin Knight (Eds.). Springer Singapore, Singapore, 105128. https://doi.org/10.1007/978-981-15-1721-1_11Google ScholarGoogle Scholar
  48. [48] Yang Yilin, Huang Liang, and Ma Mingbo. 2018. Breaking the beam search curse: A study of (re-)scoring methods and stopping criteria for neural machine translation. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, 30543059.Google ScholarGoogle ScholarCross RefCross Ref
  49. [49] Zhou Lei, Ding Liang, and Takeda Koichi. 2020. Zero-shot translation quality estimation with explicit cross-lingual patterns. In Proceedings of the 5th Conference on Machine Translation. Association for Computational Linguistics, 10681074.Google ScholarGoogle Scholar

Index Terms

  1. Approximating to the Real Translation Quality for Neural Machine Translation via Causal Motivated Methods

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in

    Full Access

    • Published in

      cover image ACM Transactions on Asian and Low-Resource Language Information Processing
      ACM Transactions on Asian and Low-Resource Language Information Processing  Volume 22, Issue 5
      May 2023
      653 pages
      ISSN:2375-4699
      EISSN:2375-4702
      DOI:10.1145/3596451
      Issue’s Table of Contents

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 9 May 2023
      • Online AM: 13 February 2023
      • Accepted: 3 February 2023
      • Revised: 11 November 2022
      • Received: 7 April 2022
      Published in tallip Volume 22, Issue 5

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • research-article
    • Article Metrics

      • Downloads (Last 12 months)106
      • Downloads (Last 6 weeks)7

      Other Metrics

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Full Text

    View this article in Full Text.

    View Full Text