research-article

Approximating to the Real Translation Quality for Neural Machine Translation via Causal Motivated Methods

Authors:
Xuewen Shi

School of Computer Science and Technology, Beijing Institute of Technology, Beijing Engineering Research Center of High Volume Language Information Processing and Cloud Computing Applications, Haidian District, Beijing, China

School of Computer Science and Technology, Beijing Institute of Technology, Beijing Engineering Research Center of High Volume Language Information Processing and Cloud Computing Applications, Haidian District, Beijing, China

0000-0002-3930-0532
Search about this author

,
Heyan Huang

School of Computer Science and Technology, Beijing Institute of Technology, Beijing Engineering Research Center of High Volume Language Information Processing and Cloud Computing Applications, Haidian District, Beijing, China

School of Computer Science and Technology, Beijing Institute of Technology, Beijing Engineering Research Center of High Volume Language Information Processing and Cloud Computing Applications, Haidian District, Beijing, China

0000-0002-0320-7520
Search about this author

,
Ping Jian

School of Computer Science and Technology, Beijing Institute of Technology, Beijing Engineering Research Center of High Volume Language Information Processing and Cloud Computing Applications, Haidian District, Beijing, China

School of Computer Science and Technology, Beijing Institute of Technology, Beijing Engineering Research Center of High Volume Language Information Processing and Cloud Computing Applications, Haidian District, Beijing, China

0000-0001-7236-2922
Search about this author

,
Yi-Kun Tang

School of Computer Science and Technology, Beijing Institute of Technology, Beijing Engineering Research Center of High Volume Language Information Processing and Cloud Computing Applications, Haidian District, Beijing, China

School of Computer Science and Technology, Beijing Institute of Technology, Beijing Engineering Research Center of High Volume Language Information Processing and Cloud Computing Applications, Haidian District, Beijing, China

0000-0001-5419-4769
Search about this author

ACM Transactions on Asian and Low-Resource Language Information Processing Volume 22 Issue 5Article No.: 126pp 1–26https://doi.org/10.1145/3583684

Published:09 May 2023Publication History

ACM Transactions on Asian and Low-Resource Language Information Processing

Abstract

It is hard to evaluate translations objectively and accurately, which limits the applications of machine translation. In this article, we assume that the above phenomenon is caused by noise interference during translation evaluation, and we handle the problem through a perspective of causal inference. We assume that the observable translation score is affected by the unobservable true translation quality and some noise simultaneously. If there is a variable that is related to the noise and independent to the true translation quality, the related noise can be eliminated by removing the effect of that variable from the observed score. Based on the above causality hypothesis, this article studies the length bias problem of beam search for neural machine translation (NMT) and the input related noise problem of translation quality estimation (QE). For the NMT length bias problem, we conduct the experiments on four typical NMT tasks (Uyghur–Chinese, Chinese–English, English–German, and English–French) with different scales of datasets. Comparing with previous approaches, the proposed causal motivated method is model-agnostic and does not require supervised training. For QE tasks, we conduct the experiments on the WMT’20 submissions. Experimental results show that the denoised QE results gain better Pearson’s correlation scores with human assessed scores compared to the original submissions. Further analyses on the NMT and QE tasks also demonstrate the rationality of the empirical assumptions made on our methods.

REFERENCES

[1] Baba Kunihiro, Shibata Ritei, and Sibuya Masaaki. 2004. Partial correlation and conditional correlation as measures of conditional independence. Australian & New Zealand Journal of Statistics 46, 4 (2004), 657–664.Google ScholarCross Ref
[2] Bahdanau Dzmitry, Cho Kyunghyun, and Bengio Yoshua. 2015. Neural machine translation by jointly learning to align and translate. In Proceedings of the 3rd International Conference on Learning Representations.Google Scholar
[3] Blatz John, Fitzgerald Erin, Foster George, Gandrabur Simona, Goutte Cyril, Kulesza Alex, Sanchis Alberto, and Ueffing Nicola. 2004. Confidence estimation for machine translation. In Proceedings of the 20th International Conference on Computational Linguistics. 315–321.Google ScholarDigital Library
[4] Boulanger-Lewandowski Nicolas, Bengio Yoshua, and Vincent Pascal. 2013. Audio chord recognition with recurrent neural networks. In Proceedings of the 14th International Society for Music Information Retrieval Conference. 335–340.Google Scholar
[5] Che Wanxiang, Li Zhenghua, and Liu Ting. 2010. LTP: A Chinese language technology platform. In Proceedings of the 23rd International Conference on Computational Linguistics: Demonstrations. 13–16.Google ScholarDigital Library
[6] Cho Kyunghyun, Merriënboer Bart van, Bahdanau Dzmitry, and Bengio Yoshua. 2014. On the properties of neural machine translation: Encoder–decoder approaches. In Proceedings of the 8th Workshop on Syntax, Semantics and Structure in Statistical Translation. Association for Computational Linguistics, 103–111.Google ScholarCross Ref
[7] Conneau Alexis, Khandelwal Kartikay, Goyal Naman, Chaudhary Vishrav, Wenzek Guillaume, Guzmán Francisco, Grave Edouard, Ott Myle, Zettlemoyer Luke, and Stoyanov Veselin. 2020. Unsupervised cross-lingual representation learning at scale. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. Association for Computational Linguistics, 8440–8451.Google ScholarCross Ref
[8] Devlin Jacob, Chang Ming-Wei, Lee Kenton, and Toutanova Kristina. 2019. BERT: Pre-training of deep bidirectional transformers for language understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers). Association for Computational Linguistics, 4171–4186.Google Scholar
[9] Fomicheva Marina, Sun Shuo, Yankovskaya Lisa, Blain Frédéric, Chaudhary Vishrav, Fishel Mark, Guzmán Francisco, and Specia Lucia. 2020. BERGAMOT-LATTE submissions for the WMT20 quality estimation shared task. In Proceedings of the 5th Conference on Machine Translation. Association for Computational Linguistics, 1010–1017.Google Scholar
[10] Gehring Jonas, Auli Michael, Grangier David, Yarats Denis, and Dauphin Yann N.. 2017. Convolutional sequence to sequence learning. In Proceedings of the 34th International Conference on Machine Learning. Vol. 70, PMLR, 1243–1252.Google Scholar
[11] Guo Chuan, Pleiss Geoff, Sun Yu, and Weinberger Kilian Q.. 2017. On calibration of modern neural networks. In Proceedings of the 34th International Conference on Machine Learning. Vol. 70, PMLR, 1321–1330.Google Scholar
[12] Guzmán Francisco, Chen Peng-Jen, Ott Myle, Pino Juan, Lample Guillaume, Koehn Philipp, Chaudhary Vishrav, and Ranzato Marc’Aurelio. 2019. The FLORES evaluation datasets for low-resource machine translation: Nepali–English and Sinhala–English. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing. Association for Computational Linguistics, 6098–6111.Google ScholarCross Ref
[13] He Wei, He Zhongjun, Wu Hua, and Wang Haifeng. 2016. Improved neural machine translation with SMT features. In Proceedings of the 30th AAAI Conference on Artificial Intelligence. AAAI Press, 151–157.Google ScholarDigital Library
[14] Hu Chi, Liu Hui, Feng Kai, Xu Chen, Xu Nuo, Zhou Zefan, Yan Shiqin, Luo Yingfeng, Wang Chenglong, Meng Xia, Xiao Tong, and Zhu Jingbo. 2020. The NiuTrans system for the WMT20 quality estimation shared task. In Proceedings of the 5th Conference on Machine Translation. Association for Computational Linguistics, 1018–1023.Google Scholar
[15] Huang Liang, Zhao Kai, and Ma Mingbo. 2017. When to finish? Optimal beam search for neural text generation (modulo beam size). In Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, 2134–2139.Google ScholarCross Ref
[16] Jean Sébastien, Firat Orhan, Cho Kyunghyun, Memisevic Roland, and Bengio Yoshua. 2015. Montreal neural machine translation systems for WMT’15. In Proceedings of the 10th Workshop on Statistical Machine Translation. Association for Computational Linguistics, 134–140.Google ScholarCross Ref
[17] Kané Hassan, Kocyigit Muhammed Yusuf, Abdalla Ali, Ajanoh Pelkins, and Coulibali Mohamed. 2020. NUBIA: NeUral based interchangeability assessor for text generation. In Proceedings of the 1st Workshop on Evaluating NLG Evaluation. Association for Computational Linguistics, Online (Dublin, Ireland), 28–37. https://aclanthology.org/2020.evalnlgeval-1.4.Google Scholar
[18] Klein Guillaume, Kim Yoon, Deng Yuntian, Senellart Jean, and Rush Alexander. 2017. OpenNMT: Open-source toolkit for neural machine translation. In Proceedings of ACL 2017, System Demonstrations. Association for Computational Linguistics, 67–72.Google ScholarCross Ref
[19] Koehn Philipp. 2004. Statistical significance tests for machine translation evaluation. In Proceedings of the 2004 Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, 388–395.Google Scholar
[20] Koehn Philipp. 2010. Statistical Machine Translation. Cambridge University Press.Google ScholarDigital Library
[21] Koehn Philipp, Hoang Hieu, Birch Alexandra, Callison-Burch Chris, Federico Marcello, Bertoldi Nicola, Cowan Brooke, Shen Wade, Moran Christine, Zens Richard, Dyer Chris, Bojar Ondřej, Constantin Alexandra, and Herbst Evan. 2007. Moses: Open source toolkit for statistical machine translation. In Proceedings of the 45th Annual Meeting of the Association for Computational Linguistics Companion Volume Proceedings of the Demo and Poster Sessions. Association for Computational Linguistics, 177–180.Google ScholarCross Ref
[22] Koehn Philipp and Knowles Rebecca. 2017. Six challenges for neural machine translation. In Proceedings of the 1st Workshop on Neural Machine Translation. Association for Computational Linguistics, 28–39.Google ScholarCross Ref
[23] Kudo Taku. 2018. Subword regularization: Improving neural network translation models with multiple subword candidates. In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). Association for Computational Linguistics, 66–75.Google ScholarCross Ref
[24] Lakshminarayanan Balaji, Pritzel Alexander, and Blundell Charles. 2017. Simple and scalable predictive uncertainty estimation using deep ensembles. In Proceedings of the 31st International Conference on Neural Information Processing Systems. 6402–6413.Google Scholar
[25] Li Jiwei and Jurafsky Dan. 2016. Mutual information and diverse decoding improve neural machine translation. arxiv:1601.00372. Retrieved from https://arxiv.org/abs/1601.00372.Google Scholar
[26] Ma Xiaoyi. 2006. Champollion: A robust parallel text sentence aligner. In Proceedings of the 5th International Conference on Language Resources and Evaluation. European Language Resources Association (ELRA).Google Scholar
[27] Meister Clara, Cotterell Ryan, and Vieira Tim. 2020. If beam search is the answer, what was the question?. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, 2173–2185.Google ScholarCross Ref
[28] Moura João, Vera Miguel, Stigt Daan van, Kepler Fabio, and Martins André F. T.. 2020. IST-unbabel participation in the WMT20 quality estimation shared task. In Proceedings of the 5th Conference on Machine Translation. Association for Computational Linguistics, 1029–1036.Google Scholar
[29] Murray Kenton and Chiang David. 2018. Correcting length bias in neural machine translation. In Proceedings of the 3rd Conference on Machine Translation: Research Papers. Association for Computational Linguistics, 212–223.Google ScholarCross Ref
[30] Nakamachi Akifumi, Shimanaka Hiroki, Kajiwara Tomoyuki, and Komachi Mamoru. 2020. TMUOU submission for WMT20 quality estimation shared task. In Proceedings of the 5th Conference on Machine Translation. Association for Computational Linguistics, 1037–1041.Google Scholar
[31] Neumann Michael and Vu Ngoc Thang. 2017. Attentive convolutional neural network based speech emotion recognition: A study on the impact of input features, signal length, and acted speech. In Proceedings of the 18th Annual Conference of the International Speech Communication Association. ISCA, 1263–1267.Google ScholarCross Ref
[32] Och Franz Josef. 2003. Minimum error rate training in statistical machine translation. In Proceedings of the 41st Annual Meeting of the Association for Computational Linguistics. Association for Computational Linguistics, 160–167.Google ScholarDigital Library
[33] Och Franz Josef and Ney Hermann. 2002. Discriminative training and maximum entropy models for statistical machine translation. In Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics. Association for Computational Linguistics, 295–302.Google ScholarDigital Library
[34] Ott Myle, Auli Michael, Grangier David, and Ranzato Marc’Aurelio. 2018. Analyzing uncertainty in neural machine translation. In Proceedings of the 35th International Conference on Machine Learning. Vol. 80, PMLR, 3953–3962.Google Scholar
[35] Ott Myle, Edunov Sergey, Baevski Alexei, Fan Angela, Gross Sam, Ng Nathan, Grangier David, and Auli Michael. 2019. fairseq: A fast, extensible toolkit for sequence modeling. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics (Demonstrations). Association for Computational Linguistics, 48–53.Google ScholarCross Ref
[36] Papineni Kishore, Roukos Salim, Ward Todd, and Zhu Wei-Jing. 2002. Bleu: A method for automatic evaluation of machine translation. In Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics. Association for Computational Linguistics, 311–318.Google ScholarDigital Library
[37] Pedregosa F., Varoquaux G., Gramfort A., Michel V., Thirion B., Grisel O., Blondel M., Prettenhofer P., Weiss R., Dubourg V., Vanderplas J., Passos A., Cournapeau D., Brucher M., Perrot M., and Duchesnay E.. 2011. Scikit-learn: Machine learning in Python. Journal of Machine Learning Research 12 (2011), 2825–2830.Google ScholarDigital Library
[38] Ranasinghe Tharindu, Orasan Constantin, and Mitkov Ruslan. 2020. TransQuest at WMT2020: Sentence-level direct assessment. In Proceedings of the 5th Conference on Machine Translation. Association for Computational Linguistics, 1049–1055.Google Scholar
[39] Schölkopf Bernhard, Hogg David W., Wang Dun, Foreman-Mackey Daniel, Janzing Dominik, Simon-Gabriel Carl-Johann, and Peters Jonas. 2016. Modeling confounding by half-sibling regression. Proceedings of the National Academy of Sciences of the United States of America 113, 27 (2016), 7391–7398.Google ScholarCross Ref
[40] Sennrich Rico, Haddow Barry, and Birch Alexandra. 2016. Neural machine translation of rare words with subword units. In Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). Association for Computational Linguistics, 1715–1725.Google ScholarCross Ref
[41] Specia Lucia, Blain Frédéric, Fomicheva Marina, Fonseca Erick, Chaudhary Vishrav, Guzmán Francisco, and Martins André F. T.. 2020. Findings of the WMT 2020 shared task on quality estimation. In Proceedings of the 5th Conference on Machine Translation. Association for Computational Linguistics, 743–764.Google Scholar
[42] Specia Lucia, Turchi Marco, Cancedda Nicola, Cristianini Nello, and Dymetman Marc. 2009. Estimating the sentence-level quality of machine translation systems. In Proceedings of the 13th Annual Conference of the European Association for Machine Translation. European Association for Machine Translation.Google Scholar
[43] Stahlberg Felix and Byrne Bill. 2019. On NMT search errors and model errors: Cat got your tongue?. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing. Association for Computational Linguistics, 3356–3362.Google ScholarCross Ref
[44] Sutskever Ilya, Vinyals Oriol, and Le Quoc V.. 2014. Sequence to sequence learning with neural networks. In Proceedings of the 27th International Conference on Neural Information Processing Systems. 3104–3112.Google ScholarDigital Library
[45] Vaswani Ashish, Shazeer Noam, Parmar Niki, Uszkoreit Jakob, Jones Llion, Gomez Aidan N., Kaiser Lukasz, and Polosukhin Illia. 2017. Attention is all you need. In Proceedings of the 31st International Conference on Neural Information Processing Systems. 5998–6008.Google Scholar
[46] Wu Yonghui, Schuster Mike, Chen Zhifeng, Le Quoc V., Norouzi Mohammad, Macherey Wolfgang, Krikun Maxim, Cao Yuan, Gao Qin, Macherey Klaus, Klingner Jeff, Shah Apurva, Johnson Melvin, Liu Xiaobing, Kaiser Lukasz, Gouws Stephan, Kato Yoshikiyo, Kudo Taku, Kazawa Hideto, Stevens Keith, Kurian George, Patil Nishant, Wang Wei, Young Cliff, Smith Jason, Riesa Jason, Rudnick Alex, Vinyals Oriol, Corrado Greg, Hughes Macduff, and Dean Jeffrey. 2016. Google’s neural machine translation system: Bridging the gap between human and machine translation. arxiv:1609.08144. Retrieved from https://arxiv.org/abs/1609.08144.Google Scholar
[47] Yang Muyun, Hu Xixin, Xiong Hao, Wang Jiayi, Jiaermuhamaiti Yiliyaer, He Zhongjun, Luo Weihua, and Huang Shujian. 2019. CCMT 2019 machine translation evaluation report. In Machine Translation Shujian Huang and Kevin Knight (Eds.). Springer Singapore, Singapore, 105–128. https://doi.org/10.1007/978-981-15-1721-1_11Google Scholar
[48] Yang Yilin, Huang Liang, and Ma Mingbo. 2018. Breaking the beam search curse: A study of (re-)scoring methods and stopping criteria for neural machine translation. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, 3054–3059.Google ScholarCross Ref
[49] Zhou Lei, Ding Liang, and Takeda Koichi. 2020. Zero-shot translation quality estimation with explicit cross-lingual patterns. In Proceedings of the 5th Conference on Machine Translation. Association for Computational Linguistics, 1068–1074.Google Scholar

Index Terms

Approximating to the Real Translation Quality for Neural Machine Translation via Causal Motivated Methods
1. Computing methodologies
  1. Artificial intelligence
    1. Natural language processing
      1. Machine translation

Recommendations

Using Translation Memory to Improve Neural Machine Translations
ICDLT '22: Proceedings of the 2022 6th International Conference on Deep Learning Technologies

In this paper, we describe a way of using translation memory (TM) to improve the translation quality and stability of neural machine translation (NMT) systems, especially when the sentences to be translated have high similarity with sentences stored in ...
Read More
Preventing translation quality deterioration caused by beam search decoding in neural machine translation using statistical machine translation
Graphical abstract

Display Omitted

Abstract
Decoding is an important part of machine translation systems, and the most popular inference algorithm used here is beam search. Beam search algorithm improves translation by allowing a larger search space to be traversed than greedy ...
Read More
Analysing terminology translation errors in statistical and neural machine translation
Abstract
Terminology translation plays a critical role in domain-specific machine translation (MT). Phrase-based statistical MT (PB-SMT) has been the dominant approach to MT for the past 30 years, both in academia and industry. Neural MT (NMT), an end-to-...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Published in
ACM Transactions on Asian and Low-Resource Language Information Processing Volume 22, Issue 5
May 2023
653 pages
ISSN:2375-4699
EISSN:2375-4702
DOI:10.1145/3596451
Editor:
Imed Zitouni
Google, USA
Issue’s Table of Contents
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 9 May 2023
- Online AM: 13 February 2023
- Accepted: 3 February 2023
- Revised: 11 November 2022
- Received: 7 April 2022
Published in tallip Volume 22, Issue 5

Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
Neural machine translation
causal inference
half-sibling regression
quality estimation
Qualifiers
- research-article
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 0
  Total Citations
  View Citations
- 192
  Total Downloads
- Downloads (Last 12 months)106
- Downloads (Last 6 weeks)7
Other Metrics
View Author Metrics
Cited By
This publication has not been cited yet

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Full Text

View this article in Full Text.

View Full Text

Approximating to the Real Translation Quality for Neural Machine Translation via Causal Motivated Methods

ACM Transactions on Asian and Low-Resource Language Information Processing

Abstract

REFERENCES

Cited By

Index Terms

Recommendations

Using Translation Memory to Improve Neural Machine Translations

Preventing translation quality deterioration caused by beam search decoding in neural machine translation using statistical machine translation

Analysing terminology translation errors in statistical and neural machine translation