Abstract
It is hard to evaluate translations objectively and accurately, which limits the applications of machine translation. In this article, we assume that the above phenomenon is caused by noise interference during translation evaluation, and we handle the problem through a perspective of causal inference. We assume that the observable translation score is affected by the unobservable true translation quality and some noise simultaneously. If there is a variable that is related to the noise and independent to the true translation quality, the related noise can be eliminated by removing the effect of that variable from the observed score. Based on the above causality hypothesis, this article studies the length bias problem of beam search for neural machine translation (NMT) and the input related noise problem of translation quality estimation (QE). For the NMT length bias problem, we conduct the experiments on four typical NMT tasks (Uyghur–Chinese, Chinese–English, English–German, and English–French) with different scales of datasets. Comparing with previous approaches, the proposed causal motivated method is model-agnostic and does not require supervised training. For QE tasks, we conduct the experiments on the WMT’20 submissions. Experimental results show that the denoised QE results gain better Pearson’s correlation scores with human assessed scores compared to the original submissions. Further analyses on the NMT and QE tasks also demonstrate the rationality of the empirical assumptions made on our methods.
- [1] . 2004. Partial correlation and conditional correlation as measures of conditional independence. Australian & New Zealand Journal of Statistics 46, 4 (2004), 657–664.Google ScholarCross Ref
- [2] . 2015. Neural machine translation by jointly learning to align and translate. In Proceedings of the 3rd International Conference on Learning Representations.Google Scholar
- [3] . 2004. Confidence estimation for machine translation. In Proceedings of the 20th International Conference on Computational Linguistics. 315–321.Google ScholarDigital Library
- [4] . 2013. Audio chord recognition with recurrent neural networks. In Proceedings of the 14th International Society for Music Information Retrieval Conference. 335–340.Google Scholar
- [5] . 2010. LTP: A Chinese language technology platform. In Proceedings of the 23rd International Conference on Computational Linguistics: Demonstrations. 13–16.Google ScholarDigital Library
- [6] . 2014. On the properties of neural machine translation: Encoder–decoder approaches. In Proceedings of the 8th Workshop on Syntax, Semantics and Structure in Statistical Translation. Association for Computational Linguistics, 103–111.Google ScholarCross Ref
- [7] . 2020. Unsupervised cross-lingual representation learning at scale. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. Association for Computational Linguistics, 8440–8451.Google ScholarCross Ref
- [8] . 2019. BERT: Pre-training of deep bidirectional transformers for language understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers). Association for Computational Linguistics, 4171–4186.Google Scholar
- [9] . 2020. BERGAMOT-LATTE submissions for the WMT20 quality estimation shared task. In Proceedings of the 5th Conference on Machine Translation. Association for Computational Linguistics, 1010–1017.Google Scholar
- [10] . 2017. Convolutional sequence to sequence learning. In Proceedings of the 34th International Conference on Machine Learning. Vol. 70, PMLR, 1243–1252.Google Scholar
- [11] . 2017. On calibration of modern neural networks. In Proceedings of the 34th International Conference on Machine Learning. Vol. 70, PMLR, 1321–1330.Google Scholar
- [12] . 2019. The FLORES evaluation datasets for low-resource machine translation: Nepali–English and Sinhala–English. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing. Association for Computational Linguistics, 6098–6111.Google ScholarCross Ref
- [13] . 2016. Improved neural machine translation with SMT features. In Proceedings of the 30th AAAI Conference on Artificial Intelligence. AAAI Press, 151–157.Google ScholarDigital Library
- [14] . 2020. The NiuTrans system for the WMT20 quality estimation shared task. In Proceedings of the 5th Conference on Machine Translation. Association for Computational Linguistics, 1018–1023.Google Scholar
- [15] . 2017. When to finish? Optimal beam search for neural text generation (modulo beam size). In Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, 2134–2139.Google ScholarCross Ref
- [16] . 2015. Montreal neural machine translation systems for WMT’15. In Proceedings of the 10th Workshop on Statistical Machine Translation. Association for Computational Linguistics, 134–140.Google ScholarCross Ref
- [17] . 2020. NUBIA: NeUral based interchangeability assessor for text generation. In Proceedings of the 1st Workshop on Evaluating NLG Evaluation. Association for Computational Linguistics, Online (Dublin, Ireland), 28–37. https://aclanthology.org/2020.evalnlgeval-1.4.Google Scholar
- [18] . 2017. OpenNMT: Open-source toolkit for neural machine translation. In Proceedings of ACL 2017, System Demonstrations. Association for Computational Linguistics, 67–72.Google ScholarCross Ref
- [19] . 2004. Statistical significance tests for machine translation evaluation. In Proceedings of the 2004 Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, 388–395.Google Scholar
- [20] . 2010. Statistical Machine Translation. Cambridge University Press.Google ScholarDigital Library
- [21] . 2007. Moses: Open source toolkit for statistical machine translation. In Proceedings of the 45th Annual Meeting of the Association for Computational Linguistics Companion Volume Proceedings of the Demo and Poster Sessions. Association for Computational Linguistics, 177–180.Google ScholarCross Ref
- [22] . 2017. Six challenges for neural machine translation. In Proceedings of the 1st Workshop on Neural Machine Translation. Association for Computational Linguistics, 28–39.Google ScholarCross Ref
- [23] . 2018. Subword regularization: Improving neural network translation models with multiple subword candidates. In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). Association for Computational Linguistics, 66–75.Google ScholarCross Ref
- [24] . 2017. Simple and scalable predictive uncertainty estimation using deep ensembles. In Proceedings of the 31st International Conference on Neural Information Processing Systems. 6402–6413.Google Scholar
- [25] . 2016. Mutual information and diverse decoding improve neural machine translation. arxiv:1601.00372. Retrieved from https://arxiv.org/abs/1601.00372.Google Scholar
- [26] . 2006. Champollion: A robust parallel text sentence aligner. In Proceedings of the 5th International Conference on Language Resources and Evaluation. European Language Resources Association (ELRA).Google Scholar
- [27] . 2020. If beam search is the answer, what was the question?. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, 2173–2185.Google ScholarCross Ref
- [28] . 2020. IST-unbabel participation in the WMT20 quality estimation shared task. In Proceedings of the 5th Conference on Machine Translation. Association for Computational Linguistics, 1029–1036.Google Scholar
- [29] . 2018. Correcting length bias in neural machine translation. In Proceedings of the 3rd Conference on Machine Translation: Research Papers. Association for Computational Linguistics, 212–223.Google ScholarCross Ref
- [30] . 2020. TMUOU submission for WMT20 quality estimation shared task. In Proceedings of the 5th Conference on Machine Translation. Association for Computational Linguistics, 1037–1041.Google Scholar
- [31] . 2017. Attentive convolutional neural network based speech emotion recognition: A study on the impact of input features, signal length, and acted speech. In Proceedings of the 18th Annual Conference of the International Speech Communication Association. ISCA, 1263–1267.Google ScholarCross Ref
- [32] . 2003. Minimum error rate training in statistical machine translation. In Proceedings of the 41st Annual Meeting of the Association for Computational Linguistics. Association for Computational Linguistics, 160–167.Google ScholarDigital Library
- [33] . 2002. Discriminative training and maximum entropy models for statistical machine translation. In Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics. Association for Computational Linguistics, 295–302.Google ScholarDigital Library
- [34] . 2018. Analyzing uncertainty in neural machine translation. In Proceedings of the 35th International Conference on Machine Learning. Vol. 80, PMLR, 3953–3962.Google Scholar
- [35] . 2019. fairseq: A fast, extensible toolkit for sequence modeling. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics (Demonstrations). Association for Computational Linguistics, 48–53.Google ScholarCross Ref
- [36] . 2002. Bleu: A method for automatic evaluation of machine translation. In Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics. Association for Computational Linguistics, 311–318.Google ScholarDigital Library
- [37] . 2011. Scikit-learn: Machine learning in Python. Journal of Machine Learning Research 12 (2011), 2825–2830.Google ScholarDigital Library
- [38] . 2020. TransQuest at WMT2020: Sentence-level direct assessment. In Proceedings of the 5th Conference on Machine Translation. Association for Computational Linguistics, 1049–1055.Google Scholar
- [39] . 2016. Modeling confounding by half-sibling regression. Proceedings of the National Academy of Sciences of the United States of America 113, 27 (2016), 7391–7398.Google ScholarCross Ref
- [40] . 2016. Neural machine translation of rare words with subword units. In Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). Association for Computational Linguistics, 1715–1725.Google ScholarCross Ref
- [41] . 2020. Findings of the WMT 2020 shared task on quality estimation. In Proceedings of the 5th Conference on Machine Translation. Association for Computational Linguistics, 743–764.Google Scholar
- [42] . 2009. Estimating the sentence-level quality of machine translation systems. In Proceedings of the 13th Annual Conference of the European Association for Machine Translation. European Association for Machine Translation.Google Scholar
- [43] . 2019. On NMT search errors and model errors: Cat got your tongue?. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing. Association for Computational Linguistics, 3356–3362.Google ScholarCross Ref
- [44] . 2014. Sequence to sequence learning with neural networks. In Proceedings of the 27th International Conference on Neural Information Processing Systems. 3104–3112.Google ScholarDigital Library
- [45] . 2017. Attention is all you need. In Proceedings of the 31st International Conference on Neural Information Processing Systems. 5998–6008.Google Scholar
- [46] . 2016. Google’s neural machine translation system: Bridging the gap between human and machine translation. arxiv:1609.08144. Retrieved from https://arxiv.org/abs/1609.08144.Google Scholar
- [47] . 2019. CCMT 2019 machine translation evaluation report. In Machine Translation Shujian Huang and Kevin Knight (Eds.). Springer Singapore, Singapore, 105–128. https://doi.org/10.1007/978-981-15-1721-1_11Google Scholar
- [48] . 2018. Breaking the beam search curse: A study of (re-)scoring methods and stopping criteria for neural machine translation. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, 3054–3059.Google ScholarCross Ref
- [49] . 2020. Zero-shot translation quality estimation with explicit cross-lingual patterns. In Proceedings of the 5th Conference on Machine Translation. Association for Computational Linguistics, 1068–1074.Google Scholar
Index Terms
- Approximating to the Real Translation Quality for Neural Machine Translation via Causal Motivated Methods
Recommendations
Using Translation Memory to Improve Neural Machine Translations
ICDLT '22: Proceedings of the 2022 6th International Conference on Deep Learning TechnologiesIn this paper, we describe a way of using translation memory (TM) to improve the translation quality and stability of neural machine translation (NMT) systems, especially when the sentences to be translated have high similarity with sentences stored in ...
Preventing translation quality deterioration caused by beam search decoding in neural machine translation using statistical machine translation
Graphical abstractDisplay Omitted
AbstractDecoding is an important part of machine translation systems, and the most popular inference algorithm used here is beam search. Beam search algorithm improves translation by allowing a larger search space to be traversed than greedy ...
Analysing terminology translation errors in statistical and neural machine translation
AbstractTerminology translation plays a critical role in domain-specific machine translation (MT). Phrase-based statistical MT (PB-SMT) has been the dominant approach to MT for the past 30 years, both in academia and industry. Neural MT (NMT), an end-to-...
Comments