Removing Input Confounder for Translation Quality Estimation via a Causal Motivated Method

Shi, Xuewen; Huang, Heyan; Jian, Ping; Tang, Yi-Kun

doi:10.1007/978-3-030-85896-4_28

Xuewen Shi^12,13,
Heyan Huang^12,13,
Ping Jian^12,13 &
…
Yi-Kun Tang^12,13

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 12858))

Included in the following conference series:

Asia-Pacific Web (APWeb) and Web-Age Information Management (WAIM) Joint International Conference on Web and Big Data

1364 Accesses

Abstract

Most state-of-the-art QE systems built upon neural networks have achieved promising performances on benchmark datasets. However, the performance of these methods can be easily influenced by the inherent features of the model input, such as the length of input sequence or the number of unseen tokens. In this paper, we introduce a causal inference based method to eliminate the negative impact caused by the characters of the input for a QE system. Specifically, we propose an iterative denoising framework for multiple confounding features. The confounder elimination operation at each iteration step is implemented by a Half-Sibling Regression based method. We conduct our experiments on the official datasets and submissions from WMT 2020 Quality Estimation Shared Task of Sentence-Level Direct Assessment. Experimental results show that the denoised QE results gain better Pearson’s correlation scores with human assessments compared to the original submissions.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 59.99; Price excludes VAT (USA)

Softcover Book: USD 79.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
http://www.statmt.org/wmt20/quality-estimation-task.html.

References

Barrault, L., et al.: Findings of the 2020 conference on machine translation (WMT20). In: Proceedings of the Fifth Conference on Machine Translation, pp. 1–55. Association for Computational Linguistics, Online (November 2020)
Google Scholar
Koehn, P., Knowles, R.: Six challenges for neural machine translation. In: Proceedings of the First Workshop on Neural Machine Translation, pp. 28–39. Association for Computational Linguistics, Vancouver (August 2017)
Google Scholar
Ott, M., Auli, M., Grangier, D., Ranzato, M.: Analyzing uncertainty in neural machine translation. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018, Stockholmsmässan, Stockholm, Sweden, July 10–15, 2018. Proceedings of Machine Learning Research, vol. 80, pp. 3953–3962. PMLR (2018)
Google Scholar
Schölkopf, B., et al.: Modeling confounding by half-sibling regression. Proc. Natl. Acad. Sci. USA 113(27), 7391–7398 (2016)
Article Google Scholar
Specia, L., Blain, F., Fomicheva, M., Fonseca, E., Chaudhary, V., Guzmán, F., Martins, A.F.T.: In: Findings of the WMT 2020 shared task on quality estimation, pp. 743–764. Association for Computational Linguistics, Online (November 2020)
Google Scholar
Specia, L., Turchi, M., Cancedda, N., Cristianini, N., Dymetman, M.: Estimating the sentence-level quality of machine translation systems. In: Proceedings of the 13th Annual conference of the European Association for Machine Translation. European Association for Machine Translation, Barcelona, Spain (May 14–15 2009)
Google Scholar

Download references

Acknowledgments

This work is supported by the National Key Research and Development Program of China (Grant No. 2017YFB1002103) and the National Natural Science Foundation of China (No. 61732005).

Author information

Authors and Affiliations

School of Computer Science and Technology, Beijing Institute of Technology, Beijing, China
Xuewen Shi, Heyan Huang, Ping Jian & Yi-Kun Tang
Beijing Engineering Research Center of High Volume Language Information Processing and Cloud Computing Applications, Beijing, 100081, China
Xuewen Shi, Heyan Huang, Ping Jian & Yi-Kun Tang

Authors

Xuewen Shi
View author publications
You can also search for this author in PubMed Google Scholar
Heyan Huang
View author publications
You can also search for this author in PubMed Google Scholar
Ping Jian
View author publications
You can also search for this author in PubMed Google Scholar
Yi-Kun Tang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Ping Jian .

Editor information

Editors and Affiliations

University of Macau, Macau, China
Leong Hou U
University of Caen Normandie, Caen, France
Marc Spaniol
Osaka University, Osaka, Japan
Yasushi Sakurai
South China University of Technology, Guangzhou, China
Junying Chen

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Shi, X., Huang, H., Jian, P., Tang, YK. (2021). Removing Input Confounder for Translation Quality Estimation via a Causal Motivated Method. In: U, L.H., Spaniol, M., Sakurai, Y., Chen, J. (eds) Web and Big Data. APWeb-WAIM 2021. Lecture Notes in Computer Science(), vol 12858. Springer, Cham. https://doi.org/10.1007/978-3-030-85896-4_28

Download citation

DOI: https://doi.org/10.1007/978-3-030-85896-4_28
Published: 19 August 2021
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-85895-7
Online ISBN: 978-3-030-85896-4
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics