An Improved Multi-task Approach to Pre-trained Model Based MT Quality Estimation

Yuan, Binhuan; Li, Yueyang; Chen, Kehai; Lu, Hao; Yang, Muyun; Cao, Hailong

doi:10.1007/978-981-19-7960-6_11

Binhuan Yuan⁷,
Yueyang Li⁷,
Kehai Chen⁷,
Hao Lu⁷,
Muyun Yang⁷ &
…
Hailong Cao⁷

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 1671))

Included in the following conference series:

China Conference on Machine Translation

225 Accesses

Abstract

Machine translation (MT) quality estimation (QE) aims to automatically predict the quality of MT outputs without any references. State-of-the-art solutions are mostly fine-tuned with a pre-trained model in a multi-task framework (i.e., joint training sentence-level QE and word-level QE). In this paper, we propose an alternative multi-task framework in which post-editing results are utilized for sentence-level QE over an mBART-based encoder-decoder model. We show that the post-editing sub-task is much more in-formative and the mBART is superior to other pre-trained models. Experiments on WMT2021 English-German and English-Chinese QE datasets showed that the proposed method achieves 1.2%–2.1% improvements in the strong sentence-level QE baseline.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 54.99; Price excludes VAT (USA)

Softcover Book: USD 69.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Devlin, J., Chang, M.W., Lee, K., Toutanova, K.: Bert: pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805 (2018)
Conneau, A., et al.: Unsupervised cross-lingual representation learning at scale. arXiv pre-print arXiv:1911.02116 (2019)
Specia, L., Farzindar, A.: Estimating machine translation post-editing effort with HTER: In: Proceedings of the Second Joint EM+/CNGL Workshop: Bringing MT to the User: Research on Integrating MT in the Translation Industry, pp. 33–43 (2010)
Google Scholar
Liu, Y., Gu, J., Goyal, N., et al.: Multilingual denoising pre-training for neural machine translation. Trans. Assoc. Comput. Linguist. 8, 726–742 (2020)
Article Google Scholar
Tang, Y., Tran, C., Li, X., et al.: Multilingual translation with extensible multilingual pre-training and finetuning. arXiv preprint arXiv:2008.00401 (2020)
Kreutzer, J., Schamoni, S., Riezler, S.: QUality estimation from scratch (QUETCH): deep learning for word-level translation quality estimation. In: Proceedings of the Tenth Workshop on Statistical Machine Translation, pp. 316–322 (2015)
Google Scholar
Kim, H., Lee, J.H.: A recurrent neural network approach for estimating the quality of ma-chine translation output. In: Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Techologies, StroudsBurg, PA, pp. 494–498. ACL (2016)
Google Scholar
Fan, K., Wang, J., Li, B., et al.: “Bilingual Expert” can find translation errors. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 33, no. 01, pp. 6367–6374 (2019)
Google Scholar
Shah, K., Cohn, T., Specia, L.: A Bayesian non-linear method for feature selection in ma-chine translation quality estimation. Mach. Transl. 29(2), 101–125 (2015)
Article Google Scholar
Moura, J., Vera, M., van Stigt, D., et al.: IST-Unbabel participation in the WMT20 quality estimation shared task.: In: Proceedings of the Fifth Conference on Machine Translation, pp. 1029–1036 (2020)
Google Scholar
Zerva, C., van Stigt, D., Rei, R., et al.: IST-Unbabel 2021 submission for the quality estimation shared task. In: Proceedings of the Sixth Conference on Machine Translation, pp. 961–972 (2021)
Google Scholar
González-Rubio, J., Navarro-Cerdán, J.R., Casacuberta, F.: Dimensionality reduction methods for machine translation quality estimation. Mach. Transl. 27(3–4), 281–301 (2013)
Article Google Scholar
Kepler, F., Trénous, J., Treviso, M., et al.: Unbabel’s participation in the WMT19 translation quality estimation shared task. arXiv preprint arXiv:1907.10352 (2019)
Ranasinghe, T., Orasan, C., Mitkov, R.: TransQuest: translation quality estimation with cross-lingual transformers. arXiv preprint arXiv:2011.01536 (2020)
Martins, A.F.T., Astudillo, R., Hokamp, C., et al.: Unbabel’s participation in the WMT16 word-level translation quality estimation shared task. In: Proceedings of the First Conference on Machine Translation: Volume 2, Shared Task Papers, pp. 806–811 (2016)
Google Scholar
Mikolov, T., Chen, K., Corrado, G.S., et al.: Efficient estimation of word representations in vector space. Comput. Sci. (2013)
Google Scholar

Download references

Acknowledgement

This work is partially funded by the National Key Research and Development Pro-gram of China (No. 2020AAA0108000), and by the Key Project of National Natural Science Foundation China (No. U1908216).

Author information

Authors and Affiliations

Department of Computer Science and Technology, Harbin Institute of Technology, Harbin, China
Binhuan Yuan, Yueyang Li, Kehai Chen, Hao Lu, Muyun Yang & Hailong Cao

Authors

Binhuan Yuan
View author publications
You can also search for this author in PubMed Google Scholar
Yueyang Li
View author publications
You can also search for this author in PubMed Google Scholar
Kehai Chen
View author publications
You can also search for this author in PubMed Google Scholar
Hao Lu
View author publications
You can also search for this author in PubMed Google Scholar
Muyun Yang
View author publications
You can also search for this author in PubMed Google Scholar
Hailong Cao
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding authors

Correspondence to Binhuan Yuan , Yueyang Li or Kehai Chen .

Editor information

Editors and Affiliations

Northeastern University, Shenyang, China
Tong Xiao
Meta AI, San Francisco, CA, USA
Juan Pino

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Yuan, B., Li, Y., Chen, K., Lu, H., Yang, M., Cao, H. (2022). An Improved Multi-task Approach to Pre-trained Model Based MT Quality Estimation. In: Xiao, T., Pino, J. (eds) Machine Translation. CCMT 2022. Communications in Computer and Information Science, vol 1671. Springer, Singapore. https://doi.org/10.1007/978-981-19-7960-6_11

Download citation

DOI: https://doi.org/10.1007/978-981-19-7960-6_11
Published: 09 December 2022
Publisher Name: Springer, Singapore
Print ISBN: 978-981-19-7959-0
Online ISBN: 978-981-19-7960-6
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

An Improved Multi-task Approach to Pre-trained Model Based MT Quality Estimation