A Document-Level Machine Translation Quality Estimation Model Based on Centering Theory

Chen, Yidong; Zhong, Enjun; Tong, Yiqi; Qiu, Yanru; Shi, Xiaodong

doi:10.1007/978-981-16-7512-6_1

Yidong Chen^7,8,
Enjun Zhong^7,8,
Yiqi Tong^7,8,9,
Yanru Qiu^7,8 &
…
Xiaodong Shi^7,8

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 1464))

Included in the following conference series:

China Conference on Machine Translation

348 Accesses

Abstract

Machine translation Quality Estimation (QE) aims to estimate the quality of machine translations without relying on golden references. Current QE researches mainly focus on sentence-level QE models, which could not capture discourse-related translation errors. To tackle this problem, this paper presents a novel document-level QE model based on Centering Theory (CT), which is a linguistics theory for assessing discourse coherence. Furthermore, we construct and release an open-source Chinese-English corpus at https://github.com/ydc/cpqe for document-level machine translation QE, which could be used to support further studies. Finally, experimental results show that the proposed model significantly outperformed the baseline model.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 54.99; Price excludes VAT (USA)

Softcover Book: USD 69.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
https://radimrehurek.com/gensim/models/word2vec.html.
2.
https://github.com/chakki-works/seqeval.
3.
Available at https://github.com/ydc/cpqe.

References

Banerjee, S., Lavie, A.: Meteor: an automatic metric for MT evaluation with improved correlation with human judgments. In: Proceedings of the ACL Workshop on Intrinsic and Extrinsic Evaluation Measures for Machine Translation and/or Summarization, pp. 65–72 (2005)
Google Scholar
Chen, Z., et al.: Improving machine translation quality estimation with neural network features. In: Proceedings of the Second Conference on Machine Translation, pp. 551–555 (2017)
Google Scholar
Devlin, J., Chang, M.W., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805 (2018)
Doddington, G.: Automatic evaluation of machine translation quality using n-gram co-occurrence statistics. In: Proceedings of the Second International Conference on Human Language Technology Research, pp. 138–145 (2002)
Google Scholar
Fan, K., Wang, J., Li, B., Zhou, F., Chen, B., Si, L.: “bilingual expert” can find translation errors. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 33, pp. 6367–6374 (2019)
Google Scholar
Felice, M., Specia, L.: Linguistic features for quality estimation. In: Proceedings of the Seventh Workshop on Statistical Machine Translation, pp. 96–103 (2012)
Google Scholar
Fonseca, E., Yankovskaya, L., Martins, A.F., Fishel, M., Federmann, C.: Findings of the WMT 2019 shared tasks on quality estimation. In: Proceedings of the Fourth Conference on Machine Translation (Volume 3: Shared Task Papers, Day 2), pp. 1–10 (2019)
Google Scholar
Grosz, B., Joshi, A., Weinstein, S.: Providing a unified account of definite noun phrases in discourse. In: Proceedings of the 21st Annual Meeting of the Association for Computational Linguistics. Association for Computational Linguistics (1983)
Google Scholar
Grosz, B.J., Joshi, A.K., Weinstein, S.: Centering: a framework for modelling the local coherence of discourse (1995)
Google Scholar
Kim, H., Jung, H.Y., Kwon, H., Lee, J.H., Na, S.H.: Predictor-estimator: Neural quality estimation based on target word prediction for machine translation. ACM Trans. Asian Low-Resour. Lang. Inf. Process. (TALLIP) 17(1), 1–22 (2017)
Article Google Scholar
Kim, H., Lim, J.H., Kim, H.K., Na, S.H.: QE BERT: bilingual BERT using multi-task learning for neural quality estimation. In: Proceedings of the Fourth Conference on Machine Translation (Volume 3: Shared Task Papers, Day 2), pp. 85–89 (2019)
Google Scholar
Lafferty, J., McCallum, A., Pereira, F.C.: Conditional random fields: probabilistic models for segmenting and labeling sequence data (2001)
Google Scholar
Li, M., Xiang, Q., Chen, Z., Wang, M.: A unified neural network for quality estimation of machine translation. IEICE Trans. Inf. Syst. 101(9), 2417–2421 (2018)
Article Google Scholar
Martins, A.F., Junczys-Dowmunt, M., Kepler, F.N., Astudillo, R., Hokamp, C., Grundkiewicz, R.: Pushing the limits of translation quality estimation. Trans. Assoc. Computat. Linguist. 5, 205–218 (2017)
Article Google Scholar
Papineni, K., Roukos, S., Ward, T., Zhu, W.J.: Bleu: a method for automatic evaluation of machine translation. In: Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics, pp. 311–318 (2002)
Google Scholar
Patel, R.N., et al.: Translation quality estimation using recurrent neural network. arXiv preprint arXiv:1610.04841 (2016)
Rubino, R., de Souza, J., Foster, J., Specia, L.: Topic models for translation quality estimation for gisting purposes (2013)
Google Scholar
Snover, M., Dorr, B., Schwartz, R., Micciulla, L., Makhoul, J.: A study of translation edit rate with targeted human annotation. In: Proceedings of Association for Machine Translation in the Americas, Cambridge, MA, vol. 200 (2006)
Google Scholar
Specia, L., Paetzold, G., Scarton, C.: Multi-level translation quality prediction with quest++. In: Proceedings of ACL-IJCNLP 2015 System Demonstrations, pp. 115–120 (2015)
Google Scholar
Tan, Z., et al.: THUMT: an open-source toolkit for neural machine translation. In: Proceedings of the 14th Conference of the Association for Machine Translation in the Americas (AMTA 2020), pp. 116–122 (2020)
Google Scholar
Tong, Y., Zheng, J., Zhu, H., Chen, Y., Shi, X.: A document-level neural machine translation model with dynamic caching guided by theme-rheme information. In: Proceedings of the 28th International Conference on Computational Linguistics, pp. 4385–4395 (2020)
Google Scholar
Vaswani, A., et al.: Attention is all you need. In: Advances in Neural Information Processing Systems, pp. 5998–6008 (2017)
Google Scholar
Voita, E., Sennrich, R., Titov, I.: When a good translation is wrong in context: context-aware machine translation improves on deixis, ellipsis, and lexical cohesion. arXiv preprint arXiv:1905.05979 (2019)
Walker, M.A., Joshi, A.K., Prince, E.F.: Centering in naturally-occurring discourse: an overview. In: Centering in Discourse. Citeseer (1998)
Google Scholar
Yang, S., Wang, Y., Chu, X.: A survey of deep learning techniques for neural machine translation. arXiv preprint arXiv:2002.07526 (2020)
Yuan, Y., Sharoff, S.: Sentence level human translation quality estimation with attention-based neural networks. arXiv preprint arXiv:2003.06381 (2020)

Download references

Acknowledgements

The authors would like to thank the three anonymous reviewers for their comments on this paper. This research was supported in part by the National Natural Science Foundation of China under Grant Nos. 62076211, U1908216 and 61573294 and the Outstanding Achievement Late Fund of the State Language Commission of China under Grant WT135-38.

Author information

Authors and Affiliations

Department of Artificial Intelligence, School of Informatics, Xiamen University, Xiamen, China
Yidong Chen, Enjun Zhong, Yiqi Tong, Yanru Qiu & Xiaodong Shi
Key Laboratory of Digital Protection and Intelligent Processing of Intangible Cultural Heritage of Fujian and Taiwan, Ministry of Culture and Tourism, Xiamen, China
Yidong Chen, Enjun Zhong, Yiqi Tong, Yanru Qiu & Xiaodong Shi
Institute of Artificial Intelligence, Beihang University, Beijing, China
Yiqi Tong

Authors

Yidong Chen
View author publications
You can also search for this author in PubMed Google Scholar
Enjun Zhong
View author publications
You can also search for this author in PubMed Google Scholar
Yiqi Tong
View author publications
You can also search for this author in PubMed Google Scholar
Yanru Qiu
View author publications
You can also search for this author in PubMed Google Scholar
Xiaodong Shi
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Yidong Chen .

Editor information

Editors and Affiliations

Xiamen University, Xiamen, China
Jinsong Su
The University of Edinburgh, Edinburgh, UK
Rico Sennrich

Appendices

A Appendix

The input of the outer-extractor is translation sentences mt, the preferred centers of translation sentences mCp, source sentences src and the preferred centers of source sentences sCp. The output of the extractor are embeddings of preferred centers Emb and the sentence relation features \(f_{outer}\). T is the number of sentences in the corpus.

B Appendix

Table 5. Parameter of Bert-BiLSTM-CRF model

Full size table

For preferred center extraction model, we use BERT-Base-Chinese as Chinese pre-trained model and BERT-Base as English pre-trained model. Some hyper-parameters are fixed: decoder layers are 12, hidden size of Bert is 768, the number of heads in multi-head attention is 12. Other parameters are shown in Table 5.

C Appendix

Table 6. Hyper-parameters of baseline predictor

Full size table

Table 7. Hyper-parameters of baseline estimator

Full size table

Our CpQE model integrate an outer-extractor compared with baseline model. Other parameters is same as the baseline model. The parameters of baseline is shown in Table 6 and Table 7. The dimension of Word2Vec in outer-extractor is 512.

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Chen, Y., Zhong, E., Tong, Y., Qiu, Y., Shi, X. (2021). A Document-Level Machine Translation Quality Estimation Model Based on Centering Theory. In: Su, J., Sennrich, R. (eds) Machine Translation. CCMT 2021. Communications in Computer and Information Science, vol 1464. Springer, Singapore. https://doi.org/10.1007/978-981-16-7512-6_1

Download citation

DOI: https://doi.org/10.1007/978-981-16-7512-6_1
Published: 30 October 2021
Publisher Name: Springer, Singapore
Print ISBN: 978-981-16-7511-9
Online ISBN: 978-981-16-7512-6
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

A Document-Level Machine Translation Quality Estimation Model Based on Centering Theory

Abstract

Access this chapter

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Appendices

A Appendix

B Appendix

C Appendix

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation