Skip to main content
Log in

Document-Level Neural Machine Translation with Hierarchical Modeling of Global Context

  • Regular Paper
  • Published:
Journal of Computer Science and Technology Aims and scope Submit manuscript

Abstract

Document-level machine translation (MT) remains challenging due to its difficulty in efficiently using document-level global context for translation. In this paper, we propose a hierarchical model to learn the global context for document-level neural machine translation (NMT). This is done through a sentence encoder to capture intra-sentence dependencies and a document encoder to model document-level inter-sentence consistency and coherence. With this hierarchical architecture, we feedback the extracted document-level global context to each word in a top-down fashion to distinguish different translations of a word according to its specific surrounding context. Notably, we explore the effect of three popular attention functions during the information backward-distribution phase to take a deep look into the global context information distribution of our model. In addition, since large-scale in-domain document-level parallel corpora are usually unavailable, we use a two-step training strategy to take advantage of a large-scale corpus with out-of-domain parallel sentence pairs and a small-scale corpus with in-domain parallel document pairs to achieve the domain adaptability. Experimental results of our model on Chinese-English and English-German corpora significantly improve the Transformer baseline by 4.5 BLEU points on average which demonstrates the effectiveness of our proposed hierarchical model in document-level NMT.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. Sutskever I, Vinyals O, le Q V. Sequence to sequence learning with neural networks. In Proc. the 27th Annual Conference on Neural Information Processing Systems, December 2014, pp.3104-3112.

  2. Bahdanau D, Cho K, Bengio Y. Neural machine translation by jointly learning to align and translate. In Proc. the 3rd International Conference on Learning Representations, May 2015.

  3. Gehring J, Auli M, Grangier D, Yarats D, Dauphin Y. Convolutional sequence to sequence learning. In Proc. the 34th International Conference on Machine Learning, August 2017, pp.1243-1252.

  4. 8.

  5. Maruf S, Ha_ari G. Document context neural machine translation with memory networks. In Proc. the 56th Annual Meeting of the Association for Computational Linguistics, July 2018, pp.1275-1284. DOI: 10.18653/v1/P18-1118.

  6. Wang L, Tu Z, Way A, Liu Q. Exploiting cross-sentence context for neural machine translation. In Proc. the 2017 Conference on Empirical Methods in Natural Language Processing, September 2017, pp.2826-2831. DOI: https://doi.org/10.18653/v1/D17-1301.

  7. Zhang J, Luan H, Sun M, Zhai F, Xu J, Zhang M, Liu Y. Improving the transformer translation model with document-level context. In Proc. the 2018 Conference on Empirical Methods in Natural Language Processing, October 31-November 4, 2018, pp.533-542. DOI: 10.18653/v1/D18-1049.

  8. Miculicich L, Ram D, Pappas N, Henderson J. Document-level neural machine translation with hierarchical attention networks. In Proc. the 2018 Conference on Empirical Methods in Natural Language Processing, October 31-November 4, 2018, pp.2947-2954. DOI: 10.18653/v1/D18-1325.

  9. Sordoni A, Bengio Y, Vahabi H, Lioma C, Simonsen J G, Nie J. A hierarchical recurrent encoder-decoder for generative context-aware query suggestion. In Proc. the 24th ACM International on Conference on Information and Knowledge Management, October 2015, pp.553-562. DOI: https://doi.org/10.1145/2806416.2806493.

  10. Vinyals O, Fortunato M, Jaitly N. Pointer networks. In Proc. the 28th Annual Conference on Neural Information Processing Systems, December 2015, pp.2692-2700.

  11. Dozat T, Christopher D M. Deep bia_ne attention for neural dependency parsing. arXiv:1611.01734, 2017. https://arxiv.org/abs/1611.01734, October 2020.

  12. Voita E, Serdyukov P, Sennrich R, Titov I. Context-aware neural machine translation learns anaphora resolution. In Proc. the 56th Annual Meeting of the Association for Computational Linguistics, July 2018, pp.1264-1274. DOI: 10.18653/v1/P18-1117.

  13. Tu Z, Liu Y, Shi S, Zhang T. Learning to remember translation history with a continuous cache. Transactions of the Association for Computational Linguistics, 2018, 6: 407-420. DOI: https://doi.org/10.1162/tacl_a_00029.

    Article  Google Scholar 

  14. Kuang S, Xiong D, Luo W, Zhou G. Modeling coherence for neural machine translation with dynamic and topic caches. In Proc. the 27th International Conference on Computational Linguistics, August 2018, pp.596-606.

  15. Bawden R, Sennrich R, Birch A, Haddow B. Evaluating discourse phenomena in neural machine translation. In Proc. the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, June 2018, pp.1304-1313. DOI: 10.18653/v1/N18-1118.

  16. Xiong H, He Z, Wu H, Wang H. Modeling coherence for discourse neural machine translation. In Proc. the 33rd AAAI Conference on Artificial Intelligence, January 27-February 1, 2019, pp.7338-7345. DOI: 10.1609/aaai.v33i01.33017338.

  17. Shen S, Cheng Y, He Z, He W, Wu H, Sun M, Liu Y. Minimum risk training for neural machine translation. In Proc. the 54th Annual Meeting of the Association for Computational Linguistics, August 2016, pp.1683-1692. DOI: 10.18653/v1/P16-1159.

  18. Tan X, Zhang L, Xiong D, Zhou G. Hierarchical modeling of global context for document-level neural machine translation. In Proc. the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing, November 2019, pp.1576-1585. DOI: https://doi.org/10.18653/v1/D19-1168.

  19. Maruf S, Martins A, Haffari G. Selective attention for context-aware neural machine translation. In Proc. the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, June 2019, pp.3092-3102. DOI: 10.18653/v1/N19-1313.

  20. Yang Z, Zhang J, Meng F, Gu S, Feng Y, Zhou J. Enhancing context modeling with a query-guided capsule network for document-level translation. In Proc. the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing, November 2019, pp.1527-1537. DOI: https://doi.org/10.18653/v1/D19-1164.

  21. Cettolo M, Girardi C, Federico M. WIT3: Web inventory of transcribed and translated talks. In Proc. the 16th Conference of the European Association for Machine Translation, May 2012, pp.261-268.

  22. Koehn P. Europarl: A parallel corpus for statistical machine translation. In Proc. the 10th Machine Translation Summit, September 2005, pp.79-86.

  23. Maruf S, Haffari G. Document context neural machine translation with memory networks. In Proc. the 56th Annual Meeting of the Association for Computational Linguistics, July 2018, pp.1275-1284. DOI: 10.18653/v1/P18-1118.

  24. Koehn P, Hoang H, Birch A, Callision-Burch C, Federico M, Bertoldi N, Cowan B, Shen W, Moran C, Zens R, Dyer C, Bojar O, Constantin A, Herbst E. Moses: Open source toolkit for statistical machine translation. In Proc. the 45th Annual Meeting of the Association for Computational Linguistics, June 2007, pp.177-180.

  25. Seenrich R, Haddow B, Birch A. Neural machine translation of rare words with subword units. In Proc. the 54th Annual Meeting of the Association for Computational Linguistics, August 2016, pp.1715-1725. DOI: 10.18653/v1/P16-1162.

  26. Klein G, Kim Y, Deng Y, Senellart J, Rush A. OpenNMT: Open-source toolkit for neural machine translation. In Proc. the 55th Annual Meeting of the Association for Computational Linguistics, July 30-August 4, 2017, pp.67-72. DOI: 10.18653/v1/P17-4012.

  27. Papineni K, Roukos S, Ward T, Zhu W. BLEU: A method for automatic evaluation of machine translation. In Proc. the 40th Annual Meeting of the Association for Computational Linguistics, July 2002, pp.311-318. DOI: 10.3115/1073083.1073135.

  28. Lavie A, Agarwal A. METEOR: An automatic metric for MT evaluation with high levels of correlation with human judgments. In Proc. the 2nd Workshop on Statistical Machine Translation, June 2007, pp.228-231.

  29. Werlen L M, Popescu-Belis A. Validation of an automatic metric for the accuracy of pronoun translation (APT). In Proc. the 3rd Workshop on Discourse in Machine Translation, September 2017, pp.17-25. DOI: https://doi.org/10.18653/v1/W17-4802.

  30. Su J, Zeng J, Xiong D, Liu Y, Wang M, Xie J. A hierarchy-to-sequence attentional neural machine translation model. IEEE/ACM Trans. Audio, Speech, and Language Processing, 2018, 26(3): 263-632. DOI: https://doi.org/10.1109/TASLP.2018.2789721.

    Article  Google Scholar 

  31. Chen J, Li X, Zhang J, Zhou C, Cui J, Wang B, Su J. Modeling discourse structure for document-level neural machine translation. arXiv:2006.04721, 2020. https://arxiv.org/abs/2006.04721, June 2020.

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Guo-Dong Zhou.

Supplementary Information

ESM 1

(PDF 129 kb)

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Tan, X., Zhang, LY. & Zhou, GD. Document-Level Neural Machine Translation with Hierarchical Modeling of Global Context. J. Comput. Sci. Technol. 37, 295–308 (2022). https://doi.org/10.1007/s11390-021-0286-3

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11390-021-0286-3

Keywords

Navigation