Skip to main content
Log in

Recurrent graph encoder for syntax-aware neural machine translation

  • Original Article
  • Published:
International Journal of Machine Learning and Cybernetics Aims and scope Submit manuscript

Abstract

Self-attention networks (SAN) have achieved promising performance in a variety of NLP tasks, e.g. neural machine translation (NMT), as they can directly build dependencies among words. But it is weaker at learning positional information than recurrent neural networks (RNN). Natural questions arise: (1) Can we design a component with RNN by directly guiding the syntax dependencies for it? (2) Whether such syntax enhanced sequence modeling component benefits existing NMT structures, e.g. RNN-based NMT and Transformer-based NMT. To answer above question, we propose a simple yet effective recurrent graph syntax encoder, dubbed RGSE, to utilize off-the-shelf syntax dependencies and its intrinsic recurrence property, such that RGSE models syntactic dependencies and sequential information (i.e. word order) simultaneously. Experimental studies on various neural machine translation tasks demonstrate that RGSE equipped RNN and Transformer models could gain consistent significant improvements over several strong syntax-aware benchmarks, with minuscule parameters increases. The extensive analysis further illustrates that RGSE does improve the syntactic and semantic preservation ability than SAN, additionally, shows superior robustness to defend syntactic noise than existing syntax-aware NMT models.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6

Similar content being viewed by others

Notes

  1. http://www.statmt.org/wmt16/translation-task.html.

  2. https://github.com/tensorflow/models/tree/master/research/syntaxnet.

  3. http://www.tkl.iis.u-tokyo.ac.jp/ynaga/jdepp/.

  4. The syntactic dependency label will remain on each substring if a word is splitted by BPE.

  5. See footnote 4.

References

  1. Kalchbrenner N, Blunsom P (2013) Recurrent continuous translation models. In: Proceedings of the 2013 conference on empirical methods in natural language processing. pp 1700–1709

  2. Sutskever I, Vinyals O, Le QV (2014) Sequence to sequence learning with neural networks. In: Advances in neural information processing systems, 27

  3. Bahdanau D, Cho K, Bengio Y (2015) Neural machine translation by jointly learning to align and translate. In: Proceedings of the 3rd International Conference on Learning Representations 2015

  4. Gehring J, Auli M, Grangier D (2017) Convolutional sequence to sequence learning. In: International conference on machine learning, pp 1243–1252

  5. Wu Y, Schuster M, Chen Z, Le QV (2016) Google’s neural machine translation system: Bridging the gap between human and machine translation. In: Advances in Neural Information Processing Systems

  6. Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez AN, Kaiser L, Polosukhin I (2017) Attention is all you need. In: Advances in neural information processing systems. p 30

  7. Koehn P, Och FJ, Marcu D (2003) Statistical Phrase-Based Translation. In: Proceedings of the 2003 Human Language Technology Conference of the North American Chapter of the Association for Computational Linguistics, pp 127–133

  8. Cho K, Van Merriënboer B, Gulcehre C (2014) Learning Phrase Representations using RNN Encoder–Decoder for Statistical Machine Translation. In: Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), pp 1724–1734

  9. Parikh A, Täckström O, Das D, Uszkoreit J (2016) A Decomposable Attention Model for Natural Language Inference. In: Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing, pp 2249–2255

  10. Lin Z, Feng M, Santos CN, Yu M, Xiang B, Zhou B, Bengio Y (2017) A structured self-attentive sentence embedding. In: Proceedings of the 5th International Conference on Learning Representations

  11. Chen MX, Firat O, Bapna A, Johnson (2018) The best of both worlds: combining recent advances in neural machine translation. In: Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Long Papers), pp 76–86

  12. Shaw P, Uszkoreit J, Vaswani A et al (2018) Self-Attention with Relative Position Representations. In: Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Vol 2 (Short Papers), pp 464–468

  13. Hao J, Wang X, Yang B, Wang L, Zhang J, Tu Z (2019) Modeling Recurrence for Transformer. In: Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Vol 1. pp 1198–1207

  14. Tran K, Bisazza A, Monz C (2018) The Importance of Being Recurrent for Modeling Hierarchical Structure. In: Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, pp 4731–4736

  15. Tang G, Müller M, Rios A, Sennrich R (2018) Why self-attention? A targeted evaluation of neural machine translation architectures. In: Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, pp 4263–4272

  16. Marcheggiani D, Titov I (2017) Encoding sentences with graph convolutional networks for semantic role labeling. In: Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, pp 1506–1515

  17. Bastings J, Titov I, Aziz W (2017) Graph Convolutional Encoders for Syntax-aware Neural Machine Translation. In: Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, pp 1957–1967

  18. Song L, Zhang Y (2018) A Graph-to-Sequence Model for AMR-to-Text Generation. In: Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pp 1616–1626

  19. Beck D, Haffari G et al (2018) Graph-to-Sequence Learning using Gated Graph Neural Networks. In: Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pp 273–283

  20. Song L, Gildea D, Zhang Y, Wang Z, Su J (2019) Semantic neural machine translation using AMR. Trans Assoc Comput Linguist 7:19–31

    Article  Google Scholar 

  21. Kipf TN, Welling M (2016) Semi-supervised classification with graph convolutional networks. arXiv preprint arXiv:1609.02907

  22. Wu S, Zhang D et al (2018) Dependency-to-dependency neural machine translation. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 26(11):2132–2141

  23. He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 770–778

  24. Domhan T (2018) How much attention do you need? a granular analysis of neural machine translation architectures. In: Proceedings of the 56th annual meeting of the association for computational linguistics, vol 1: long papers, pp 1799–1808

  25. Sennrich R, Haddow B, Birch A (2016) Neural Machine Translation of Rare Words with Subword Units. In: Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pp 1715–1725

  26. Morishita M, Suzuki J, Nagata M (2017) NTT neural machine translation systems at WAT 2017. In: Proceedings of the 4th workshop on Asian translation (WAT2017). Asian Federation of Natural Language Processing, pp 89–94

  27. Chen H, Huang S, Chiang D, Chen J (2017) Improved Neural Machine Translation with a Syntax-Aware Encoder and Decoder. In: Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pp 1936–1945

  28. Papineni K, Roukos S, Ward T, Zhu W-J (2002) BLEU: a method for automatic evaluation of machine translation. In: Proceedings of the 40th annual meeting of the Association for Computational Linguistics, pp 311–318

  29. Anastasopoulos A, Chiang D (2018) Tied Multitask Learning for Neural Speech Translation. In: Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers), pp 82–91

  30. Yang B, Tu Z, Wong DF, Meng F, Chao LS, Zhang T (2018) Modeling Localness for Self-Attention Networks. In: Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, pp 4449–4458

  31. Zhang Z, Wu Y, Zhou J, Duan S, Zhao H, Wang R (2022) SG-Net: Syntax Guided Transformer for Language Representation. IEEE Transactions on Pattern Analysis & Machine Intelligence, 44(06):3285–3299

    Article  Google Scholar 

  32. Deguchi H, Tamura A, Ninomiya T (2021) Synchronous Syntactic Attention for Transformer Neural Machine Translation. In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing: Student Research Workshop, pp 348–355

  33. Duan S, Zhao H, Zhang D, Wang R (2020) Syntax-aware data augmentation for neural machine translation. arXiv:2004.14200

  34. Yang J, Ma S, Zhang D, Li Z, Zhou M (2020) Improving neural machine translation with soft template prediction. In: Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pages 5979–5989

  35. Conneau A, Kruszewski G, Lample G (2018) What you can cram into a single $ &!#* vector: Probing sentence embeddings for linguistic properties. In: Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pp 2126–2136

  36. Li J, Yang B, Dou Z.-Y, Wang X, Lyu M.R, Tu Z (2019) Information Aggregation for Multi-Head Attention with Routing-by-Agreement. In: Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), pp 3566–3575

  37. Xu M, Wong DF, Yang B, Zhang Y, Chao LS (2019) Leveraging local and global patterns for self-attention networks. In: Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pp 3069–3075

  38. Yin Y, Song L, Su J, Zeng J, Zhou C, Luo J (2019) Graph-based neural sentence ordering. In: Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence (IJCAI-19), pp 5387–5393

  39. Yin Y, Meng F, Su J, Zhou C, Yang Z, Zhou J, Luo J (2020) A Novel Graph-based Multi-modal Fusion Encoder for Neural Machine Translation. In: Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pp 3025–3035

  40. Ding L, Wang L, Tao D (2020) Self-Attention with Cross-Lingual Position Representation. In: Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pp 1679–1685

  41. Tai KS, Socher R, Manning CD (2015) Improved Semantic Representations From Tree-Structured Long Short-Term Memory Networks. In: Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 1: Long Papers, pp 1556–1566

  42. Sennrich R, Haddow B (2016) Linguistic Input Features Improve Neural Machine Translation. In: Proceedings of the First Conference on Machine Translation: Volume 1, Research Papers, pp 83–91

  43. Eriguchi A, Hashimoto K, Tsuruoka Y (2016) Tree-to-Sequence Attentional Neural Machine Translation. In: Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pp 823–833

  44. Li J, Xiong D, Tu Z, Zhu M, Zhang M, Zhou G (2017) Modeling Source Syntax for Neural Machine Translation. In: Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pp 688–697

  45. Zaremoodi P, Haffari G (2018) Incorporating Syntactic Uncertainty in Neural Machine Translation with a Forest-to-Sequence Model. In: Proceedings of the 27th International Conference on Computational Linguistics, pp 1421–1429

  46. Ma C, Tamura A, Utiyama M, Zhao T, Sumita E (2018) Forest-Based Neural Machine Translation. In: Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pp 1253–1263

  47. Luong MT, Le QV, Sutskever I, Vinyals O, Kaiser L (2015) Multi-task sequence to sequence learning. arXiv:1511.06114

  48. Niehues J, Cho E (2017) Exploiting Linguistic Resources for Neural Machine Translation Using Multi-task Learning. In: Proceedings of the Second Conference on Machine Translation, pp 80–89

  49. Zhang M, Li Z, Fu G, Zhang M (2019) Syntax-Enhanced Neural Machine Translation with Syntax-Aware Word Representations. In: Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), pp 1151–1161

  50. Marcheggiani D, Bastings J, Titov I (2018) Exploiting Semantics in Neural Machine Translation with Graph Convolutional Networks. In: Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 2 (Short Papers), pp 486–492

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Longyue Wang.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Ding, L., Wang, L. & Liu, S. Recurrent graph encoder for syntax-aware neural machine translation. Int. J. Mach. Learn. & Cyber. 14, 1053–1062 (2023). https://doi.org/10.1007/s13042-022-01682-9

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s13042-022-01682-9

Keywords

Navigation