Abstract
Self-attention networks (SAN) have achieved promising performance in a variety of NLP tasks, e.g. neural machine translation (NMT), as they can directly build dependencies among words. But it is weaker at learning positional information than recurrent neural networks (RNN). Natural questions arise: (1) Can we design a component with RNN by directly guiding the syntax dependencies for it? (2) Whether such syntax enhanced sequence modeling component benefits existing NMT structures, e.g. RNN-based NMT and Transformer-based NMT. To answer above question, we propose a simple yet effective recurrent graph syntax encoder, dubbed RGSE, to utilize off-the-shelf syntax dependencies and its intrinsic recurrence property, such that RGSE models syntactic dependencies and sequential information (i.e. word order) simultaneously. Experimental studies on various neural machine translation tasks demonstrate that RGSE equipped RNN and Transformer models could gain consistent significant improvements over several strong syntax-aware benchmarks, with minuscule parameters increases. The extensive analysis further illustrates that RGSE does improve the syntactic and semantic preservation ability than SAN, additionally, shows superior robustness to defend syntactic noise than existing syntax-aware NMT models.






Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.Notes
The syntactic dependency label will remain on each substring if a word is splitted by BPE.
See footnote 4.
References
Kalchbrenner N, Blunsom P (2013) Recurrent continuous translation models. In: Proceedings of the 2013 conference on empirical methods in natural language processing. pp 1700–1709
Sutskever I, Vinyals O, Le QV (2014) Sequence to sequence learning with neural networks. In: Advances in neural information processing systems, 27
Bahdanau D, Cho K, Bengio Y (2015) Neural machine translation by jointly learning to align and translate. In: Proceedings of the 3rd International Conference on Learning Representations 2015
Gehring J, Auli M, Grangier D (2017) Convolutional sequence to sequence learning. In: International conference on machine learning, pp 1243–1252
Wu Y, Schuster M, Chen Z, Le QV (2016) Google’s neural machine translation system: Bridging the gap between human and machine translation. In: Advances in Neural Information Processing Systems
Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez AN, Kaiser L, Polosukhin I (2017) Attention is all you need. In: Advances in neural information processing systems. p 30
Koehn P, Och FJ, Marcu D (2003) Statistical Phrase-Based Translation. In: Proceedings of the 2003 Human Language Technology Conference of the North American Chapter of the Association for Computational Linguistics, pp 127–133
Cho K, Van Merriënboer B, Gulcehre C (2014) Learning Phrase Representations using RNN Encoder–Decoder for Statistical Machine Translation. In: Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), pp 1724–1734
Parikh A, Täckström O, Das D, Uszkoreit J (2016) A Decomposable Attention Model for Natural Language Inference. In: Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing, pp 2249–2255
Lin Z, Feng M, Santos CN, Yu M, Xiang B, Zhou B, Bengio Y (2017) A structured self-attentive sentence embedding. In: Proceedings of the 5th International Conference on Learning Representations
Chen MX, Firat O, Bapna A, Johnson (2018) The best of both worlds: combining recent advances in neural machine translation. In: Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Long Papers), pp 76–86
Shaw P, Uszkoreit J, Vaswani A et al (2018) Self-Attention with Relative Position Representations. In: Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Vol 2 (Short Papers), pp 464–468
Hao J, Wang X, Yang B, Wang L, Zhang J, Tu Z (2019) Modeling Recurrence for Transformer. In: Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Vol 1. pp 1198–1207
Tran K, Bisazza A, Monz C (2018) The Importance of Being Recurrent for Modeling Hierarchical Structure. In: Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, pp 4731–4736
Tang G, Müller M, Rios A, Sennrich R (2018) Why self-attention? A targeted evaluation of neural machine translation architectures. In: Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, pp 4263–4272
Marcheggiani D, Titov I (2017) Encoding sentences with graph convolutional networks for semantic role labeling. In: Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, pp 1506–1515
Bastings J, Titov I, Aziz W (2017) Graph Convolutional Encoders for Syntax-aware Neural Machine Translation. In: Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, pp 1957–1967
Song L, Zhang Y (2018) A Graph-to-Sequence Model for AMR-to-Text Generation. In: Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pp 1616–1626
Beck D, Haffari G et al (2018) Graph-to-Sequence Learning using Gated Graph Neural Networks. In: Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pp 273–283
Song L, Gildea D, Zhang Y, Wang Z, Su J (2019) Semantic neural machine translation using AMR. Trans Assoc Comput Linguist 7:19–31
Kipf TN, Welling M (2016) Semi-supervised classification with graph convolutional networks. arXiv preprint arXiv:1609.02907
Wu S, Zhang D et al (2018) Dependency-to-dependency neural machine translation. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 26(11):2132–2141
He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 770–778
Domhan T (2018) How much attention do you need? a granular analysis of neural machine translation architectures. In: Proceedings of the 56th annual meeting of the association for computational linguistics, vol 1: long papers, pp 1799–1808
Sennrich R, Haddow B, Birch A (2016) Neural Machine Translation of Rare Words with Subword Units. In: Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pp 1715–1725
Morishita M, Suzuki J, Nagata M (2017) NTT neural machine translation systems at WAT 2017. In: Proceedings of the 4th workshop on Asian translation (WAT2017). Asian Federation of Natural Language Processing, pp 89–94
Chen H, Huang S, Chiang D, Chen J (2017) Improved Neural Machine Translation with a Syntax-Aware Encoder and Decoder. In: Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pp 1936–1945
Papineni K, Roukos S, Ward T, Zhu W-J (2002) BLEU: a method for automatic evaluation of machine translation. In: Proceedings of the 40th annual meeting of the Association for Computational Linguistics, pp 311–318
Anastasopoulos A, Chiang D (2018) Tied Multitask Learning for Neural Speech Translation. In: Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers), pp 82–91
Yang B, Tu Z, Wong DF, Meng F, Chao LS, Zhang T (2018) Modeling Localness for Self-Attention Networks. In: Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, pp 4449–4458
Zhang Z, Wu Y, Zhou J, Duan S, Zhao H, Wang R (2022) SG-Net: Syntax Guided Transformer for Language Representation. IEEE Transactions on Pattern Analysis & Machine Intelligence, 44(06):3285–3299
Deguchi H, Tamura A, Ninomiya T (2021) Synchronous Syntactic Attention for Transformer Neural Machine Translation. In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing: Student Research Workshop, pp 348–355
Duan S, Zhao H, Zhang D, Wang R (2020) Syntax-aware data augmentation for neural machine translation. arXiv:2004.14200
Yang J, Ma S, Zhang D, Li Z, Zhou M (2020) Improving neural machine translation with soft template prediction. In: Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pages 5979–5989
Conneau A, Kruszewski G, Lample G (2018) What you can cram into a single $ &!#* vector: Probing sentence embeddings for linguistic properties. In: Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pp 2126–2136
Li J, Yang B, Dou Z.-Y, Wang X, Lyu M.R, Tu Z (2019) Information Aggregation for Multi-Head Attention with Routing-by-Agreement. In: Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), pp 3566–3575
Xu M, Wong DF, Yang B, Zhang Y, Chao LS (2019) Leveraging local and global patterns for self-attention networks. In: Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pp 3069–3075
Yin Y, Song L, Su J, Zeng J, Zhou C, Luo J (2019) Graph-based neural sentence ordering. In: Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence (IJCAI-19), pp 5387–5393
Yin Y, Meng F, Su J, Zhou C, Yang Z, Zhou J, Luo J (2020) A Novel Graph-based Multi-modal Fusion Encoder for Neural Machine Translation. In: Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pp 3025–3035
Ding L, Wang L, Tao D (2020) Self-Attention with Cross-Lingual Position Representation. In: Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pp 1679–1685
Tai KS, Socher R, Manning CD (2015) Improved Semantic Representations From Tree-Structured Long Short-Term Memory Networks. In: Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 1: Long Papers, pp 1556–1566
Sennrich R, Haddow B (2016) Linguistic Input Features Improve Neural Machine Translation. In: Proceedings of the First Conference on Machine Translation: Volume 1, Research Papers, pp 83–91
Eriguchi A, Hashimoto K, Tsuruoka Y (2016) Tree-to-Sequence Attentional Neural Machine Translation. In: Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pp 823–833
Li J, Xiong D, Tu Z, Zhu M, Zhang M, Zhou G (2017) Modeling Source Syntax for Neural Machine Translation. In: Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pp 688–697
Zaremoodi P, Haffari G (2018) Incorporating Syntactic Uncertainty in Neural Machine Translation with a Forest-to-Sequence Model. In: Proceedings of the 27th International Conference on Computational Linguistics, pp 1421–1429
Ma C, Tamura A, Utiyama M, Zhao T, Sumita E (2018) Forest-Based Neural Machine Translation. In: Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pp 1253–1263
Luong MT, Le QV, Sutskever I, Vinyals O, Kaiser L (2015) Multi-task sequence to sequence learning. arXiv:1511.06114
Niehues J, Cho E (2017) Exploiting Linguistic Resources for Neural Machine Translation Using Multi-task Learning. In: Proceedings of the Second Conference on Machine Translation, pp 80–89
Zhang M, Li Z, Fu G, Zhang M (2019) Syntax-Enhanced Neural Machine Translation with Syntax-Aware Word Representations. In: Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), pp 1151–1161
Marcheggiani D, Bastings J, Titov I (2018) Exploiting Semantics in Neural Machine Translation with Graph Convolutional Networks. In: Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 2 (Short Papers), pp 486–492
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Ding, L., Wang, L. & Liu, S. Recurrent graph encoder for syntax-aware neural machine translation. Int. J. Mach. Learn. & Cyber. 14, 1053–1062 (2023). https://doi.org/10.1007/s13042-022-01682-9
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s13042-022-01682-9