Skip to main content
Log in

Bilingual attention based neural machine translation

  • Published:
Applied Intelligence Aims and scope Submit manuscript

Abstract

In recent years, Recurrent Neural Network based Neural Machine Translation (RNN-based NMT) equipped with an attention mechanism from the decoder to encoder, has achieved great advancements and exhibited good performance in many language pairs. However, little work has been done on the attention mechanism for the target side, which has the potential to further improve NMT. To address this issue, in this paper, we propose a novel bilingual attention based NMT, where its bilingual attention mechanism exploits decoding history and enables the NMT model to better dynamically select and exploit source side and target side information. Compared with previous RNN-based NMT models, our model has two advantages: First, our model exercises a dynamic control over the ratios at which source and target contexts respectively contribute to the generation of the next target word. In this way, the weakly induced structure relations on both sides can be exploited for NMT. Second, through short-cut connections, the training errors of our model can be directly back-propagated, which effectively alleviates the gradient vanishing or exploding issue. Experimental results and in-depth analyses on Chinese-English, English-German, and English-French translation tasks show that our model with proper configurations can significantly surpass the dominant NMT model, Transformer. Particularly, our proposed model has won the first prize in the English-Chinese translation task of WMT2018.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5

Similar content being viewed by others

Notes

  1. The corpora include LDC2002E18, LDC2003E07, LDC2003E14, Hansards portion of LDC2004T07, LDC2004T08 and LDC2005T06.

  2. https://github.com/moses-smt/mosesdecoder/blob/master/scripts/generic/mu\lti-bleu.perl

References

  1. Bahdanau D, Cho K, Bengio Y (2015) Neural machine translation by jointly learning to align and translate. In: Bengio Y, LeCun Y (eds) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, conference track proceedings

  2. Bojar O, Federmann C, Fishel M, Graham Y, Haddow B, Koehn P, Monz C (2018) Findings of the 2018 conference on machine translation (WMT18). In: Bojar O, Chatterjee R, Federmann C, Fishel M, Graham Y, Haddow B, Huck M, Jimeno-Yepes A, Koehn P, Monz C, Negri M, Nèvèol A, Neves ML, Post M, Specia L, Turchi M, Verspoor K (eds) Proceedings of the third conference on machine translation: Shared Task Papers, WMT 2018, Belgium, Brussels, October 31 - November 1, 2018, Association for Computational Linguistics, p 272–303

  3. Buck C, Heafield K, Van Ooyen B (2014) N-gram counts and language models from the common crawl. In: LREC, Citeseer, vol 2, p 4

  4. Chen K, Wang R, Utiyama M, Sumita E (2019) Neural machine translation with reordering embeddings. In: Korhonen A, Traum DR, Màrquez L (eds) Proceedings of the 57th Conference of the Association for Computational Linguistics, ACL 2019, Florence, Italy, July 28- August 2, 2019, vol 1: Long Papers, Association for Computational Linguistics, pp 1787–1799

  5. Chen MX, Firat O, Bapna A, Johnson M, Macherey W, Foster G, Jones L, Schuster M, Shazeer N, Parmar N, Vaswani A, Uszkoreit J, Kaiser L, Chen Z, Wu Y, Hughes M (2018) The best of both worlds: Combining recent advances in neural machine translation. In: Proceedings of the 56th annual meeting of the association for computational linguistics (vol 1: Long Papers), Association for Computational Linguistics, p 76–86

  6. Cheng J, Dong L, Lapata M (2016) Long short-term memory-networks for machine reading. empirical methods in natural language processing p 551–561

  7. Cheng Y, Tu Z, Meng F, Zhai J, Liu Y (2018) Towards robust neural machine translation. In: Gurevych I, Miyao Y (eds) Proceedings of the 56th annual meeting of the association for computational linguistics, ACL 2018, Melbourne, Australia, July 15-20, 2018, vol 1: Long Papers, Association for Computational Linguistics, pp 1756–1766

  8. Chiang D (2005) A hierarchical phrase-based model for statistical machine translation. In: Proceedings of the 43rd annual meeting on association for computational linguistics, association for computational linguistics, p 263–270

  9. Cho K, van Merrienboer B, Gulcehre C, Bahdanau D, Bougares F, Schwenk H, Bengio Y (2014) Learning phrase representations using rnn encoder–decoder for statistical machine translation. In: Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), Association for computational linguistics, Doha, Qatar, pp 1724– 1734

  10. Gehring J, Auli M, Grangier D, Dauphin YN (2017) A convolutional encoder model for neural machine translation. In: Barzilay R, Kan M (eds) Proceedings of the 55th annual meeting of the association for computational linguistics, ACL 2017, Vancouver, Canada, July 30 - August 4, vol 1: Long Papers, Association for Computational Linguistics, p 123–135

  11. Gehring J, Auli M, Grangier D, Yarats D, Dauphin YN (2017) Convolutional sequence to sequence learning. In: Precup D, Teh YW (eds) Proceedings of the 34th International Conference on Machine Learning, ICML 2017, Sydney, NSW, Australia, 6-11 August 2017, PMLR, Proceedings of Machine Learning Research, vol 70, p 1243–1252

  12. Graves A, Wayne G, Danihelka I (2014) Neural turing machines CoRR arXiv:1410.5401

  13. He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: 2016 IEEE Conference on computer vision and pattern recognition, CVPR 2016, Las Vegas, NV, USA, June 27–30, 2016 p 770–778

  14. Hinton GE, Srivastava N, Krizhevsky A, Sutskever I, Salakhutdinov R (2012) Improving neural networks by preventing co-adaptation of feature detectors, arXiv:1207.0580

  15. Hochreiter S, Schmidhuber J (1997) Long short-term memory. Neural Comput 9(8):1735–1780

    Article  Google Scholar 

  16. Johnson M, Schuster M, Le Q V, Krikun M, Wu Y, Chen Z, Thorat N, Viègas FB, Wattenberg M, Corrado G, Hughes M, Dean J (2017) Google’s multilingual neural machine translation system: Enabling zero-shot translation. TACL 5:339–351

  17. Kim J, El-khamy M, Lee J (2017) Residual LSTM: design of a deep recurrent architecture for distant speech recognition. In: Interspeech 2017, 18th annual conference of the international speech communication association, Stockholm, Sweden, August 20-24, 2017, p 1591–1595

  18. Kingma DP, Ba J (2015) Adam: A method for stochastic optimization. In: Bengio Y, LeCun Y (eds) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings

  19. Koehn P (2004) Statistical significance tests for machine translation evaluation. In: Proceedings of the 2004 Conference on Empirical Methods in Natural Language Processing, EMNLP 2004, A meeting of SIGDAT, a Special Interest Group of the ACL, held in conjunction with ACL 2004, 25-26 July 2004, Barcelona, Spain ACL, p 388–395

  20. Koehn P, Och FJ, Marcu D (2003) Statistical phrase-based translation. In: Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology-vol 1, Association for Computational Linguistics, p 48–54

  21. Kuang S, Li J, Branco A, Luo W, Xiong D (2018) Attention focusing for neural machine translation by bridging source and target embeddings. In: Gurevych I, Miyao Y (eds) Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics, ACL 2018, Melbourne, Australia, July 15-20, 2018, vol 1: Long Papers, Association for Computational Linguistics p 1767–1776

  22. Lee K, Levy O, Zettlemoyer L (2017) Recurrent additive networks, CoRR arXiv:1705.07393

  23. Li C, Vu NT (2018) Densely connected convolutional networks for speech recognition. In: Proceedings of the 13th ITG Symposium on Speech Communication, Oldenburg, Germany, October 10–12, 2018, VDE / IEEE, p 1–5

  24. Li J, Xiong D, Tu Z, Zhu M, Zhang M, Zhou G (2017) Modeling source syntax for neural machine translation. In: Proceedings of the 55th annual meeting of the association for computational linguistics (Volume 1: Long Papers), Association for Computational Linguistics, Vancouver, Canada, p 688–697

  25. Lin H, Meng F, Su J, Yin Y, Yang Z, Ge Y, Zhou J, Luo J (2020) Dynamic context-guided capsule network for multimodal machine translation. In: Chen CW, Cucchiara R, Hua X, Qi G, Ricci E, Zhang Z, Zimmermann R (eds) MM’20: The 28th ACM international conference on multimedia, virtual event / seattle, WA, USA, October 12-16, 2020, ACM, p 1320–1329

  26. Lin J, Ma S, Su Q, Sun X (2018) Decoding-history-based adaptive control of attention for neural machine translation, CoRR arXiv:1802.01812

  27. Lin Z, Feng M, dos Santos CN, Yu M, Xiang B, Zhou B, Bengio Y (2017) A structured self-attentive sentence embedding. In: 5Th international conference on learning representations, ICLR 2017, Toulon, France, april 24-26, 2017 Conference Track Proceedings. OpenReview.net

  28. Liu Y, Lapata M (2018) Learning structured text representations. TACL 6:63–75

    Article  Google Scholar 

  29. Luong T, Pham H, Manning CD (2015) Effective approaches to attention-based neural machine translation. In: Proceedings of the 2015 conference on empirical methods in natural language processing, association for computational linguistics, p 1412–1421

  30. Meng F, Tu Z, Cheng Y, Wu H, Zhai J, Yang Y, Wang D (2018) Neural machine translation with key-value memory-augmented attention. In: Lang J (ed) Proceedings of the twenty-seventh international joint conference on artificial intelligence, IJCAI 2018, July 13-19, 2018, Stockholm, Sweden. ijcai.org, pp 25740–2580

  31. Mi H, Sankaran B, Wang Z, Ittycheriah A (2016) Coverage embedding models for neural machine translation. In: Proceedings of the 2016 conference on empirical methods in natural language Processing Association for Computational Linguistics, p 955–960

  32. Papineni K, Roukos S, Ward T, Zhu W (2002) Bleu: a method for automatic evaluation of machine translation. In: Proceedings of the 40th annual meeting of the association for computational linguistics, July 6-12, 2002, Philadelphia, PA, USA ACL, p 311–318

  33. Pascanu R, Gülçehre Ç, Cho K, Bengio Y (2014) How to construct deep recurrent neural networks. In: Bengio Y, LeCun Y (eds) 2nd International Conference on Learning Representations, ICLR 2014, Banff, AB, Canada, April 14-16, 2014, Conference Track Proceedings

  34. Pereyra G, Tucker G, Chorowski J, Kaiser L, Hinton G E (2017) Regularizing neural networks by penalizing confident output distributions. In: 5Th international conference on learning representations, ICLR 2017, Toulon, France, april 24-26 2017, Workshop Track Proceedings. OpenReview.net

  35. Sennrich R, Haddow B, Birch A (2016) Neural machine translation of rare words with subword units. In: Proceedings of the 54th annual meeting of the association for computational linguistics, ACL 2016, August 7-12, 2016, Berlin, Germany, Volume 1: Long Papers, The Association for Computer Linguistics

  36. Su J, Tan Z, Xiong D, Ji R, Shi X, Liu Y (2017) Lattice-based recurrent neural network encoders for neural machine translation. In: Singh SP, Markovitch S (eds) Proceedings of the Thirty-First AAAI Conference on Artificial Intelligence, February 4-9, 2017, San Francisco, California, USA, AAAI Press, p 3302– 3308

  37. Su J, Wu S, Xiong D, Lu Y, Han X, Zhang B (2018) Variational recurrent neural machine translation. In: Proceedings of the Thirty-Second AAAI Conference on Artificial Intelligence, (AAAI-18), the 30th innovative Applications of Artificial Intelligence (IAAI-18), and the 8th AAAI Symposium on Educational Advances in Artificial Intelligence (EAAI-18), New Orleans, Louisiana, USA, February 2-7, 2018, p 5488–5495

  38. Su J, Wu S, Zhang B, Wu C, Qin Y, Xiong D (2018) A neural generative autoencoder for bilingual word embeddings. Inf Sci 424:287–300

    Article  Google Scholar 

  39. Su J, Zeng J, Xiong D, Liu Y, Wang M, Xie J (2018) A hierarchy-to-sequence attentional neural machine translation model. IEEE ACM Trans Audio Speech Lang Process 26(3):623–632

    Article  Google Scholar 

  40. Sutskever I, Vinyals O, Le QV (2014) Sequence to sequence learning with neural networks. In: Advances in neural information processing systems, p 3104–3112

  41. Szegedy C, Ioffe S, Vanhoucke V, Alemi AA (2017) Inception-v4, inception-resnet and the impact of residual connections on learning. In: Proceedings of the Thirty-First AAAI Conference on Artificial Intelligence, February 4-9, 2017, San Francisco, California, USA., p 4278–4284

  42. Tang J, Qu M, Wang M, Zhang M, Yan J, Mei Q (2015) LINE: Large-scale information network embedding. In: Gangemi A, Leonardi S, Panconesi A (eds) Proceedings of the 24th International Conference on World Wide Web, WWW 2015, ACM, pp 1067–1077

  43. Tu Z, Lu Z, Liu Y, Liu X, Li H (2016) Modeling coverage for neural machine translation. In: Proceedings of the 54th annual meeting of the association for computational linguistics, ACL 2016, August 7-12, 2016, Berlin, Germany, Volume 1: Long Papers, The Association for Computer Linguistics

  44. Tu Z, Liu Y, Lu Z, Liu X, Li H (2017) Context gates for neural machine translation. TACL 5:87–99

    Article  Google Scholar 

  45. Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez AN, Kaiser Ł, Polosukhin I (2017) Attention is all you need. In: Advances in Neural Information Processing Systems, p 6000–6010

  46. Wang C (2017) RRA: Recurrent residual attention for sequence learning, CoRR arXiv:1709.03714

  47. Wang G, Ying R, Huang J, Leskovec J (2020) Direct multi-hop attention based graph neural network. CoRR arXiv:2009.14332

  48. Wang M, Lu Z, Li H, Liu Q (2016) Memory-enhanced decoder for neural machine translation. In: Proceedings of the 2016 conference on empirical methods in natural language processing association for computational linguistics, pp 278–286

  49. Wang M, Lu Z, Zhou J, Liu Q (2017) Deep neural machine translation with linear associative unit. In: Barzilay R, Kan M (eds) Proceedings of the 55th annual meeting of the association for computational linguistics, ACL 2017, Vancouver, Canada, July 30 - August 4, Volume 1: Long papers, association for computational linguistics, pp 136–145

  50. Wang M, Xie J, Tan Z, Su J, Xiong D, Bian C (2018) Neural machine translation with decoding history enhanced attention. In: Proceedings of the 27th international conference on computational linguistics, association for computational linguistics, p 1464–1473

  51. Wang M, Xie J, Tan Z, Su J, Xiong D, Bian C (2019) Towards linear time neural machine translation with capsule networks. In: Inui K, Jiang J, Ng V,Wan X (eds) Proceedings of the 2019 conference on empirical methods in natural language processing and the 9th international joint conference on natural language processing, EMNLP-IJCNLP 2019, Hong Kong, China, November 3-7, 2019, Association for Computational Linguistics, pp 803–812

  52. Wang T, Cho K (2016) Larger-context language modelling with recurrent neural network. In: Proceedings of the 54th annual meeting of the association for computational linguistics (Volume 1: Long Papers), association for computational linguistics, Berlin, Germany, pp 1319–1329

  53. Wang Y, Wang L, Li VOK, Tu Z (2020) On the sparsity of neural machine translation models. In: Webber B, Cohn T, He Y, Liu Y (eds) Proceedings of the 2020 conference on empirical methods in natural language processing, EMNLP 2020, Online, November 16-20, 2020, association for computational linguistics, p 1060–1066

  54. Weng R, Yu H, Wei X, Luo W (2020) Towards enhancing faithfulness for neural machine translation. In: Webber B, Cohn T, He Y, Liu Y (eds) Proceedings of the 2020 conference on empirical methods in natural language processing, EMNLP 2020, Online, November 16-20, 2020, Association for Computational Linguistics, pp 2675–2684

  55. Werlen LM, Pappas N, Ram D, Popescu-belis A (2018) Self-attentive residual decoder for neural machine translation. In: Proceedings of the 2018 conference of the north american chapter of the association for computational linguistics: human language technologies, NAACL-HLT 2018, New Orleans, Louisiana, USA, June 1-6, 2018 vol 1 Long Papers, p 1366–1379

  56. Weston J, Chopra S, Bordes A (2014) Memory networks. CoRR arXiv:1410.3916

  57. Xia Y, Tian F, Qin T, Yu N, Liu T (2017) Sequence generation with target attention. In: Machine learning and knowledge discovery in databases - european conference, ECML PKDD 2017, Skopje, Macedonia, september 18-22, 2017 Proceedings, Part I, pp 816–831

  58. Xiao F, Li J, Zhao H, Wang R, Chen K (2019) Lattice-based transformer encoder for neural machine translation. In: Korhonen A, Traum DR, Màrquez L (eds) Proceedings of the 57th Conference of the Association for Computational Linguistics, ACL 2019, Florence, Italy, July 28- August 2, 2019, Volume 1: Long Papers, Association for Computational Linguistics, pp 3090–3097

  59. Yang J, Zhang B, Qin Y, Zhang X, Lin Q, Su J (2018) Otem&utem: Over- and under-translation evaluation metric for NMT. In: Zhang M, Ng V, Zhao D, Li S, Zan H (eds) Natural language processing and chinese computing - 7th CCF international conference, NLPCC 2018, Hohhot, China, august 26-30, 2018, proceedings, Part I, Springer, Lecture Notes in Computer Science, vol 11108, pp 291–302

  60. Yin Y, Meng F, Su J, Zhou C, Yang Z, Zhou J, Luo J (2020) A novel graph-based multi-modal fusion encoder for neural machine translation. In: Jurafsky D, Chai J, Schluter N, Tetreault JR (eds) Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, ACL 2020, Online, July 5-10, 2020, Association for Computational Linguistics, pp 3025–3035

  61. Zhang B, Xiong D, Su J, Duan H, Zhang M (2016) Variational neural machine translation. In: Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing, Association for Computational Linguistics, Austin, Texas, pp 521–530

  62. Zhang B, Xiong D, Su J (2018) Accelerating neural transformer via an average attention network. In: Gurevych I, Miyao Y (eds) Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics, ACL 2018, Melbourne, Australia, July 15-20, 2018, Volume 1: Long Papers, Association for Computational Linguistics, pp 1789–1798

  63. Zhang B, Xiong D, Xie J, Su J (2020) Neural machine translation with gru-gated attention model. IEEE Trans Neural Networks Learn Syst 31(11):4688–4698

    Article  Google Scholar 

  64. Zhang J, Zong C (2016) Exploiting source-side monolingual data in neural machine translation. In: Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing, Association for Computational Linguistics, Austin, Texas, pp 1535–1545

  65. Zhang J, Wang M, Liu Q, Zhou J (2017) Incorporating word reordering knowledge into attention-based neural machine translation. In: Barzilay R, Kan M (eds) Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics, ACL 2017, Vancouver, Canada, July 30 - August 4, Volume 1: Long Papers, Association for Computational Linguistics, pp 1524–1534

  66. Zhang W, Hu J, Feng Y, Liu Q (2018) Refining source representations with relation networks for neural machine translation. In: Bender EM, Derczynski L, Isabelle P (eds) Proceedings of the 27th International Conference on Computational Linguistics, COLING 2018, Santa Fe, New Mexico, USA, August 20-26, 2018, Association for Computational Linguistics, p 1292–1303

  67. Zhang X, Su J, Qin Y, Liu Y, Ji R, Wang H (2018) Asynchronous bidirectional decoding for neural machine translation. In: McIlraith SA, Weinberger KQ (eds) Proceedings of the Thirty-Second AAAI Conference on Artificial Intelligence, (AAAI-18), the 30th innovative Applications of Artificial Intelligence (IAAI-18), and the 8th AAAI Symposium on Educational Advances in Artificial Intelligence (EAAI-18), New Orleans, Louisiana, USA, February 2-7, 2018, AAAI Press, p 5698–5705

  68. Zheng Z, Zhou H, Huang S, Mou L, Dai X, Chen J, Tu Z (2018) Modeling past and future for neural machine translation. TACL 6:145–157

    Article  Google Scholar 

  69. Zhou J, Cao Y, Wang X, Li P, Xu W (2016) Deep recurrent models with fast-forward connections for neural machine translation. TACL 4:371–383

    Article  Google Scholar 

Download references

Acknowledgements

The authors was supported by National Key Research and Development Program of China (No. 2020AAA0108004), National Natural Science Foundation of China (No. 61672440), Natural Science Foundation of Fujian Province of China (No.2020J06001), Youth Innovation Fund of Xiamen (Grant No. 3502Z20206059), and Industry-University-Research Project of Xiamen City (3502Z20203002). Fei Long and Jinsong Su are corresponding authors. We also thank the anonymous reviewers for their insightful comments.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Fei Long.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Kang, L., He, S., Wang, M. et al. Bilingual attention based neural machine translation. Appl Intell 53, 4302–4315 (2023). https://doi.org/10.1007/s10489-022-03563-8

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10489-022-03563-8

Keywords

Navigation