Skip to main content
Log in

Improving neural machine translation using gated state network and focal adaptive attention networtk

  • Review
  • Published:
Neural Computing and Applications Aims and scope Submit manuscript

Abstract

The currently predominant token-to-token attention mechanism has demonstrated its ability to capture word dependencies in neural machine translation. This mechanism treats a sequence as bag-of-words tokens and compute the similarity between tokens without considering their intrinsic interactions. In this paper, we argue that this attention mechanism may miss opportunity of take advantage of the state information through multiple time steps. Thus, we propose a Gated State Network which manipulates the state information flow with sequential characteristics. We also incorporate a Focal Adaptive Attention Network which utilizes a Gaussian distribution to concentrate the attention distribution to a predicted focal position and its neighborhood. Experimental results on WMT’14 English–German and WMT’17 Chinese–English translation tasks demonstrate the effectiveness of the proposed approach.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8

Similar content being viewed by others

Notes

  1. https://pypi.org/project/mosestokenizer/.

  2. https://github.com/fxsjy/jieba.

  3. https://github.com/THUNLP-MT/THUMT.

  4. https://pypi.org/project/sacrebleu/.

  5. https://translate.google.cn/.

References

  1. Bahdanau D, Cho K, Bengio Y (2015) Neural machine translation by jointly learning to align and translate. In: Bengio Y, LeCun Y (eds) 3rd international conference on learning representations. ICLR. San Diego, CA, USA

  2. Britz D, Goldie A, Luong MT, Le Q (2017) Massive exploration of neural machine translation architectures. arXiv preprint arXiv:1703.03906

  3. Chen MX, Firat O, Bapna A, Johnson M, Macherey W, Foster G, Jones L, Schuster M, Shazeer N, Parmar N, Vaswani A, Uszkoreit J, Kaiser L, Chen Z, Wu Y, Hughes M (2018) The best of both worlds: combining recent advances in neural machine translation. In: Proceedings of the 56th annual meeting of the association for computational linguistics, vol 1, Long Papers, ACL, Association for Computational Linguistics, Melbourne, Australia, pp 76–86

  4. Chen MY, Chiang HS, Sangaiah AK, Hsieh TC (2020) Recurrent neural network with attention mechanism for language model. Neural Comput Appl 32:7915

    Article  Google Scholar 

  5. Cho K, van Merriënboer B, Gulcehre C, Bahdanau D, Bougares F, Schwenk H, Bengio Y (2014) Learning phrase representations using RNN encoder–decoder for statistical machine translation. In: Proceedings of the 2014 conference on empirical methods in natural language processing, EMNLP, Association for Computational Linguistics, Doha, Qatar, pp 1724–1734

  6. Dauphin YN, Fan A, Auli M, Grangier D (2017) Language modeling with gated convolutional networks. In: Precup D, Teh YW (eds) Proceedings of the 34th international conference on machine learning, ICML, vol 70. PMLR, International Convention Centre, Sydney, Australia, pp 933–941

  7. Deng H, Zhang L, Wang L (2019) Global context-dependent recurrent neural network language model with sparse feature learning. Neural Comput Appl 31(2):999–1011

    Article  Google Scholar 

  8. Diao Y, Lin H, Yang L, Fan X, Chu Y, Wu D, Zhang D, Xu K (2020) CRHASum: extractive text summarization with contextualized-representation hierarchical-attention summarization network. Neural Comput Appl 32:11491

    Article  Google Scholar 

  9. Fu MS, Qu H, Moges D, Lu L (2018) Attention based collaborative filtering. Neurocomputing 311:88–98

    Article  Google Scholar 

  10. Gao Y, Wang Y, Liu L, Guo Y, Huang H (2020) Neural abstractive summarization fusing by global generative topics. Neural Comput Appl 32(9):5049–5058

    Article  Google Scholar 

  11. Gehring J, Auli M, Grangier D, Yarats D, Dauphin YN (2017) Convolutional sequence to sequence learning. arXiv preprint arXiv:1705.03122

  12. He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, CVPR, pp 770–778

  13. He K, Zhang X, Ren S, Sun J (2016) Identity mappings in deep residual networks. In: Leibe B, Matas J, Sebe N, Welling M (eds.) Computer Vision—ECCV 2016—14th European conference, Amsterdam, The Netherlands, October 11–14, 2016, Proceedings, Part IV, vol 9908. Springer, pp 630–645

  14. Hochreiter S, Schmidhuber J (1997) Long short-term memory. Neural Comput 9(8):1735–1780

    Article  Google Scholar 

  15. Ji Qing-dao-er R, Su YL, Liu WW (2020) Research on the LSTM Mongolian and Chinese machine translation based on morpheme encoding. Neural Comput Appl 32(1):41–49

  16. Kalchbrenner N, Espeholt L, Simonyan K, Oord Avd, Graves A, Kavukcuoglu K (2016) Neural machine translation in linear time. arXiv preprint arXiv:1610.10099

  17. Lin Z, Feng M, Santos CNd, Yu M, Xiang B, Zhou B, Bengio Y (2017) A structured self-attentive sentence embedding. In: Proceedings of the 5th international conference on learning representations, ICLR, Toulon, France

  18. Liu X, Wang J, Yin M, Edwards B, Xu P (2017) Supervised learning of sparse context reconstruction coefficients for data representation and classification. Neural Comput Appl 28(1):135–143

    Article  Google Scholar 

  19. Luong T, Pham H, Manning CD (2015) Effective approaches to attention-based neural machine translation. In: Proceedings of the 2015 conference on empirical methods in natural language processing, EMNLP, Association for Computational Linguistics, Lisbon, Portugal, pp 1412–1421

  20. Pathak A, Pakray P, Bentham J (2019) English-Mizo Machine Translation using neural and statistical approaches. Neural Comput Appl 31(11):7615–7631

    Article  Google Scholar 

  21. Peng R, Chen Z, Hao T, Fang Y (2019) Neural machine translation with attention based on a new syntactic branch distance. In: Huang S, Knight K (eds) Communications in computer and information science, vol 1104. Springer, Singapore, pp 47–57

    Google Scholar 

  22. Shaw P, Uszkoreit J, Vaswani A (2018) Self-attention with relative position representations. In: Proceedings of the 2018 conference of the North American chapter of the association for computational linguistics: human language technologies, vol 2: Short Papers. ACL, Association for Computational Linguistics, New Orleans, Louisiana, pp 464–468

  23. Sutskever I, Vinyals O, Le QV (2014) Sequence to sequence learning with neural networks. In: Ghahramani Z, Welling M, Cortes C, Lawrence ND, Weinberger KQ (eds) Advances in Neural Information Processing Systems 27, NeurIPS. Curran Associates Inc, Montréal, Canada, pp 3104–3112

    Google Scholar 

  24. Tu Z, Liu Y, Lu Z, Liu X, Li H (2017) Context gates for neural machine translation. Trans Assoc Comput Linguist 5:87–99

    Article  Google Scholar 

  25. Tu Z, Lu Z, Liu Y, Liu X, Li H (2016) Modeling coverage for neural machine translation. In: Proceedings of the 54th annual meeting of the association for computational linguistics, vol 1, Long Papers. ACL, Association for Computational Linguistics, Berlin, Germany, pp 76–85

  26. Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez AN, Kaiser L, Polosukhin I (2017) Attention is all you need. In: Guyon I, Luxburg UV, Bengio S, Wallach H, Fergus R, Vishwanathan S, Garnett R (eds.) Advances in neural information processing systems 30, NeurIPS, Curran Associates, Inc, pp 5998–6008

  27. Yang B, Li J, Wong DF, Chao LS, Wang X, Tu Z (2019) Context-aware self-attention networks. In: Proceedings of the thirty-third AAAI conference on artificial intelligence, AAAI. Honolulu, Hawaii, USA

  28. Yang B, Tu Z, Wong DF, Meng F, Chao LS, Zhang T (2018) Modeling localness for self-attention networks. In: Proceedings of the 2018 conference on empirical methods in natural language processing, EMNLP, Association for Computational Linguistics, Brussels, Belgium, pp 4449–4458

  29. Yang B, Wang L, Wong DF, Chao LS, Tu Z (2019) Convolutional self-attention networks. In: Proceedings of the 2019 conference of the North American chapter of the association for computational linguistics: human language technologies, vol 1: Long and Short Papers. Association for Computational Linguistics, Minneapolis, Minnesota, pp 4040–4045

  30. Zhang B, Xiong D, Su J (2018) Accelerating neural transformer via an average attention network. In: Proceedings of the 56th annual meeting of the association for computational linguistics, vol 1: Long Papers. ACL, Association for Computational Linguistics, Melbourne, Australia, pp 1789–1798

  31. Zhang B, Xiong D, Su J (2020) Neural machine translation with deep attention. IEEE Trans Pattern Anal Mach Intell 42(1):154–163

    Article  Google Scholar 

  32. Zhang B, Xiong D, Su J, Duan H (2017) A context-aware recurrent encoder for neural machine translation. IEEE/ACM Trans Audio Speech Lang Process 25(12):2424–2432

    Article  Google Scholar 

  33. Zhang B, Xiong D, Xie J, Su J (2020) Neural machine translation with GRU-gated attention model. IEEE Trans Neural Netw Learn Syst 31:4688–4698

    Article  Google Scholar 

  34. Zhang H, Li J, Ji Y, Yue H (2017) Understanding subtitles by character-level sequence-to-sequence learning. IEEE Trans Ind Inf 13(2):616–624

    Article  Google Scholar 

  35. Zhang J, Ding Y, Shen S, Cheng Y, Sun M, Luan H, Liu Y (2017) Thumt: an open source toolkit for neural machine translation. arXiv preprint arXiv:1706.06415

  36. Zhou Y, Tian L, Zhu C, Jin X, Sun Y (2020) Video coding optimization for virtual reality 360-degree source. IEEE J Sel Top Signal Process 14(1):118–129

    Article  Google Scholar 

Download references

Acknowledgements

This work was partially supported by the National Key Research and Development Program of China under Grant 2018AAA0100202 and the National Science Foundation of China under Grants 61976043.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Hong Qu.

Ethics declarations

Conflict of interest

The authors declare that there is no known conflict of interest associated with this manuscript, and there has been no significant financial support for this work that could have influenced its outcome.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Huang, L., Chen, W., Liu, Y. et al. Improving neural machine translation using gated state network and focal adaptive attention networtk. Neural Comput & Applic 33, 15955–15967 (2021). https://doi.org/10.1007/s00521-021-06444-2

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00521-021-06444-2

Keywords

Navigation