Abstract
The currently predominant token-to-token attention mechanism has demonstrated its ability to capture word dependencies in neural machine translation. This mechanism treats a sequence as bag-of-words tokens and compute the similarity between tokens without considering their intrinsic interactions. In this paper, we argue that this attention mechanism may miss opportunity of take advantage of the state information through multiple time steps. Thus, we propose a Gated State Network which manipulates the state information flow with sequential characteristics. We also incorporate a Focal Adaptive Attention Network which utilizes a Gaussian distribution to concentrate the attention distribution to a predicted focal position and its neighborhood. Experimental results on WMT’14 English–German and WMT’17 Chinese–English translation tasks demonstrate the effectiveness of the proposed approach.
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs00521-021-06444-2/MediaObjects/521_2021_6444_Fig1_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs00521-021-06444-2/MediaObjects/521_2021_6444_Fig2_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs00521-021-06444-2/MediaObjects/521_2021_6444_Fig3_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs00521-021-06444-2/MediaObjects/521_2021_6444_Fig4_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs00521-021-06444-2/MediaObjects/521_2021_6444_Fig5_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs00521-021-06444-2/MediaObjects/521_2021_6444_Fig6_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs00521-021-06444-2/MediaObjects/521_2021_6444_Fig7_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs00521-021-06444-2/MediaObjects/521_2021_6444_Fig8_HTML.png)
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.References
Bahdanau D, Cho K, Bengio Y (2015) Neural machine translation by jointly learning to align and translate. In: Bengio Y, LeCun Y (eds) 3rd international conference on learning representations. ICLR. San Diego, CA, USA
Britz D, Goldie A, Luong MT, Le Q (2017) Massive exploration of neural machine translation architectures. arXiv preprint arXiv:1703.03906
Chen MX, Firat O, Bapna A, Johnson M, Macherey W, Foster G, Jones L, Schuster M, Shazeer N, Parmar N, Vaswani A, Uszkoreit J, Kaiser L, Chen Z, Wu Y, Hughes M (2018) The best of both worlds: combining recent advances in neural machine translation. In: Proceedings of the 56th annual meeting of the association for computational linguistics, vol 1, Long Papers, ACL, Association for Computational Linguistics, Melbourne, Australia, pp 76–86
Chen MY, Chiang HS, Sangaiah AK, Hsieh TC (2020) Recurrent neural network with attention mechanism for language model. Neural Comput Appl 32:7915
Cho K, van Merriënboer B, Gulcehre C, Bahdanau D, Bougares F, Schwenk H, Bengio Y (2014) Learning phrase representations using RNN encoder–decoder for statistical machine translation. In: Proceedings of the 2014 conference on empirical methods in natural language processing, EMNLP, Association for Computational Linguistics, Doha, Qatar, pp 1724–1734
Dauphin YN, Fan A, Auli M, Grangier D (2017) Language modeling with gated convolutional networks. In: Precup D, Teh YW (eds) Proceedings of the 34th international conference on machine learning, ICML, vol 70. PMLR, International Convention Centre, Sydney, Australia, pp 933–941
Deng H, Zhang L, Wang L (2019) Global context-dependent recurrent neural network language model with sparse feature learning. Neural Comput Appl 31(2):999–1011
Diao Y, Lin H, Yang L, Fan X, Chu Y, Wu D, Zhang D, Xu K (2020) CRHASum: extractive text summarization with contextualized-representation hierarchical-attention summarization network. Neural Comput Appl 32:11491
Fu MS, Qu H, Moges D, Lu L (2018) Attention based collaborative filtering. Neurocomputing 311:88–98
Gao Y, Wang Y, Liu L, Guo Y, Huang H (2020) Neural abstractive summarization fusing by global generative topics. Neural Comput Appl 32(9):5049–5058
Gehring J, Auli M, Grangier D, Yarats D, Dauphin YN (2017) Convolutional sequence to sequence learning. arXiv preprint arXiv:1705.03122
He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, CVPR, pp 770–778
He K, Zhang X, Ren S, Sun J (2016) Identity mappings in deep residual networks. In: Leibe B, Matas J, Sebe N, Welling M (eds.) Computer Vision—ECCV 2016—14th European conference, Amsterdam, The Netherlands, October 11–14, 2016, Proceedings, Part IV, vol 9908. Springer, pp 630–645
Hochreiter S, Schmidhuber J (1997) Long short-term memory. Neural Comput 9(8):1735–1780
Ji Qing-dao-er R, Su YL, Liu WW (2020) Research on the LSTM Mongolian and Chinese machine translation based on morpheme encoding. Neural Comput Appl 32(1):41–49
Kalchbrenner N, Espeholt L, Simonyan K, Oord Avd, Graves A, Kavukcuoglu K (2016) Neural machine translation in linear time. arXiv preprint arXiv:1610.10099
Lin Z, Feng M, Santos CNd, Yu M, Xiang B, Zhou B, Bengio Y (2017) A structured self-attentive sentence embedding. In: Proceedings of the 5th international conference on learning representations, ICLR, Toulon, France
Liu X, Wang J, Yin M, Edwards B, Xu P (2017) Supervised learning of sparse context reconstruction coefficients for data representation and classification. Neural Comput Appl 28(1):135–143
Luong T, Pham H, Manning CD (2015) Effective approaches to attention-based neural machine translation. In: Proceedings of the 2015 conference on empirical methods in natural language processing, EMNLP, Association for Computational Linguistics, Lisbon, Portugal, pp 1412–1421
Pathak A, Pakray P, Bentham J (2019) English-Mizo Machine Translation using neural and statistical approaches. Neural Comput Appl 31(11):7615–7631
Peng R, Chen Z, Hao T, Fang Y (2019) Neural machine translation with attention based on a new syntactic branch distance. In: Huang S, Knight K (eds) Communications in computer and information science, vol 1104. Springer, Singapore, pp 47–57
Shaw P, Uszkoreit J, Vaswani A (2018) Self-attention with relative position representations. In: Proceedings of the 2018 conference of the North American chapter of the association for computational linguistics: human language technologies, vol 2: Short Papers. ACL, Association for Computational Linguistics, New Orleans, Louisiana, pp 464–468
Sutskever I, Vinyals O, Le QV (2014) Sequence to sequence learning with neural networks. In: Ghahramani Z, Welling M, Cortes C, Lawrence ND, Weinberger KQ (eds) Advances in Neural Information Processing Systems 27, NeurIPS. Curran Associates Inc, Montréal, Canada, pp 3104–3112
Tu Z, Liu Y, Lu Z, Liu X, Li H (2017) Context gates for neural machine translation. Trans Assoc Comput Linguist 5:87–99
Tu Z, Lu Z, Liu Y, Liu X, Li H (2016) Modeling coverage for neural machine translation. In: Proceedings of the 54th annual meeting of the association for computational linguistics, vol 1, Long Papers. ACL, Association for Computational Linguistics, Berlin, Germany, pp 76–85
Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez AN, Kaiser L, Polosukhin I (2017) Attention is all you need. In: Guyon I, Luxburg UV, Bengio S, Wallach H, Fergus R, Vishwanathan S, Garnett R (eds.) Advances in neural information processing systems 30, NeurIPS, Curran Associates, Inc, pp 5998–6008
Yang B, Li J, Wong DF, Chao LS, Wang X, Tu Z (2019) Context-aware self-attention networks. In: Proceedings of the thirty-third AAAI conference on artificial intelligence, AAAI. Honolulu, Hawaii, USA
Yang B, Tu Z, Wong DF, Meng F, Chao LS, Zhang T (2018) Modeling localness for self-attention networks. In: Proceedings of the 2018 conference on empirical methods in natural language processing, EMNLP, Association for Computational Linguistics, Brussels, Belgium, pp 4449–4458
Yang B, Wang L, Wong DF, Chao LS, Tu Z (2019) Convolutional self-attention networks. In: Proceedings of the 2019 conference of the North American chapter of the association for computational linguistics: human language technologies, vol 1: Long and Short Papers. Association for Computational Linguistics, Minneapolis, Minnesota, pp 4040–4045
Zhang B, Xiong D, Su J (2018) Accelerating neural transformer via an average attention network. In: Proceedings of the 56th annual meeting of the association for computational linguistics, vol 1: Long Papers. ACL, Association for Computational Linguistics, Melbourne, Australia, pp 1789–1798
Zhang B, Xiong D, Su J (2020) Neural machine translation with deep attention. IEEE Trans Pattern Anal Mach Intell 42(1):154–163
Zhang B, Xiong D, Su J, Duan H (2017) A context-aware recurrent encoder for neural machine translation. IEEE/ACM Trans Audio Speech Lang Process 25(12):2424–2432
Zhang B, Xiong D, Xie J, Su J (2020) Neural machine translation with GRU-gated attention model. IEEE Trans Neural Netw Learn Syst 31:4688–4698
Zhang H, Li J, Ji Y, Yue H (2017) Understanding subtitles by character-level sequence-to-sequence learning. IEEE Trans Ind Inf 13(2):616–624
Zhang J, Ding Y, Shen S, Cheng Y, Sun M, Luan H, Liu Y (2017) Thumt: an open source toolkit for neural machine translation. arXiv preprint arXiv:1706.06415
Zhou Y, Tian L, Zhu C, Jin X, Sun Y (2020) Video coding optimization for virtual reality 360-degree source. IEEE J Sel Top Signal Process 14(1):118–129
Acknowledgements
This work was partially supported by the National Key Research and Development Program of China under Grant 2018AAA0100202 and the National Science Foundation of China under Grants 61976043.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
The authors declare that there is no known conflict of interest associated with this manuscript, and there has been no significant financial support for this work that could have influenced its outcome.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Huang, L., Chen, W., Liu, Y. et al. Improving neural machine translation using gated state network and focal adaptive attention networtk. Neural Comput & Applic 33, 15955–15967 (2021). https://doi.org/10.1007/s00521-021-06444-2
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00521-021-06444-2