Improving neural machine translation using gated state network and focal adaptive attention networtk

Huang, Li; Chen, Wenyu; Liu, Yuguo; Zhang, He; Qu, Hong

doi:10.1007/s00521-021-06444-2

Improving neural machine translation using gated state network and focal adaptive attention networtk

Review
Published: 04 September 2021

Volume 33, pages 15955–15967, (2021)
Cite this article

Neural Computing and Applications Aims and scope Submit manuscript

Li Huang¹,
Wenyu Chen¹,
Yuguo Liu¹,
He Zhang² &
…
Hong Qu ORCID: orcid.org/0000-0001-6114-3441¹

We’re sorry, something doesn't seem to be working properly.

Please try refreshing the page. If that doesn't work, please contact support so we can address the problem.

Abstract

The currently predominant token-to-token attention mechanism has demonstrated its ability to capture word dependencies in neural machine translation. This mechanism treats a sequence as bag-of-words tokens and compute the similarity between tokens without considering their intrinsic interactions. In this paper, we argue that this attention mechanism may miss opportunity of take advantage of the state information through multiple time steps. Thus, we propose a Gated State Network which manipulates the state information flow with sequential characteristics. We also incorporate a Focal Adaptive Attention Network which utilizes a Gaussian distribution to concentrate the attention distribution to a predicted focal position and its neighborhood. Experimental results on WMT’14 English–German and WMT’17 Chinese–English translation tasks demonstrate the effectiveness of the proposed approach.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Neural Machine Translation with Attention Based on a New Syntactic Branch Distance

Cross Aggregation of Multi-head Attention for Neural Machine Translation

Gated Self-attentive Encoder for Neural Machine Translation

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

Notes

References

Bahdanau D, Cho K, Bengio Y (2015) Neural machine translation by jointly learning to align and translate. In: Bengio Y, LeCun Y (eds) 3rd international conference on learning representations. ICLR. San Diego, CA, USA
Britz D, Goldie A, Luong MT, Le Q (2017) Massive exploration of neural machine translation architectures. arXiv preprint arXiv:1703.03906
Chen MX, Firat O, Bapna A, Johnson M, Macherey W, Foster G, Jones L, Schuster M, Shazeer N, Parmar N, Vaswani A, Uszkoreit J, Kaiser L, Chen Z, Wu Y, Hughes M (2018) The best of both worlds: combining recent advances in neural machine translation. In: Proceedings of the 56th annual meeting of the association for computational linguistics, vol 1, Long Papers, ACL, Association for Computational Linguistics, Melbourne, Australia, pp 76–86
Chen MY, Chiang HS, Sangaiah AK, Hsieh TC (2020) Recurrent neural network with attention mechanism for language model. Neural Comput Appl 32:7915
Article Google Scholar
Cho K, van Merriënboer B, Gulcehre C, Bahdanau D, Bougares F, Schwenk H, Bengio Y (2014) Learning phrase representations using RNN encoder–decoder for statistical machine translation. In: Proceedings of the 2014 conference on empirical methods in natural language processing, EMNLP, Association for Computational Linguistics, Doha, Qatar, pp 1724–1734
Dauphin YN, Fan A, Auli M, Grangier D (2017) Language modeling with gated convolutional networks. In: Precup D, Teh YW (eds) Proceedings of the 34th international conference on machine learning, ICML, vol 70. PMLR, International Convention Centre, Sydney, Australia, pp 933–941
Deng H, Zhang L, Wang L (2019) Global context-dependent recurrent neural network language model with sparse feature learning. Neural Comput Appl 31(2):999–1011
Article Google Scholar
Diao Y, Lin H, Yang L, Fan X, Chu Y, Wu D, Zhang D, Xu K (2020) CRHASum: extractive text summarization with contextualized-representation hierarchical-attention summarization network. Neural Comput Appl 32:11491
Article Google Scholar
Fu MS, Qu H, Moges D, Lu L (2018) Attention based collaborative filtering. Neurocomputing 311:88–98
Article Google Scholar
Gao Y, Wang Y, Liu L, Guo Y, Huang H (2020) Neural abstractive summarization fusing by global generative topics. Neural Comput Appl 32(9):5049–5058
Article Google Scholar
Gehring J, Auli M, Grangier D, Yarats D, Dauphin YN (2017) Convolutional sequence to sequence learning. arXiv preprint arXiv:1705.03122
He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, CVPR, pp 770–778
He K, Zhang X, Ren S, Sun J (2016) Identity mappings in deep residual networks. In: Leibe B, Matas J, Sebe N, Welling M (eds.) Computer Vision—ECCV 2016—14th European conference, Amsterdam, The Netherlands, October 11–14, 2016, Proceedings, Part IV, vol 9908. Springer, pp 630–645
Hochreiter S, Schmidhuber J (1997) Long short-term memory. Neural Comput 9(8):1735–1780
Article Google Scholar
Ji Qing-dao-er R, Su YL, Liu WW (2020) Research on the LSTM Mongolian and Chinese machine translation based on morpheme encoding. Neural Comput Appl 32(1):41–49
Kalchbrenner N, Espeholt L, Simonyan K, Oord Avd, Graves A, Kavukcuoglu K (2016) Neural machine translation in linear time. arXiv preprint arXiv:1610.10099
Lin Z, Feng M, Santos CNd, Yu M, Xiang B, Zhou B, Bengio Y (2017) A structured self-attentive sentence embedding. In: Proceedings of the 5th international conference on learning representations, ICLR, Toulon, France
Liu X, Wang J, Yin M, Edwards B, Xu P (2017) Supervised learning of sparse context reconstruction coefficients for data representation and classification. Neural Comput Appl 28(1):135–143
Article Google Scholar
Luong T, Pham H, Manning CD (2015) Effective approaches to attention-based neural machine translation. In: Proceedings of the 2015 conference on empirical methods in natural language processing, EMNLP, Association for Computational Linguistics, Lisbon, Portugal, pp 1412–1421
Pathak A, Pakray P, Bentham J (2019) English-Mizo Machine Translation using neural and statistical approaches. Neural Comput Appl 31(11):7615–7631
Article Google Scholar
Peng R, Chen Z, Hao T, Fang Y (2019) Neural machine translation with attention based on a new syntactic branch distance. In: Huang S, Knight K (eds) Communications in computer and information science, vol 1104. Springer, Singapore, pp 47–57
Google Scholar
Shaw P, Uszkoreit J, Vaswani A (2018) Self-attention with relative position representations. In: Proceedings of the 2018 conference of the North American chapter of the association for computational linguistics: human language technologies, vol 2: Short Papers. ACL, Association for Computational Linguistics, New Orleans, Louisiana, pp 464–468
Sutskever I, Vinyals O, Le QV (2014) Sequence to sequence learning with neural networks. In: Ghahramani Z, Welling M, Cortes C, Lawrence ND, Weinberger KQ (eds) Advances in Neural Information Processing Systems 27, NeurIPS. Curran Associates Inc, Montréal, Canada, pp 3104–3112
Google Scholar
Tu Z, Liu Y, Lu Z, Liu X, Li H (2017) Context gates for neural machine translation. Trans Assoc Comput Linguist 5:87–99
Article Google Scholar
Tu Z, Lu Z, Liu Y, Liu X, Li H (2016) Modeling coverage for neural machine translation. In: Proceedings of the 54th annual meeting of the association for computational linguistics, vol 1, Long Papers. ACL, Association for Computational Linguistics, Berlin, Germany, pp 76–85
Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez AN, Kaiser L, Polosukhin I (2017) Attention is all you need. In: Guyon I, Luxburg UV, Bengio S, Wallach H, Fergus R, Vishwanathan S, Garnett R (eds.) Advances in neural information processing systems 30, NeurIPS, Curran Associates, Inc, pp 5998–6008
Yang B, Li J, Wong DF, Chao LS, Wang X, Tu Z (2019) Context-aware self-attention networks. In: Proceedings of the thirty-third AAAI conference on artificial intelligence, AAAI. Honolulu, Hawaii, USA
Yang B, Tu Z, Wong DF, Meng F, Chao LS, Zhang T (2018) Modeling localness for self-attention networks. In: Proceedings of the 2018 conference on empirical methods in natural language processing, EMNLP, Association for Computational Linguistics, Brussels, Belgium, pp 4449–4458
Yang B, Wang L, Wong DF, Chao LS, Tu Z (2019) Convolutional self-attention networks. In: Proceedings of the 2019 conference of the North American chapter of the association for computational linguistics: human language technologies, vol 1: Long and Short Papers. Association for Computational Linguistics, Minneapolis, Minnesota, pp 4040–4045
Zhang B, Xiong D, Su J (2018) Accelerating neural transformer via an average attention network. In: Proceedings of the 56th annual meeting of the association for computational linguistics, vol 1: Long Papers. ACL, Association for Computational Linguistics, Melbourne, Australia, pp 1789–1798
Zhang B, Xiong D, Su J (2020) Neural machine translation with deep attention. IEEE Trans Pattern Anal Mach Intell 42(1):154–163
Article Google Scholar
Zhang B, Xiong D, Su J, Duan H (2017) A context-aware recurrent encoder for neural machine translation. IEEE/ACM Trans Audio Speech Lang Process 25(12):2424–2432
Article Google Scholar
Zhang B, Xiong D, Xie J, Su J (2020) Neural machine translation with GRU-gated attention model. IEEE Trans Neural Netw Learn Syst 31:4688–4698
Article Google Scholar
Zhang H, Li J, Ji Y, Yue H (2017) Understanding subtitles by character-level sequence-to-sequence learning. IEEE Trans Ind Inf 13(2):616–624
Article Google Scholar
Zhang J, Ding Y, Shen S, Cheng Y, Sun M, Luan H, Liu Y (2017) Thumt: an open source toolkit for neural machine translation. arXiv preprint arXiv:1706.06415
Zhou Y, Tian L, Zhu C, Jin X, Sun Y (2020) Video coding optimization for virtual reality 360-degree source. IEEE J Sel Top Signal Process 14(1):118–129
Article Google Scholar

Download references

Acknowledgements

This work was partially supported by the National Key Research and Development Program of China under Grant 2018AAA0100202 and the National Science Foundation of China under Grants 61976043.

Author information

Authors and Affiliations

School of Computer Science and Engineering, University of Electronic Science and Technology of China, Chengdu, 611731, China
Li Huang, Wenyu Chen, Yuguo Liu & Hong Qu
Tencent Big Data Product Center of CSIG, Chengdu, 610094, China
He Zhang

Authors

Li Huang
View author publications
You can also search for this author in PubMed Google Scholar
Wenyu Chen
View author publications
You can also search for this author in PubMed Google Scholar
Yuguo Liu
View author publications
You can also search for this author in PubMed Google Scholar
He Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Hong Qu
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Hong Qu.

Ethics declarations

Conflict of interest

The authors declare that there is no known conflict of interest associated with this manuscript, and there has been no significant financial support for this work that could have influenced its outcome.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Huang, L., Chen, W., Liu, Y. et al. Improving neural machine translation using gated state network and focal adaptive attention networtk. Neural Comput & Applic 33, 15955–15967 (2021). https://doi.org/10.1007/s00521-021-06444-2

Download citation

Received: 11 June 2020
Accepted: 17 August 2021
Published: 04 September 2021
Issue Date: December 2021
DOI: https://doi.org/10.1007/s00521-021-06444-2

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Improving neural machine translation using gated state network and focal adaptive attention networtk

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Neural Machine Translation with Attention Based on a New Syntactic Branch Distance

Cross Aggregation of Multi-head Attention for Neural Machine Translation

Gated Self-attentive Encoder for Neural Machine Translation

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Subscribe and save

Buy Now

Navigation

Improving neural machine translation using gated state network and focal adaptive attention networtk

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Neural Machine Translation with Attention Based on a New Syntactic Branch Distance

Cross Aggregation of Multi-head Attention for Neural Machine Translation

Gated Self-attentive Encoder for Neural Machine Translation

Explore related subjects

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now

Search

Navigation