Skip to main content
Log in

Improving Neural Machine Translation Model with Deep Encoding Information

  • Published:
Cognitive Computation Aims and scope Submit manuscript

Abstract

Availability of very high computational power along with the development of deep neural network (DNN) technology has enabled rapid progress of machine translation technology. The powerful representation ability of the deep neural network also enables the neural machine translation technology (NMT) to exploit the available large-scale bilingual parallel corpus as well as the computing power to provide a highly effective translation model. Nevertheless, the existing neural machine translation models only utilize the top layer encoder information, whereas the information available in deeper encoding layers is often ignored. This significantly constrains the performance of the translation model. To address this issue, in this paper, we propose a novel neural machine translation model which can fully exploit the deep encoding information. The core idea is to use different ways of aggregating the information from different encoding layers. We further design three different aggregation strategies including parallel layer, multi-layer, and dynamic layer encoding information aggregations. Three translation models are correspondingly trained and compared with the baseline transformer model for the Chinese-to-English translation task. The experimental results indicate that the BLEU-4 score of the proposed model has been increased by 0.89 compared with that of the benchmark model. Experiments demonstrate the effectiveness of the proposed method.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7

Similar content being viewed by others

Explore related subjects

Discover the latest articles, news and stories from top researchers in related subjects.

References

  1. Kalchbrenner N, Blunsom P. Recurrent continuous translation models. Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing. 2013.pp.1700–1709.

  2. Sutskever I, Vinyals O, Le Q V. Sequence to sequence learning with neural networks. Advances in Neural Information Processing Systems. 2014.pp.3104–3112.

  3. Chua LO, Roska T. The CNN paradigm. IEEE Transactions on Circuits and Systems I: Fundamental Theory and Applications. 1993;40(3):147–56.

    Article  Google Scholar 

  4. Zhang W, Feng Y, Meng F, et al. Bridging the gap between training and inference for neural machine translation. 57th Annual Meeting of the Association for Computational Linguistics. 2019.pp.4334–4343.

  5. Cho K, Van Merrienboer B, Gulcehre C, et al. Learning phrase representations using RNN encoder-decoder for statistical machine translation. 2014 Conference on Empirical Methods in Natural Language Processing. 2014.pp.1724–1734.

  6. Bengio Y, Simard P, Frasconi P. Learning long-term dependencies with gradient descent is difficult. IEEE Trans Neural Networks. 1994;5(2):157–66.

    Article  Google Scholar 

  7. Bahdanau D, Cho K, Bengio Y. Neural machine translation by jointly learning to align and translate. 3rd International Conference on Learning Representations. 2015.

  8. Luong M T, Pham H, Manning C D. Effective approaches to attention-based neural machine translation. Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing. 2015.pp.1412–1421.

  9. Gehring J, Auli M, Grangier D, et al. A convolutional encoder model for neural machine translation. Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics. 2017.pp.123–135.

  10. Gehring J, Auli M, Grangier D, et al. Convolutional sequence to sequence learning. Proceedings of the 34th International Conference on Machine Learning. 2017.pp.1243–1252.

  11. Vaswani A, Shazeer N, Parmar N, et al. Attention is all you need. Proceedings of the 31st International Conference on Neural Information Processing Systems. 2017.pp.6000–6010.

  12. Zhang B, Xiong D, Su J. Neural machine translation with deep attention. IEEE Trans Pattern Anal Mach Intell. 2020;42(1):154–63.

    Article  Google Scholar 

  13. Zhou J, Cao Y, Wang X, et al. Deep recurrent models with fast-forward connections for neural machine translation. Transactions of the Association for Computational Linguistics. 2016;4:371–83.

    Article  Google Scholar 

  14. Chen K, Wang R, Utiyama M, et al. Towards more diverse input representation for neural machine translation. IEEE/ACM Transactions on Audio, Speech, and Language Processing. 2020;28:1586–97.

    Article  Google Scholar 

  15. Shi X, Padhi I, Knight K. Does string-based neural MT learn source syntax?. Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing. 2016.pp.1526–1534.

  16. Peters M E, Neumann M, Iyyer M, et al. Deep contextualized word representations. 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2018.pp.2227–2237.

  17. Yang M, Liu S, Chen K, et al. A hierarchical clustering approach to fuzzy semantic representation of rare words in neural machine translation. IEEE Trans Fuzzy Syst. 2020;28(5):1992–1002.

    Article  Google Scholar 

  18. Zhang B, Xiong D, Su J. Accelerating neural transformer via an average attention network. 56th Annual Meeting of the Association for Computational Linguistics. 2018.pp.1789–1798.

  19. He K, Zhang X, Ren S, et al. Deep residual learning for image recognition. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2016.pp.770–778.

  20. Shen Y, Tan X, He D, et al. Dense information flow for neural machine translation. 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2018.pp.1294–1303.

  21. Dou Z Y, Tu Z, Wang X, et al. Exploiting deep representations for neural machine translation. 2018 Conference on Empirical Methods in Natural Language Processing. 2018.pp.4253–4262.

  22. Huang G, Liu Z, Van Der Maaten L, et al. Densely connected convolutional networks. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2017.pp.4700–4708.

  23. Yu F, Wang D, Shelhamer E, et al. Deep layer aggregation. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2018.pp.2403–2412.

  24. Papineni K, Roukos S, Ward T, et al. BLEU: a method for automatic evaluation of machine translation. Proceedings of the 40th Annual Meeting on Association for Computational Linguistics. 2002.pp.311–318.

Download references

Funding

This work was supported by National Natural Science Foundation of China (No. U19A2059) and by Ministry of Science and Technology of Sichuan Province Program (2020YFG0328).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Tianxi Huang.

Ethics declarations

Ethical Approval

This article does not contain any studies with human participants or animals performed by any of the authors.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Duan, G., Yang, H., Qin, K. et al. Improving Neural Machine Translation Model with Deep Encoding Information. Cogn Comput 13, 972–980 (2021). https://doi.org/10.1007/s12559-021-09860-7

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s12559-021-09860-7

Keywords

Navigation