Abstract
Availability of very high computational power along with the development of deep neural network (DNN) technology has enabled rapid progress of machine translation technology. The powerful representation ability of the deep neural network also enables the neural machine translation technology (NMT) to exploit the available large-scale bilingual parallel corpus as well as the computing power to provide a highly effective translation model. Nevertheless, the existing neural machine translation models only utilize the top layer encoder information, whereas the information available in deeper encoding layers is often ignored. This significantly constrains the performance of the translation model. To address this issue, in this paper, we propose a novel neural machine translation model which can fully exploit the deep encoding information. The core idea is to use different ways of aggregating the information from different encoding layers. We further design three different aggregation strategies including parallel layer, multi-layer, and dynamic layer encoding information aggregations. Three translation models are correspondingly trained and compared with the baseline transformer model for the Chinese-to-English translation task. The experimental results indicate that the BLEU-4 score of the proposed model has been increased by 0.89 compared with that of the benchmark model. Experiments demonstrate the effectiveness of the proposed method.







Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.References
Kalchbrenner N, Blunsom P. Recurrent continuous translation models. Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing. 2013.pp.1700–1709.
Sutskever I, Vinyals O, Le Q V. Sequence to sequence learning with neural networks. Advances in Neural Information Processing Systems. 2014.pp.3104–3112.
Chua LO, Roska T. The CNN paradigm. IEEE Transactions on Circuits and Systems I: Fundamental Theory and Applications. 1993;40(3):147–56.
Zhang W, Feng Y, Meng F, et al. Bridging the gap between training and inference for neural machine translation. 57th Annual Meeting of the Association for Computational Linguistics. 2019.pp.4334–4343.
Cho K, Van Merrienboer B, Gulcehre C, et al. Learning phrase representations using RNN encoder-decoder for statistical machine translation. 2014 Conference on Empirical Methods in Natural Language Processing. 2014.pp.1724–1734.
Bengio Y, Simard P, Frasconi P. Learning long-term dependencies with gradient descent is difficult. IEEE Trans Neural Networks. 1994;5(2):157–66.
Bahdanau D, Cho K, Bengio Y. Neural machine translation by jointly learning to align and translate. 3rd International Conference on Learning Representations. 2015.
Luong M T, Pham H, Manning C D. Effective approaches to attention-based neural machine translation. Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing. 2015.pp.1412–1421.
Gehring J, Auli M, Grangier D, et al. A convolutional encoder model for neural machine translation. Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics. 2017.pp.123–135.
Gehring J, Auli M, Grangier D, et al. Convolutional sequence to sequence learning. Proceedings of the 34th International Conference on Machine Learning. 2017.pp.1243–1252.
Vaswani A, Shazeer N, Parmar N, et al. Attention is all you need. Proceedings of the 31st International Conference on Neural Information Processing Systems. 2017.pp.6000–6010.
Zhang B, Xiong D, Su J. Neural machine translation with deep attention. IEEE Trans Pattern Anal Mach Intell. 2020;42(1):154–63.
Zhou J, Cao Y, Wang X, et al. Deep recurrent models with fast-forward connections for neural machine translation. Transactions of the Association for Computational Linguistics. 2016;4:371–83.
Chen K, Wang R, Utiyama M, et al. Towards more diverse input representation for neural machine translation. IEEE/ACM Transactions on Audio, Speech, and Language Processing. 2020;28:1586–97.
Shi X, Padhi I, Knight K. Does string-based neural MT learn source syntax?. Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing. 2016.pp.1526–1534.
Peters M E, Neumann M, Iyyer M, et al. Deep contextualized word representations. 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2018.pp.2227–2237.
Yang M, Liu S, Chen K, et al. A hierarchical clustering approach to fuzzy semantic representation of rare words in neural machine translation. IEEE Trans Fuzzy Syst. 2020;28(5):1992–1002.
Zhang B, Xiong D, Su J. Accelerating neural transformer via an average attention network. 56th Annual Meeting of the Association for Computational Linguistics. 2018.pp.1789–1798.
He K, Zhang X, Ren S, et al. Deep residual learning for image recognition. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2016.pp.770–778.
Shen Y, Tan X, He D, et al. Dense information flow for neural machine translation. 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2018.pp.1294–1303.
Dou Z Y, Tu Z, Wang X, et al. Exploiting deep representations for neural machine translation. 2018 Conference on Empirical Methods in Natural Language Processing. 2018.pp.4253–4262.
Huang G, Liu Z, Van Der Maaten L, et al. Densely connected convolutional networks. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2017.pp.4700–4708.
Yu F, Wang D, Shelhamer E, et al. Deep layer aggregation. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2018.pp.2403–2412.
Papineni K, Roukos S, Ward T, et al. BLEU: a method for automatic evaluation of machine translation. Proceedings of the 40th Annual Meeting on Association for Computational Linguistics. 2002.pp.311–318.
Funding
This work was supported by National Natural Science Foundation of China (No. U19A2059) and by Ministry of Science and Technology of Sichuan Province Program (2020YFG0328).
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Ethical Approval
This article does not contain any studies with human participants or animals performed by any of the authors.
Rights and permissions
About this article
Cite this article
Duan, G., Yang, H., Qin, K. et al. Improving Neural Machine Translation Model with Deep Encoding Information. Cogn Comput 13, 972–980 (2021). https://doi.org/10.1007/s12559-021-09860-7
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s12559-021-09860-7