ABSTRACT
Neural Machine Translation (NMT) models based on Transformer achieve promising progress in both translation quality and training speed. Such a strong framework adopts parallel structures that greatly improve the decoding speed without losing quality. However, due to the self-attention network in decoder that cannot maintain the parallelization under the auto-regressive scheme, the Transformer did not enjoy the same speed performance as training when inference. In this work, with simplicity and feasibility in mind, we introduce a gated cumulative attention network to replace the self-attention part in Transformer decoder to maintain the parallelization property in the inference phase. The gated cumulative attention network includes two sub-layers, a gated linearly cumulative layer that creates the relationship between already predicted tokens and current representation, and a feature fusion layer that enhances the representation with a feature fusion operation. The proposed method was evaluated on WMT17 datasets with 12 language pair groups. Experimental results show the effectiveness of the proposed method and also demonstrated that the proposed gated cumulative attention network has adequate ability as an alternative to the self-attention part in the Transformer decoder.
- F. Aqlan, X. Fan, A. Alqwbani, and A. Al-Mansoub. 2019. Arabic–Chinese Neural Machine Translation: Romanized Arabic as Subword Unit for Arabic-sourced Translation. IEEE Access 7 (2019), 133122–133135.Google ScholarCross Ref
- Dzmitry Bahdanau, Kyunghyun Cho, and Yoshua Bengio. 2015. Neural Machine Translation by Jointly Learning to Align and Translate. In 3rd International Conference on Learning Representations, ICLR, Yoshua Bengio and Yann LeCun (Eds.).SanDiego,CA,USA. http://arxiv.org/abs/1409.0473Google Scholar
- Denny Britz, Anna Goldie, Minh-Thang Luong, and Quoc Le. 2017. Massive exploration of neural machine translation architectures. arXiv preprint arXiv:1703.03906 (2017).Google Scholar
- Kyunghyun Cho, Bart van Merriënboer, Caglar Gulcehre, Dzmitry Bahdanau, Fethi Bougares, Holger Schwenk, and Yoshua Bengio. 2014. Learning Phrase Representations using RNN Encoder–Decoder for Statistical Machine Translation. In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing, EMNLP. Association for ComputationalLinguistics,Doha,Qatar,1724–1734. https://doi.org/10.3115/v1/D14-1179Google ScholarCross Ref
- Yann N. Dauphin, Angela Fan, Michael Auli, and David Grangier. 2017. Language Modeling with Gated Convolutional Networks. In Proceedings of the 34th International Conference on Machine Learning, ICML (Proceedings of Machine Learning Research, Vol. 70), Doina Precup and Yee Whye Teh (Eds.). PMLR, International Convention Centre, Sydney, Australia,933–941. http://proceedings.mlr.press/v70/dauphin17a.htmlGoogle Scholar
- Jonas Gehring, Michael Auli, David Grangier, Denis Yarats, and Yann N Dauphin. 2017. Convolutional sequence to sequence learning. arXiv preprint arXiv:1705.03122 (2017).Google Scholar
- Jiatao Gu, James Bradbury, Caiming Xiong, Victor O.K. Li, and Richard Socher. 2018. Non-Autoregressive Neural Machine Translation. In International Conference on Learning Representations. https://openreview.net/forum?id= B1l8BtlCbGoogle Scholar
- Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition, CVPR. 770–778.Google ScholarCross Ref
- Sepp Hochreiter and Jürgen Schmidhuber. 1997. Long short-term memory. Neural computation 9, 8 (1997), 1735–1780.Google ScholarDigital Library
- Nal Kalchbrenner, Lasse Espeholt, Karen Simonyan, Aaron van den Oord, Alex Graves, and Koray Kavukcuoglu. 2016. Neural machine translation in linear time. arXiv preprint arXiv:1610.10099 (2016).Google Scholar
- H. Li, J. Sha, and C. Shi. 2020. Revisiting Back-Translation for Low-Resource Machine Translation Between Chinese and Vietnamese. IEEE Access 8 (2020), 119931–119939.Google ScholarCross Ref
- Zhouhan Lin, Minwei Feng, Cicero Nogueira dos Santos, Mo Yu, Bing Xiang, Bowen Zhou, and Yoshua Bengio. 2017. A structured self-attentive sentence embedding. In Proceedings of the 5th International Conference on Learning Representations, ICLR. Toulon, France.Google Scholar
- Thang Luong, Hieu Pham, and Christopher D. Manning. 2015. Effective Approaches to Attention-based Neural Machine Translation. In Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, EMNLP. AssociationforComputationalLinguistics,Lisbon,Portugal,1412–1421. https://doi.org/10.18653/v1/D15-1166Google Scholar
- Rico Sennrich, Barry Haddow, and Alexandra Birch. 2016. Improving Neural Machine Translation Models with Monolingual Data. In Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: LongPapers).AssociationforComputationalLinguistics,Berlin,Germany,86–96. https://doi.org/10.18653/v1/P16-1009Google ScholarCross Ref
- Peter Shaw, Jakob Uszkoreit, and Ashish Vaswani. 2018. Self-Attention with Relative Position Representations. In Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 2 (Short Papers), ACL. Association for Computational Linguistics, New Orleans, Louisiana, 464–468. https://doi.org/10.18653/v1/N18-2074Google ScholarCross Ref
- Ilya Sutskever, Oriol Vinyals, and Quoc V Le. 2014. Sequence to Sequence Learning with Neural Networks. In Advances in Neural Information Processing Systems 27, NeurIPS, Z. Ghahramani, M. Welling, C. Cortes, N. D. Lawrence, and K. Q. Weinberger(Eds.).CurranAssociates,Inc.,Montréal,Canada,3104–3112. http://papers.nips.cc/paper/5346-sequence- to-sequence-learning-with-neural-networks.pdfGoogle Scholar
- Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Ł ukasz Kaiser, and Illia Polosukhin. 2017. Attention is All you Need. In Advances in Neural Information Processing Systems 30, NeurIPS, I. Guyon, U. V. Luxburg, S. Bengio, H. Wallach, R. Fergus, S. Vishwanathan, and R. Garnett (Eds.). Curran Associates, Inc.,5998–6008. http://papers.nips.cc/paper/7181-attention-is-all-you-need.pdfGoogle Scholar
- YonghuiWu,MikeSchuster,ZhifengChen,QuocVLe,MohammadNorouzi,WolfgangMacherey,MaximKrikun,Yuan Cao, Qin Gao, Klaus Macherey, 2016. Google's neural machine translation system: Bridging the gap between human and machine translation. arXiv preprint arXiv:1609.08144 (2016).Google Scholar
- Yingce Xia, Tianyu He, Xu Tan, Fei Tian, Di He, and Tao Qin. 2019. Tied Transformers : Neural Machine Translation with Shared Encoder and Decoder. In Proceedings of the Thirty-Third AAAI Conference on Artificial Intelligence (AAAI-19), Vol. 05.Google ScholarDigital Library
- Baosong Yang, Jian Li, Derek F Wong, Lidia S Chao, Xing Wang, and Zhaopeng Tu. 2019. Context-aware self-attention networks. In Proceedings of the thirty-Third AAAI Conference on Artificial Intelligence, AAAI. Honolulu, Hawaii, USA.Google ScholarDigital Library
- Biao Zhang, Deyi Xiong, and Jinsong Su. 2017. A GRU-gated attention model for neural machine translation. arXiv preprint arXiv:1704.08430 (2017).Google Scholar
- Biao Zhang, Deyi Xiong, and Jinsong Su. 2018. Accelerating Neural Transformer via an Average Attention Network. In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), ACL. AssociationforComputationalLinguistics,Melbourne,Australia,1789–1798. https://doi.org/10.18653/v1/P18-1166Google ScholarCross Ref
- Jiacheng Zhang, Yanzhuo Ding, Shiqi Shen, Yong Cheng, Maosong Sun, Huanbo Luan, and Yang Liu. 2017. Thumt: An open source toolkit for neural machine translation. arXiv preprint arXiv:1706.06415 (2017).Google Scholar
- Geumcheol Kim and Sang-Hong Lee , "Comparison of Korean Preprocessing Performance according to Tokenizer in NMT Transforme Model," Journal of Advances in Information Technology, Vol. 11, No. 4, pp. 228-232, November 2020. doi: 10.12720/jait.11.4.228-232Google ScholarCross Ref
Recommendations
A Neural Network Classifier Based on Dependency Tree for English-Vietnamese Statistical Machine Translation
Computational Linguistics and Intelligent Text ProcessingAbstractReordering in MT is a major challenge when translating between languages with different of sentence structures. In Phrase-based statistical machine translation (PBSMT) systems, syntactic pre-ordering is a commonly used pre-processing technique. ...
Neural Machine Translation of Indian Languages
Compute '17: Proceedings of the 10th Annual ACM India Compute ConferenceNeural Machine Translation (NMT) is a new technique for machine translation that has led to remarkable improvements compared to rule-based and statistical machine translation (SMT) techniques, by overcoming many of the weaknesses in the conventional ...
Approach of English to Sanskrit machine translation based on case-based reasoning, artificial neural networks and translation rules
In Machine Translation (MT), we reuse past translation that is encoded into a set of cases, where case is the input sentence and its corresponding translation. A case which is similar to the input sentence will be retrieved and a solution is produced by ...
Comments