research-article

Accelerating Transformer for Neural Machine Translation

Authors:

Hong QuAuthors Info & Claims

ICMLC '21: Proceedings of the 2021 13th International Conference on Machine Learning and Computing

Pages 191 - 197

https://doi.org/10.1145/3457682.3457711

Published: 21 June 2021 Publication History

Abstract

Neural Machine Translation (NMT) models based on Transformer achieve promising progress in both translation quality and training speed. Such a strong framework adopts parallel structures that greatly improve the decoding speed without losing quality. However, due to the self-attention network in decoder that cannot maintain the parallelization under the auto-regressive scheme, the Transformer did not enjoy the same speed performance as training when inference. In this work, with simplicity and feasibility in mind, we introduce a gated cumulative attention network to replace the self-attention part in Transformer decoder to maintain the parallelization property in the inference phase. The gated cumulative attention network includes two sub-layers, a gated linearly cumulative layer that creates the relationship between already predicted tokens and current representation, and a feature fusion layer that enhances the representation with a feature fusion operation. The proposed method was evaluated on WMT17 datasets with 12 language pair groups. Experimental results show the effectiveness of the proposed method and also demonstrated that the proposed gated cumulative attention network has adequate ability as an alternative to the self-attention part in the Transformer decoder.

References

[1]

F. Aqlan, X. Fan, A. Alqwbani, and A. Al-Mansoub. 2019. Arabic–Chinese Neural Machine Translation: Romanized Arabic as Subword Unit for Arabic-sourced Translation. IEEE Access 7 (2019), 133122–133135.

[2]

Dzmitry Bahdanau, Kyunghyun Cho, and Yoshua Bengio. 2015. Neural Machine Translation by Jointly Learning to Align and Translate. In 3rd International Conference on Learning Representations, ICLR, Yoshua Bengio and Yann LeCun (Eds.).SanDiego,CA,USA. http://arxiv.org/abs/1409.0473

[3]

Denny Britz, Anna Goldie, Minh-Thang Luong, and Quoc Le. 2017. Massive exploration of neural machine translation architectures. arXiv preprint arXiv:1703.03906 (2017).

[4]

Kyunghyun Cho, Bart van Merriënboer, Caglar Gulcehre, Dzmitry Bahdanau, Fethi Bougares, Holger Schwenk, and Yoshua Bengio. 2014. Learning Phrase Representations using RNN Encoder–Decoder for Statistical Machine Translation. In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing, EMNLP. Association for ComputationalLinguistics,Doha,Qatar,1724–1734. https://doi.org/10.3115/v1/D14-1179

[5]

Yann N. Dauphin, Angela Fan, Michael Auli, and David Grangier. 2017. Language Modeling with Gated Convolutional Networks. In Proceedings of the 34th International Conference on Machine Learning, ICML (Proceedings of Machine Learning Research, Vol. 70), Doina Precup and Yee Whye Teh (Eds.). PMLR, International Convention Centre, Sydney, Australia,933–941. http://proceedings.mlr.press/v70/dauphin17a.html

[6]

Jonas Gehring, Michael Auli, David Grangier, Denis Yarats, and Yann N Dauphin. 2017. Convolutional sequence to sequence learning. arXiv preprint arXiv:1705.03122 (2017).

[7]

Jiatao Gu, James Bradbury, Caiming Xiong, Victor O.K. Li, and Richard Socher. 2018. Non-Autoregressive Neural Machine Translation. In International Conference on Learning Representations. https://openreview.net/forum?id= B1l8BtlCb

[8]

Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition, CVPR. 770–778.

[9]

Sepp Hochreiter and Jürgen Schmidhuber. 1997. Long short-term memory. Neural computation 9, 8 (1997), 1735–1780.

Digital Library

[10]

Nal Kalchbrenner, Lasse Espeholt, Karen Simonyan, Aaron van den Oord, Alex Graves, and Koray Kavukcuoglu. 2016. Neural machine translation in linear time. arXiv preprint arXiv:1610.10099 (2016).

[11]

H. Li, J. Sha, and C. Shi. 2020. Revisiting Back-Translation for Low-Resource Machine Translation Between Chinese and Vietnamese. IEEE Access 8 (2020), 119931–119939.

[12]

Zhouhan Lin, Minwei Feng, Cicero Nogueira dos Santos, Mo Yu, Bing Xiang, Bowen Zhou, and Yoshua Bengio. 2017. A structured self-attentive sentence embedding. In Proceedings of the 5th International Conference on Learning Representations, ICLR. Toulon, France.

[13]

Thang Luong, Hieu Pham, and Christopher D. Manning. 2015. Effective Approaches to Attention-based Neural Machine Translation. In Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, EMNLP. AssociationforComputationalLinguistics,Lisbon,Portugal,1412–1421. https://doi.org/10.18653/v1/D15-1166

[14]

Rico Sennrich, Barry Haddow, and Alexandra Birch. 2016. Improving Neural Machine Translation Models with Monolingual Data. In Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: LongPapers).AssociationforComputationalLinguistics,Berlin,Germany,86–96. https://doi.org/10.18653/v1/P16-1009

[15]

Peter Shaw, Jakob Uszkoreit, and Ashish Vaswani. 2018. Self-Attention with Relative Position Representations. In Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 2 (Short Papers), ACL. Association for Computational Linguistics, New Orleans, Louisiana, 464–468. https://doi.org/10.18653/v1/N18-2074

[16]

Ilya Sutskever, Oriol Vinyals, and Quoc V Le. 2014. Sequence to Sequence Learning with Neural Networks. In Advances in Neural Information Processing Systems 27, NeurIPS, Z. Ghahramani, M. Welling, C. Cortes, N. D. Lawrence, and K. Q. Weinberger(Eds.).CurranAssociates,Inc.,Montréal,Canada,3104–3112. http://papers.nips.cc/paper/5346-sequence- to-sequence-learning-with-neural-networks.pdf

[17]

Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Ł ukasz Kaiser, and Illia Polosukhin. 2017. Attention is All you Need. In Advances in Neural Information Processing Systems 30, NeurIPS, I. Guyon, U. V. Luxburg, S. Bengio, H. Wallach, R. Fergus, S. Vishwanathan, and R. Garnett (Eds.). Curran Associates, Inc.,5998–6008. http://papers.nips.cc/paper/7181-attention-is-all-you-need.pdf

[18]

YonghuiWu,MikeSchuster,ZhifengChen,QuocVLe,MohammadNorouzi,WolfgangMacherey,MaximKrikun,Yuan Cao, Qin Gao, Klaus Macherey, 2016. Google's neural machine translation system: Bridging the gap between human and machine translation. arXiv preprint arXiv:1609.08144 (2016).

[19]

Yingce Xia, Tianyu He, Xu Tan, Fei Tian, Di He, and Tao Qin. 2019. Tied Transformers : Neural Machine Translation with Shared Encoder and Decoder. In Proceedings of the Thirty-Third AAAI Conference on Artificial Intelligence (AAAI-19), Vol. 05.

Digital Library

[20]

Baosong Yang, Jian Li, Derek F Wong, Lidia S Chao, Xing Wang, and Zhaopeng Tu. 2019. Context-aware self-attention networks. In Proceedings of the thirty-Third AAAI Conference on Artificial Intelligence, AAAI. Honolulu, Hawaii, USA.

Digital Library

[21]

Biao Zhang, Deyi Xiong, and Jinsong Su. 2017. A GRU-gated attention model for neural machine translation. arXiv preprint arXiv:1704.08430 (2017).

[22]

Biao Zhang, Deyi Xiong, and Jinsong Su. 2018. Accelerating Neural Transformer via an Average Attention Network. In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), ACL. AssociationforComputationalLinguistics,Melbourne,Australia,1789–1798. https://doi.org/10.18653/v1/P18-1166

[23]

Jiacheng Zhang, Yanzhuo Ding, Shiqi Shen, Yong Cheng, Maosong Sun, Huanbo Luan, and Yang Liu. 2017. Thumt: An open source toolkit for neural machine translation. arXiv preprint arXiv:1706.06415 (2017).

[24]

Geumcheol Kim and Sang-Hong Lee, "Comparison of Korean Preprocessing Performance according to Tokenizer in NMT Transforme Model," Journal of Advances in Information Technology, Vol. 11, No. 4, pp. 228-232, November 2020.

Cited By

Ma Q(2024)Research on English–Chinese machine translation shift based on word vector similarityArtificial Life and Robotics10.1007/s10015-024-00964-529:4(585-589)Online publication date: 16-Sep-2024
https://doi.org/10.1007/s10015-024-00964-5

Recommendations

A Neural Network Classifier Based on Dependency Tree for English-Vietnamese Statistical Machine Translation
Computational Linguistics and Intelligent Text Processing
Abstract
Reordering in MT is a major challenge when translating between languages with different of sentence structures. In Phrase-based statistical machine translation (PBSMT) systems, syntactic pre-ordering is a commonly used pre-processing technique. ...
Neural Machine Translation of Indian Languages
Compute '17: Proceedings of the 10th Annual ACM India Compute Conference

Neural Machine Translation (NMT) is a new technique for machine translation that has led to remarkable improvements compared to rule-based and statistical machine translation (SMT) techniques, by overcoming many of the weaknesses in the conventional ...
Approach of English to Sanskrit machine translation based on case-based reasoning, artificial neural networks and translation rules

In Machine Translation (MT), we reuse past translation that is encoded into a set of cases, where case is the input sentence and its corresponding translation. A case which is similar to the input sentence will be retrieved and a solution is produced by ...

Comments

Information & Contributors

Information

Published In

cover image ACM Other conferences

ICMLC '21: Proceedings of the 2021 13th International Conference on Machine Learning and Computing

February 2021

601 pages

ISBN:9781450389310

DOI:10.1145/3457682

Copyright © 2021 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 21 June 2021

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article
Research
Refereed limited

Conference

ICMLC 2021

ICMLC 2021: 2021 13th International Conference on Machine Learning and Computing

February 26 - March 1, 2021

Shenzhen, China

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

1
Total Citations
View Citations
121
Total Downloads

Downloads (Last 12 months)21
Downloads (Last 6 weeks)0

Reflects downloads up to 10 Feb 2025

Other Metrics

View Author Metrics

Citations

Cited By

Ma Q(2024)Research on English–Chinese machine translation shift based on word vector similarityArtificial Life and Robotics10.1007/s10015-024-00964-529:4(585-589)Online publication date: 16-Sep-2024
https://doi.org/10.1007/s10015-024-00964-5

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

HTML Format

View this article in HTML Format.

Figures

Tables

Media

View Table of Conten