skip to main content
10.1145/3457682.3457711acmotherconferencesArticle/Chapter ViewAbstractPublication PagesicmlcConference Proceedingsconference-collections
research-article

Accelerating Transformer for Neural Machine Translation

Published: 21 June 2021 Publication History

Abstract

Neural Machine Translation (NMT) models based on Transformer achieve promising progress in both translation quality and training speed. Such a strong framework adopts parallel structures that greatly improve the decoding speed without losing quality. However, due to the self-attention network in decoder that cannot maintain the parallelization under the auto-regressive scheme, the Transformer did not enjoy the same speed performance as training when inference. In this work, with simplicity and feasibility in mind, we introduce a gated cumulative attention network to replace the self-attention part in Transformer decoder to maintain the parallelization property in the inference phase. The gated cumulative attention network includes two sub-layers, a gated linearly cumulative layer that creates the relationship between already predicted tokens and current representation, and a feature fusion layer that enhances the representation with a feature fusion operation. The proposed method was evaluated on WMT17 datasets with 12 language pair groups. Experimental results show the effectiveness of the proposed method and also demonstrated that the proposed gated cumulative attention network has adequate ability as an alternative to the self-attention part in the Transformer decoder.

References

[1]
F. Aqlan, X. Fan, A. Alqwbani, and A. Al-Mansoub. 2019. Arabic–Chinese Neural Machine Translation: Romanized Arabic as Subword Unit for Arabic-sourced Translation. IEEE Access 7 (2019), 133122–133135.
[2]
Dzmitry Bahdanau, Kyunghyun Cho, and Yoshua Bengio. 2015. Neural Machine Translation by Jointly Learning to Align and Translate. In 3rd International Conference on Learning Representations, ICLR, Yoshua Bengio and Yann LeCun (Eds.).SanDiego,CA,USA. http://arxiv.org/abs/1409.0473
[3]
Denny Britz, Anna Goldie, Minh-Thang Luong, and Quoc Le. 2017. Massive exploration of neural machine translation architectures. arXiv preprint arXiv:1703.03906 (2017).
[4]
Kyunghyun Cho, Bart van Merriënboer, Caglar Gulcehre, Dzmitry Bahdanau, Fethi Bougares, Holger Schwenk, and Yoshua Bengio. 2014. Learning Phrase Representations using RNN Encoder–Decoder for Statistical Machine Translation. In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing, EMNLP. Association for ComputationalLinguistics,Doha,Qatar,1724–1734. https://doi.org/10.3115/v1/D14-1179
[5]
Yann N. Dauphin, Angela Fan, Michael Auli, and David Grangier. 2017. Language Modeling with Gated Convolutional Networks. In Proceedings of the 34th International Conference on Machine Learning, ICML (Proceedings of Machine Learning Research, Vol. 70), Doina Precup and Yee Whye Teh (Eds.). PMLR, International Convention Centre, Sydney, Australia,933–941. http://proceedings.mlr.press/v70/dauphin17a.html
[6]
Jonas Gehring, Michael Auli, David Grangier, Denis Yarats, and Yann N Dauphin. 2017. Convolutional sequence to sequence learning. arXiv preprint arXiv:1705.03122 (2017).
[7]
Jiatao Gu, James Bradbury, Caiming Xiong, Victor O.K. Li, and Richard Socher. 2018. Non-Autoregressive Neural Machine Translation. In International Conference on Learning Representations. https://openreview.net/forum?id= B1l8BtlCb
[8]
Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition, CVPR. 770–778.
[9]
Sepp Hochreiter and Jürgen Schmidhuber. 1997. Long short-term memory. Neural computation 9, 8 (1997), 1735–1780.
[10]
Nal Kalchbrenner, Lasse Espeholt, Karen Simonyan, Aaron van den Oord, Alex Graves, and Koray Kavukcuoglu. 2016. Neural machine translation in linear time. arXiv preprint arXiv:1610.10099 (2016).
[11]
H. Li, J. Sha, and C. Shi. 2020. Revisiting Back-Translation for Low-Resource Machine Translation Between Chinese and Vietnamese. IEEE Access 8 (2020), 119931–119939.
[12]
Zhouhan Lin, Minwei Feng, Cicero Nogueira dos Santos, Mo Yu, Bing Xiang, Bowen Zhou, and Yoshua Bengio. 2017. A structured self-attentive sentence embedding. In Proceedings of the 5th International Conference on Learning Representations, ICLR. Toulon, France.
[13]
Thang Luong, Hieu Pham, and Christopher D. Manning. 2015. Effective Approaches to Attention-based Neural Machine Translation. In Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, EMNLP. AssociationforComputationalLinguistics,Lisbon,Portugal,1412–1421. https://doi.org/10.18653/v1/D15-1166
[14]
Rico Sennrich, Barry Haddow, and Alexandra Birch. 2016. Improving Neural Machine Translation Models with Monolingual Data. In Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: LongPapers).AssociationforComputationalLinguistics,Berlin,Germany,86–96. https://doi.org/10.18653/v1/P16-1009
[15]
Peter Shaw, Jakob Uszkoreit, and Ashish Vaswani. 2018. Self-Attention with Relative Position Representations. In Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 2 (Short Papers), ACL. Association for Computational Linguistics, New Orleans, Louisiana, 464–468. https://doi.org/10.18653/v1/N18-2074
[16]
Ilya Sutskever, Oriol Vinyals, and Quoc V Le. 2014. Sequence to Sequence Learning with Neural Networks. In Advances in Neural Information Processing Systems 27, NeurIPS, Z. Ghahramani, M. Welling, C. Cortes, N. D. Lawrence, and K. Q. Weinberger(Eds.).CurranAssociates,Inc.,Montréal,Canada,3104–3112. http://papers.nips.cc/paper/5346-sequence- to-sequence-learning-with-neural-networks.pdf
[17]
Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Ł ukasz Kaiser, and Illia Polosukhin. 2017. Attention is All you Need. In Advances in Neural Information Processing Systems 30, NeurIPS, I. Guyon, U. V. Luxburg, S. Bengio, H. Wallach, R. Fergus, S. Vishwanathan, and R. Garnett (Eds.). Curran Associates, Inc.,5998–6008. http://papers.nips.cc/paper/7181-attention-is-all-you-need.pdf
[18]
YonghuiWu,MikeSchuster,ZhifengChen,QuocVLe,MohammadNorouzi,WolfgangMacherey,MaximKrikun,Yuan Cao, Qin Gao, Klaus Macherey, 2016. Google's neural machine translation system: Bridging the gap between human and machine translation. arXiv preprint arXiv:1609.08144 (2016).
[19]
Yingce Xia, Tianyu He, Xu Tan, Fei Tian, Di He, and Tao Qin. 2019. Tied Transformers : Neural Machine Translation with Shared Encoder and Decoder. In Proceedings of the Thirty-Third AAAI Conference on Artificial Intelligence (AAAI-19), Vol. 05.
[20]
Baosong Yang, Jian Li, Derek F Wong, Lidia S Chao, Xing Wang, and Zhaopeng Tu. 2019. Context-aware self-attention networks. In Proceedings of the thirty-Third AAAI Conference on Artificial Intelligence, AAAI. Honolulu, Hawaii, USA.
[21]
Biao Zhang, Deyi Xiong, and Jinsong Su. 2017. A GRU-gated attention model for neural machine translation. arXiv preprint arXiv:1704.08430 (2017).
[22]
Biao Zhang, Deyi Xiong, and Jinsong Su. 2018. Accelerating Neural Transformer via an Average Attention Network. In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), ACL. AssociationforComputationalLinguistics,Melbourne,Australia,1789–1798. https://doi.org/10.18653/v1/P18-1166
[23]
Jiacheng Zhang, Yanzhuo Ding, Shiqi Shen, Yong Cheng, Maosong Sun, Huanbo Luan, and Yang Liu. 2017. Thumt: An open source toolkit for neural machine translation. arXiv preprint arXiv:1706.06415 (2017).
[24]
Geumcheol Kim and Sang-Hong Lee, "Comparison of Korean Preprocessing Performance according to Tokenizer in NMT Transforme Model," Journal of Advances in Information Technology, Vol. 11, No. 4, pp. 228-232, November 2020.

Cited By

View all
  • (2024)Research on English–Chinese machine translation shift based on word vector similarityArtificial Life and Robotics10.1007/s10015-024-00964-529:4(585-589)Online publication date: 16-Sep-2024

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Other conferences
ICMLC '21: Proceedings of the 2021 13th International Conference on Machine Learning and Computing
February 2021
601 pages
ISBN:9781450389310
DOI:10.1145/3457682
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 21 June 2021

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. Machine Translation
  2. Natural Language Processing
  3. Neural Networks

Qualifiers

  • Research-article
  • Research
  • Refereed limited

Conference

ICMLC 2021

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)21
  • Downloads (Last 6 weeks)0
Reflects downloads up to 10 Feb 2025

Other Metrics

Citations

Cited By

View all
  • (2024)Research on English–Chinese machine translation shift based on word vector similarityArtificial Life and Robotics10.1007/s10015-024-00964-529:4(585-589)Online publication date: 16-Sep-2024

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format.

HTML Format

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media