skip to main content
10.1145/3457682.3457711acmotherconferencesArticle/Chapter ViewAbstractPublication PagesicmlcConference Proceedingsconference-collections
research-article

Accelerating Transformer for Neural Machine Translation

Authors Info & Claims
Published:21 June 2021Publication History

ABSTRACT

Neural Machine Translation (NMT) models based on Transformer achieve promising progress in both translation quality and training speed. Such a strong framework adopts parallel structures that greatly improve the decoding speed without losing quality. However, due to the self-attention network in decoder that cannot maintain the parallelization under the auto-regressive scheme, the Transformer did not enjoy the same speed performance as training when inference. In this work, with simplicity and feasibility in mind, we introduce a gated cumulative attention network to replace the self-attention part in Transformer decoder to maintain the parallelization property in the inference phase. The gated cumulative attention network includes two sub-layers, a gated linearly cumulative layer that creates the relationship between already predicted tokens and current representation, and a feature fusion layer that enhances the representation with a feature fusion operation. The proposed method was evaluated on WMT17 datasets with 12 language pair groups. Experimental results show the effectiveness of the proposed method and also demonstrated that the proposed gated cumulative attention network has adequate ability as an alternative to the self-attention part in the Transformer decoder.

References

  1. F. Aqlan, X. Fan, A. Alqwbani, and A. Al-Mansoub. 2019. Arabic–Chinese Neural Machine Translation: Romanized Arabic as Subword Unit for Arabic-sourced Translation. IEEE Access 7 (2019), 133122–133135.Google ScholarGoogle ScholarCross RefCross Ref
  2. Dzmitry Bahdanau, Kyunghyun Cho, and Yoshua Bengio. 2015. Neural Machine Translation by Jointly Learning to Align and Translate. In 3rd International Conference on Learning Representations, ICLR, Yoshua Bengio and Yann LeCun (Eds.).SanDiego,CA,USA. http://arxiv.org/abs/1409.0473Google ScholarGoogle Scholar
  3. Denny Britz, Anna Goldie, Minh-Thang Luong, and Quoc Le. 2017. Massive exploration of neural machine translation architectures. arXiv preprint arXiv:1703.03906 (2017).Google ScholarGoogle Scholar
  4. Kyunghyun Cho, Bart van Merriënboer, Caglar Gulcehre, Dzmitry Bahdanau, Fethi Bougares, Holger Schwenk, and Yoshua Bengio. 2014. Learning Phrase Representations using RNN Encoder–Decoder for Statistical Machine Translation. In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing, EMNLP. Association for ComputationalLinguistics,Doha,Qatar,1724–1734. https://doi.org/10.3115/v1/D14-1179Google ScholarGoogle ScholarCross RefCross Ref
  5. Yann N. Dauphin, Angela Fan, Michael Auli, and David Grangier. 2017. Language Modeling with Gated Convolutional Networks. In Proceedings of the 34th International Conference on Machine Learning, ICML (Proceedings of Machine Learning Research, Vol. 70), Doina Precup and Yee Whye Teh (Eds.). PMLR, International Convention Centre, Sydney, Australia,933–941. http://proceedings.mlr.press/v70/dauphin17a.htmlGoogle ScholarGoogle Scholar
  6. Jonas Gehring, Michael Auli, David Grangier, Denis Yarats, and Yann N Dauphin. 2017. Convolutional sequence to sequence learning. arXiv preprint arXiv:1705.03122 (2017).Google ScholarGoogle Scholar
  7. Jiatao Gu, James Bradbury, Caiming Xiong, Victor O.K. Li, and Richard Socher. 2018. Non-Autoregressive Neural Machine Translation. In International Conference on Learning Representations. https://openreview.net/forum?id= B1l8BtlCbGoogle ScholarGoogle Scholar
  8. Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition, CVPR. 770–778.Google ScholarGoogle ScholarCross RefCross Ref
  9. Sepp Hochreiter and Jürgen Schmidhuber. 1997. Long short-term memory. Neural computation 9, 8 (1997), 1735–1780.Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Nal Kalchbrenner, Lasse Espeholt, Karen Simonyan, Aaron van den Oord, Alex Graves, and Koray Kavukcuoglu. 2016. Neural machine translation in linear time. arXiv preprint arXiv:1610.10099 (2016).Google ScholarGoogle Scholar
  11. H. Li, J. Sha, and C. Shi. 2020. Revisiting Back-Translation for Low-Resource Machine Translation Between Chinese and Vietnamese. IEEE Access 8 (2020), 119931–119939.Google ScholarGoogle ScholarCross RefCross Ref
  12. Zhouhan Lin, Minwei Feng, Cicero Nogueira dos Santos, Mo Yu, Bing Xiang, Bowen Zhou, and Yoshua Bengio. 2017. A structured self-attentive sentence embedding. In Proceedings of the 5th International Conference on Learning Representations, ICLR. Toulon, France.Google ScholarGoogle Scholar
  13. Thang Luong, Hieu Pham, and Christopher D. Manning. 2015. Effective Approaches to Attention-based Neural Machine Translation. In Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, EMNLP. AssociationforComputationalLinguistics,Lisbon,Portugal,1412–1421. https://doi.org/10.18653/v1/D15-1166Google ScholarGoogle Scholar
  14. Rico Sennrich, Barry Haddow, and Alexandra Birch. 2016. Improving Neural Machine Translation Models with Monolingual Data. In Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: LongPapers).AssociationforComputationalLinguistics,Berlin,Germany,86–96. https://doi.org/10.18653/v1/P16-1009Google ScholarGoogle ScholarCross RefCross Ref
  15. Peter Shaw, Jakob Uszkoreit, and Ashish Vaswani. 2018. Self-Attention with Relative Position Representations. In Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 2 (Short Papers), ACL. Association for Computational Linguistics, New Orleans, Louisiana, 464–468. https://doi.org/10.18653/v1/N18-2074Google ScholarGoogle ScholarCross RefCross Ref
  16. Ilya Sutskever, Oriol Vinyals, and Quoc V Le. 2014. Sequence to Sequence Learning with Neural Networks. In Advances in Neural Information Processing Systems 27, NeurIPS, Z. Ghahramani, M. Welling, C. Cortes, N. D. Lawrence, and K. Q. Weinberger(Eds.).CurranAssociates,Inc.,Montréal,Canada,3104–3112. http://papers.nips.cc/paper/5346-sequence- to-sequence-learning-with-neural-networks.pdfGoogle ScholarGoogle Scholar
  17. Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Ł ukasz Kaiser, and Illia Polosukhin. 2017. Attention is All you Need. In Advances in Neural Information Processing Systems 30, NeurIPS, I. Guyon, U. V. Luxburg, S. Bengio, H. Wallach, R. Fergus, S. Vishwanathan, and R. Garnett (Eds.). Curran Associates, Inc.,5998–6008. http://papers.nips.cc/paper/7181-attention-is-all-you-need.pdfGoogle ScholarGoogle Scholar
  18. YonghuiWu,MikeSchuster,ZhifengChen,QuocVLe,MohammadNorouzi,WolfgangMacherey,MaximKrikun,Yuan Cao, Qin Gao, Klaus Macherey, 2016. Google's neural machine translation system: Bridging the gap between human and machine translation. arXiv preprint arXiv:1609.08144 (2016).Google ScholarGoogle Scholar
  19. Yingce Xia, Tianyu He, Xu Tan, Fei Tian, Di He, and Tao Qin. 2019. Tied Transformers : Neural Machine Translation with Shared Encoder and Decoder. In Proceedings of the Thirty-Third AAAI Conference on Artificial Intelligence (AAAI-19), Vol. 05.Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Baosong Yang, Jian Li, Derek F Wong, Lidia S Chao, Xing Wang, and Zhaopeng Tu. 2019. Context-aware self-attention networks. In Proceedings of the thirty-Third AAAI Conference on Artificial Intelligence, AAAI. Honolulu, Hawaii, USA.Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Biao Zhang, Deyi Xiong, and Jinsong Su. 2017. A GRU-gated attention model for neural machine translation. arXiv preprint arXiv:1704.08430 (2017).Google ScholarGoogle Scholar
  22. Biao Zhang, Deyi Xiong, and Jinsong Su. 2018. Accelerating Neural Transformer via an Average Attention Network. In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), ACL. AssociationforComputationalLinguistics,Melbourne,Australia,1789–1798. https://doi.org/10.18653/v1/P18-1166Google ScholarGoogle ScholarCross RefCross Ref
  23. Jiacheng Zhang, Yanzhuo Ding, Shiqi Shen, Yong Cheng, Maosong Sun, Huanbo Luan, and Yang Liu. 2017. Thumt: An open source toolkit for neural machine translation. arXiv preprint arXiv:1706.06415 (2017).Google ScholarGoogle Scholar
  24. Geumcheol Kim and Sang-Hong Lee , "Comparison of Korean Preprocessing Performance according to Tokenizer in NMT Transforme Model," Journal of Advances in Information Technology, Vol. 11, No. 4, pp. 228-232, November 2020. doi: 10.12720/jait.11.4.228-232Google ScholarGoogle ScholarCross RefCross Ref

Recommendations

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Sign in
  • Published in

    cover image ACM Other conferences
    ICMLC '21: Proceedings of the 2021 13th International Conference on Machine Learning and Computing
    February 2021
    601 pages
    ISBN:9781450389310
    DOI:10.1145/3457682

    Copyright © 2021 ACM

    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    • Published: 21 June 2021

    Permissions

    Request permissions about this article.

    Request Permissions

    Check for updates

    Qualifiers

    • research-article
    • Research
    • Refereed limited
  • Article Metrics

    • Downloads (Last 12 months)14
    • Downloads (Last 6 weeks)1

    Other Metrics

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format .

View HTML Format