research-article

NAT4AT: Using Non-Autoregressive Translation Makes Autoregressive Translation Faster and Better

Authors:

Xiaoling WangAuthors Info & Claims

WWW '24: Proceedings of the ACM Web Conference 2024

Pages 4181 - 4192

https://doi.org/10.1145/3589334.3645527

Published: 13 May 2024 Publication History

Abstract

With the increasing number of web documents, the demand for translation has increased dramatically. Non-autoregressive translation (NAT) models can significantly reduce decoding latency to meet the growing translation needs, but they sacrifice translation quality. And there is still an irreparable performance gap between NAT models and strong autoregressive translation (AT) models at the corpus level. However, more fine-grained comparative experiments on AT and NAT are currently lacking. Therefore, in this paper, we first conducted analysis experiments at the sentence level and found complementarity and high similarity between the translations generated by AT and NAT. Then, based on this observation, we propose a general and effective method called NAT4AT, which can not only use NAT to speed up the inference speed of AT significantly but also improve its final translation quality. Specifically, NAT4AT first uses a NAT model to generate an original translation in parallel and then uses an AT model as a correction model to revise errors in the original translation. In this way, the AT model no longer needs to predict the entire translation but only needs to predict a small number of error parts in the NAT result. Extensive experimental results on major WMT benchmarks verify the generality and effectiveness of our method, whose translation quality is superior to the strong AT model and achieves a 5.0x speedup.

Supplemental Material

MP4 File

video presentation

Download
1181.80 MB

MP4 File

Supplemental video

Download
64.88 MB

References

[1]

Yu Bao, Hao Zhou, Shujian Huang, Dongqi Wang, Lihua Qian, Xinyu Dai, Jiajun Chen, and Lei Li. 2022. GLAT: Glancing at Latent Variables for Parallel Text Generation. In ACL.

[2]

Tom B. Brown, Benjamin Mann, Nick Ryder, Melanie Subbiah, Jared Kaplan, Prafulla Dhariwal, Arvind Neelakantan, Pranav Shyam, Girish Sastry, Amanda Askell, Sandhini Agarwal, Ariel Herbert-Voss, Gretchen Krueger, T. J. Henighan, Rewon Child, Aditya Ramesh, Daniel M. Ziegler, Jeff Wu, Clemens Winter, Christopher Hesse, Mark Chen, Eric Sigler, Mateusz Litwin, Scott Gray, Benjamin Chess, Jack Clark, Christopher Berner, Sam McCandlish, Alec Radford, Ilya Sutskever, and Dario Amodei. 2020. Language Models are Few-Shot Learners. ArXiv, Vol. abs/2005.14165 (2020).

[3]

Cunxiao Du, Zhaopeng Tu, and Jing Jiang. 2021. Order-agnostic cross entropy for non-autoregressive machine translation. arXiv preprint arXiv:2106.05093 (2021).

[4]

Xinwei Geng, Xiaocheng Feng, and Bing Qin. 2021. Learning to Rewrite for Non-Autoregressive Neural Machine Translation. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing. 3297--3308.

[5]

Marjan Ghazvininejad, Omer Levy, Yinhan Liu, and Luke Zettlemoyer. 2019. Mask-predict: Parallel decoding of conditional masked language models. arXiv preprint arXiv:1904.09324 (2019).

[6]

Jiatao Gu, James Bradbury, Caiming Xiong, Victor OK Li, and Richard Socher. 2018. Non-Autoregressive Neural Machine Translation. In International Conference on Learning Representations.

[7]

Jiatao Gu and Xiang Kong. 2021. Fully Non-autoregressive Neural Machine Translation: Tricks of the Trade. In Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021. Association for Computational Linguistics, Online, 120--133. https://doi.org/10.18653/v1/2021.findings-acl.11

[8]

Jiaxin Guo, Minghan Wang, Daimeng Wei, Hengchao Shang, Yuxia Wang, Zongyao Li, Zhengzhe Yu, Zhangling Wu, Yimeng Chen, Chang Su, M. Zhang, Lizhi Lei, Shimin Tao, and Hao Yang. 2021. Self-Distillation Mixup Training for Non-autoregressive Neural Machine Translation. ArXiv, Vol. abs/2112.11640 (2021).

[9]

Junliang Guo, Linli Xu, and Enhong Chen. 2020. Jointly masked sequence-to-sequence model for non-autoregressive neural machine translation. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. 376--385.

[10]

Yongchang Hao, Shilin He, Wenxiang Jiao, Zhaopeng Tu, Michael Lyu, and Xing Wang. 2021. Multi-Task Learning with Shared Encoder for Non-Autoregressive Machine Translation. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 3989--3996.

[11]

Chenyang Huang, Hao Zhou, Osmar R Za"iane, Lili Mou, and Lei Li. 2021. Non-autoregressive translation with layer-wise prediction and deep supervision. arXiv preprint arXiv:2110.07515 (2021).

[12]

Chenyang Huang, Hao Zhou, Osmar R. Za"i ane, Lili Mou, and Lei Li. 2022c. Non-autoregressive Translation with Layer-Wise Prediction and Deep Supervision. In Thirty-Sixth AAAI Conference on Artificial Intelligence, AAAI 2022, Thirty-Fourth Conference on Innovative Applications of Artificial Intelligence, IAAI 2022, The Twelveth Symposium on Educational Advances in Artificial Intelligence, EAAI 2022 Virtual Event, February 22 - March 1, 2022. AAAI Press, 10776--10784. https://doi.org/10.1609/aaai.v36i10.21323

[13]

Fei Huang, Hao Zhou, Yang Liu, Hanguang Li, and Min Huang. 2022b. Directed Acyclic Transformer for Non-Autoregressive Machine Translation. In ICML.

[14]

Xiaoshan Huang, Felipe Pérez, and Maksims Volkovs. 2022a. Improving Non-Autoregressive Translation Models Without Distillation. In International Conference on Learning Representations.

[15]

Jungo Kasai, James Cross, Marjan Ghazvininejad, and Jiatao Gu. 2020. Non-autoregressive machine translation with disentangled context transformer. In International Conference on Machine Learning. PMLR, 5144--5155.

[16]

Diederik P Kingma and Jimmy Ba. 2015. Adam: A Method for Stochastic Optimization. In ICLR (Poster).

[17]

Yaniv Leviathan, Matan Kalman, and Yossi Matias. 2022. Fast Inference from Transformers via Speculative Decoding. In International Conference on Machine Learning. https://api.semanticscholar.org/CorpusID:254096365

[18]

Qijiong Liu, Jieming Zhu, Jiahao Wu, Tiandeng Wu, Zhenhua Dong, and Xiao-Ming Wu. 2023. FANS: Fast Non-Autoregressive Sequence Generation for Item List Continuation. Proceedings of the ACM Web Conference 2023 (2023). https://api.semanticscholar.org/CorpusID:257913384

Digital Library

[19]

Zhengrui Ma, Chenze Shao, Shangtong Gui, Min Zhang, and Yanghe Feng. 2023. Fuzzy Alignments in Directed Acyclic Graph for Non-Autoregressive Machine Translation. ArXiv, Vol. abs/2303.06662 (2023).

[20]

Myle Ott, Sergey Edunov, Alexei Baevski, Angela Fan, Sam Gross, Nathan Ng, David Grangier, and Michael Auli. 2019. fairseq: A Fast, Extensible Toolkit for Sequence Modeling. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics (Demonstrations). Association for Computational Linguistics, Minneapolis, Minnesota, 48--53. https://doi.org/10.18653/v1/N19--4009

[21]

Kishore Papineni, Salim Roukos, Todd Ward, and Wei-Jing Zhu. 2002. Bleu: a method for automatic evaluation of machine translation. In Proceedings of the 40th annual meeting of the Association for Computational Linguistics. 311--318.

Digital Library

[22]

Lihua Qian, Hao Zhou, Yu Bao, Mingxuan Wang, Lin Qiu, Weinan Zhang, Yong Yu, and Lei Li. 2021. Glancing Transformer for Non-Autoregressive Neural Machine Translation. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers). Association for Computational Linguistics, Online, 1993--2003. https://doi.org/10.18653/v1/2021.acl-long.155

[23]

Colin Raffel, Noam M. Shazeer, Adam Roberts, Katherine Lee, Sharan Narang, Michael Matena, Yanqi Zhou, Wei Li, and Peter J. Liu. 2019. Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer. ArXiv, Vol. abs/1910.10683 (2019).

[24]

Qiu Ran, Yankai Lin, Peng Li, and Jie Zhou. 2020. Learning to Recover from Multi-Modality Errors for Non-Autoregressive Neural Machine Translation. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. 3059--3069.

[25]

Qiu Ran, Yankai Lin, Peng Li, and Jie Zhou. 2021. Guiding non-autoregressive neural machine translation decoding with reordering information. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 35. 13727--13735.

[26]

Ricardo Rei, José G. C. de Souza, Duarte Alves, Chrysoula Zerva, Ana C. Farinha, T. Glushkova, Alon Lavie, Luísa Coheur, and André F. T. Martins. 2022. COMET-22: Unbabel-IST 2022 Submission for the Metrics Shared Task. In Conference on Machine Translation.

[27]

Chitwan Saharia, William Chan, Saurabh Saxena, and Mohammad Norouzi. 2020. Non-Autoregressive Machine Translation with Latent Alignments. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP). 1098--1108.

[28]

Nikolay Savinov, Junyoung Chung, Mikolaj Binkowski, Erich Elsen, and A"aron van den Oord. 2021. Step-unrolled Denoising Autoencoders for Text Generation. ArXiv, Vol. abs/2112.06749 (2021).

[29]

Rico Sennrich, Barry Haddow, and Alexandra Birch. 2016. Neural Machine Translation of Rare Words with Subword Units. In Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). 1715--1725.

[30]

Chenze Shao and Yang Feng. 2022. Non-Monotonic Latent Alignments for CTC-Based Non-Autoregressive Machine Translation. ArXiv, Vol. abs/2210.03953 (2022).

[31]

Chenze Shao, Xuanfu Wu, and Yang Feng. 2022. One Reference Is Not Enough: Diverse Distillation with Reference Selection for Non-Autoregressive Translation. In NAACL.

[32]

Raphael Shu, Jason Lee, Hideki Nakayama, and Kyunghyun Cho. 2020. Latent-variable non-autoregressive neural machine translation with deterministic inference using a delta posterior. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 34. 8846--8853.

[33]

Jongyoon Song, Sungwon Kim, and Sungroh Yoon. 2021. AligNART: Non-autoregressive Neural Machine Translation by Jointly Learning to Estimate Alignment and Translate. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing. 1--14.

[34]

Kaitao Song, Xu Tan, Tao Qin, Jianfeng Lu, and Tie-Yan Liu. 2019. MASS: Masked Sequence to Sequence Pre-training for Language Generation. In International Conference on Machine Learning.

[35]

Mitchell Stern, William Chan, Jamie Kiros, and Jakob Uszkoreit. 2019. Insertion transformer: Flexible sequence generation via insertion operations. In International Conference on Machine Learning. PMLR, 5976--5985.

[36]

Hugo Touvron, Louis Martin, Kevin R. Stone, Peter Albert, Amjad Almahairi, Yasmine Babaei, Nikolay Bashlykov, Soumya Batra, Prajjwal Bhargava, Shruti Bhosale, Daniel M. Bikel, Lukas Blecher, Cristian Cantón Ferrer, Moya Chen, Guillem Cucurull, David Esiobu, Jude Fernandes, Jeremy Fu, Wenyin Fu, Brian Fuller, Cynthia Gao, Vedanuj Goswami, Naman Goyal, Anthony S. Hartshorn, Saghar Hosseini, Rui Hou, Hakan Inan, Marcin Kardas, Viktor Kerkez, Madian Khabsa, Isabel M. Kloumann, A. V. Korenev, Punit Singh Koura, Marie-Anne Lachaux, Thibaut Lavril, Jenya Lee, Diana Liskovich, Yinghai Lu, Yuning Mao, Xavier Martinet, Todor Mihaylov, Pushkar Mishra, Igor Molybog, Yixin Nie, Andrew Poulton, Jeremy Reizenstein, Rashi Rungta, Kalyan Saladi, Alan Schelten, Ruan Silva, Eric Michael Smith, R. Subramanian, Xia Tan, Binh Tang, Ross Taylor, Adina Williams, Jian Xiang Kuan, Puxin Xu, Zhengxu Yan, Iliyan Zarov, Yuchen Zhang, Angela Fan, Melanie Kambadur, Sharan Narang, Aurelien Rodriguez, Robert Stojnic, Sergey Edunov, and Thomas Scialom. 2023. Llama 2: Open Foundation and Fine-Tuned Chat Models. ArXiv, Vol. abs/2307.09288 (2023). https://api.semanticscholar.org/CorpusID:259950998

[37]

Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Łukasz Kaiser, and Illia Polosukhin. 2017. Attention is all you need. In Advances in neural information processing systems. 5998--6008.

[38]

Heming Xia, Tao Ge, Furu Wei, and Zhifang Sui. 2022. Lossless Speedup of Autoregressive Translation with Generalized Aggressive Decoding. ArXiv, Vol. abs/2203.16487 (2022).

[39]

Chang Xu, Jun Wang, Yuqing Tang, Francisco Guzmán, Benjamin I. P. Rubinstein, and Trevor Cohn. 2020. A Targeted Attack on Black-Box Neural Machine Translation with Parallel Data Poisoning. Proceedings of the Web Conference 2021 (2020). https://api.semanticscholar.org/CorpusID:234636535

[40]

Huanran Zheng, Wei Zhu, Pengfei Wang, and Xiaoling Wang. 2022. Candidate Soups: Fusing Candidate Results Improves Translation Quality for Non-Autoregressive Translation. In EMNLP.

[41]

Lianmin Zheng, Wei-Lin Chiang, Ying Sheng, Siyuan Zhuang, Zhanghao Wu, Yonghao Zhuang, Zi Lin, Zhuohan Li, Dacheng Li, Eric P. Xing, Haotong Zhang, Joseph Gonzalez, and Ion Stoica. 2023. Judging LLM-as-a-judge with MT-Bench and Chatbot Arena. ArXiv, Vol. abs/2306.05685 (2023). https://api.semanticscholar.org/CorpusID:259129398

[42]

Chunting Zhou, Graham Neubig, and Jiatao Gu. 2020a. Understanding Knowledge Distillation in Non-autoregressive Machine Translation. ArXiv, Vol. abs/1911.02727 (2020).

[43]

Long Zhou, Jiajun Zhang, and Chengqing Zong. 2020b. Improving Autoregressive NMT with Non-Autoregressive Model. In Proceedings of the First Workshop on Automatic Simultaneous Translation. 24--29.

[44]

Minghao Zhu, Junli Wang, and Chungang Yan. 2022. Non-Autoregressive Neural Machine Translation with Consistency Regularization Optimized Variational Framework. In NAACL.

Index Terms

NAT4AT: Using Non-Autoregressive Translation Makes Autoregressive Translation Faster and Better
1. Computing methodologies
  1. Artificial intelligence
    1. Natural language processing
      1. Machine translation
      2. Natural language generation

Recommendations

Using Translation Memory to Improve Neural Machine Translations
ICDLT '22: Proceedings of the 2022 6th International Conference on Deep Learning Technologies

In this paper, we describe a way of using translation memory (TM) to improve the translation quality and stability of neural machine translation (NMT) systems, especially when the sentences to be translated have high similarity with sentences stored in ...
Chimera Model of Candidate Soups for Non-Autoregressive Translation
Database Systems for Advanced Applications
Abstract
Non-Autoregressive Translation (NAT) models have drawn much attention because of their excellent decoding speed. However, NAT models suffer a significant drop in translation quality compared to Autoregressive Translation (AT) models. Candidate ...
Filter-GLAT: Filter Glanced Decoder Output for Non-autoregressive Transformer
Web and Big Data
Abstract
Non-autoregressive machine translation model has achieved significantly faster inference speed compared to the autoregressive translation model. However, its translation quality is degraded compared to the autoregressive translation model. Despite ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

WWW '24: Proceedings of the ACM Web Conference 2024

May 2024

4826 pages

ISBN:9798400701719

DOI:10.1145/3589334

General Chairs:
Tat-Seng Chua
National University of Singapore
,
Chong-Wah Ngo
Singapore Management University
,
Proceedings Chair:
Roy Ka-Wei Lee
Singapore University of Technology and Design
,
Program Chairs:
Ravi Kumar
Google
,
Hady W. Lauw
Singapore Management University

Copyright © 2024 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

SIGWEB: ACM Special Interest Group on Hypertext, Hypermedia, and Web

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 13 May 2024

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Funding Sources

NSFC grant
Shanghai Trusted Industry Internet Software Collaborative Innovation Center
National Key R&D Program of China

Conference

WWW '24

Sponsor:

SIGWEB

WWW '24: The ACM Web Conference 2024

May 13 - 17, 2024

Singapore, Singapore

Acceptance Rates

Overall Acceptance Rate 1,899 of 8,196 submissions, 23%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

0
Total Citations
98
Total Downloads

Downloads (Last 12 months)98
Downloads (Last 6 weeks)11

Reflects downloads up to 05 Mar 2025

Other Metrics

View Author Metrics

Citations

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Table of Conten