skip to main content
10.1145/3589334.3645527acmconferencesArticle/Chapter ViewAbstractPublication PagesthewebconfConference Proceedingsconference-collections
research-article

NAT4AT: Using Non-Autoregressive Translation Makes Autoregressive Translation Faster and Better

Published: 13 May 2024 Publication History

Abstract

With the increasing number of web documents, the demand for translation has increased dramatically. Non-autoregressive translation (NAT) models can significantly reduce decoding latency to meet the growing translation needs, but they sacrifice translation quality. And there is still an irreparable performance gap between NAT models and strong autoregressive translation (AT) models at the corpus level. However, more fine-grained comparative experiments on AT and NAT are currently lacking. Therefore, in this paper, we first conducted analysis experiments at the sentence level and found complementarity and high similarity between the translations generated by AT and NAT. Then, based on this observation, we propose a general and effective method called NAT4AT, which can not only use NAT to speed up the inference speed of AT significantly but also improve its final translation quality. Specifically, NAT4AT first uses a NAT model to generate an original translation in parallel and then uses an AT model as a correction model to revise errors in the original translation. In this way, the AT model no longer needs to predict the entire translation but only needs to predict a small number of error parts in the NAT result. Extensive experimental results on major WMT benchmarks verify the generality and effectiveness of our method, whose translation quality is superior to the strong AT model and achieves a 5.0x speedup.

Supplemental Material

MP4 File
video presentation
MP4 File
Supplemental video

References

[1]
Yu Bao, Hao Zhou, Shujian Huang, Dongqi Wang, Lihua Qian, Xinyu Dai, Jiajun Chen, and Lei Li. 2022. GLAT: Glancing at Latent Variables for Parallel Text Generation. In ACL.
[2]
Tom B. Brown, Benjamin Mann, Nick Ryder, Melanie Subbiah, Jared Kaplan, Prafulla Dhariwal, Arvind Neelakantan, Pranav Shyam, Girish Sastry, Amanda Askell, Sandhini Agarwal, Ariel Herbert-Voss, Gretchen Krueger, T. J. Henighan, Rewon Child, Aditya Ramesh, Daniel M. Ziegler, Jeff Wu, Clemens Winter, Christopher Hesse, Mark Chen, Eric Sigler, Mateusz Litwin, Scott Gray, Benjamin Chess, Jack Clark, Christopher Berner, Sam McCandlish, Alec Radford, Ilya Sutskever, and Dario Amodei. 2020. Language Models are Few-Shot Learners. ArXiv, Vol. abs/2005.14165 (2020).
[3]
Cunxiao Du, Zhaopeng Tu, and Jing Jiang. 2021. Order-agnostic cross entropy for non-autoregressive machine translation. arXiv preprint arXiv:2106.05093 (2021).
[4]
Xinwei Geng, Xiaocheng Feng, and Bing Qin. 2021. Learning to Rewrite for Non-Autoregressive Neural Machine Translation. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing. 3297--3308.
[5]
Marjan Ghazvininejad, Omer Levy, Yinhan Liu, and Luke Zettlemoyer. 2019. Mask-predict: Parallel decoding of conditional masked language models. arXiv preprint arXiv:1904.09324 (2019).
[6]
Jiatao Gu, James Bradbury, Caiming Xiong, Victor OK Li, and Richard Socher. 2018. Non-Autoregressive Neural Machine Translation. In International Conference on Learning Representations.
[7]
Jiatao Gu and Xiang Kong. 2021. Fully Non-autoregressive Neural Machine Translation: Tricks of the Trade. In Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021. Association for Computational Linguistics, Online, 120--133. https://doi.org/10.18653/v1/2021.findings-acl.11
[8]
Jiaxin Guo, Minghan Wang, Daimeng Wei, Hengchao Shang, Yuxia Wang, Zongyao Li, Zhengzhe Yu, Zhangling Wu, Yimeng Chen, Chang Su, M. Zhang, Lizhi Lei, Shimin Tao, and Hao Yang. 2021. Self-Distillation Mixup Training for Non-autoregressive Neural Machine Translation. ArXiv, Vol. abs/2112.11640 (2021).
[9]
Junliang Guo, Linli Xu, and Enhong Chen. 2020. Jointly masked sequence-to-sequence model for non-autoregressive neural machine translation. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. 376--385.
[10]
Yongchang Hao, Shilin He, Wenxiang Jiao, Zhaopeng Tu, Michael Lyu, and Xing Wang. 2021. Multi-Task Learning with Shared Encoder for Non-Autoregressive Machine Translation. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 3989--3996.
[11]
Chenyang Huang, Hao Zhou, Osmar R Za"iane, Lili Mou, and Lei Li. 2021. Non-autoregressive translation with layer-wise prediction and deep supervision. arXiv preprint arXiv:2110.07515 (2021).
[12]
Chenyang Huang, Hao Zhou, Osmar R. Za"i ane, Lili Mou, and Lei Li. 2022c. Non-autoregressive Translation with Layer-Wise Prediction and Deep Supervision. In Thirty-Sixth AAAI Conference on Artificial Intelligence, AAAI 2022, Thirty-Fourth Conference on Innovative Applications of Artificial Intelligence, IAAI 2022, The Twelveth Symposium on Educational Advances in Artificial Intelligence, EAAI 2022 Virtual Event, February 22 - March 1, 2022. AAAI Press, 10776--10784. https://doi.org/10.1609/aaai.v36i10.21323
[13]
Fei Huang, Hao Zhou, Yang Liu, Hanguang Li, and Min Huang. 2022b. Directed Acyclic Transformer for Non-Autoregressive Machine Translation. In ICML.
[14]
Xiaoshan Huang, Felipe Pérez, and Maksims Volkovs. 2022a. Improving Non-Autoregressive Translation Models Without Distillation. In International Conference on Learning Representations.
[15]
Jungo Kasai, James Cross, Marjan Ghazvininejad, and Jiatao Gu. 2020. Non-autoregressive machine translation with disentangled context transformer. In International Conference on Machine Learning. PMLR, 5144--5155.
[16]
Diederik P Kingma and Jimmy Ba. 2015. Adam: A Method for Stochastic Optimization. In ICLR (Poster).
[17]
Yaniv Leviathan, Matan Kalman, and Yossi Matias. 2022. Fast Inference from Transformers via Speculative Decoding. In International Conference on Machine Learning. https://api.semanticscholar.org/CorpusID:254096365
[18]
Qijiong Liu, Jieming Zhu, Jiahao Wu, Tiandeng Wu, Zhenhua Dong, and Xiao-Ming Wu. 2023. FANS: Fast Non-Autoregressive Sequence Generation for Item List Continuation. Proceedings of the ACM Web Conference 2023 (2023). https://api.semanticscholar.org/CorpusID:257913384
[19]
Zhengrui Ma, Chenze Shao, Shangtong Gui, Min Zhang, and Yanghe Feng. 2023. Fuzzy Alignments in Directed Acyclic Graph for Non-Autoregressive Machine Translation. ArXiv, Vol. abs/2303.06662 (2023).
[20]
Myle Ott, Sergey Edunov, Alexei Baevski, Angela Fan, Sam Gross, Nathan Ng, David Grangier, and Michael Auli. 2019. fairseq: A Fast, Extensible Toolkit for Sequence Modeling. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics (Demonstrations). Association for Computational Linguistics, Minneapolis, Minnesota, 48--53. https://doi.org/10.18653/v1/N19--4009
[21]
Kishore Papineni, Salim Roukos, Todd Ward, and Wei-Jing Zhu. 2002. Bleu: a method for automatic evaluation of machine translation. In Proceedings of the 40th annual meeting of the Association for Computational Linguistics. 311--318.
[22]
Lihua Qian, Hao Zhou, Yu Bao, Mingxuan Wang, Lin Qiu, Weinan Zhang, Yong Yu, and Lei Li. 2021. Glancing Transformer for Non-Autoregressive Neural Machine Translation. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers). Association for Computational Linguistics, Online, 1993--2003. https://doi.org/10.18653/v1/2021.acl-long.155
[23]
Colin Raffel, Noam M. Shazeer, Adam Roberts, Katherine Lee, Sharan Narang, Michael Matena, Yanqi Zhou, Wei Li, and Peter J. Liu. 2019. Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer. ArXiv, Vol. abs/1910.10683 (2019).
[24]
Qiu Ran, Yankai Lin, Peng Li, and Jie Zhou. 2020. Learning to Recover from Multi-Modality Errors for Non-Autoregressive Neural Machine Translation. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. 3059--3069.
[25]
Qiu Ran, Yankai Lin, Peng Li, and Jie Zhou. 2021. Guiding non-autoregressive neural machine translation decoding with reordering information. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 35. 13727--13735.
[26]
Ricardo Rei, José G. C. de Souza, Duarte Alves, Chrysoula Zerva, Ana C. Farinha, T. Glushkova, Alon Lavie, Luísa Coheur, and André F. T. Martins. 2022. COMET-22: Unbabel-IST 2022 Submission for the Metrics Shared Task. In Conference on Machine Translation.
[27]
Chitwan Saharia, William Chan, Saurabh Saxena, and Mohammad Norouzi. 2020. Non-Autoregressive Machine Translation with Latent Alignments. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP). 1098--1108.
[28]
Nikolay Savinov, Junyoung Chung, Mikolaj Binkowski, Erich Elsen, and A"aron van den Oord. 2021. Step-unrolled Denoising Autoencoders for Text Generation. ArXiv, Vol. abs/2112.06749 (2021).
[29]
Rico Sennrich, Barry Haddow, and Alexandra Birch. 2016. Neural Machine Translation of Rare Words with Subword Units. In Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). 1715--1725.
[30]
Chenze Shao and Yang Feng. 2022. Non-Monotonic Latent Alignments for CTC-Based Non-Autoregressive Machine Translation. ArXiv, Vol. abs/2210.03953 (2022).
[31]
Chenze Shao, Xuanfu Wu, and Yang Feng. 2022. One Reference Is Not Enough: Diverse Distillation with Reference Selection for Non-Autoregressive Translation. In NAACL.
[32]
Raphael Shu, Jason Lee, Hideki Nakayama, and Kyunghyun Cho. 2020. Latent-variable non-autoregressive neural machine translation with deterministic inference using a delta posterior. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 34. 8846--8853.
[33]
Jongyoon Song, Sungwon Kim, and Sungroh Yoon. 2021. AligNART: Non-autoregressive Neural Machine Translation by Jointly Learning to Estimate Alignment and Translate. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing. 1--14.
[34]
Kaitao Song, Xu Tan, Tao Qin, Jianfeng Lu, and Tie-Yan Liu. 2019. MASS: Masked Sequence to Sequence Pre-training for Language Generation. In International Conference on Machine Learning.
[35]
Mitchell Stern, William Chan, Jamie Kiros, and Jakob Uszkoreit. 2019. Insertion transformer: Flexible sequence generation via insertion operations. In International Conference on Machine Learning. PMLR, 5976--5985.
[36]
Hugo Touvron, Louis Martin, Kevin R. Stone, Peter Albert, Amjad Almahairi, Yasmine Babaei, Nikolay Bashlykov, Soumya Batra, Prajjwal Bhargava, Shruti Bhosale, Daniel M. Bikel, Lukas Blecher, Cristian Cantón Ferrer, Moya Chen, Guillem Cucurull, David Esiobu, Jude Fernandes, Jeremy Fu, Wenyin Fu, Brian Fuller, Cynthia Gao, Vedanuj Goswami, Naman Goyal, Anthony S. Hartshorn, Saghar Hosseini, Rui Hou, Hakan Inan, Marcin Kardas, Viktor Kerkez, Madian Khabsa, Isabel M. Kloumann, A. V. Korenev, Punit Singh Koura, Marie-Anne Lachaux, Thibaut Lavril, Jenya Lee, Diana Liskovich, Yinghai Lu, Yuning Mao, Xavier Martinet, Todor Mihaylov, Pushkar Mishra, Igor Molybog, Yixin Nie, Andrew Poulton, Jeremy Reizenstein, Rashi Rungta, Kalyan Saladi, Alan Schelten, Ruan Silva, Eric Michael Smith, R. Subramanian, Xia Tan, Binh Tang, Ross Taylor, Adina Williams, Jian Xiang Kuan, Puxin Xu, Zhengxu Yan, Iliyan Zarov, Yuchen Zhang, Angela Fan, Melanie Kambadur, Sharan Narang, Aurelien Rodriguez, Robert Stojnic, Sergey Edunov, and Thomas Scialom. 2023. Llama 2: Open Foundation and Fine-Tuned Chat Models. ArXiv, Vol. abs/2307.09288 (2023). https://api.semanticscholar.org/CorpusID:259950998
[37]
Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Łukasz Kaiser, and Illia Polosukhin. 2017. Attention is all you need. In Advances in neural information processing systems. 5998--6008.
[38]
Heming Xia, Tao Ge, Furu Wei, and Zhifang Sui. 2022. Lossless Speedup of Autoregressive Translation with Generalized Aggressive Decoding. ArXiv, Vol. abs/2203.16487 (2022).
[39]
Chang Xu, Jun Wang, Yuqing Tang, Francisco Guzmán, Benjamin I. P. Rubinstein, and Trevor Cohn. 2020. A Targeted Attack on Black-Box Neural Machine Translation with Parallel Data Poisoning. Proceedings of the Web Conference 2021 (2020). https://api.semanticscholar.org/CorpusID:234636535
[40]
Huanran Zheng, Wei Zhu, Pengfei Wang, and Xiaoling Wang. 2022. Candidate Soups: Fusing Candidate Results Improves Translation Quality for Non-Autoregressive Translation. In EMNLP.
[41]
Lianmin Zheng, Wei-Lin Chiang, Ying Sheng, Siyuan Zhuang, Zhanghao Wu, Yonghao Zhuang, Zi Lin, Zhuohan Li, Dacheng Li, Eric P. Xing, Haotong Zhang, Joseph Gonzalez, and Ion Stoica. 2023. Judging LLM-as-a-judge with MT-Bench and Chatbot Arena. ArXiv, Vol. abs/2306.05685 (2023). https://api.semanticscholar.org/CorpusID:259129398
[42]
Chunting Zhou, Graham Neubig, and Jiatao Gu. 2020a. Understanding Knowledge Distillation in Non-autoregressive Machine Translation. ArXiv, Vol. abs/1911.02727 (2020).
[43]
Long Zhou, Jiajun Zhang, and Chengqing Zong. 2020b. Improving Autoregressive NMT with Non-Autoregressive Model. In Proceedings of the First Workshop on Automatic Simultaneous Translation. 24--29.
[44]
Minghao Zhu, Junli Wang, and Chungang Yan. 2022. Non-Autoregressive Neural Machine Translation with Consistency Regularization Optimized Variational Framework. In NAACL.

Index Terms

  1. NAT4AT: Using Non-Autoregressive Translation Makes Autoregressive Translation Faster and Better

      Recommendations

      Comments

      Information & Contributors

      Information

      Published In

      cover image ACM Conferences
      WWW '24: Proceedings of the ACM Web Conference 2024
      May 2024
      4826 pages
      ISBN:9798400701719
      DOI:10.1145/3589334
      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

      Sponsors

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      Published: 13 May 2024

      Permissions

      Request permissions for this article.

      Check for updates

      Author Tags

      1. efficient inference
      2. neural machine translation
      3. non-autoregressive generation

      Qualifiers

      • Research-article

      Funding Sources

      • NSFC grant
      • Shanghai Trusted Industry Internet Software Collaborative Innovation Center
      • National Key R&D Program of China

      Conference

      WWW '24
      Sponsor:
      WWW '24: The ACM Web Conference 2024
      May 13 - 17, 2024
      Singapore, Singapore

      Acceptance Rates

      Overall Acceptance Rate 1,899 of 8,196 submissions, 23%

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • 0
        Total Citations
      • 98
        Total Downloads
      • Downloads (Last 12 months)98
      • Downloads (Last 6 weeks)11
      Reflects downloads up to 05 Mar 2025

      Other Metrics

      Citations

      View Options

      Login options

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      Figures

      Tables

      Media

      Share

      Share

      Share this Publication link

      Share on social media