Skip to main content
Log in

Alleviating repetitive tokens in non-autoregressive machine translation with unlikelihood training

  • Neural Networks
  • Published:
Soft Computing Aims and scope Submit manuscript

Abstract

In recent years, significant progress has been made in the field of non-autoregressive machine translations. However, the accuracy of non-autoregressive models still lags behind their autoregressive counterparts. This discrepancy can be attributed to the abundance of repetitive tokens in the target sequences generated by non-autoregressive models. In this study, we delve into this phenomenon and propose a novel approach to train a non-autoregressive model using unlikelihood loss. We evaluate our method on three widely used benchmark tasks. The experimental results demonstrating that our proposed approach significantly reduces the number of repetitive tokens while improving the overall performance of non-autoregressive machine translations. Compared to the baseline model ”Mask-Predict”, the average number of repetitions on IWSLT 14 DE\(\rightarrow \)EN valid set is reduced from 0.48 to 0.17, resulting in a remarkable 62% decrease.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2

Similar content being viewed by others

Data availability

Enquiries about data availability should be directed to the authors.

References

  • Bahdanau D, Cho K, Bengio Y (2014) Neural machine translation by jointly learning to align and translate. arXiv preprint arXiv:1409.0473

  • Bahuleyan H, El Asri L (2020) Diverse keyphrase generation with neural unlikelihood training. In: Proceedings of the 28th International Conference on Computational Linguistics, pp 5271–5287

  • Bao Y, Zhou H, Huang S, Wang D, Qian L, Dai X, Chen J, Li L (2022) Latent-glat: glancing at latent variables for parallel text generation. In: Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pp 8398–8409

  • Clark K, Luong M-T, Le QV, Manning CD (2019) Electra: pre-training text encoders as discriminators rather than generators. In: International Conference on Learning Representations

  • Geng X, Feng X, Qin B (2021) Learning to rewrite for non-autoregressive neural machine translation. In: Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, pp 3297–3308

  • Ghazvininejad M, Levy O, Liu Y, Zettlemoyer L (2019) Mask-predict: parallel decoding of conditional masked language models. In: Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), pp 6114–6123

  • Gu J, Bradbury J, Xiong C, Li VOK, Socher R (2018) Non-autoregressive neural machine translation. In: International Conference on Learning Representations

  • Guo J, Tan X, He D, Qin T, Xu L, Liu T-Y (2019) Non-autoregressive neural machine translation with enhanced decoder input. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 33, pp 3723–3730

  • Guo J, Xu L, Chen E (2020) Jointly masked sequence-to-sequence model for non-autoregressive neural machine translation. In: Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pp 376–385

  • Huang F, Zhou H, Liu Y, Li H, Huang M (2022) Directed acyclic transformer for non-autoregressive machine translation. In: International Conference on Machine Learning, PLMR, pp 9410–9428

  • Huang C, Zhou H, Zaïane OR, Mou L, Li L (2022) Non-autoregressive translation with layer-wise prediction and deep supervision. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 36, pp 10776–10784

  • Kaiser L, Bengio S, Roy A, Vaswani A, Parmar N, Uszkoreit J, Shazeer N (2018) Fast decoding in sequence models using discrete latent variables. In: ICML

  • Kingma DP, Ba J (2014) Adam: a method for stochastic optimization. arXiv preprint arXiv:1412.6980

  • Lee J, Mansimov E, Cho K (2018) Deterministic non-autoregressive neural sequence modeling by iterative refinement. In: Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, pp 1173–1182

  • Li Z, Lin Z, He D, Tian F, Qin T, Wang L, Liu T-Y (2019) Hint-based training for non-autoregressive machine translation. In: Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), pp 5712–5717

  • Ma X, Zhou C, Li X, Neubig G, Hovy E (2019) Flowseq: non-autoregressive conditional sequence generation with generative flow. In: Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), pp 4273–4283

  • Papineni K, Roukos S, Ward T, Zhu W-J (2002) Bleu: a method for automatic evaluation of machine translation. In: Proceedings of the 40th annual meeting of the Association for Computational Linguistics, pp 311–318

  • Qian L, Zhou H, Bao Y, Wang M, Qiu L, Zhang W, Yu Y, Li L (2021) Glancing transformer for non-autoregressive neural machine translation. In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), pp 1993–2003

  • Ren Y, Liu J, Tan X, Zhao S, Zhao Z, Liu T-Y (2020) A study of non-autoregressive model for sequence generation. arXiv preprint arXiv:2004.10454

  • Sennrich R, Haddow B, Birch A (2016) Neural machine translation of rare words with subword units. In: Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pp 1715–1725

  • Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez AN, Kaiser Ł, Polosukhin I (2017) Attention is all you need. In: Advances in neural information processing systems, pp 5998–6008

  • Wang Y, Tian F, He D, Qin T, Zhai CX, Liu T-Y (2019) Non-autoregressive machine translation with auxiliary regularization. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 33, pp 5377–5384

  • Wang S, Shi S, Huang H (2021) Improving non-autoregressive machine translation with soft-masking. In: CCF International Conference on Natural Language Processing and Chinese Computing, Springer, pp 141–152

  • Wang S, Huang H, Shi S (2023) Incorporating history and future into non-autoregressive machine translation. Comput Speech Lang 77:101439

    Article  MATH  Google Scholar 

  • Welleck S, Kulikov I, Roller S, Dinan E, Cho K, Weston J (2019) Neural text generation with unlikelihood training. arXiv preprint arXiv:1908.04319

Download references

Acknowledgements

This work was supported by the National Natural Science Foundation of China (Nos. 61732005, 61671064).

Funding

This work was supported by the National Natural Science Foundation of China (Nos. 61732005, 61671064)

Author information

Authors and Affiliations

Authors

Contributions

All authors contributed to the study conception and design. Material preparation, data collection and analysis were performed by Shuheng Wang. The first draft of the manuscript was written by Shuheng Wang and all authors commented on previous versions of the manuscript. All authors read and approved the final manuscript.

Corresponding author

Correspondence to Shumin Shi.

Ethics declarations

Conflict of interest

We declare that we do not have any commercial or associative interest that represents a conflict of interest in connection with the work submitted.

Ethical approval

This article does not contain any studies with human participants or animals performed by any of the authors.

Informed consent

Informed consent was not required for this type of study.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Wang, S., Shi, S. & Huang, H. Alleviating repetitive tokens in non-autoregressive machine translation with unlikelihood training. Soft Comput 28, 4681–4688 (2024). https://doi.org/10.1007/s00500-023-09490-1

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00500-023-09490-1

Keywords

Navigation