Skip to main content

Non-autoregressive Neural Machine Translation with Distortion Model

  • Conference paper
  • First Online:
Natural Language Processing and Chinese Computing (NLPCC 2020)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 12430))

Abstract

Non-autoregressive translation (NAT) has attracted attention recently due to its high efficiency during inference. Unfortunately, it performs significantly worse than the autoregressive translation (AT) model. We observe that the gap between NAT and AT can be remarkably narrowed if we provide the inputs of the decoder in the same order as the target sentence. However, existing NAT models still initialize the decoding process by copying source inputs from left to right, and lack an explicit reordering mechanism for decoder inputs. To address this problem, we propose a novel distortion model to enhance the decoder inputs so as to further improve NAT models. The distortion model, incorporated into the NAT model, reorders the decoder inputs to close the word order of the decoder outputs, which can reduce the search space of the non-autoregressive decoder. We verify our approach empirically through a series of experiments on three similar language pairs (En\(\Rightarrow \)De, En\(\Rightarrow \)Ro, and De\(\Rightarrow \)En) and two dissimilar language pairs (Zh\(\Rightarrow \)En and En\(\Rightarrow \)Ja). Quantitative and qualitative analyses demonstrate the effectiveness and universality of our proposed approach.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    The relative position encoding \(\alpha \) and \(\beta \) are computed as \(\alpha _{ij}\) = \(w^{K}_{\mathrm{{clip}}(j-i, k)}\), \(\beta _{ij}\) = \(w^{V}_{\mathrm{{clip}}(j-i, k)}\), where clip(xk) = max(\(-k\), min(kx)), i and j denote the absolute position of two tokens. Besides, \(w^{K}\) and \(w^{V}\) are learnable parameters, and we use k = 100 for our experiments.

  2. 2.

    http://www.statmt.org/wmt14/translation-task.html.

  3. 3.

    http://www.statmt.org/wmt16/translation-task.html.

  4. 4.

    https://wit3.fbk.eu/.

  5. 5.

    The corpora includes LDC2000T50, LDC2002T01, LDC2002E18, LDC2003E07, LDC2003E14, LDC2003T17 and LDC2004T07.

  6. 6.

    http://isw3.naist.jp/~philip-a/emnlp2016/.

  7. 7.

    https://github.com/tensorflow/tensor2tensor.

  8. 8.

    https://github.com/clab/fast_align.

  9. 9.

    We think that NAT with re-scoring technique is an unfair comparison to standard AT model, because AT model can still improve the performance by reranking the beam-search results [17].

References

  1. Al-Onaizan, Y., Papineni, K.: Distortion models for statistical machine translation. In: ACL 2016 (2006)

    Google Scholar 

  2. Bahdanau, D., Cho, K., Bengio, Y.: Neural machine translation by jointly learning to align and translate. In: ICLR 2015 (2015)

    Google Scholar 

  3. Brown, P.F., Pietra, V.J.D., Pietra, S.A.D., Mercer, R.L.: The mathematics of statistical machine translation: parameter estimation. Comput. Linguist. 19(2), 263–311 (1993)

    Google Scholar 

  4. Chen, K., Wang, R., Utiyama, M., Sumita, E.: Neural machine translation with reordering embeddings. In: ACL 2019 (2019)

    Google Scholar 

  5. De Gispert, A., Iglesias, G., Byrne, B.: Fast and accurate preordering for SMT using neural networks. In: NACCL 2015 (2015)

    Google Scholar 

  6. Du, J., Way, A.: Pre-reordering for neural machine translation: helpful or harmful? Prague Bull. Math. Linguist. 108(1), 171–182 (2017)

    Article  Google Scholar 

  7. Feng, S., Liu, S., Yang, N., Li, M., Zhou, M., Zhu, K.Q.: Improving attention modeling with implicit distortion and fertility for machine translation. In: COLING 2016 (2016)

    Google Scholar 

  8. Gehring, J., Auli, M., Grangier, D., Yarats, D., Dauphin, Y.N.: Convolutional sequence to sequence learning. In: ICML 2017 (2017)

    Google Scholar 

  9. Ghazvininejad, M., Levy, O., Liu, Y., Zettlemoyer, L.: Mask-predict: parallel decoding of conditional masked language models. In: EMNLP-IJCNLP 2019 (2019)

    Google Scholar 

  10. Gu, J., Bradbury, J., Xiong, C., Li, V.O., Socher, R.: Non-autoregressive neural machine translation. In: ICLR 2017 (2017)

    Google Scholar 

  11. Guo, J., Tan, X., He, D., Qin, T., Xu, L., Liu, T.Y.: Non-autoregressive neural machine translation with enhanced decoder input. In: AAAI 2019 (2019)

    Google Scholar 

  12. Kaiser, Ł., et al.: Fast decoding in sequence models using discrete latent variables. In: ICML 2018 (2018)

    Google Scholar 

  13. Kim, Y., Rush, A.M.: Sequence-level knowledge distillation. In: EMNLP 2016 (2016)

    Google Scholar 

  14. Koehn, P., Och, F.J., Marcu, D.: Statistical phrase-based translation. In: ACL-NAACL 2013 (2003)

    Google Scholar 

  15. Lee, J., Mansimov, E., Cho, K.: Deterministic non-autoregressive neural sequence modeling by iterative refinement. In: EMNLP 2018 (2018)

    Google Scholar 

  16. Lerner, U., Petrov, S.: Source-side classifier preordering for machine translation. In: EMNLP 2013 (2013)

    Google Scholar 

  17. Liu, Y., Zhou, L., Wang, Y., Zhao, Y., Zhang, J., Zong, C.: A comparable study on model averaging, ensembling and reranking in NMT. In: Zhang, M., Ng, V., Zhao, D., Li, S., Zan, H. (eds.) NLPCC 2018. LNCS (LNAI), vol. 11109, pp. 299–308. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-99501-4_26

    Chapter  Google Scholar 

  18. Ma, X., Zhou, C., Li, X., Neubig, G., Hovy, E.: FlowSeq: non-autoregressive conditional sequence generation with generative flow. In: EMNLP-IJCNLP 2019 (2019)

    Google Scholar 

  19. Nakagawa, T.: Efficient top-down BTG parsing for machine translation preordering. In: ACL-IJCNLP 2015 (2015)

    Google Scholar 

  20. Och, F.J., et al.: A smorgasbord of features for statistical machine translation. In: NACCL 2014 (2004)

    Google Scholar 

  21. Ran, Q., Lin, Y., Li, P., Zhou, J.: Guiding non-autoregressive neural machine translation decoding with reordering information. arXiv preprint arXiv:1911.02215 (2019)

  22. Sennrich, R., Haddow, B., Birch, A.: Neural machine translation of rare words with subword units. In: ACL 2016 (2016)

    Google Scholar 

  23. Shao, C., Feng, Y., Zhang, J., Meng, F., Chen, X., Zhou, J.: Retrieving sequential information for non-autoregressive neural machine translation. In: ACL 2019 (2019)

    Google Scholar 

  24. Shaw, P., Uszkoreit, J., Vaswani, A.: Self-attention with relative position representations. In: NAACL 2018 (2018)

    Google Scholar 

  25. Sutskever, I., Vinyals, O., Le, Q.V.: Sequence to sequence learning with neural networks. In: NIPS 2014 (2014)

    Google Scholar 

  26. Tillmann, C.: A unigram orientation model for statistical machine translation. In: NAACL 2004 (2004)

    Google Scholar 

  27. Vaswani, A., et al.: Attention is all you need. In: NIPS 2017 (2017)

    Google Scholar 

  28. Vinyals, O., Fortunato, M., Jaitly, N.: Pointer networks. In: NIPS 2015 (2015)

    Google Scholar 

  29. Wang, Y., Zhou, L., Zhang, J., Zong, C.: Word, subword or character? An empirical study of granularity in Chinese-English NMT. In: Wong, D.F., Xiong, D. (eds.) CWMT 2017. CCIS, vol. 787, pp. 30–42. Springer, Singapore (2017). https://doi.org/10.1007/978-981-10-7134-8_4

    Chapter  Google Scholar 

  30. Wang, Y., Tian, F., He, D., Qin, T., Zhai, C., Liu, T.Y.: Non-autoregressive machine translation with auxiliary regularization. In: AAAI 2019 (2019)

    Google Scholar 

  31. Wei, B., Wang, M., Zhou, H., Lin, J., Sun, X.: Imitation learning for non-autoregressive neural machine translation. In: ACL 2019 (2019)

    Google Scholar 

  32. Wu, Y., Schuster, M., Chen, Z., Le, Q.V., et al.: Google’s neural machine translation system: bridging the gap between human and machine translation. arXiv preprint arXiv:1609.08144 (2016)

  33. Zhang, J., Zong, C.: Neural machine translation: Challenges, progress and future. arXiv preprint arXiv:2004.05809 (2020)

  34. Zhang, J., Wang, M., Liu, Q., Zhou, J.: Incorporating word reordering knowledge into attention-based neural machine translation. In: ACL 2017 (2017)

    Google Scholar 

  35. Zhao, Y., Zhang, J., Zong, C.: Exploiting pre-ordering for neural machine translation. In: LREC 2018 (2018)

    Google Scholar 

  36. Zhou, L., Zhang, J., Yu, H., Zong, C.: Sequence generation: from both sides to the middle. In: IJCAI 2019 (2019)

    Google Scholar 

Download references

Acknowledgments

The research work has been funded by the Natural Science Foundation of China under Grant No. U1836221 and 61673380. The research work in this paper has also been supported by Beijing Advanced Innovation Center for Language Resources and Beijing Academy of Artificial Intelligence (BAAI2019QN0504).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Long Zhou .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2020 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Zhou, L., Zhang, J., Zhao, Y., Zong, C. (2020). Non-autoregressive Neural Machine Translation with Distortion Model. In: Zhu, X., Zhang, M., Hong, Y., He, R. (eds) Natural Language Processing and Chinese Computing. NLPCC 2020. Lecture Notes in Computer Science(), vol 12430. Springer, Cham. https://doi.org/10.1007/978-3-030-60450-9_32

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-60450-9_32

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-60449-3

  • Online ISBN: 978-3-030-60450-9

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics