Non-autoregressive Neural Machine Translation with Distortion Model

Zhou, Long; Zhang, Jiajun; Zhao, Yang; Zong, Chengqing

doi:10.1007/978-3-030-60450-9_32

Long Zhou^12,13,
Jiajun Zhang^12,13,
Yang Zhao^12,13 &
…
Chengqing Zong^12,13,14

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 12430))

Included in the following conference series:

CCF International Conference on Natural Language Processing and Chinese Computing

3098 Accesses
1 Citations

Abstract

Non-autoregressive translation (NAT) has attracted attention recently due to its high efficiency during inference. Unfortunately, it performs significantly worse than the autoregressive translation (AT) model. We observe that the gap between NAT and AT can be remarkably narrowed if we provide the inputs of the decoder in the same order as the target sentence. However, existing NAT models still initialize the decoding process by copying source inputs from left to right, and lack an explicit reordering mechanism for decoder inputs. To address this problem, we propose a novel distortion model to enhance the decoder inputs so as to further improve NAT models. The distortion model, incorporated into the NAT model, reorders the decoder inputs to close the word order of the decoder outputs, which can reduce the search space of the non-autoregressive decoder. We verify our approach empirically through a series of experiments on three similar language pairs (En\(\Rightarrow \)De, En\(\Rightarrow \)Ro, and De\(\Rightarrow \)En) and two dissimilar language pairs (Zh\(\Rightarrow \)En and En\(\Rightarrow \)Ja). Quantitative and qualitative analyses demonstrate the effectiveness and universality of our proposed approach.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
The relative position encoding \(\alpha \) and \(\beta \) are computed as \(\alpha _{ij}\) = \(w^{K}_{\mathrm{{clip}}(j-i, k)}\), \(\beta _{ij}\) = \(w^{V}_{\mathrm{{clip}}(j-i, k)}\), where clip(x, k) = max(\(-k\), min(k, x)), i and j denote the absolute position of two tokens. Besides, \(w^{K}\) and \(w^{V}\) are learnable parameters, and we use k = 100 for our experiments.
2.
http://www.statmt.org/wmt14/translation-task.html.
3.
http://www.statmt.org/wmt16/translation-task.html.
4.
https://wit3.fbk.eu/.
5.
The corpora includes LDC2000T50, LDC2002T01, LDC2002E18, LDC2003E07, LDC2003E14, LDC2003T17 and LDC2004T07.
6.
http://isw3.naist.jp/~philip-a/emnlp2016/.
7.
https://github.com/tensorflow/tensor2tensor.
8.
https://github.com/clab/fast_align.
9.
We think that NAT with re-scoring technique is an unfair comparison to standard AT model, because AT model can still improve the performance by reranking the beam-search results [17].

References

Al-Onaizan, Y., Papineni, K.: Distortion models for statistical machine translation. In: ACL 2016 (2006)
Google Scholar
Bahdanau, D., Cho, K., Bengio, Y.: Neural machine translation by jointly learning to align and translate. In: ICLR 2015 (2015)
Google Scholar
Brown, P.F., Pietra, V.J.D., Pietra, S.A.D., Mercer, R.L.: The mathematics of statistical machine translation: parameter estimation. Comput. Linguist. 19(2), 263–311 (1993)
Google Scholar
Chen, K., Wang, R., Utiyama, M., Sumita, E.: Neural machine translation with reordering embeddings. In: ACL 2019 (2019)
Google Scholar
De Gispert, A., Iglesias, G., Byrne, B.: Fast and accurate preordering for SMT using neural networks. In: NACCL 2015 (2015)
Google Scholar
Du, J., Way, A.: Pre-reordering for neural machine translation: helpful or harmful? Prague Bull. Math. Linguist. 108(1), 171–182 (2017)
Article Google Scholar
Feng, S., Liu, S., Yang, N., Li, M., Zhou, M., Zhu, K.Q.: Improving attention modeling with implicit distortion and fertility for machine translation. In: COLING 2016 (2016)
Google Scholar
Gehring, J., Auli, M., Grangier, D., Yarats, D., Dauphin, Y.N.: Convolutional sequence to sequence learning. In: ICML 2017 (2017)
Google Scholar
Ghazvininejad, M., Levy, O., Liu, Y., Zettlemoyer, L.: Mask-predict: parallel decoding of conditional masked language models. In: EMNLP-IJCNLP 2019 (2019)
Google Scholar
Gu, J., Bradbury, J., Xiong, C., Li, V.O., Socher, R.: Non-autoregressive neural machine translation. In: ICLR 2017 (2017)
Google Scholar
Guo, J., Tan, X., He, D., Qin, T., Xu, L., Liu, T.Y.: Non-autoregressive neural machine translation with enhanced decoder input. In: AAAI 2019 (2019)
Google Scholar
Kaiser, Ł., et al.: Fast decoding in sequence models using discrete latent variables. In: ICML 2018 (2018)
Google Scholar
Kim, Y., Rush, A.M.: Sequence-level knowledge distillation. In: EMNLP 2016 (2016)
Google Scholar
Koehn, P., Och, F.J., Marcu, D.: Statistical phrase-based translation. In: ACL-NAACL 2013 (2003)
Google Scholar
Lee, J., Mansimov, E., Cho, K.: Deterministic non-autoregressive neural sequence modeling by iterative refinement. In: EMNLP 2018 (2018)
Google Scholar
Lerner, U., Petrov, S.: Source-side classifier preordering for machine translation. In: EMNLP 2013 (2013)
Google Scholar
Liu, Y., Zhou, L., Wang, Y., Zhao, Y., Zhang, J., Zong, C.: A comparable study on model averaging, ensembling and reranking in NMT. In: Zhang, M., Ng, V., Zhao, D., Li, S., Zan, H. (eds.) NLPCC 2018. LNCS (LNAI), vol. 11109, pp. 299–308. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-99501-4_26
Chapter Google Scholar
Ma, X., Zhou, C., Li, X., Neubig, G., Hovy, E.: FlowSeq: non-autoregressive conditional sequence generation with generative flow. In: EMNLP-IJCNLP 2019 (2019)
Google Scholar
Nakagawa, T.: Efficient top-down BTG parsing for machine translation preordering. In: ACL-IJCNLP 2015 (2015)
Google Scholar
Och, F.J., et al.: A smorgasbord of features for statistical machine translation. In: NACCL 2014 (2004)
Google Scholar
Ran, Q., Lin, Y., Li, P., Zhou, J.: Guiding non-autoregressive neural machine translation decoding with reordering information. arXiv preprint arXiv:1911.02215 (2019)
Sennrich, R., Haddow, B., Birch, A.: Neural machine translation of rare words with subword units. In: ACL 2016 (2016)
Google Scholar
Shao, C., Feng, Y., Zhang, J., Meng, F., Chen, X., Zhou, J.: Retrieving sequential information for non-autoregressive neural machine translation. In: ACL 2019 (2019)
Google Scholar
Shaw, P., Uszkoreit, J., Vaswani, A.: Self-attention with relative position representations. In: NAACL 2018 (2018)
Google Scholar
Sutskever, I., Vinyals, O., Le, Q.V.: Sequence to sequence learning with neural networks. In: NIPS 2014 (2014)
Google Scholar
Tillmann, C.: A unigram orientation model for statistical machine translation. In: NAACL 2004 (2004)
Google Scholar
Vaswani, A., et al.: Attention is all you need. In: NIPS 2017 (2017)
Google Scholar
Vinyals, O., Fortunato, M., Jaitly, N.: Pointer networks. In: NIPS 2015 (2015)
Google Scholar
Wang, Y., Zhou, L., Zhang, J., Zong, C.: Word, subword or character? An empirical study of granularity in Chinese-English NMT. In: Wong, D.F., Xiong, D. (eds.) CWMT 2017. CCIS, vol. 787, pp. 30–42. Springer, Singapore (2017). https://doi.org/10.1007/978-981-10-7134-8_4
Chapter Google Scholar
Wang, Y., Tian, F., He, D., Qin, T., Zhai, C., Liu, T.Y.: Non-autoregressive machine translation with auxiliary regularization. In: AAAI 2019 (2019)
Google Scholar
Wei, B., Wang, M., Zhou, H., Lin, J., Sun, X.: Imitation learning for non-autoregressive neural machine translation. In: ACL 2019 (2019)
Google Scholar
Wu, Y., Schuster, M., Chen, Z., Le, Q.V., et al.: Google’s neural machine translation system: bridging the gap between human and machine translation. arXiv preprint arXiv:1609.08144 (2016)
Zhang, J., Zong, C.: Neural machine translation: Challenges, progress and future. arXiv preprint arXiv:2004.05809 (2020)
Zhang, J., Wang, M., Liu, Q., Zhou, J.: Incorporating word reordering knowledge into attention-based neural machine translation. In: ACL 2017 (2017)
Google Scholar
Zhao, Y., Zhang, J., Zong, C.: Exploiting pre-ordering for neural machine translation. In: LREC 2018 (2018)
Google Scholar
Zhou, L., Zhang, J., Yu, H., Zong, C.: Sequence generation: from both sides to the middle. In: IJCAI 2019 (2019)
Google Scholar

Download references

Acknowledgments

The research work has been funded by the Natural Science Foundation of China under Grant No. U1836221 and 61673380. The research work in this paper has also been supported by Beijing Advanced Innovation Center for Language Resources and Beijing Academy of Artificial Intelligence (BAAI2019QN0504).

Author information

Authors and Affiliations

National Laboratory of Pattern Recognition, Institute of Automation, CAS, Beijing, People’s Republic of China
Long Zhou, Jiajun Zhang, Yang Zhao & Chengqing Zong
School of Artificial Intelligence, University of Chinese Academy of Sciences, Beijing, China
Long Zhou, Jiajun Zhang, Yang Zhao & Chengqing Zong
CAS Center for Excellence in Brain Science and Intelligence Technology, Shanghai, People’s Republic of China
Chengqing Zong

Authors

Long Zhou
View author publications
You can also search for this author in PubMed Google Scholar
Jiajun Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Yang Zhao
View author publications
You can also search for this author in PubMed Google Scholar
Chengqing Zong
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Long Zhou .

Editor information

Editors and Affiliations

ECE & Ingenuity Labs Research Institute, Queen’s University, Kingston, ON, Canada
Xiaodan Zhu
Department of Computer Science and Technology, Tsinghua University, Beijing, China
Min Zhang
School of Computer Science and Technology, Soochow University, Suzhou, China
Yu Hong
College of Intelligence and Computing, Tianjin University, Tianjin, China
Ruifang He

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Zhou, L., Zhang, J., Zhao, Y., Zong, C. (2020). Non-autoregressive Neural Machine Translation with Distortion Model. In: Zhu, X., Zhang, M., Hong, Y., He, R. (eds) Natural Language Processing and Chinese Computing. NLPCC 2020. Lecture Notes in Computer Science(), vol 12430. Springer, Cham. https://doi.org/10.1007/978-3-030-60450-9_32

Download citation

DOI: https://doi.org/10.1007/978-3-030-60450-9_32
Published: 02 October 2020
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-60449-3
Online ISBN: 978-3-030-60450-9
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Societies and partnerships

the China Computer Federation (CCF) (opens in a new tab)