Abstract
Non-autoregressive translation (NAT) has attracted attention recently due to its high efficiency during inference. Unfortunately, it performs significantly worse than the autoregressive translation (AT) model. We observe that the gap between NAT and AT can be remarkably narrowed if we provide the inputs of the decoder in the same order as the target sentence. However, existing NAT models still initialize the decoding process by copying source inputs from left to right, and lack an explicit reordering mechanism for decoder inputs. To address this problem, we propose a novel distortion model to enhance the decoder inputs so as to further improve NAT models. The distortion model, incorporated into the NAT model, reorders the decoder inputs to close the word order of the decoder outputs, which can reduce the search space of the non-autoregressive decoder. We verify our approach empirically through a series of experiments on three similar language pairs (En\(\Rightarrow \)De, En\(\Rightarrow \)Ro, and De\(\Rightarrow \)En) and two dissimilar language pairs (Zh\(\Rightarrow \)En and En\(\Rightarrow \)Ja). Quantitative and qualitative analyses demonstrate the effectiveness and universality of our proposed approach.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
The relative position encoding \(\alpha \) and \(\beta \) are computed as \(\alpha _{ij}\) = \(w^{K}_{\mathrm{{clip}}(j-i, k)}\), \(\beta _{ij}\) = \(w^{V}_{\mathrm{{clip}}(j-i, k)}\), where clip(x, k) = max(\(-k\), min(k, x)), i and j denote the absolute position of two tokens. Besides, \(w^{K}\) and \(w^{V}\) are learnable parameters, and we use k = 100 for our experiments.
- 2.
- 3.
- 4.
- 5.
The corpora includes LDC2000T50, LDC2002T01, LDC2002E18, LDC2003E07, LDC2003E14, LDC2003T17 and LDC2004T07.
- 6.
- 7.
- 8.
- 9.
We think that NAT with re-scoring technique is an unfair comparison to standard AT model, because AT model can still improve the performance by reranking the beam-search results [17].
References
Al-Onaizan, Y., Papineni, K.: Distortion models for statistical machine translation. In: ACL 2016 (2006)
Bahdanau, D., Cho, K., Bengio, Y.: Neural machine translation by jointly learning to align and translate. In: ICLR 2015 (2015)
Brown, P.F., Pietra, V.J.D., Pietra, S.A.D., Mercer, R.L.: The mathematics of statistical machine translation: parameter estimation. Comput. Linguist. 19(2), 263–311 (1993)
Chen, K., Wang, R., Utiyama, M., Sumita, E.: Neural machine translation with reordering embeddings. In: ACL 2019 (2019)
De Gispert, A., Iglesias, G., Byrne, B.: Fast and accurate preordering for SMT using neural networks. In: NACCL 2015 (2015)
Du, J., Way, A.: Pre-reordering for neural machine translation: helpful or harmful? Prague Bull. Math. Linguist. 108(1), 171–182 (2017)
Feng, S., Liu, S., Yang, N., Li, M., Zhou, M., Zhu, K.Q.: Improving attention modeling with implicit distortion and fertility for machine translation. In: COLING 2016 (2016)
Gehring, J., Auli, M., Grangier, D., Yarats, D., Dauphin, Y.N.: Convolutional sequence to sequence learning. In: ICML 2017 (2017)
Ghazvininejad, M., Levy, O., Liu, Y., Zettlemoyer, L.: Mask-predict: parallel decoding of conditional masked language models. In: EMNLP-IJCNLP 2019 (2019)
Gu, J., Bradbury, J., Xiong, C., Li, V.O., Socher, R.: Non-autoregressive neural machine translation. In: ICLR 2017 (2017)
Guo, J., Tan, X., He, D., Qin, T., Xu, L., Liu, T.Y.: Non-autoregressive neural machine translation with enhanced decoder input. In: AAAI 2019 (2019)
Kaiser, Ł., et al.: Fast decoding in sequence models using discrete latent variables. In: ICML 2018 (2018)
Kim, Y., Rush, A.M.: Sequence-level knowledge distillation. In: EMNLP 2016 (2016)
Koehn, P., Och, F.J., Marcu, D.: Statistical phrase-based translation. In: ACL-NAACL 2013 (2003)
Lee, J., Mansimov, E., Cho, K.: Deterministic non-autoregressive neural sequence modeling by iterative refinement. In: EMNLP 2018 (2018)
Lerner, U., Petrov, S.: Source-side classifier preordering for machine translation. In: EMNLP 2013 (2013)
Liu, Y., Zhou, L., Wang, Y., Zhao, Y., Zhang, J., Zong, C.: A comparable study on model averaging, ensembling and reranking in NMT. In: Zhang, M., Ng, V., Zhao, D., Li, S., Zan, H. (eds.) NLPCC 2018. LNCS (LNAI), vol. 11109, pp. 299–308. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-99501-4_26
Ma, X., Zhou, C., Li, X., Neubig, G., Hovy, E.: FlowSeq: non-autoregressive conditional sequence generation with generative flow. In: EMNLP-IJCNLP 2019 (2019)
Nakagawa, T.: Efficient top-down BTG parsing for machine translation preordering. In: ACL-IJCNLP 2015 (2015)
Och, F.J., et al.: A smorgasbord of features for statistical machine translation. In: NACCL 2014 (2004)
Ran, Q., Lin, Y., Li, P., Zhou, J.: Guiding non-autoregressive neural machine translation decoding with reordering information. arXiv preprint arXiv:1911.02215 (2019)
Sennrich, R., Haddow, B., Birch, A.: Neural machine translation of rare words with subword units. In: ACL 2016 (2016)
Shao, C., Feng, Y., Zhang, J., Meng, F., Chen, X., Zhou, J.: Retrieving sequential information for non-autoregressive neural machine translation. In: ACL 2019 (2019)
Shaw, P., Uszkoreit, J., Vaswani, A.: Self-attention with relative position representations. In: NAACL 2018 (2018)
Sutskever, I., Vinyals, O., Le, Q.V.: Sequence to sequence learning with neural networks. In: NIPS 2014 (2014)
Tillmann, C.: A unigram orientation model for statistical machine translation. In: NAACL 2004 (2004)
Vaswani, A., et al.: Attention is all you need. In: NIPS 2017 (2017)
Vinyals, O., Fortunato, M., Jaitly, N.: Pointer networks. In: NIPS 2015 (2015)
Wang, Y., Zhou, L., Zhang, J., Zong, C.: Word, subword or character? An empirical study of granularity in Chinese-English NMT. In: Wong, D.F., Xiong, D. (eds.) CWMT 2017. CCIS, vol. 787, pp. 30–42. Springer, Singapore (2017). https://doi.org/10.1007/978-981-10-7134-8_4
Wang, Y., Tian, F., He, D., Qin, T., Zhai, C., Liu, T.Y.: Non-autoregressive machine translation with auxiliary regularization. In: AAAI 2019 (2019)
Wei, B., Wang, M., Zhou, H., Lin, J., Sun, X.: Imitation learning for non-autoregressive neural machine translation. In: ACL 2019 (2019)
Wu, Y., Schuster, M., Chen, Z., Le, Q.V., et al.: Google’s neural machine translation system: bridging the gap between human and machine translation. arXiv preprint arXiv:1609.08144 (2016)
Zhang, J., Zong, C.: Neural machine translation: Challenges, progress and future. arXiv preprint arXiv:2004.05809 (2020)
Zhang, J., Wang, M., Liu, Q., Zhou, J.: Incorporating word reordering knowledge into attention-based neural machine translation. In: ACL 2017 (2017)
Zhao, Y., Zhang, J., Zong, C.: Exploiting pre-ordering for neural machine translation. In: LREC 2018 (2018)
Zhou, L., Zhang, J., Yu, H., Zong, C.: Sequence generation: from both sides to the middle. In: IJCAI 2019 (2019)
Acknowledgments
The research work has been funded by the Natural Science Foundation of China under Grant No. U1836221 and 61673380. The research work in this paper has also been supported by Beijing Advanced Innovation Center for Language Resources and Beijing Academy of Artificial Intelligence (BAAI2019QN0504).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2020 Springer Nature Switzerland AG
About this paper
Cite this paper
Zhou, L., Zhang, J., Zhao, Y., Zong, C. (2020). Non-autoregressive Neural Machine Translation with Distortion Model. In: Zhu, X., Zhang, M., Hong, Y., He, R. (eds) Natural Language Processing and Chinese Computing. NLPCC 2020. Lecture Notes in Computer Science(), vol 12430. Springer, Cham. https://doi.org/10.1007/978-3-030-60450-9_32
Download citation
DOI: https://doi.org/10.1007/978-3-030-60450-9_32
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-60449-3
Online ISBN: 978-3-030-60450-9
eBook Packages: Computer ScienceComputer Science (R0)