Unsupervised Neural Machine Translation With Cross-Lingual Language Representation Agreement | IEEE Journals & Magazine | IEEE Xplore

Unsupervised Neural Machine Translation With Cross-Lingual Language Representation Agreement


Abstract:

Unsupervised cross-lingual language representation initialization methods such as unsupervised bilingual word embedding (UBWE) pre-training and cross-lingual masked langu...Show More

Abstract:

Unsupervised cross-lingual language representation initialization methods such as unsupervised bilingual word embedding (UBWE) pre-training and cross-lingual masked language model (CMLM) pre-training, together with mechanisms such as denoising and back-translation, have advanced unsupervised neural machine translation (UNMT), which has achieved impressive results on several language pairs, particularly French-English and German-English. Typically, UBWE focuses on initializing the word embedding layer in the encoder and decoder of UNMT, whereas the CMLM focuses on initializing the entire encoder and decoder of UNMT. However, UBWE/CMLM training and UNMT training are independent, which makes it difficult to assess how the quality of UBWE/CMLM affects the performance of UNMT during UNMT training. In this paper, we first empirically explore relationships between UNMT and UBWE/CMLM. The empirical results demonstrate that the performance of UBWE and CMLM has a significant influence on the performance of UNMT. Motivated by this, we propose a novel UNMT structure with cross-lingual language representation agreement to capture the interaction between UBWE/CMLM and UNMT during UNMT training. Experimental results on several language pairs demonstrate that the proposed UNMT models improve significantly over the corresponding state-of-the-art UNMT baselines.
Page(s): 1170 - 1182
Date of Publication: 20 March 2020

ISSN Information:

Funding Agency:


Contact IEEE to Subscribe

References

References is not available for this document.