Unpaired Multimodal Neural Machine Translation via Reinforcement Learning

Wang, Yijun; Wei, Tianxin; Liu, Qi; Chen, Enhong

doi:10.1007/978-3-030-73197-7_11

Yijun Wang¹⁶,
Tianxin Wei¹⁷,
Qi Liu¹⁶ &
…
Enhong Chen¹⁶

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 12682))

Included in the following conference series:

International Conference on Database Systems for Advanced Applications

2729 Accesses
1 Citations

Abstract

End-to-end neural machine translation (NMT) heavily relies on parallel corpora for training. However, high-quality parallel corpora are usually costly to collect. To tackle this problem, multimodal content, especially image, has been introduced to help build an NMT system without parallel corpora. In this paper, we propose a reinforcement learning (RL) method to build an NMT system by introducing a sequence-level supervision signal as a reward. Based on the fact that visual information can be a universal representation to ground different languages, we design two different rewards to guide the learning process, i.e., (1) the likelihood of generated sentence given source image and (2) the distance of attention weights given by image caption models. Experimental results on the Multi30K, IAPR-TC12, and IKEA datasets show that the proposed learning mechanism achieves better performance than existing methods.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

References

Bahdanau, D., et al.: An actor-critic algorithm for sequence prediction (2017)
Google Scholar
Bahdanau, D., Cho, K., Bengio, Y.: Neural machine translation by jointly learning to align and translate. In: 3rd International Conference on Learning Representations (2015)
Google Scholar
Caglayan, O., et al.: Does multimodality help human and machine for translation and image captioning? In: 1st Conference on Machine Translation, vol. 2, pp. 627–633 (2016)
Google Scholar
Calixto, I., Liu, Q., Campbell, N.: Doubly-attentive decoder for multi-modal neural machine translation. In: Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), vol. 1, pp. 1913–1924 (2017)
Google Scholar
Chen, Y., Wu, L., Zaki, M.J.: Reinforcement learning based graph-to-sequence model for natural question generation. In: 8th International Conference on Learning Representations (2020)
Google Scholar
Chen, Y., Liu, Y., Cheng, Y., Li, V.O.: A teacher-student framework for zero-resource neural machine translation. In: Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), vol. 1, pp. 1925–1935 (2017)
Google Scholar
Chen, Y., Liu, Y., Li, V.O.: Zero-resource neural machine translation with multi-agent communication game. In: 32nd AAAI Conference on Artificial Intelligence (2018)
Google Scholar
Elliott, D., Frank, S., Sima’an, K., Specia, L.: Multi30k: multilingual English-German image descriptions. In: Proceedings of the 5th Workshop on Vision and Language, pp. 70–74 (2016)
Google Scholar
Firat, O., Sankaran, B., Al-Onaizan, Y., Vural, F.T.Y., Cho, K.: Zero-resource translation with multi-lingual neural machine translation. In: Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing, pp. 268–277 (2016)
Google Scholar
Gehring, J., Auli, M., Grangier, D., Yarats, D., Dauphin, Y.N.: Convolutional sequence to sequence learning. In: Proceedings of the 34th International Conference on Machine Learning, vol. 70, pp. 1243–1252. JMLR.org (2017)
Google Scholar
Gella, S., Sennrich, R., Keller, F., Lapata, M.: Image pivoting for learning multilingual multimodal representations. In: Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, pp. 2839–2845 (2017)
Google Scholar
Graça, Y.K.M., Ney, H.: When and why is unsupervised neural machine translation useless? In: 22nd Annual Conference of the European Association for Machine Translation, p. 35 (2020)
Google Scholar
Grubinger, M., Clough, P., Müller, H., Deselaers, T.: The IAPR TC-12 benchmark: a new evaluation resource for visual information systems. In: International Workshop OntoImage, vol. 2 (2006)
Google Scholar
Hashimoto, K., Tsuruoka, Y.: Accelerated reinforcement learning for sentence generation by vocabulary prediction. In: Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pp. 3115–3125 (2019)
Google Scholar
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016)
Google Scholar
Hitschler, J., Schamoni, S., Riezler, S.: Multimodal pivots for image caption translation. In: Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), vol. 1, pp. 2399–2409 (2016)
Google Scholar
Johnson, M., et al.: Google’s multilingual neural machine translation system: enabling zero-shot translation. Trans. Assoc. Comput. Linguist. 5, 339–351 (2017)
Article Google Scholar
Karpathy, A., Fei-Fei, L.: Deep visual-semantic alignments for generating image descriptions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3128–3137 (2015)
Google Scholar
Kingma, D.P., Ba, J.: Adam: a method for stochastic optimization. In: 3rd International Conference on Learning Representations (2015)
Google Scholar
Koehn, P., et al.: Moses: open source toolkit for statistical machine translation. In: Proceedings of the 45th Annual Meeting of the Association for Computational Linguistics Companion Volume Proceedings of the Demo and Poster Sessions, pp. 177–180 (2007)
Google Scholar
Li, J., Monroe, W., Shi, T., Jean, S., Ritter, A., Jurafsky, D.: Adversarial learning for neural dialogue generation. In: Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, pp. 2157–2169 (2017)
Google Scholar
Nakayama, H., Nishida, N.: Zero-resource machine translation by multimodal encoder-decoder network with multimedia pivot. Mach. Transl. 31(1–2), 49–64 (2017). https://doi.org/10.1007/s10590-017-9197-z
Article Google Scholar
Papineni, K., Roukos, S., Ward, T., Zhu, W.J.: BLEU: a method for automatic evaluation of machine translation. In: Proceedings of the 40th Annual Meeting on Association for Computational Linguistics, pp. 311–318. Association for Computational Linguistics (2002)
Google Scholar
Ranzato, M., Chopra, S., Auli, M., Zaremba, W.: Sequence level training with recurrent neural networks. In: 4th International Conference on Learning Representations (2016)
Google Scholar
Sennrich, R., Haddow, B., Birch, A.: Neural machine translation of rare words with subword units. In: Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), vol. 1, pp. 1715–1725 (2016)
Google Scholar
Shen, S., et al.: Minimum risk training for neural machine translation. In: Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), vol. 1, pp. 1683–1692 (2016)
Google Scholar
Su, Y., Fan, K., Bach, N., Kuo, C.C.J., Huang, F.: Unsupervised multi-modal neural machine translation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 10482–10491 (2019)
Google Scholar
Vaswani, A., et al.: Attention is all you need. In: Advances in Neural Information Processing Systems, pp. 5998–6008 (2017)
Google Scholar
Williams, R.J.: Simple statistical gradient-following algorithms for connectionist reinforcement learning. Mach. Learn. 8(3–4), 229–256 (1992). https://doi.org/10.1007/BF00992696
Article MATH Google Scholar
Wu, L., Tian, F., Qin, T., Lai, J., Liu, T.Y.: A study of reinforcement learning for neural machine translation. In: Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, pp. 3612–3621 (2018)
Google Scholar
Wu, Y., et al.: Google’s neural machine translation system: bridging the gap between human and machine translation. arXiv preprint arXiv:1609.08144 (2016)
Xu, K., et al.: Show, attend and tell: Neural image caption generation with visual attention. In: International Conference on Machine Learning, pp. 2048–2057 (2015)
Google Scholar
Young, P., Lai, A., Hodosh, M., Hockenmaier, J.: From image descriptions to visual denotations: new similarity metrics for semantic inference over event descriptions. Trans. Assoc. Comput. Linguist. 2, 67–78 (2014)
Article Google Scholar
Zheng, H., Cheng, Y., Liu, Y.: Maximum expected likelihood estimation for zero-resource neural machine translation. In: IJCAI, pp. 4251–4257 (2017)
Google Scholar
Zhou, M., Cheng, R., Lee, Y.J., Yu, Z.: A visual attention grounding neural model for multimodal machine translation. In: Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, pp. 3643–3653 (2018)
Google Scholar

Download references

Acknowledgement

This research was partially supported by grants from the National Key Research and Development Program of China (No. 2016YFB1000904) and the National Natural Science Foundation of China (Nos. 61727809, 61922073 and U20A20229).

Author information

Authors and Affiliations

Anhui Province Key Laboratory of Big Data Analysis and Application, University of Science and Technology of China, Hefei, China
Yijun Wang, Qi Liu & Enhong Chen
Department of Computer Science and Technology, University of Science and Technology of China, Hefei, China
Tianxin Wei

Authors

Yijun Wang
View author publications
You can also search for this author in PubMed Google Scholar
Tianxin Wei
View author publications
You can also search for this author in PubMed Google Scholar
Qi Liu
View author publications
You can also search for this author in PubMed Google Scholar
Enhong Chen
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Enhong Chen .

Editor information

Editors and Affiliations

Aalborg University, Aalborg, Denmark
Christian S. Jensen
Singapore Management University, Singapore, Singapore
Ee-Peng Lim
Academia Sinica, Taipei, Taiwan
De-Nian Yang
The Pennsylvania State University, University Park, PA, USA
Wang-Chien Lee
National Chiao Tung University, Hsinchu, Taiwan
Vincent S. Tseng
Athens University of Economics and Business, Athens, Greece
Vana Kalogeraki
National Cheng Kung University, Tainan City, Taiwan
Jen-Wei Huang
National Tsing Hua University, Hsinchu, Taiwan
Chih-Ya Shen

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Wang, Y., Wei, T., Liu, Q., Chen, E. (2021). Unpaired Multimodal Neural Machine Translation via Reinforcement Learning. In: Jensen, C.S., et al. Database Systems for Advanced Applications. DASFAA 2021. Lecture Notes in Computer Science(), vol 12682. Springer, Cham. https://doi.org/10.1007/978-3-030-73197-7_11

Download citation

DOI: https://doi.org/10.1007/978-3-030-73197-7_11
Published: 06 April 2021
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-73196-0
Online ISBN: 978-3-030-73197-7
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics