Noise-Robust Semi-supervised Multi-modal Machine Translation

Li, Lin; Hu, Kaixi; Tayir, Turghun; Liu, Jianquan; Lee, Kong Aik

doi:10.1007/978-3-031-20865-2_12

Lin Li ORCID: orcid.org/0000-0001-7553-6916¹¹,
Kaixi Hu ORCID: orcid.org/0000-0002-6774-8510¹¹,
Turghun Tayir¹¹,
Jianquan Liu¹² &
…
Kong Aik Lee¹³

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 13630))

Included in the following conference series:

Pacific Rim International Conference on Artificial Intelligence

1231 Accesses

Abstract

Recent unsupervised multi-modal machine translation methods have shown promising performance for capturing semantic relationships in unannotated monolingual corpora by large-scale pretraining. Empirical studies show that small accessible parallel corpora can achieve comparable performance gains of large pretraining corpora in unsupervised setting. Inspired by the observation, we think semi-supervised learning can largely reduce the demand of pretraining corpora without performance degradation in low-cost scenario. However, images of parallel corpora typically contain much irrelevant information, i.e., visual noises. Such noises have a negative impact on the semantic alignment between source and target languages in semi-supervised learning, thus weakening the contribution of parallel corpora. To effectively utilize the valuable and expensive parallel corpora, we propose a Noise-robust Semi-supervised Multi-modal Machine Translation method (Semi-MMT). In particular, a visual cross-attention sublayer is introduced into source and target language decoders, respectively. And, the representations of texts are used as a guideline to filter visual noises. Based on the visual cross-attention, we further devise a hybrid training strategy by employing four unsupervised and two supervised tasks to reduce the mismatch between the semantic representation spaces of source and target languages. Extensive experiments conducted on the Multi30k dataset show that our method outperforms the state-of-the-art unsupervised methods with large-scale extra corpora for pretraining in terms of METEOR metric, yet only requires 7% parallel corpora.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 79.99; Price excludes VAT (USA)

Softcover Book: USD 99.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Calixto, I., Chowdhury, K.D., Liu, Q.: DCU system report on the WMT 2017 multi-modal machine translation task. In: Proceedings of the Second Conference on Machine Translation (WMT), pp. 440–444 (2017)
Google Scholar
Calixto, I., Liu, Q.: Incorporating global visual features into attention-based neural machine translation. In: EMNLP, pp. 992–1003. Association for Computational Linguistics (2017)
Google Scholar
Calixto, I., Rios, M., Aziz, W.: Latent variable model for multi-modal translation. In: ACL (1), pp. 6392–6405. Association for Computational Linguistics (2019)
Google Scholar
Chen, S., Jin, Q., Fu, J.: From words to sentences: A progressive learning approach for zero-resource machine translation with visual pivots. In: Proceedings of the 28th International Joint Conference on Artificial Intelligence (IJCAI), pp. 4932–4938 (2019)
Google Scholar
Cheng, Y., et al.: Semi-supervised learning for neural machine translation. In: Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (ACL) (2016)
Google Scholar
Deng, J., Dong, W., Socher, R., Li, L., Li, K., Fei-Fei, L.: Imagenet: A large-scale hierarchical image database. In: Proceedings of the 2009 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 248–255 (2009)
Google Scholar
Elliott, D., Frank, S., Sima’an, K., Specia, L.: Multi30k: Multilingual english-german image descriptions. In: Proceedings of the 5th Workshop on Vision and Language, hosted by the 54th Annual Meeting of the Association for Computational Linguistics (VL@ACL), pp. 627–633 (2016)
Google Scholar
Gehring, J., Auli, M., Grangier, D., and D.Y.: Convolutional sequence to sequence learning. In: Proceedings of the 34th International Conference on Machine Learning (ICML), pp. 1243–1252 (2017)
Google Scholar
Grönroos, S., et al.: The memad submission to the WMT18 multimodal translation task. In: Proceedings of the Third Conference on Machine Translation: Shared Task Papers (WMT), pp. 603–611 (2018)
Google Scholar
Han, Y., Li, L., Zhang, J.: A coordinated representation learning enhanced multimodal machine translation approach with multi-attention. In: Proceedings of the 2020 on International Conference on Multimedia Retrieval (ICMR), pp. 571–577 (2020)
Google Scholar
Helcl, J., Libovický, J., Varis, D.: CUNI system for the WMT18 multimodal translation task. In: Proceedings of the Third Conference on Machine Translation (WMT), pp. 616–623 (2018)
Google Scholar
Huang, P., Sun, S., Yang, H.: Image-assisted transformer in zero-resource multi-modal translation. In: Proceedings of the 2021IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 7548–7552 (2021)
Google Scholar
Ive, J., Madhyastha, P., Specia, L.: Distilling translations with visual awareness. In: ACL, pp. 6525–6538. Association for Computational Linguistics (2019)
Google Scholar
Karita, S., Watanabe, S., Iwata, T., Delcroix, M., Ogawa, A., Nakatani, T.: Semi-supervised end-to-end speech recognition using text-to-speech and autoencoders. In: Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 6166–6170 (2019)
Google Scholar
Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. In: Proceedings of the Third International Conference on Learning Representations (ICLR) (2015)
Google Scholar
Lample, G., Conneau, A., Denoyer, L., Ranzato, M.: Unsupervised machine translation using monolingual corpora only. In: Proceedings of the 6th International Conference on Learning Representations (ICLR) (2018)
Google Scholar
Lavie, A., Agarwal, A.: METEOR: an automatic metric for MT evaluation with high levels of correlation with human judgments. In: Proceedings of the Second Workshop on Statistical Machine Translation (WMT@ACL), pp. 228–231 (2007)
Google Scholar
Li, L., Hu, K., Zheng, Y., Liu, J., Lee, K.A.: Coopnet: Multi-modal cooperative gender prediction in social media user profiling. In: ICASSP, pp. 4310–4314. IEEE (2021)
Google Scholar
Li, L., Tayir, T., Hu, K., Zhou, D.: Multi-modal and multi-perspective machine translation by collecting diverse alignments. In: Pham, D.N., Theeramunkong, T., Governatori, G., Liu, F. (eds.) PRICAI 2021. LNCS (LNAI), vol. 13032, pp. 311–322. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-89363-7_24
Chapter Google Scholar
Papineni, K., Roukos, S., Ward, T., Zhu, W.: Bleu: a method for automatic evaluation of machine translation. In: Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics (ACL), pp. 311–318 (2002)
Google Scholar
Su, Y., Fan, K., Bach, N., Kuo, C.J., Huang, F.: Unsupervised multi-modal neural machine translation. In: Proceedings of the 2019 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 10482–10491 (2019)
Google Scholar
Vaswani, A., et al.: Attention is all you need. In: Proceedings of the 31th Conference on Neural Information Processing Systems (NIPS), pp. 5998–6008 (2017)
Google Scholar
Vincent, P., Larochelle, H., Bengio, Y., Manzagol, P.: Extracting and composing robust features with denoising autoencoders. In: Proceedings of the 25th International Conference on Machine Learning (ICML), pp. 1096–1103 (2008)
Google Scholar
Wang, Y., et al.: Semi-supervised neural machine translation via marginal distribution estimation. IEEE ACM Trans. Audio Speech Lang. Process. 27(10), 1564–1576 (2019)
Google Scholar
Xu, W., Niu, X., Carpuat, M.: Dual reconstruction: a unifying objective for semi-supervised neural machine translation. In: Findings of the Association for Computational Linguistics: EMNLP, pp. 2006–2020 (2020)
Google Scholar
Yao, S., Wan, X.: Multimodal transformer for multimodal machine translation. In: Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics (ACL), pp. 4346–4350 (2020)
Google Scholar

Download references

Acknowledgements

This work is supported in part by the National Natural Science Foundation of China (62276196), the Key Research and Development Program of Hubei Province (No. 2021BAA030) and the China Scholarship Council (LiuJinMei [2020] 1509, 202106950041).

Author information

Authors and Affiliations

School of Computer Science and Artificial Intelligence, Wuhan University of Technology, Wuhan, China
Lin Li, Kaixi Hu & Turghun Tayir
Visual Intelligence Research Laboratories, NEC Corporation, Tokyo, Japan
Jianquan Liu
Institute for Infocomm Research, A*STAR, Singapore, Singapore
Kong Aik Lee

Authors

Lin Li
View author publications
You can also search for this author in PubMed Google Scholar
Kaixi Hu
View author publications
You can also search for this author in PubMed Google Scholar
Turghun Tayir
View author publications
You can also search for this author in PubMed Google Scholar
Jianquan Liu
View author publications
You can also search for this author in PubMed Google Scholar
Kong Aik Lee
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Lin Li .

Editor information

Editors and Affiliations

CSIRO Australian e-Health Research Centre, Brisbane, QLD, Australia
Sankalp Khanna
Shanghai Jiao Tong University, Shanghai, China
Jian Cao
University of Tasmania, Hobart, TAS, Australia
Quan Bai
University of Technology Sydney, Sydney, NSW, Australia
Guandong Xu

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Li, L., Hu, K., Tayir, T., Liu, J., Lee, K.A. (2022). Noise-Robust Semi-supervised Multi-modal Machine Translation. In: Khanna, S., Cao, J., Bai, Q., Xu, G. (eds) PRICAI 2022: Trends in Artificial Intelligence. PRICAI 2022. Lecture Notes in Computer Science, vol 13630. Springer, Cham. https://doi.org/10.1007/978-3-031-20865-2_12

Download citation

DOI: https://doi.org/10.1007/978-3-031-20865-2_12
Published: 04 November 2022
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-20864-5
Online ISBN: 978-3-031-20865-2
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics