research-article

Unsupervised Multimodal Machine Translation for Low-resource Distant Language Pairs

Authors:

Lin LiAuthors Info & Claims

ACM Transactions on Asian and Low-Resource Language Information Processing, Volume 23, Issue 4

Article No.: 55, Pages 1 - 22

https://doi.org/10.1145/3652161

Published: 15 April 2024 Publication History

Abstract

Unsupervised machine translation (UMT) has recently attracted more attention from researchers, enabling models to translate when languages lack parallel corpora. However, the current works mainly consider close language pairs (e.g., English-German and English-French), and the effectiveness of visual content for distant language pairs has yet to be investigated. This article proposes an unsupervised multimodal machine translation model for low-resource distant language pairs. Specifically, we first employ adequate measures such as transliteration and re-ordering to bring distant language pairs closer together. We then use visual content to extend masked language modeling and generate visual masked language modeling for UMT. Finally, empirical experiments are conducted on our distant language pair dataset and the public Multi30k dataset. Experimental results demonstrate the superior performance of our model, with BLEU score improvements of 2.5 and 2.6 on translation for distant language pairs English-Uyghur and Chinese-Uyghur. Moreover, our model also brings remarkable results for close language pairs, improving 2.3 BLEU compared with the existing models in English-German.

References

[1]

Mikel Artetxe, Gorka Labaka, Eneko Agirre, and Kyunghyun Cho. 2018. Unsupervised neural machine translation. In Proceedings of the 6th International Conference on Learning Representations. 1–12.

[2]

Dzmitry Bahdanau, Kyunghyun Cho, and Yoshua Bengio. 2015. Neural machine translation by jointly learning to align and translate. In Proceedings of the 3rd International Conference on Learning Representations. 1–15.

[3]

Ozan Caglayan, Mercedes García-Martínez, Adrien Bardet, Walid Aransa, Fethi Bougares, and Loïc Barrault. 2017. NMTPY: A flexible toolkit for advanced neural machine translation systems. Prague Bull. Math. Linguistics 109 (2017), 15–28.

[4]

Ozan Caglayan, Menekse Kuyu, Mustafa Sercan Amac, Pranava Madhyastha, Erkut Erdem, Aykut Erdem, and Lucia Specia. 2021. Cross-lingual visual pre-training for multimodal machine translation. In Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics. 1317–1324.

[5]

Pi-Chuan Chang, Michel Galley, and Christopher D. Manning. 2008. Optimizing chinese word segmentation for machine translation performance. In Proceedings of the 3rd Workshop on Statistical Machine Translation. 224–232.

Digital Library

[6]

Shizhe Chen, Qin Jin, and Jianlong Fu. 2019. From words to sentences: A progressive learning approach for zero-resource machine translation with visual pivots. In Proceedings of the 28th International Joint Conference on Artificial Intelligence. 4932–4938.

[7]

Yun Chen, Yang Liu, and Victor O. K. Li. 2018. Zero-resource neural machine translation with multi-agent communication game. In Proceedings of the 32nd AAAI Conference on Artificial Intelligence (AAAI’18), the 30th Innovative Applications of Artificial Intelligence (IAAI’18), and the 8th AAAI Symposium on Educational Advances in Artificial Intelligence. 5086–5093.

[8]

Yong Cheng, Qian Yang, Yang Liu, Maosong Sun, and Wei Xu. 2017. Joint training for pivot-based neural machine translation. In Proceedings of the 26th International Joint Conference on Artificial Intelligence. 3974–3980.

Digital Library

[9]

Kyunghyun Cho, Bart van Merrienboer, Dzmitry Bahdanau, and Yoshua Bengio. 2014. On the properties of neural machine translation: Encoder-decoder approaches. In Proceedings of the SSST@EMNLP 8th Workshop on Syntax, Semantics and Structure in Statistical Translation. 103–111.

[10]

Kyunghyun Cho, Bart Van Merrienboer, Caglar Gulcehre, Dzmitry Bahdanau, Fethi Bougares, Holger Schwenk, and Yoshua Bengio. 2014. Learning phrase representations using RNN encoder-decoder for statistical machine translation. In Proceedings of the Conference on Empirical Methods in Natural Language Processing. 1724–1734.

[11]

Alexis Conneau and Guillaume Lample. 2019. Cross-lingual language model pretraining. In Proceedings of the 32nd Annual Conference on Neural Information Processing Systems. 7057–7067.

[12]

Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2019. BERT: Pre-training of deep bidirectional transformers for language understanding. In Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 4171–4186.

[13]

Jacob Devlin, Ming Wei Chang, Kenton Lee, and Kristina Toutanova. 2019. BERT: Pre-training of deep bidirectional transformers for language understanding. In Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics. 4171–4186.

[14]

Desmond Elliott, Stella Frank, Khalil Sima’an, and Lucia Specia. 2016. Multi30K: Multilingual English-German image descriptions. In Proceedings of the 5th Workshop on Vision and Language. 70–74.

[15]

Orhan Firat, Baskaran Sankaran, Yaser Al-Onaizan, Fatos T. Yarman-Vural, and Kyunghyun Cho. 2016. Zero-resource translation with multi-lingual neural machine translation. In Proceedings of the Conference on Empirical Methods in Natural Language Processing. 268–277.

[16]

Stella Frank, Desmond Elliott, and Lucia Specia. 2018. Assessing multilingual multimodal image description: Studies of native speaker preferences and translator choices. Nat. Lang. Eng. 24, 3 (2018), 393–413.

[17]

Jindrich Helcl, Jindrich Libovický, and Dusan Varis. 2018. CUNI system for the WMT18 multimodal translation task. In Proceedings of the 3rd Conference on Machine Translation. 616–623.

[18]

Cong Duy Vu Hoang, Philipp Koehn, Gholamreza Haffari, and Trevor Cohn. 2018. Iterative back-translation for neural machine translation. In Proceedings of the 2nd Workshop on Neural Machine Translation and Generation. 18–24.

[19]

Jeremy Howard and Sebastian Ruder. 2018. Universal language model fine-tuning for text classification. In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics. 328–339.

[20]

Po-Yao Huang, Junjie Hu, Xiaojun Chang, and Alexander G. Hauptmann. 2020. Unsupervised multimodal neural machine translation with pseudo visual pivoting. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. 8226–8237.

[21]

Ping Huang, Shiliang Sun, and Hao Yang. 2021. Image-assisted transformer in zero-resource multi-modal translation. In Proceedings of the International Conference on Acoustics, Speech and Signal Processing. 7548–7552.

[22]

Hideki Isozaki, Tsutomu Hirao, Kevin Duh, Katsuhito Sudoh, and Hajime Tsukada. 2010. Automatic evaluation of translation quality for distant language pairs. In Proceedings of the Conference on Empirical Methods in Natural Language Processing. 944–952.

[23]

Hideki Isozaki, Katsuhito Sudoh, Hajime Tsukada, and Kevin Duh. 2010. Head finalization: A simple reordering rule for SOV languages. In Proceedings of the Joint 5th Workshop on Statistical Machine Translation and MetricsMATR. 244–251.

[24]

Yunsu Kim, Miguel Graça, and Hermann Ney. 2020. When and why is unsupervised neural machine translation useless?. In Proceedings of the 22nd Annual Conference of the European Association for Machine Translation. 35–44.

[25]

Diederik P. Kingma and Jimmy Ba. 2015. Adam: A method for stochastic optimization. In Proceedings of the Conference on Computer Vision and Pattern Recognition. 1–15.

[26]

Philipp Koehn. 2004. Statistical significance tests for machine translation evaluation. In Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP’04), A meeting of SIGDAT, a Special Interest Group of the ACL, Held in Conjunction with ACL. 388–395.

[27]

Philipp Koehn, Hieu Hoang, Alexandra Birch, Chris Callison-Burch, Marcello Federico, Nicola Bertoldi, Brooke Cowan, Wade Shen, Christine Moran, Richard Zens, Chris Dyer, Ondrej Bojar, Alexandra Constantin, and Evan Herbst. 2007. Moses: Open source toolkit for statistical machine translation. In Proceedings of the 45th Annual Meeting of the Association for Computational Linguistics. 177–180.

Digital Library

[28]

Alina Kuznetsova, Hassan Rom, Neil Alldrin, Jasper Uijlings, Ivan Krasin, Jordi Pont-Tuset, Shahab Kamali, Stefan Popov, Matteo Malloci, Alexander Kolesnikov et al. 2020. The open images dataset v4: Unified image classification, object detection, and visual relationship detection at scale. Int. J. Comput. Vision 128, 7 (2020), 1956–1981.

[29]

Guillaume Lample, Alexis Conneau, Ludovic Denoyer, and Marc’Aurelio Ranzato. 2018. Unsupervised machine translation using monolingual corpora only. In Proceedings of the 6th International Conference on Learning Representations. 1–14.

[30]

Alon Lavie and Abhaya Agarwal. 2007. METEOR: An automatic metric for MT evaluation with high levels of correlation with human judgments. In Proceedings of the 2nd Workshop on Statistical Machine Translation. 228–231.

[31]

Lin Li, Kaixi Hu, Turghun Tayir, Jianquan Liu, and Kong Aik Lee. 2022. Noise-robust semi-supervised multi-modal machine translation. In Proceedings of the 19th Pacific Rim International Conference on Artificial Intelligence. 155–168.

Digital Library

[32]

Lin Li, Turghun Tayir, Yifeng Han, Xiaohui Tao, and Juan D. Velásquez. 2023. Multimodality information fusion for automated machine translation. Info. Fusion 91 (2023), 352–363.

Digital Library

[33]

Lin Li, Turghun Tayir, Kaixi Hu, and Dong Zhou. 2021. Multi-modal and multi-perspective machine translation by collecting diverse alignments. In Proceedings of the 18th Pacific Rim International Conference on Artificial Intelligence. 311–322.

Digital Library

[34]

Mingjie Li, Po-Yao Huang, Xiaojun Chang, Junjie Hu, Yi Yang, and Alex Hauptmann. 2023. Video pivoting unsupervised multi-modal machine translation. IEEE Trans. Pattern Anal. Mach. Intell. 45, 3 (2023), 3918–3932.

[35]

Kelly Marchisio, Kevin Duh, and Philipp Koehn. 2020. When does unsupervised machine translation work? In Proceedings of the 5th Conference on Machine Translation. 571–583.

[36]

Hideki Nakayama and Noriki Nishida. 2017. Zero-resource machine translation by multimodal encoder-decoder network with multimedia pivot. Mach. Transl. 31, 1-2 (2017), 49–64.

Digital Library

[37]

Graham Neubig, Zi-Yi Dou, Junjie Hu, Paul Michel, Danish Pruthi, and Xinyi Wang. 2019. compare-mt: A tool for holistic comparison of language generation systems. In Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL-HLT’19). 35–41.

[38]

Kishore Papineni, Salim Roukos, Todd Ward, and Wei-Jing Zhu. 2002. Bleu: A method for automatic evaluation of machine translation. In Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics. 311–318.

Digital Library

[39]

Shaoqing Ren, Kaiming He, Ross B. Girshick, and Jian Sun. 2015. Faster R-CNN: Towards real-time object detection with region proposal networks. In Proceedings of the Conference on Neural Information Processing Systems. 91–99.

[40]

Rico Sennrich, Barry Haddow, and Alexandra Birch. 2016. Improving neural machine translation models with monolingual data. In Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics. 86–96.

[41]

Rico Sennrich, Barry Haddow, and Alexandra Birch. 2016. Neural machine translation of rare words with subword units. In Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics. 1715–1725.

[42]

Matthew Snover, Bonnie Dorr, Richard Schwartz, Linnea Micciulla, and John Makhoul. 2006. A study of translation edit rate with targeted human annotation. In Proceedings of the Association for Machine Translation in the Americas. 223–231.

[43]

Kaitao Song, Xu Tan, Tao Qin, Jianfeng Lu, and Tie-Yan Liu. 2019. MASS: Masked sequence to sequence pre-training for language generation. In Proceedings of the 36th International Conference on Machine Learning. 5926–5936.

[44]

Linfeng Song, Daniel Gildea, Yue Zhang, Zhiguo Wang, and Jinsong Su. 2019. Semantic neural machine translation using AMR. Trans. Assoc. Comput. Linguistics 7 (2019), 19–31.

[45]

Yuanhang Su, Kai Fan, Nguyen Bach, C.-C. Jay Kuo, and Fei Huang. 2019. Unsupervised multi-modal neural machine translation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 10482–10491.

[46]

Haipeng Sun, Rui Wang, Masao Utiyama, Benjamin Marie, Kehai Chen, Eiichiro Sumita, and Tiejun Zhao. 2021. Unsupervised neural machine translation for similar and distant language pairs: An empirical study. ACM Trans. Asian Low Resour. Lang. Inf. Process. 20, 1 (2021), 10:1–10:17.

Digital Library

[47]

Ilya Sutskever, Oriol Vinyals, and Quoc V. Le. 2014. Sequence to sequence learning with neural networks. In Proceedings of the 27th Annual Conference on Neural Information Processing Systems. 3104–3112.

[48]

Turghun Tayir, Lin Li, Bei Li, Jianquan Liu, and Kong Aik Lee. 2024. Encoder-decoder calibration for multimodal machine translation. IEEE Trans. Artific. Intell. (2024), 1–9. plore.ieee.org/document/10401981

[49]

Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Lukasz Kaiser, and Illia Polosukhin. 2017. Attention is all you need. In Proceedings of the 30th Annual Conference on Neural Information Processing Systems. 5998–6008.

[50]

Pascal Vincent, Hugo Larochelle, Yoshua Bengio, and Pierre-Antoine Manzagol. 2008. Extracting and composing robust features with denoising autoencoders. In Proceedings of the 25th International Conference on Machine Learning. 1096–1103.

Digital Library

[51]

Yijun Wang, Tianxin Wei, Qi Liu, and Enhong Chen. 2021. Unpaired multimodal neural machine translation via reinforcement learning. In Proceedings of the 26th International Conference on Database Systems for Advanced Applications. 168–185.

Digital Library

[52]

Zhe Yang, Qingkai Fang, and Yang Feng. 2022. Low-resource neural machine translation with cross-modal alignment. In Proceedings of the Conference on Empirical Methods in Natural Language Processing. 10134–10146.

[53]

Peter Young, Alice Lai, Micah Hodosh, and Julia Hockenmaier. 2014. From image descriptions to visual denotations: New similarity metrics for semantic inference over event descriptions. Trans. Assoc. Comput. Linguistics 2 (2014), 67–78.

[54]

Kun Yu, Yusuke Miyao, Xiangli Wang, Takuya Matsuzaki, and Jun’ichi Tsujii. 2010. Semi-automatically developing Chinese HPSG grammar from the Penn Chinese treebank for deep parsing. In Proceedings of the 23rd International Conference on Computational Linguistics (COLING’10). 1417–1425.

[55]

Zhirui Zhang, Shujie Liu, Mu Li, Ming Zhou, and Enhong Chen. 2018. Joint training for neural machine translation models with monolingual data. In Proceedings of the 32nd AAAI Conference on Artificial Intelligence (AAAI’18), the 30th Innovative Applications of Artificial Intelligence (IAAI’18), and the 8th AAAI Symposium on Educational Advances in Artificial Intelligence (EAAI’18). 555–562.

[56]

Chunting Zhou, Xuezhe Ma, Junjie Hu, and Graham Neubig. [n.d.]. Handling syntactic divergence in low-resource machine translation. In Proceedings of the Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing. 1388–1394.

[57]

Barret Zoph, Deniz Yuret, Jonathan May, and Kevin Knight. 2016. Transfer learning for low-resource neural machine translation. In Proceedings of the Conference on Empirical Methods in Natural Language Processing. 1568–1575.

Cited By

Qiu KChen JAshraf SShahid T(2025)Strategic Decision Support System With Probabilistic Linguistic Term Sets: Extended CRADIS Approach for Supply Chain Risk Management in Sports IndustryIEEE Access10.1109/ACCESS.2024.341639113(32853-32862)Online publication date: 2025
https://doi.org/10.1109/ACCESS.2024.3416391
Naz SShafiq AButt STasneem RPamucar DGonzalez Z(2025)Decision-making model for selecting products through online product reviews utilizing natural language processing techniquesNeurocomputing10.1016/j.neucom.2024.128593611:COnline publication date: 1-Jan-2025
https://dl.acm.org/doi/10.1016/j.neucom.2024.128593
Yang SYang Q(2025)Joint pairwise learning and masked language models for neural machine translation of EnglishArtificial Life and Robotics10.1007/s10015-025-01008-2Online publication date: 10-Feb-2025
https://doi.org/10.1007/s10015-025-01008-2
Show More Cited By

Index Terms

Unsupervised Multimodal Machine Translation for Low-resource Distant Language Pairs
1. Computing methodologies
  1. Artificial intelligence
    1. Natural language processing
      1. Machine translation

Recommendations

Low resource machine translation of english–manipuri: A semi-supervised approach
Abstract
The language barrier is one of the practical challenges human being face during communication. To overcome this, researchers are focusing on using machines to translate a source language to a target language using the textual ...
Highlights
- Backtranslation and forward-translation improve the low resource machine translation.
Unsupervised Neural Machine Translation for Similar and Distant Language Pairs: An Empirical Study
Special issue on Deep Learning for Low-Resource Natural Language Processing, Part 1 and Regular Papers

Unsupervised neural machine translation (UNMT) has achieved remarkable results for several language pairs, such as French–English and German–English. Most previous studies have focused on modeling UNMT systems; few studies have investigated the effect ...
Source language adaptation approaches for resource-poor machine translation

Most of the world languages are resource-poor for statistical machine translation; still, many of them are actually related to some resource-rich language. Thus, we propose three novel, language-independent approaches to source language adaptation for ...

Comments

Information & Contributors

Information

Published In

cover image ACM Transactions on Asian and Low-Resource Language Information Processing

ACM Transactions on Asian and Low-Resource Language Information Processing Volume 23, Issue 4

April 2024

221 pages

EISSN:2375-4702

DOI:10.1145/3613577

Editor:
Imed Zitouni
Google, USA

Issue’s Table of Contents

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 15 April 2024

Online AM: 09 March 2024

Accepted: 05 March 2024

Revised: 26 February 2024

Received: 07 November 2023

Published in TALLIP Volume 23, Issue 4

Check for updates

Author Tags

Qualifiers

Research-article

Funding Sources

NSFC, China

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

25
Total Citations
View Citations
1,422
Total Downloads

Downloads (Last 12 months)1,422
Downloads (Last 6 weeks)30

Reflects downloads up to 02 Mar 2025

Other Metrics

View Author Metrics

Citations

Cited By

Qiu KChen JAshraf SShahid T(2025)Strategic Decision Support System With Probabilistic Linguistic Term Sets: Extended CRADIS Approach for Supply Chain Risk Management in Sports IndustryIEEE Access10.1109/ACCESS.2024.341639113(32853-32862)Online publication date: 2025
https://doi.org/10.1109/ACCESS.2024.3416391
Naz SShafiq AButt STasneem RPamucar DGonzalez Z(2025)Decision-making model for selecting products through online product reviews utilizing natural language processing techniquesNeurocomputing10.1016/j.neucom.2024.128593611:COnline publication date: 1-Jan-2025
https://dl.acm.org/doi/10.1016/j.neucom.2024.128593
Yang SYang Q(2025)Joint pairwise learning and masked language models for neural machine translation of EnglishArtificial Life and Robotics10.1007/s10015-025-01008-2Online publication date: 10-Feb-2025
https://doi.org/10.1007/s10015-025-01008-2
Khan NMohmand MRehman SUllah ZKhan ZBoulila W(2024)Advancements in intrusion detection: A lightweight hybrid RNN-RF modelPLOS ONE10.1371/journal.pone.029966619:6(e0299666)Online publication date: 21-Jun-2024
https://doi.org/10.1371/journal.pone.0299666
ZHAO XLIU JZHOU MJIANG XQI X(2024)A dataset of Tibetan-Chinese speech translationChina Scientific Data10.11922/11-6035.csd.2024.0023.zh9:4(1-9)Online publication date: 20-Dec-2024
https://doi.org/10.11922/11-6035.csd.2024.0023.zh
Sheng X(2024)Design of Multimodal Retrieval Model for Translation Domain Based on BERTProceedings of the 2024 International Conference on Machine Intelligence and Digital Applications10.1145/3662739.3672185(168-172)Online publication date: 30-May-2024
https://dl.acm.org/doi/10.1145/3662739.3672185
Jamil HJian YJamil FAhmad S(2024)Swarm Learning Empowered Federated Deep Learning for Seamless Smartphone-Based Activity RecognitionIEEE Transactions on Consumer Electronics10.1109/TCE.2024.347907870:4(6919-6935)Online publication date: 1-Nov-2024
https://dl.acm.org/doi/10.1109/TCE.2024.3479078
Gao A(2024)English Translation Assistance System Integrating Machine Learning Algorithms2024 International Conference on Intelligent Algorithms for Computational Intelligence Systems (IACIS)10.1109/IACIS61494.2024.10721728(1-4)Online publication date: 23-Aug-2024
https://doi.org/10.1109/IACIS61494.2024.10721728
Tariq UHu ZTasneem KHeyat MIqbal MAziz K(2024)ClusterE-ZSL: A Novel Cluster-Based Embedding for Enhanced Zero-Shot Learning in Contrastive Pre-Training Cross-Modal RetrievalIEEE Access10.1109/ACCESS.2024.347608212(162622-162637)Online publication date: 2024
https://doi.org/10.1109/ACCESS.2024.3476082
Batool AByun Y(2024)Enhanced Sentiment Analysis and Topic Modeling During the Pandemic Using Automated Latent Dirichlet AllocationIEEE Access10.1109/ACCESS.2024.341171712(81206-81220)Online publication date: 2024
https://doi.org/10.1109/ACCESS.2024.3411717
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Full Text

View this article in Full Text.

Figures

Tables

Media

View full text|Download PDF

View Issue’s Table of Contents