skip to main content
research-article

Unsupervised Multimodal Machine Translation for Low-resource Distant Language Pairs

Published: 15 April 2024 Publication History

Abstract

Unsupervised machine translation (UMT) has recently attracted more attention from researchers, enabling models to translate when languages lack parallel corpora. However, the current works mainly consider close language pairs (e.g., English-German and English-French), and the effectiveness of visual content for distant language pairs has yet to be investigated. This article proposes an unsupervised multimodal machine translation model for low-resource distant language pairs. Specifically, we first employ adequate measures such as transliteration and re-ordering to bring distant language pairs closer together. We then use visual content to extend masked language modeling and generate visual masked language modeling for UMT. Finally, empirical experiments are conducted on our distant language pair dataset and the public Multi30k dataset. Experimental results demonstrate the superior performance of our model, with BLEU score improvements of 2.5 and 2.6 on translation for distant language pairs English-Uyghur and Chinese-Uyghur. Moreover, our model also brings remarkable results for close language pairs, improving 2.3 BLEU compared with the existing models in English-German.

References

[1]
Mikel Artetxe, Gorka Labaka, Eneko Agirre, and Kyunghyun Cho. 2018. Unsupervised neural machine translation. In Proceedings of the 6th International Conference on Learning Representations. 1–12.
[2]
Dzmitry Bahdanau, Kyunghyun Cho, and Yoshua Bengio. 2015. Neural machine translation by jointly learning to align and translate. In Proceedings of the 3rd International Conference on Learning Representations. 1–15.
[3]
Ozan Caglayan, Mercedes García-Martínez, Adrien Bardet, Walid Aransa, Fethi Bougares, and Loïc Barrault. 2017. NMTPY: A flexible toolkit for advanced neural machine translation systems. Prague Bull. Math. Linguistics 109 (2017), 15–28.
[4]
Ozan Caglayan, Menekse Kuyu, Mustafa Sercan Amac, Pranava Madhyastha, Erkut Erdem, Aykut Erdem, and Lucia Specia. 2021. Cross-lingual visual pre-training for multimodal machine translation. In Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics. 1317–1324.
[5]
Pi-Chuan Chang, Michel Galley, and Christopher D. Manning. 2008. Optimizing chinese word segmentation for machine translation performance. In Proceedings of the 3rd Workshop on Statistical Machine Translation. 224–232.
[6]
Shizhe Chen, Qin Jin, and Jianlong Fu. 2019. From words to sentences: A progressive learning approach for zero-resource machine translation with visual pivots. In Proceedings of the 28th International Joint Conference on Artificial Intelligence. 4932–4938.
[7]
Yun Chen, Yang Liu, and Victor O. K. Li. 2018. Zero-resource neural machine translation with multi-agent communication game. In Proceedings of the 32nd AAAI Conference on Artificial Intelligence (AAAI’18), the 30th Innovative Applications of Artificial Intelligence (IAAI’18), and the 8th AAAI Symposium on Educational Advances in Artificial Intelligence. 5086–5093.
[8]
Yong Cheng, Qian Yang, Yang Liu, Maosong Sun, and Wei Xu. 2017. Joint training for pivot-based neural machine translation. In Proceedings of the 26th International Joint Conference on Artificial Intelligence. 3974–3980.
[9]
Kyunghyun Cho, Bart van Merrienboer, Dzmitry Bahdanau, and Yoshua Bengio. 2014. On the properties of neural machine translation: Encoder-decoder approaches. In Proceedings of the SSST@EMNLP 8th Workshop on Syntax, Semantics and Structure in Statistical Translation. 103–111.
[10]
Kyunghyun Cho, Bart Van Merrienboer, Caglar Gulcehre, Dzmitry Bahdanau, Fethi Bougares, Holger Schwenk, and Yoshua Bengio. 2014. Learning phrase representations using RNN encoder-decoder for statistical machine translation. In Proceedings of the Conference on Empirical Methods in Natural Language Processing. 1724–1734.
[11]
Alexis Conneau and Guillaume Lample. 2019. Cross-lingual language model pretraining. In Proceedings of the 32nd Annual Conference on Neural Information Processing Systems. 7057–7067.
[12]
Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2019. BERT: Pre-training of deep bidirectional transformers for language understanding. In Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 4171–4186.
[13]
Jacob Devlin, Ming Wei Chang, Kenton Lee, and Kristina Toutanova. 2019. BERT: Pre-training of deep bidirectional transformers for language understanding. In Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics. 4171–4186.
[14]
Desmond Elliott, Stella Frank, Khalil Sima’an, and Lucia Specia. 2016. Multi30K: Multilingual English-German image descriptions. In Proceedings of the 5th Workshop on Vision and Language. 70–74.
[15]
Orhan Firat, Baskaran Sankaran, Yaser Al-Onaizan, Fatos T. Yarman-Vural, and Kyunghyun Cho. 2016. Zero-resource translation with multi-lingual neural machine translation. In Proceedings of the Conference on Empirical Methods in Natural Language Processing. 268–277.
[16]
Stella Frank, Desmond Elliott, and Lucia Specia. 2018. Assessing multilingual multimodal image description: Studies of native speaker preferences and translator choices. Nat. Lang. Eng. 24, 3 (2018), 393–413.
[17]
Jindrich Helcl, Jindrich Libovický, and Dusan Varis. 2018. CUNI system for the WMT18 multimodal translation task. In Proceedings of the 3rd Conference on Machine Translation. 616–623.
[18]
Cong Duy Vu Hoang, Philipp Koehn, Gholamreza Haffari, and Trevor Cohn. 2018. Iterative back-translation for neural machine translation. In Proceedings of the 2nd Workshop on Neural Machine Translation and Generation. 18–24.
[19]
Jeremy Howard and Sebastian Ruder. 2018. Universal language model fine-tuning for text classification. In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics. 328–339.
[20]
Po-Yao Huang, Junjie Hu, Xiaojun Chang, and Alexander G. Hauptmann. 2020. Unsupervised multimodal neural machine translation with pseudo visual pivoting. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. 8226–8237.
[21]
Ping Huang, Shiliang Sun, and Hao Yang. 2021. Image-assisted transformer in zero-resource multi-modal translation. In Proceedings of the International Conference on Acoustics, Speech and Signal Processing. 7548–7552.
[22]
Hideki Isozaki, Tsutomu Hirao, Kevin Duh, Katsuhito Sudoh, and Hajime Tsukada. 2010. Automatic evaluation of translation quality for distant language pairs. In Proceedings of the Conference on Empirical Methods in Natural Language Processing. 944–952.
[23]
Hideki Isozaki, Katsuhito Sudoh, Hajime Tsukada, and Kevin Duh. 2010. Head finalization: A simple reordering rule for SOV languages. In Proceedings of the Joint 5th Workshop on Statistical Machine Translation and MetricsMATR. 244–251.
[24]
Yunsu Kim, Miguel Graça, and Hermann Ney. 2020. When and why is unsupervised neural machine translation useless?. In Proceedings of the 22nd Annual Conference of the European Association for Machine Translation. 35–44.
[25]
Diederik P. Kingma and Jimmy Ba. 2015. Adam: A method for stochastic optimization. In Proceedings of the Conference on Computer Vision and Pattern Recognition. 1–15.
[26]
Philipp Koehn. 2004. Statistical significance tests for machine translation evaluation. In Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP’04), A meeting of SIGDAT, a Special Interest Group of the ACL, Held in Conjunction with ACL. 388–395.
[27]
Philipp Koehn, Hieu Hoang, Alexandra Birch, Chris Callison-Burch, Marcello Federico, Nicola Bertoldi, Brooke Cowan, Wade Shen, Christine Moran, Richard Zens, Chris Dyer, Ondrej Bojar, Alexandra Constantin, and Evan Herbst. 2007. Moses: Open source toolkit for statistical machine translation. In Proceedings of the 45th Annual Meeting of the Association for Computational Linguistics. 177–180.
[28]
Alina Kuznetsova, Hassan Rom, Neil Alldrin, Jasper Uijlings, Ivan Krasin, Jordi Pont-Tuset, Shahab Kamali, Stefan Popov, Matteo Malloci, Alexander Kolesnikov et al. 2020. The open images dataset v4: Unified image classification, object detection, and visual relationship detection at scale. Int. J. Comput. Vision 128, 7 (2020), 1956–1981.
[29]
Guillaume Lample, Alexis Conneau, Ludovic Denoyer, and Marc’Aurelio Ranzato. 2018. Unsupervised machine translation using monolingual corpora only. In Proceedings of the 6th International Conference on Learning Representations. 1–14.
[30]
Alon Lavie and Abhaya Agarwal. 2007. METEOR: An automatic metric for MT evaluation with high levels of correlation with human judgments. In Proceedings of the 2nd Workshop on Statistical Machine Translation. 228–231.
[31]
Lin Li, Kaixi Hu, Turghun Tayir, Jianquan Liu, and Kong Aik Lee. 2022. Noise-robust semi-supervised multi-modal machine translation. In Proceedings of the 19th Pacific Rim International Conference on Artificial Intelligence. 155–168.
[32]
Lin Li, Turghun Tayir, Yifeng Han, Xiaohui Tao, and Juan D. Velásquez. 2023. Multimodality information fusion for automated machine translation. Info. Fusion 91 (2023), 352–363.
[33]
Lin Li, Turghun Tayir, Kaixi Hu, and Dong Zhou. 2021. Multi-modal and multi-perspective machine translation by collecting diverse alignments. In Proceedings of the 18th Pacific Rim International Conference on Artificial Intelligence. 311–322.
[34]
Mingjie Li, Po-Yao Huang, Xiaojun Chang, Junjie Hu, Yi Yang, and Alex Hauptmann. 2023. Video pivoting unsupervised multi-modal machine translation. IEEE Trans. Pattern Anal. Mach. Intell. 45, 3 (2023), 3918–3932.
[35]
Kelly Marchisio, Kevin Duh, and Philipp Koehn. 2020. When does unsupervised machine translation work? In Proceedings of the 5th Conference on Machine Translation. 571–583.
[36]
Hideki Nakayama and Noriki Nishida. 2017. Zero-resource machine translation by multimodal encoder-decoder network with multimedia pivot. Mach. Transl. 31, 1-2 (2017), 49–64.
[37]
Graham Neubig, Zi-Yi Dou, Junjie Hu, Paul Michel, Danish Pruthi, and Xinyi Wang. 2019. compare-mt: A tool for holistic comparison of language generation systems. In Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL-HLT’19). 35–41.
[38]
Kishore Papineni, Salim Roukos, Todd Ward, and Wei-Jing Zhu. 2002. Bleu: A method for automatic evaluation of machine translation. In Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics. 311–318.
[39]
Shaoqing Ren, Kaiming He, Ross B. Girshick, and Jian Sun. 2015. Faster R-CNN: Towards real-time object detection with region proposal networks. In Proceedings of the Conference on Neural Information Processing Systems. 91–99.
[40]
Rico Sennrich, Barry Haddow, and Alexandra Birch. 2016. Improving neural machine translation models with monolingual data. In Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics. 86–96.
[41]
Rico Sennrich, Barry Haddow, and Alexandra Birch. 2016. Neural machine translation of rare words with subword units. In Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics. 1715–1725.
[42]
Matthew Snover, Bonnie Dorr, Richard Schwartz, Linnea Micciulla, and John Makhoul. 2006. A study of translation edit rate with targeted human annotation. In Proceedings of the Association for Machine Translation in the Americas. 223–231.
[43]
Kaitao Song, Xu Tan, Tao Qin, Jianfeng Lu, and Tie-Yan Liu. 2019. MASS: Masked sequence to sequence pre-training for language generation. In Proceedings of the 36th International Conference on Machine Learning. 5926–5936.
[44]
Linfeng Song, Daniel Gildea, Yue Zhang, Zhiguo Wang, and Jinsong Su. 2019. Semantic neural machine translation using AMR. Trans. Assoc. Comput. Linguistics 7 (2019), 19–31.
[45]
Yuanhang Su, Kai Fan, Nguyen Bach, C.-C. Jay Kuo, and Fei Huang. 2019. Unsupervised multi-modal neural machine translation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 10482–10491.
[46]
Haipeng Sun, Rui Wang, Masao Utiyama, Benjamin Marie, Kehai Chen, Eiichiro Sumita, and Tiejun Zhao. 2021. Unsupervised neural machine translation for similar and distant language pairs: An empirical study. ACM Trans. Asian Low Resour. Lang. Inf. Process. 20, 1 (2021), 10:1–10:17.
[47]
Ilya Sutskever, Oriol Vinyals, and Quoc V. Le. 2014. Sequence to sequence learning with neural networks. In Proceedings of the 27th Annual Conference on Neural Information Processing Systems. 3104–3112.
[48]
Turghun Tayir, Lin Li, Bei Li, Jianquan Liu, and Kong Aik Lee. 2024. Encoder-decoder calibration for multimodal machine translation. IEEE Trans. Artific. Intell. (2024), 1–9. plore.ieee.org/document/10401981
[49]
Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Lukasz Kaiser, and Illia Polosukhin. 2017. Attention is all you need. In Proceedings of the 30th Annual Conference on Neural Information Processing Systems. 5998–6008.
[50]
Pascal Vincent, Hugo Larochelle, Yoshua Bengio, and Pierre-Antoine Manzagol. 2008. Extracting and composing robust features with denoising autoencoders. In Proceedings of the 25th International Conference on Machine Learning. 1096–1103.
[51]
Yijun Wang, Tianxin Wei, Qi Liu, and Enhong Chen. 2021. Unpaired multimodal neural machine translation via reinforcement learning. In Proceedings of the 26th International Conference on Database Systems for Advanced Applications. 168–185.
[52]
Zhe Yang, Qingkai Fang, and Yang Feng. 2022. Low-resource neural machine translation with cross-modal alignment. In Proceedings of the Conference on Empirical Methods in Natural Language Processing. 10134–10146.
[53]
Peter Young, Alice Lai, Micah Hodosh, and Julia Hockenmaier. 2014. From image descriptions to visual denotations: New similarity metrics for semantic inference over event descriptions. Trans. Assoc. Comput. Linguistics 2 (2014), 67–78.
[54]
Kun Yu, Yusuke Miyao, Xiangli Wang, Takuya Matsuzaki, and Jun’ichi Tsujii. 2010. Semi-automatically developing Chinese HPSG grammar from the Penn Chinese treebank for deep parsing. In Proceedings of the 23rd International Conference on Computational Linguistics (COLING’10). 1417–1425.
[55]
Zhirui Zhang, Shujie Liu, Mu Li, Ming Zhou, and Enhong Chen. 2018. Joint training for neural machine translation models with monolingual data. In Proceedings of the 32nd AAAI Conference on Artificial Intelligence (AAAI’18), the 30th Innovative Applications of Artificial Intelligence (IAAI’18), and the 8th AAAI Symposium on Educational Advances in Artificial Intelligence (EAAI’18). 555–562.
[56]
Chunting Zhou, Xuezhe Ma, Junjie Hu, and Graham Neubig. [n.d.]. Handling syntactic divergence in low-resource machine translation. In Proceedings of the Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing. 1388–1394.
[57]
Barret Zoph, Deniz Yuret, Jonathan May, and Kevin Knight. 2016. Transfer learning for low-resource neural machine translation. In Proceedings of the Conference on Empirical Methods in Natural Language Processing. 1568–1575.

Cited By

View all
  • (2025)Strategic Decision Support System With Probabilistic Linguistic Term Sets: Extended CRADIS Approach for Supply Chain Risk Management in Sports IndustryIEEE Access10.1109/ACCESS.2024.341639113(32853-32862)Online publication date: 2025
  • (2025)Decision-making model for selecting products through online product reviews utilizing natural language processing techniquesNeurocomputing10.1016/j.neucom.2024.128593611:COnline publication date: 1-Jan-2025
  • (2025)Joint pairwise learning and masked language models for neural machine translation of EnglishArtificial Life and Robotics10.1007/s10015-025-01008-2Online publication date: 10-Feb-2025
  • Show More Cited By

Index Terms

  1. Unsupervised Multimodal Machine Translation for Low-resource Distant Language Pairs

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Transactions on Asian and Low-Resource Language Information Processing
    ACM Transactions on Asian and Low-Resource Language Information Processing  Volume 23, Issue 4
    April 2024
    221 pages
    EISSN:2375-4702
    DOI:10.1145/3613577
    Issue’s Table of Contents

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 15 April 2024
    Online AM: 09 March 2024
    Accepted: 05 March 2024
    Revised: 26 February 2024
    Received: 07 November 2023
    Published in TALLIP Volume 23, Issue 4

    Check for updates

    Author Tags

    1. Visual masked language modeling
    2. unsupervised machine translation
    3. distant language pair
    4. image feature

    Qualifiers

    • Research-article

    Funding Sources

    • NSFC, China

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)1,422
    • Downloads (Last 6 weeks)30
    Reflects downloads up to 02 Mar 2025

    Other Metrics

    Citations

    Cited By

    View all
    • (2025)Strategic Decision Support System With Probabilistic Linguistic Term Sets: Extended CRADIS Approach for Supply Chain Risk Management in Sports IndustryIEEE Access10.1109/ACCESS.2024.341639113(32853-32862)Online publication date: 2025
    • (2025)Decision-making model for selecting products through online product reviews utilizing natural language processing techniquesNeurocomputing10.1016/j.neucom.2024.128593611:COnline publication date: 1-Jan-2025
    • (2025)Joint pairwise learning and masked language models for neural machine translation of EnglishArtificial Life and Robotics10.1007/s10015-025-01008-2Online publication date: 10-Feb-2025
    • (2024)Advancements in intrusion detection: A lightweight hybrid RNN-RF modelPLOS ONE10.1371/journal.pone.029966619:6(e0299666)Online publication date: 21-Jun-2024
    • (2024)A dataset of Tibetan-Chinese speech translationChina Scientific Data10.11922/11-6035.csd.2024.0023.zh9:4(1-9)Online publication date: 20-Dec-2024
    • (2024)Design of Multimodal Retrieval Model for Translation Domain Based on BERTProceedings of the 2024 International Conference on Machine Intelligence and Digital Applications10.1145/3662739.3672185(168-172)Online publication date: 30-May-2024
    • (2024)Swarm Learning Empowered Federated Deep Learning for Seamless Smartphone-Based Activity RecognitionIEEE Transactions on Consumer Electronics10.1109/TCE.2024.347907870:4(6919-6935)Online publication date: 1-Nov-2024
    • (2024)English Translation Assistance System Integrating Machine Learning Algorithms2024 International Conference on Intelligent Algorithms for Computational Intelligence Systems (IACIS)10.1109/IACIS61494.2024.10721728(1-4)Online publication date: 23-Aug-2024
    • (2024)ClusterE-ZSL: A Novel Cluster-Based Embedding for Enhanced Zero-Shot Learning in Contrastive Pre-Training Cross-Modal RetrievalIEEE Access10.1109/ACCESS.2024.347608212(162622-162637)Online publication date: 2024
    • (2024)Enhanced Sentiment Analysis and Topic Modeling During the Pandemic Using Automated Latent Dirichlet AllocationIEEE Access10.1109/ACCESS.2024.341171712(81206-81220)Online publication date: 2024
    • Show More Cited By

    View Options

    Login options

    Full Access

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Full Text

    View this article in Full Text.

    Full Text

    Figures

    Tables

    Media

    Share

    Share

    Share this Publication link

    Share on social media