Skip to main content
Log in

Exploiting multiple correlated modalities can enhance low-resource machine translation quality

  • Published:
Multimedia Tools and Applications Aims and scope Submit manuscript

Abstract

In an effort to enhance the machine translation (MT) quality of low-resource languages, we report the first study on multimodal machine translation (MMT) for Manipuri\(\rightarrow \)English, Manipuri\(\rightarrow \)Hindi and Manipuri\(\rightarrow \)German language pairs. Manipuri is a morphologically rich and resource-constrained language with limited resources that can be computationally utilized. No such MMT dataset has not been reported for these language pairs till date. To build the parallel datasets, we collected news articles containing images and associated text in English from a local daily newspaper and used English as a pivot language. The machine-translated outputs of the existing translation systems of these languages go through manual post-editing to build the datasets. In addition to text, we build MT systems by exploiting features from images and audio recordings in the source language, i.e., Manipuri. We carried out an extensive analysis of the MT systems trained with text-only and multimodal inputs using automatic metrics and human evaluation techniques. Our findings attest that integrating multiple correlated modalities enhances the MT system performance in low-resource settings achieving a significant improvement of up to +3 BLEU score. The human assessment revealed that the fluency score of the MMT systems depends on the type of correlated auxiliary modality.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7

Similar content being viewed by others

Data Availability

The dataset used in this work is available from Imphal Free Press subject to licensing agreement. A request may be made to the authors to gain access of the data with permission from Imphal Free Press. A sample of the dataset is available at Github (https://github.com/LSMeetei/MnMultimodal) for reference.

Notes

  1. http://censusindia.gov.in

  2. https://ifp.co.in/

  3. Acronyms: O = Object, S = Subject, V = Verb.

  4. https://ifp.co.in/

  5. https://indicnlp.ai4bharat.org/indic-trans/

  6. https://www.deepl.com/

  7. Also known as F1-score is a harmonic mean of precision and recall.

  8. https://anoopkunchukuttan.github.io/indic_nlp_library/

  9. BLEU+case.mixed+numrefs.1+smooth.exp+tok.13a+version.1.5.1

  10. chrF2+numchars.6+space.false+version.1.5.1

  11. TER+tok.tercom-nonorm-punct-noasian-uncased+version.1.5.1

References

  1. Anastasopoulos A, Bojar O, Bremerman J, Cattoni R, Elbayad M, Federico M, Wiesner M (2021) Findings of the IWSLT 2021 Evaluation Campaign. In: Proceedings of the 18th International Conference on Spoken Language Translation (IWSLT 2021), Online. https://doi.org/10.18653/v1/2020.iwslt-1.1

  2. Bahdanau D, Cho K, Bengio Y (2014) Neural machine translation by jointly learning to align and translate. arXiv:1409.0473

  3. Bansal M, Lobiyal DK (2021) Multilingual sequence to sequence convolutional machine translation. Multimedia Tools and Applications 80(25):33701–33726. https://doi.org/10.1007/s11042-021-11345-6

    Article  Google Scholar 

  4. Caglayan O, Aransa W, Bardet A, Garcia-Martinez M, Bougares F, Barrault L, Van de Weijer J (2017) For LIUM-CVC submissions WMT17 multimodal translation task. arXiv:1707.04481, https://doi.org/10.48550/arXiv.1707.04481

  5. Caglayan O, Aransa W, Wang Y, Masana M, Garcia-Martinez M, Bougares F, Van de Weijer J (2016) Does multimodality help human and machine for translation and image captioning?. arXiv:1605.09186. https://doi.org/10.48550

  6. Caglayan O, Madhyastha P, Specia L, Barrault L (2019) Probing the need for visual context in multimodal machine translation. arXiv:1903.08678https://doi.org/10.48550

  7. Calixto I, Liu Q (2017) Incorporating Global Visual Features into Attention-based Neural Machine Translation. In Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing (pp. 992-1003). https://doi.org/10.18653/v1/D17-1105

  8. Dhanjal AS, Singh W (2022) An automatic machine translation system for multi-lingual speech to Indian sign language. multimedia Tools and Applications, 1-39. https://doi.org/10.1007/s11042-021-11706-1

  9. Elliott D, Frank S, Sima’an K, Specia L (2016) Multi30k: Multilingual english-german image descriptions. arXiv:1605.00459https://doi.org/10.48550

  10. Gulcehre C, Firat O, Xu K, Cho K, Barrault L, Lin HC, Bengio Y (2015) On using monolingual corpora in neural machine translation. arXiv:1503.03535https://doi.org/10.48550

  11. Hirasawa T, Yang Z, Komachi M, Okazaki N (2020) Keyframe Segmentation and Positional Encoding for Video-guided Machine Translation Challenge 2020. arXiv:2006.12799https://doi.org/10.48550

  12. Hochreiter S, Schmidhuber J (1997) Long short-term memory. Neural computation 9(8):1735–1780. https://doi.org/10.1162/neco.1997.9.8.1735

    Article  Google Scholar 

  13. Huang PY, Liu F, Shiang SR, Oh J, Dyer C (2016, August) Attention-based multimodal neural machine translation. In Proceedings of the First Conference on Machine Translation: Volume 2, Shared Task Papers (pp. 639-645). https://doi.org/10.18653/v1/W16-2360

  14. Kakwani D, Kunchukuttan A, Golla S, Gokul NC, Bhattacharyya A, Khapra MM, Kumar P (2020) inlpsuite: Monolingual corpora, evaluation benchmarks and pre-trained multilingual language models for indian languages. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing: Findings (pp. 4948-4961). https://doi.org/10.18653/v1/2020.findings-emnlp.445

  15. Kingma DP, Ba J (2014) Adam: A method for stochastic optimization. arXiv:1412.6980https://doi.org/10.48550

  16. Klein G, Hernandez F, Nguyen V, Senellart J (2020, October) The OpenNMT neural machine translation toolkit: 2020 edition. In Proceedings of the 14th Conference of the Association for Machine Translation in the Americas (Volume 1: Research Track) (pp. 102-109). https://aclanthology.org/2020.amta-research.9

  17. Kocabiyikoglu AC, Besacier L, Kraif O (2018) Augmenting librispeech with french translations: A multimodal corpus for direct speech translation evaluation. arXiv:1802.03142https://doi.org/10.48550

  18. Koehn P, Hoang H, Birch A, Callison-Burch C, Federico M, Bertoldi N, Herbst E (2007) Moses: Open source toolkit for statistical machine translation. In Proceedings of the 45th annual meeting of the association for computational linguistics companion volume proceedings of the demo and poster sessions (pp. 177-180). https://aclanthology.org/P07-2045

  19. Koehn P, Och FJ, Marcu D (2003) Statistical phrase-based translation. In: Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology-Volume 1 (pp. 48-54). https://doi.org/10.3115/1073445.1073462

  20. Lee JY (2019) Deep multimodal embedding for video captioning. Multimedia Tools and Applications 78(22):31793–31805. https://doi.org/10.1007/s11042-019-08011-3

    Article  Google Scholar 

  21. Luong MT, Pham H, Manning CD (2015) Effective Approaches to Attention-based Neural Machine Translation. In Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing (pp. 1412-1421). https://doi.org/10.18653/v1/D15-1166

  22. Mao J, Xu W, Yang Y, Wang J, Yuille AL (2014) Explain images with multimodal recurrent neural networks. arXiv:1410.1090https://doi.org/10.48550

  23. Meetei LS, Rahul L, Singh A, Singh SM, Singh TD, Bandyopadhyay S (2021) An Experiment on Speech-to-Text Translation Systems for Manipuri to English on Low Resource Setting. In Proceedings of the 18th International Conference on Natural Language Processing (ICON) (pp. 54-63). https://aclanthology.org/2021.icon-main.8

  24. Meetei LS, Singh TD, Bandyopadhyay S (2019) WAT2019: English-Hindi translation on Hindi visual genome dataset. In Proceedings of the 6th Workshop on Asian Translation (pp. 181-188). https://doi.org/10.18653/v1/D19-5224

  25. Meetei LS, Singh TD, Bandyopadhyay S, Vela M, van Genabith J (2020) English to Manipuri and Mizo Post-Editing Effort and its Impact on Low Resource Machine Translation. In Proceedings of the 17th International Conference on Natural Language Processing (ICON) (pp. 50-59). https://aclanthology.org/2020.icon-main.7

  26. Meetei LS, Singh SM, Singh A, Das R, Singh TD, Bandyopadhyay S (2023) Hindi to English Multimodal Machine Translation on News Dataset in Low Resource Setting. Procedia Computer Science 218:2102–2109. https://doi.org/10.1016/j.procs.2023.01.186

    Article  Google Scholar 

  27. Ney H (1999) Speech translation: Coupling of recognition and translation. In 1999 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings. ICASSP99 (Cat. No. 99CH36258) (Vol. 1, pp. 517-520). IEEE. https://doi.org/10.1109/ICASSP.1999.758176

  28. Papineni K, Roukos S, Ward T, Zhu WJ (2002) Bleu: a method for automatic evaluation of machine translation. In Proceedings of the 40th annual meeting of the Association for Computational Linguistics (pp. 311-318). https://doi.org/10.3115/1073083.1073135

  29. Parida S, Bojar O, Dash SR (2019) Hindi visual genome: A dataset for multi-modal english to hindi machine translation. Computación y Sistemas 23(4):1499–1505. https://doi.org/10.13053/cys-23-4-3294

    Article  Google Scholar 

  30. Pham NQ, Nguyen TS, Ha TL, Hussain J, Schneider F, Niehues J, Waibel A (2019) The iwslt 2019 kit speech translation system. In Proceedings of the 16th International Conference on Spoken Language Translation. https://aclanthology.org/2019.iwslt-1.3

  31. Popović M (2015) chrF: character n-gram F-score for automatic MT evaluation. In Proceedings of the Tenth Workshop on Statistical Machine Translation (pp. 392-395). https://doi.org/10.18653/v1/W15-3049

  32. Post M (2018) A Call for Clarity in Reporting BLEU Scores. In Proceedings of the Third Conference on Machine Translation: Research Papers (pp. 186-191). https://doi.org/10.18653/v1/W18-6319

  33. Rahul L, Meetei LS, Jayanna HS (2021 Statistical and Neural Machine Translation for Manipuri-English on Intelligence Domain. In Advances in Computing and Network Communications (pp. 249-257). Springer, Singapore. https://doi.org/10.1007/978-981-33-6987-0_21

  34. Sanabria R, Caglayan O, Palaskar S, Elliott D, Barrault L, Specia L, Metze F (2018) How2: a large-scale dataset for multimodal language understanding. arXiv:1811.00347https://doi.org/10.48550

  35. Schuster M, Paliwal KK (1997) Bidirectional recurrent neural networks. IEEE transactions on Signal Processing 45(11):2673–2681. https://doi.org/10.1109/78.650093

    Article  Google Scholar 

  36. Sennrich R, Haddow B, Birch A (2015) Improving neural machine translation models with monolingual data. arXiv:1511.06709. https://doi.org/10.48550

  37. Simonyan K, Zisserman A (2014) Very deep convolutional networks for large-scale image recognition. arXiv:1409.1556https://doi.org/10.48550

  38. Singh SM, Meetei LS, Singh TD, Bandyopadhyay S (2021) Multiple captions embellished multilingual multi-modal neural machine translation. In Proceedings of the First Workshop on Multimodal Machine Translation for Low Resource Languages (MMTLRL 2021) (pp. 2-11). https://aclanthology.org/2021.mmtlrl-1.2

  39. Singh TD (2013) Taste of Two Different Flavours: Which Manipuri Script Works Better for English-Manipuri Language Pair SMT Systems?. In Proceedings of the Seventh Workshop on Syntax, Semantics and Structure in Statistical Translation (pp. 11-18). https://aclanthology.org/W13-0802

  40. Singh TD, Hujon AV (2020) Low Resource and Domain Specific English to Khasi SMT and NMT Systems. In 2020 International Conference on Computational Performance Evaluation (ComPE) (pp. 733-737). IEEE. https://doi.org/10.1109/ComPE49325.2020.9200059

  41. Singh TD, i Bonet CE, Bandyopadhyay S, van Genabith J (2021) Proceedings of the First Workshop on Multimodal Machine Translation for Low Resource Languages (MMTLRL 2021). In Proceedings of the First Workshop on Multimodal Machine Translation for Low Resource Languages (MMTLRL 2021). https://aclanthology.org/2021.mmtlrl-1

  42. Singh SM, Singh TD (2022) An empirical study of low-resource neural machine translation of manipuri in multilingual settings. Neural Computing and Applications 34(17):14823–14844. https://doi.org/10.1007/s00521-022-07337-8

    Article  Google Scholar 

  43. Singh A, Singh TD, Bandyopadhyay S (2022) V2t: video to text framework using a novel automatic shot boundary detection algorithm. Multimedia Tools and Applications 81(13):7989–18009. https://doi.org/10.1007/s11042-022-12343-y

    Article  Google Scholar 

  44. Singh S, Singh TD, Bandyopadhyay S (2022) An Experiment on Speech-to-Speech Translation of Hindi to English: A Deep Learning Approach. In Advanced Machine Intelligence and Signal Processing (pp. 625-635). Singapore: Springer Nature Singapore. https://doi.org/10.1007/978-981-19-0840-8_48

  45. Snover M, Dorr B, Schwartz R, Micciulla L, Makhoul J (2006) A study of translation edit rate with targeted human annotation. In Proceedings of the 7th Conference of the Association for Machine Translation in the Americas: Technical Papers (pp. 223-231). https://aclanthology.org/2006.amta-papers.25

  46. Snover M, Madnani N, Dorr B, Schwartz R (2009). Fluency, adequacy, or HTER? Exploring different human judgments with a tunable MT metric. In Proceedings of the Fourth Workshop on Statistical Machine Translation (pp. 259-268). https://dl.acm.org/doi/abs/10.5555/1626431.1626480

  47. Sperber M, Neubig G, Niehues J, Waibel A (2019) Attention-passing models for robust and data-efficient end-to-end speech translation. Transactions of the Association for Computational Linguistics 7:313–325. https://doi.org/10.1162/tacl_a_00270

    Article  Google Scholar 

  48. Sutskever I, Vinyals O, Le QV (2014) Sequence to sequence learning with neural networks. In: Proceedings of the 27th International Conference on Neural Information Processing Systems-Volume 2 (pp. 3104-3112). https://doi.org/10.5555/2969033.2969173

  49. Tillmann C, Ney H (2003) Word reordering and a dynamic programming beam search algorithm for statistical machine translation. Computational linguistics 29(1):97–133. https://doi.org/10.1162/089120103321337458

    Article  Google Scholar 

  50. Toral A, Wieling M, Way A (2018) Post-editing effort of a novel with statistical and neural machine translation. Frontiers in Digital Humanities 5:9. https://doi.org/10.3389/fdigh.2018.00009

    Article  Google Scholar 

  51. Wang X, Wu J, Chen J, Li L, Wang YF, Wang WY (2019) Vatex: A large-scale, high-quality multilingual dataset for video-and-language research. In Proceedings of the IEEE/CVF International Conference on Computer Vision (pp. 4581-4591). https://doi.org/10.1109/ICCV.2019.00468

  52. Wang D, Xiong D (2021) Efficient Object-Level Visual Context Modeling for Multimodal Machine Translation: Masking Irrelevant Objects Helps Grounding. In Proceedings of the AAAI Conference on Artificial Intelligence (Vol. 35, No. 4, pp. 2720-2728). https://doi.org/10.1609/aaai.v35i4.16376

  53. Weiss RJ, Chorowski J, Jaitly N, Wu Y, Chen Z (2017) Sequence-to-Sequence Models Can Directly Translate Foreign Speech. Proc. Interspeech 2017, 2625–2629. https://doi.org/10.21437/Interspeech.2017-503

  54. Yao BZ, Yang X, Lin L, Lee MW, Zhu SC (2010) I2t: Image parsing to text description. Proceedings of the IEEE 98(8):1485–1508. https://doi.org/10.1109/JPROC.2010.2050411

    Article  Google Scholar 

  55. Young P, Lai A, Hodosh M, Hockenmaier J (2014) From image descriptions to visual denotations: New similarity metrics for semantic inference over event descriptions. Transactions of the Association for Computational Linguistics 2:67–78. https://doi.org/10.1162/tacl_a_00166

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Loitongbam Sanayai Meetei.

Ethics declarations

Conflicts of interest

The authors declare that there are no conflicts of interest regarding the publication of this paper.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Meetei, L.S., Singh, T.D. & Bandyopadhyay, S. Exploiting multiple correlated modalities can enhance low-resource machine translation quality. Multimed Tools Appl 83, 13137–13157 (2024). https://doi.org/10.1007/s11042-023-15721-2

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11042-023-15721-2

Keywords

Navigation