Exploiting multiple correlated modalities can enhance low-resource machine translation quality

Meetei, Loitongbam Sanayai; Singh, Thoudam Doren; Bandyopadhyay, Sivaji

doi:10.1007/s11042-023-15721-2

Exploiting multiple correlated modalities can enhance low-resource machine translation quality

Published: 05 July 2023

Volume 83, pages 13137–13157, (2024)
Cite this article

Multimedia Tools and Applications Aims and scope Submit manuscript

Loitongbam Sanayai Meetei ORCID: orcid.org/0000-0002-9816-9108^1,2,
Thoudam Doren Singh^1,2 &
Sivaji Bandyopadhyay^2,3

234 Accesses
1 Citation
1 Altmetric
Explore all metrics

Abstract

In an effort to enhance the machine translation (MT) quality of low-resource languages, we report the first study on multimodal machine translation (MMT) for Manipuri\(\rightarrow \)English, Manipuri\(\rightarrow \)Hindi and Manipuri\(\rightarrow \)German language pairs. Manipuri is a morphologically rich and resource-constrained language with limited resources that can be computationally utilized. No such MMT dataset has not been reported for these language pairs till date. To build the parallel datasets, we collected news articles containing images and associated text in English from a local daily newspaper and used English as a pivot language. The machine-translated outputs of the existing translation systems of these languages go through manual post-editing to build the datasets. In addition to text, we build MT systems by exploiting features from images and audio recordings in the source language, i.e., Manipuri. We carried out an extensive analysis of the MT systems trained with text-only and multimodal inputs using automatic metrics and human evaluation techniques. Our findings attest that integrating multiple correlated modalities enhances the MT system performance in low-resource settings achieving a significant improvement of up to +3 BLEU score. The human assessment revealed that the fluency score of the MMT systems depends on the type of correlated auxiliary modality.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Multimodal machine translation through visuals and speech

Article Open access 13 August 2020

An Exploratory Study of SMT Versus NMT for the Resource Constraint English to Manipuri Translation

K-Translate - Interactive Multi-system Machine Translation

Data Availability

The dataset used in this work is available from Imphal Free Press subject to licensing agreement. A request may be made to the authors to gain access of the data with permission from Imphal Free Press. A sample of the dataset is available at Github (https://github.com/LSMeetei/MnMultimodal) for reference.

Notes

http://censusindia.gov.in
https://ifp.co.in/
Acronyms: O = Object, S = Subject, V = Verb.
https://ifp.co.in/
https://indicnlp.ai4bharat.org/indic-trans/
https://www.deepl.com/
Also known as F1-score is a harmonic mean of precision and recall.
https://anoopkunchukuttan.github.io/indic_nlp_library/
BLEU+case.mixed+numrefs.1+smooth.exp+tok.13a+version.1.5.1
chrF2+numchars.6+space.false+version.1.5.1
TER+tok.tercom-nonorm-punct-noasian-uncased+version.1.5.1

References

Anastasopoulos A, Bojar O, Bremerman J, Cattoni R, Elbayad M, Federico M, Wiesner M (2021) Findings of the IWSLT 2021 Evaluation Campaign. In: Proceedings of the 18th International Conference on Spoken Language Translation (IWSLT 2021), Online. https://doi.org/10.18653/v1/2020.iwslt-1.1
Bahdanau D, Cho K, Bengio Y (2014) Neural machine translation by jointly learning to align and translate. arXiv:1409.0473
Bansal M, Lobiyal DK (2021) Multilingual sequence to sequence convolutional machine translation. Multimedia Tools and Applications 80(25):33701–33726. https://doi.org/10.1007/s11042-021-11345-6
Article Google Scholar
Caglayan O, Aransa W, Bardet A, Garcia-Martinez M, Bougares F, Barrault L, Van de Weijer J (2017) For LIUM-CVC submissions WMT17 multimodal translation task. arXiv:1707.04481, https://doi.org/10.48550/arXiv.1707.04481
Caglayan O, Aransa W, Wang Y, Masana M, Garcia-Martinez M, Bougares F, Van de Weijer J (2016) Does multimodality help human and machine for translation and image captioning?. arXiv:1605.09186. https://doi.org/10.48550
Caglayan O, Madhyastha P, Specia L, Barrault L (2019) Probing the need for visual context in multimodal machine translation. arXiv:1903.08678 https://doi.org/10.48550
Calixto I, Liu Q (2017) Incorporating Global Visual Features into Attention-based Neural Machine Translation. In Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing (pp. 992-1003). https://doi.org/10.18653/v1/D17-1105
Dhanjal AS, Singh W (2022) An automatic machine translation system for multi-lingual speech to Indian sign language. multimedia Tools and Applications, 1-39. https://doi.org/10.1007/s11042-021-11706-1
Elliott D, Frank S, Sima’an K, Specia L (2016) Multi30k: Multilingual english-german image descriptions. arXiv:1605.00459 https://doi.org/10.48550
Gulcehre C, Firat O, Xu K, Cho K, Barrault L, Lin HC, Bengio Y (2015) On using monolingual corpora in neural machine translation. arXiv:1503.03535 https://doi.org/10.48550
Hirasawa T, Yang Z, Komachi M, Okazaki N (2020) Keyframe Segmentation and Positional Encoding for Video-guided Machine Translation Challenge 2020. arXiv:2006.12799 https://doi.org/10.48550
Hochreiter S, Schmidhuber J (1997) Long short-term memory. Neural computation 9(8):1735–1780. https://doi.org/10.1162/neco.1997.9.8.1735
Article Google Scholar
Huang PY, Liu F, Shiang SR, Oh J, Dyer C (2016, August) Attention-based multimodal neural machine translation. In Proceedings of the First Conference on Machine Translation: Volume 2, Shared Task Papers (pp. 639-645). https://doi.org/10.18653/v1/W16-2360
Kakwani D, Kunchukuttan A, Golla S, Gokul NC, Bhattacharyya A, Khapra MM, Kumar P (2020) inlpsuite: Monolingual corpora, evaluation benchmarks and pre-trained multilingual language models for indian languages. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing: Findings (pp. 4948-4961). https://doi.org/10.18653/v1/2020.findings-emnlp.445
Kingma DP, Ba J (2014) Adam: A method for stochastic optimization. arXiv:1412.6980 https://doi.org/10.48550
Klein G, Hernandez F, Nguyen V, Senellart J (2020, October) The OpenNMT neural machine translation toolkit: 2020 edition. In Proceedings of the 14th Conference of the Association for Machine Translation in the Americas (Volume 1: Research Track) (pp. 102-109). https://aclanthology.org/2020.amta-research.9
Kocabiyikoglu AC, Besacier L, Kraif O (2018) Augmenting librispeech with french translations: A multimodal corpus for direct speech translation evaluation. arXiv:1802.03142 https://doi.org/10.48550
Koehn P, Hoang H, Birch A, Callison-Burch C, Federico M, Bertoldi N, Herbst E (2007) Moses: Open source toolkit for statistical machine translation. In Proceedings of the 45th annual meeting of the association for computational linguistics companion volume proceedings of the demo and poster sessions (pp. 177-180). https://aclanthology.org/P07-2045
Koehn P, Och FJ, Marcu D (2003) Statistical phrase-based translation. In: Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology-Volume 1 (pp. 48-54). https://doi.org/10.3115/1073445.1073462
Lee JY (2019) Deep multimodal embedding for video captioning. Multimedia Tools and Applications 78(22):31793–31805. https://doi.org/10.1007/s11042-019-08011-3
Article Google Scholar
Luong MT, Pham H, Manning CD (2015) Effective Approaches to Attention-based Neural Machine Translation. In Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing (pp. 1412-1421). https://doi.org/10.18653/v1/D15-1166
Mao J, Xu W, Yang Y, Wang J, Yuille AL (2014) Explain images with multimodal recurrent neural networks. arXiv:1410.1090 https://doi.org/10.48550
Meetei LS, Rahul L, Singh A, Singh SM, Singh TD, Bandyopadhyay S (2021) An Experiment on Speech-to-Text Translation Systems for Manipuri to English on Low Resource Setting. In Proceedings of the 18th International Conference on Natural Language Processing (ICON) (pp. 54-63). https://aclanthology.org/2021.icon-main.8
Meetei LS, Singh TD, Bandyopadhyay S (2019) WAT2019: English-Hindi translation on Hindi visual genome dataset. In Proceedings of the 6th Workshop on Asian Translation (pp. 181-188). https://doi.org/10.18653/v1/D19-5224
Meetei LS, Singh TD, Bandyopadhyay S, Vela M, van Genabith J (2020) English to Manipuri and Mizo Post-Editing Effort and its Impact on Low Resource Machine Translation. In Proceedings of the 17th International Conference on Natural Language Processing (ICON) (pp. 50-59). https://aclanthology.org/2020.icon-main.7
Meetei LS, Singh SM, Singh A, Das R, Singh TD, Bandyopadhyay S (2023) Hindi to English Multimodal Machine Translation on News Dataset in Low Resource Setting. Procedia Computer Science 218:2102–2109. https://doi.org/10.1016/j.procs.2023.01.186
Article Google Scholar
Ney H (1999) Speech translation: Coupling of recognition and translation. In 1999 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings. ICASSP99 (Cat. No. 99CH36258) (Vol. 1, pp. 517-520). IEEE. https://doi.org/10.1109/ICASSP.1999.758176
Papineni K, Roukos S, Ward T, Zhu WJ (2002) Bleu: a method for automatic evaluation of machine translation. In Proceedings of the 40th annual meeting of the Association for Computational Linguistics (pp. 311-318). https://doi.org/10.3115/1073083.1073135
Parida S, Bojar O, Dash SR (2019) Hindi visual genome: A dataset for multi-modal english to hindi machine translation. Computación y Sistemas 23(4):1499–1505. https://doi.org/10.13053/cys-23-4-3294
Article Google Scholar
Pham NQ, Nguyen TS, Ha TL, Hussain J, Schneider F, Niehues J, Waibel A (2019) The iwslt 2019 kit speech translation system. In Proceedings of the 16th International Conference on Spoken Language Translation. https://aclanthology.org/2019.iwslt-1.3
Popović M (2015) chrF: character n-gram F-score for automatic MT evaluation. In Proceedings of the Tenth Workshop on Statistical Machine Translation (pp. 392-395). https://doi.org/10.18653/v1/W15-3049
Post M (2018) A Call for Clarity in Reporting BLEU Scores. In Proceedings of the Third Conference on Machine Translation: Research Papers (pp. 186-191). https://doi.org/10.18653/v1/W18-6319
Rahul L, Meetei LS, Jayanna HS (2021 Statistical and Neural Machine Translation for Manipuri-English on Intelligence Domain. In Advances in Computing and Network Communications (pp. 249-257). Springer, Singapore. https://doi.org/10.1007/978-981-33-6987-0_21
Sanabria R, Caglayan O, Palaskar S, Elliott D, Barrault L, Specia L, Metze F (2018) How2: a large-scale dataset for multimodal language understanding. arXiv:1811.00347 https://doi.org/10.48550
Schuster M, Paliwal KK (1997) Bidirectional recurrent neural networks. IEEE transactions on Signal Processing 45(11):2673–2681. https://doi.org/10.1109/78.650093
Article Google Scholar
Sennrich R, Haddow B, Birch A (2015) Improving neural machine translation models with monolingual data. arXiv:1511.06709. https://doi.org/10.48550
Simonyan K, Zisserman A (2014) Very deep convolutional networks for large-scale image recognition. arXiv:1409.1556 https://doi.org/10.48550
Singh SM, Meetei LS, Singh TD, Bandyopadhyay S (2021) Multiple captions embellished multilingual multi-modal neural machine translation. In Proceedings of the First Workshop on Multimodal Machine Translation for Low Resource Languages (MMTLRL 2021) (pp. 2-11). https://aclanthology.org/2021.mmtlrl-1.2
Singh TD (2013) Taste of Two Different Flavours: Which Manipuri Script Works Better for English-Manipuri Language Pair SMT Systems?. In Proceedings of the Seventh Workshop on Syntax, Semantics and Structure in Statistical Translation (pp. 11-18). https://aclanthology.org/W13-0802
Singh TD, Hujon AV (2020) Low Resource and Domain Specific English to Khasi SMT and NMT Systems. In 2020 International Conference on Computational Performance Evaluation (ComPE) (pp. 733-737). IEEE. https://doi.org/10.1109/ComPE49325.2020.9200059
Singh TD, i Bonet CE, Bandyopadhyay S, van Genabith J (2021) Proceedings of the First Workshop on Multimodal Machine Translation for Low Resource Languages (MMTLRL 2021). In Proceedings of the First Workshop on Multimodal Machine Translation for Low Resource Languages (MMTLRL 2021). https://aclanthology.org/2021.mmtlrl-1
Singh SM, Singh TD (2022) An empirical study of low-resource neural machine translation of manipuri in multilingual settings. Neural Computing and Applications 34(17):14823–14844. https://doi.org/10.1007/s00521-022-07337-8
Article Google Scholar
Singh A, Singh TD, Bandyopadhyay S (2022) V2t: video to text framework using a novel automatic shot boundary detection algorithm. Multimedia Tools and Applications 81(13):7989–18009. https://doi.org/10.1007/s11042-022-12343-y
Article Google Scholar
Singh S, Singh TD, Bandyopadhyay S (2022) An Experiment on Speech-to-Speech Translation of Hindi to English: A Deep Learning Approach. In Advanced Machine Intelligence and Signal Processing (pp. 625-635). Singapore: Springer Nature Singapore. https://doi.org/10.1007/978-981-19-0840-8_48
Snover M, Dorr B, Schwartz R, Micciulla L, Makhoul J (2006) A study of translation edit rate with targeted human annotation. In Proceedings of the 7th Conference of the Association for Machine Translation in the Americas: Technical Papers (pp. 223-231). https://aclanthology.org/2006.amta-papers.25
Snover M, Madnani N, Dorr B, Schwartz R (2009). Fluency, adequacy, or HTER? Exploring different human judgments with a tunable MT metric. In Proceedings of the Fourth Workshop on Statistical Machine Translation (pp. 259-268). https://dl.acm.org/doi/abs/10.5555/1626431.1626480
Sperber M, Neubig G, Niehues J, Waibel A (2019) Attention-passing models for robust and data-efficient end-to-end speech translation. Transactions of the Association for Computational Linguistics 7:313–325. https://doi.org/10.1162/tacl_a_00270
Article Google Scholar
Sutskever I, Vinyals O, Le QV (2014) Sequence to sequence learning with neural networks. In: Proceedings of the 27th International Conference on Neural Information Processing Systems-Volume 2 (pp. 3104-3112). https://doi.org/10.5555/2969033.2969173
Tillmann C, Ney H (2003) Word reordering and a dynamic programming beam search algorithm for statistical machine translation. Computational linguistics 29(1):97–133. https://doi.org/10.1162/089120103321337458
Article Google Scholar
Toral A, Wieling M, Way A (2018) Post-editing effort of a novel with statistical and neural machine translation. Frontiers in Digital Humanities 5:9. https://doi.org/10.3389/fdigh.2018.00009
Article Google Scholar
Wang X, Wu J, Chen J, Li L, Wang YF, Wang WY (2019) Vatex: A large-scale, high-quality multilingual dataset for video-and-language research. In Proceedings of the IEEE/CVF International Conference on Computer Vision (pp. 4581-4591). https://doi.org/10.1109/ICCV.2019.00468
Wang D, Xiong D (2021) Efficient Object-Level Visual Context Modeling for Multimodal Machine Translation: Masking Irrelevant Objects Helps Grounding. In Proceedings of the AAAI Conference on Artificial Intelligence (Vol. 35, No. 4, pp. 2720-2728). https://doi.org/10.1609/aaai.v35i4.16376
Weiss RJ, Chorowski J, Jaitly N, Wu Y, Chen Z (2017) Sequence-to-Sequence Models Can Directly Translate Foreign Speech. Proc. Interspeech 2017, 2625–2629. https://doi.org/10.21437/Interspeech.2017-503
Yao BZ, Yang X, Lin L, Lee MW, Zhu SC (2010) I2t: Image parsing to text description. Proceedings of the IEEE 98(8):1485–1508. https://doi.org/10.1109/JPROC.2010.2050411
Article Google Scholar
Young P, Lai A, Hodosh M, Hockenmaier J (2014) From image descriptions to visual denotations: New similarity metrics for semantic inference over event descriptions. Transactions of the Association for Computational Linguistics 2:67–78. https://doi.org/10.1162/tacl_a_00166
Article Google Scholar

Download references

Author information

Authors and Affiliations

Center for Natural Language Processing (CNLP), National Institute of Technology Silchar, Assam, India
Loitongbam Sanayai Meetei & Thoudam Doren Singh
Department of Computer Science and Engineering, National Institute of Technology Silchar, Assam, India
Loitongbam Sanayai Meetei, Thoudam Doren Singh & Sivaji Bandyopadhyay
Department of Computer Science and Engineering, Jadavpur University, West Bengal, India
Sivaji Bandyopadhyay

Authors

Loitongbam Sanayai Meetei
View author publications
You can also search for this author in PubMed Google Scholar
Thoudam Doren Singh
View author publications
You can also search for this author in PubMed Google Scholar
Sivaji Bandyopadhyay
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Loitongbam Sanayai Meetei.

Ethics declarations

Conflicts of interest

The authors declare that there are no conflicts of interest regarding the publication of this paper.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Meetei, L.S., Singh, T.D. & Bandyopadhyay, S. Exploiting multiple correlated modalities can enhance low-resource machine translation quality. Multimed Tools Appl 83, 13137–13157 (2024). https://doi.org/10.1007/s11042-023-15721-2

Download citation

Received: 30 August 2022
Revised: 13 April 2023
Accepted: 18 April 2023
Published: 05 July 2023
Issue Date: February 2024
DOI: https://doi.org/10.1007/s11042-023-15721-2

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Exploiting multiple correlated modalities can enhance low-resource machine translation quality

Abstract

Access this article

Similar content being viewed by others

Multimodal machine translation through visuals and speech

An Exploratory Study of SMT Versus NMT for the Resource Constraint English to Manipuri Translation

K-Translate - Interactive Multi-system Machine Translation

Data Availability

Notes

References

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflicts of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Exploiting multiple correlated modalities can enhance low-resource machine translation quality

Abstract

Access this article

Similar content being viewed by others

Multimodal machine translation through visuals and speech

An Exploratory Study of SMT Versus NMT for the Resource Constraint English to Manipuri Translation

K-Translate - Interactive Multi-system Machine Translation

Data Availability

Notes

References

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflicts of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation