skip to main content
note

Consensus-Based Machine Translation for Code-Mixed Texts

Published: 09 March 2024 Publication History

Abstract

Multilingualism in India is widespread due to its long history of foreign acquaintances. This leads to the presence of an audience familiar with conversing using more than one language. Additionally, due to the social media boom, the usage of multiple languages to communicate has become extensive. Hence, the need for a translation system that can serve the novice and monolingual user is the need of the hour. Such translation systems can be developed by methods such as statistical machine translation and neural machine translation, where each approach has its advantages as well as disadvantages. In addition, the parallel corpus needed to build a translation system, with code-mixed data, is not readily available. In the present work, we present two translation frameworks that can leverage the individual advantages of these pre-existing approaches by building an ensemble model that takes a consensus of the final outputs of the preceding approaches and generates the target output. The developed models were used for translating English-Bengali code-mixed data (written in Roman script) into their equivalent monolingual Bengali instances. A code-mixed to monolingual parallel corpus was also developed to train the preceding systems. Empirical results show improved BLEU and TER scores of 17.23 and 53.18 and 19.12 and 51.29, respectively, for the developed frameworks.

References

[1]
Dzmitry Bahdanau, Kyunghyun Cho, and Yoshua Bengio. 2014. Neural machine translation by jointly learning to align and translate. arXiv preprint arXiv:1409.0473.
[2]
Shankar Biradar, Sunil Saumya, and Arun Chauhan. 2021. Hate or non-hate: Translation based hate speech identification in code-mixed Hinglish data set. In Proceedings of the 2021 IEEE International Conference on Big Data (Big Data’21). IEEE, Los Alamitos, CA, 2470–2475.
[3]
Mrinal Dhar, Vaibhav Kumar, and Manish Shrivastava. 2018. Enabling code-mixed translation: Parallel corpus creation and MT augmentation approach. In Proceedings of the 1st Workshop on Linguistic Resources for Natural Language Processing. 131–140.
[4]
Mohamed Diab, Ahmed Abdul-Hamid, Amr El Kholy, Ahmed Hassan, and Nizar Habash. 2020. Code-mixed machine translation with attention-based hybrid SMT-NMT. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP’20). 2196–2206.
[5]
Meghan Dowling, Teresa Lynn, Alberto Poncelas, and Andy Way. 2018. SMT versus NMT: Preliminary comparisons for Irish. In Proceedings of the AMTA 2018 Workshop on Technologies for MT of Low Resource Languages (LoResMT’18). 12–20.
[6]
Devansh Gautam, Prashant Kodali, Kshitij Gupta, Anmol Goel, Manish Shrivastava, and Ponnurangam Kumaraguru. 2021. Comet: Towards code-mixed translation using parallel monolingual sentences. In Proceedings of the 5th Workshop on Computational Approaches to Linguistic Code-Switching. 47–55.
[7]
Abhirut Gupta, Aditya Vavre, and Sunita Sarawagi. 2021. Training data augmentation for code-mixed translation. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 5760–5766.
[8]
Nal Kalchbrenner and Phil Blunsom. 2013. Recurrent continuous translation models. In Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing. 1700–1709.
[9]
Philipp Koehn, Hieu Hoang, Alexandra Birch, Chris Callison-Burch, Marcello Federico, Nicola Bertoldi, Brooke Cowan, Wade Shen, Christine Moran, Richard Zens, et al. 2007. Moses: Open source toolkit for statistical machine translation. In Proceedings of the 45th Annual Meeting of the Association for Computational Linguistics Companion Volume Proceedings of the Demo and Poster Sessions. 177–180.
[10]
Sainik Kumar Mahata, Soumil Mandal, Dipankar Das, and Sivaji Bandyopadhyay. 2019. Code-mixed to monolingual translation framework. In Proceedings of the 11th Forum for Information Retrieval Evaluation. 30–35.
[11]
Yael Maschler. 2013. On the transition from code switching to a mixed code. In Code-Switching in Conversation. Routledge, 125–149.
[12]
R. Mahesh K. Sinha and Anil Thakur. 2005. Machine translation of bi-lingual Hindi-English (Hinglish) text. In Proceedings of the 10th Machine Translation Summit (MT Summit X).149–156.
[13]
Neetika, Vishal Goyal, and Simpel Rani. 2021. Automatic understanding of code mixed social media text: A state of the art. In Advances in Information Communication Technology and Computing, Vishal Goar, Manoj Kuri, Rajesh Kumar, and Tomonobu Senjyu (Eds.). Springer Singapore, Singapore, 91–100.
[14]
Jan Niehues, Eunah Cho, Thanh-Le Ha, and Alex Waibel. 2016. Pre-translation for neural machine translation. arXiv preprint arXiv:1610.05243 (2016).
[15]
Kishore Papineni, Salim Roukos, Todd Ward, and Wei-Jing Zhu. 2002. BLEU: A method for automatic evaluation of machine translation. In Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics. 311–318.
[16]
Shruti Rijhwani, Royal Sequiera, Monojit Choudhury Choudhury, and Kalika Bali. 2016. Translating codemixed tweets: A language detection based system. In Proceedings of the 3rd Workshop on Indian Language Data Resource and Evaluation (WILDRE’16). 81–82.
[17]
Matthew Snover, Bonnie Dorr, Richard Schwartz, Linnea Micciulla, and John Makhoul. 2006. A study of translation edit rate with targeted human annotation. In Proceedings of the 7th Conference of the Association for Machine Translation in the Americas: Technical Papers. 223–231.
[18]
Vivek Srivastava and Mayank Singh. 2020. PHINC: A parallel Hinglish social media code-mixed corpus for machine translation. In Proceedings of the 6th Workshop on Noisy User-Generated Text (W-NUT’20). 41–49.
[19]
Ilya Sutskever, Oriol Vinyals, and Quoc V. Le. 2014. Sequence to sequence learning with neural networks. In Advances in Neural Information Processing Systems. 3104–3112.
[20]
Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Łukasz Kaiser, and Illia Polosukhin. 2017. Attention is all you need. In Advances in Neural Information Processing Systems. 5998–6008.
[21]
Yiren Wang, Lijun Wu, Yingce Xia, Tao Qin, ChengXiang Zhai, and Tie-Yan Liu. 2020. Transductive ensemble learning for neural machine translation. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 34. 6291–6298.
[22]
Ronald J. Williams and David Zipser. 1989. A learning algorithm for continually running fully recurrent neural networks. Neural Computation 1, 2 (1989), 270–280.
[23]
Amir Zafar, Amit Prakash, Anoop Kumar, Ashish Vaswani, and Quoc V. Le. 2017. Code-mixed machine translation with hybrid SMT-NMT. arXiv preprint arXiv:1706.03762 (2017).
[24]
Rui Zhao, Jianfeng Gao, Jian Su, Qingsong Liu, and Jianfeng Lu. 2018. Neural code-switching machine translation. arXiv preprint arXiv:1801.07483 (2018).
[25]
Barret Zoph and Kevin Knight. 2016. Multi-source neural translation. CoRR abs/1601.00710 (2016). http://arxiv.org/abs/1601.00710

Index Terms

  1. Consensus-Based Machine Translation for Code-Mixed Texts

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Transactions on Asian and Low-Resource Language Information Processing
    ACM Transactions on Asian and Low-Resource Language Information Processing  Volume 23, Issue 3
    March 2024
    277 pages
    EISSN:2375-4702
    DOI:10.1145/3613569
    Issue’s Table of Contents

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 09 March 2024
    Accepted: 12 October 2023
    Revised: 03 September 2023
    Received: 05 April 2023
    Published in TALLIP Volume 23, Issue 3

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. Phrase-based machine translation
    2. neural machine translation
    3. consensus
    4. code-mixed
    5. parallel corpus
    6. neural network

    Qualifiers

    • Note

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • 0
      Total Citations
    • 127
      Total Downloads
    • Downloads (Last 12 months)127
    • Downloads (Last 6 weeks)4
    Reflects downloads up to 02 Mar 2025

    Other Metrics

    Citations

    View Options

    Login options

    Full Access

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Full Text

    View this article in Full Text.

    Full Text

    Figures

    Tables

    Media

    Share

    Share

    Share this Publication link

    Share on social media