Skip to main content

Addressing the Issue of Unavailability of Parallel Corpus Incorporating Monolingual Corpus on PBSMT System for English-Manipuri Translation

  • Conference paper
  • First Online:
Computational Linguistics and Intelligent Text Processing (CICLing 2018)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 13396))

  • 321 Accesses

Abstract

This research paper work establishes an important concept of improving Phrase based Statistical Machine Translation System incorporating monolingual corpus on the target side of the English to Manipuri translation language pair. However, there has been no work that focuses on translating one of the Indian Minority Tibeton-Burman Manipuri language pair. This Phrase based Statistical Machine Translation system has been developed using the Moses open-source toolkit and evaluated carefully using various automatic and human evaluation techniques. PBSMT achieves a BLEU Score of 10.15 as compared to the baseline PBSMT of BLEU Score 9.89 using the same training, tuning, and testing datasets. This research paper work addresses the issue of limited availability of parallel text corpora (English-Manipuri pair).

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

Notes

  1. 1.

    http://www.statmt.org/moses/?n=Development.GetStarted.

  2. 2.

    https://en.wikipedia.org/wiki/Languages-of-India.

  3. 3.

    http://e-pao.net/.

  4. 4.

    http://ildc.in/Manipuri/Mnindex.aspx.

References

  1. Antony, P.: Machine translation approaches and survey for Indian languages. Int. J. Comput. Linguist. Chin. Lang. Process. 18(1), 47–78 (2013)

    Google Scholar 

  2. Dave, S., Parikh, J., Bhattacharyya, P.: Interlingua-based English-Hindi machine translation and language divergence. Mach. Transl. 16(4), 251–304 (2001)

    Article  Google Scholar 

  3. Hoang, H., Koehn, P.: Design of the moses decoder for statistical machine translation. In: Software Engineering, Testing, and Quality Assurance for Natural Language Processing, pp. 58–65. Association for Computational Linguistics (2008)

    Google Scholar 

  4. Koehn, P.: Machine Translation System User Manual and Code Guide (2011)

    Google Scholar 

  5. Nießen, S., Ney, H.: Statistical machine translation with scarce resources using morpho-syntactic information. Comput. Linguist. 30(2), 181–204 (2004)

    Article  Google Scholar 

  6. Papineni, K., Roukos, S., Ward, T., Zhu, W.J.: BLEU: a method for automatic evaluation of machine translation. In: Proceedings of the 40th Annual Meeting on Association for Computational Linguistics, pp. 311–318. Association for Computational Linguistics (2002)

    Google Scholar 

  7. Ramanathan, A., Hegde, J., Shah, R.M., Bhattacharyya, P., Sasikumar, M.: Simple syntactic and morphological processing can help English-Hindi statistical machine translation. In: IJCNLP, pp. 513–520 (2008)

    Google Scholar 

  8. Resnik, P., Smith, N.A.: The web as a parallel corpus. Comput. Linguist. 29(3), 349–380 (2003)

    Article  Google Scholar 

  9. Singh, T.D.: Addressing some issues of data sparsity towards improving English-Manipuri SMT using morphological information. Monolingual Machine Translation p. 46

    Google Scholar 

  10. Singh, T.D., Bandyopadhyay, S.: Manipuri-English example based machine translation system. Int. J. Comput. Linguist. Appl. (IJCLA), ISSN pp. 0976–0962 (2010)

    Google Scholar 

  11. Utiyama, M., Isahara, H.: A comparison of pivot methods for phrase-based statistical machine translation. In: HLT-NAACL, pp. 484–491 (2007)

    Google Scholar 

Download references

Acknowledgments

I would like to express my deepest appreciation to the Technology Development for Indian Languages (TDIL) Programme, initiated by the Ministry of Electronics and Information Technology, Govt. of India for sharing the valuable parallel corpus on English to Manipuri Language pair and the monolingual corpus in Manipuri Language for this research paper. Furthermore, I would like to extend my heart full gratitude to the Department of Computer Science and Engineering, National Institute of Technology, Mizoram for providing me the required financial assistance and the laboratory facilities for conducting out the full experimental research works on this research paper.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Amika Achom .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2023 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Achom, A., Pakray, P., Gelbukh, A. (2023). Addressing the Issue of Unavailability of Parallel Corpus Incorporating Monolingual Corpus on PBSMT System for English-Manipuri Translation. In: Gelbukh, A. (eds) Computational Linguistics and Intelligent Text Processing. CICLing 2018. Lecture Notes in Computer Science, vol 13396. Springer, Cham. https://doi.org/10.1007/978-3-031-23793-5_25

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-23793-5_25

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-23792-8

  • Online ISBN: 978-3-031-23793-5

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics