Addressing the Issue of Unavailability of Parallel Corpus Incorporating Monolingual Corpus on PBSMT System for English-Manipuri Translation

Achom, Amika; Pakray, Partha; Gelbukh, Alexander

doi:10.1007/978-3-031-23793-5_25

Amika Achom⁸,
Partha Pakray⁸ &
Alexander Gelbukh⁹

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 13396))

Included in the following conference series:

International Conference on Computational Linguistics and Intelligent Text Processing

321 Accesses

Abstract

This research paper work establishes an important concept of improving Phrase based Statistical Machine Translation System incorporating monolingual corpus on the target side of the English to Manipuri translation language pair. However, there has been no work that focuses on translating one of the Indian Minority Tibeton-Burman Manipuri language pair. This Phrase based Statistical Machine Translation system has been developed using the Moses open-source toolkit and evaluated carefully using various automatic and human evaluation techniques. PBSMT achieves a BLEU Score of 10.15 as compared to the baseline PBSMT of BLEU Score 9.89 using the same training, tuning, and testing datasets. This research paper work addresses the issue of limited availability of parallel text corpora (English-Manipuri pair).

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 69.99; Price excludes VAT (USA)

Softcover Book: USD 89.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Statistical machine translation of Indian languages: a survey

Article 17 November 2017

Phrase-Based English–Nyishi Machine Translation

Indowordnet’s help in Indian language machine translation

Article 06 September 2019

Notes

References

Antony, P.: Machine translation approaches and survey for Indian languages. Int. J. Comput. Linguist. Chin. Lang. Process. 18(1), 47–78 (2013)
Google Scholar
Dave, S., Parikh, J., Bhattacharyya, P.: Interlingua-based English-Hindi machine translation and language divergence. Mach. Transl. 16(4), 251–304 (2001)
Article Google Scholar
Hoang, H., Koehn, P.: Design of the moses decoder for statistical machine translation. In: Software Engineering, Testing, and Quality Assurance for Natural Language Processing, pp. 58–65. Association for Computational Linguistics (2008)
Google Scholar
Koehn, P.: Machine Translation System User Manual and Code Guide (2011)
Google Scholar
Nießen, S., Ney, H.: Statistical machine translation with scarce resources using morpho-syntactic information. Comput. Linguist. 30(2), 181–204 (2004)
Article Google Scholar
Papineni, K., Roukos, S., Ward, T., Zhu, W.J.: BLEU: a method for automatic evaluation of machine translation. In: Proceedings of the 40th Annual Meeting on Association for Computational Linguistics, pp. 311–318. Association for Computational Linguistics (2002)
Google Scholar
Ramanathan, A., Hegde, J., Shah, R.M., Bhattacharyya, P., Sasikumar, M.: Simple syntactic and morphological processing can help English-Hindi statistical machine translation. In: IJCNLP, pp. 513–520 (2008)
Google Scholar
Resnik, P., Smith, N.A.: The web as a parallel corpus. Comput. Linguist. 29(3), 349–380 (2003)
Article Google Scholar
Singh, T.D.: Addressing some issues of data sparsity towards improving English-Manipuri SMT using morphological information. Monolingual Machine Translation p. 46
Google Scholar
Singh, T.D., Bandyopadhyay, S.: Manipuri-English example based machine translation system. Int. J. Comput. Linguist. Appl. (IJCLA), ISSN pp. 0976–0962 (2010)
Google Scholar
Utiyama, M., Isahara, H.: A comparison of pivot methods for phrase-based statistical machine translation. In: HLT-NAACL, pp. 484–491 (2007)
Google Scholar

Download references

Acknowledgments

I would like to express my deepest appreciation to the Technology Development for Indian Languages (TDIL) Programme, initiated by the Ministry of Electronics and Information Technology, Govt. of India for sharing the valuable parallel corpus on English to Manipuri Language pair and the monolingual corpus in Manipuri Language for this research paper. Furthermore, I would like to extend my heart full gratitude to the Department of Computer Science and Engineering, National Institute of Technology, Mizoram for providing me the required financial assistance and the laboratory facilities for conducting out the full experimental research works on this research paper.

Author information

Authors and Affiliations

National Institute of Technology, Aizawl, Mizoram, India
Amika Achom & Partha Pakray
Centro de Investigación en Computación (CIC) of the Instituto Politécnico Nacional (IPN), Mexico, Mexico
Alexander Gelbukh

Authors

Amika Achom
View author publications
You can also search for this author in PubMed Google Scholar
Partha Pakray
View author publications
You can also search for this author in PubMed Google Scholar
Alexander Gelbukh
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Amika Achom .

Editor information

Editors and Affiliations

Instituto Politécnico Nacional, Mexico City, Mexico
Alexander Gelbukh

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Achom, A., Pakray, P., Gelbukh, A. (2023). Addressing the Issue of Unavailability of Parallel Corpus Incorporating Monolingual Corpus on PBSMT System for English-Manipuri Translation. In: Gelbukh, A. (eds) Computational Linguistics and Intelligent Text Processing. CICLing 2018. Lecture Notes in Computer Science, vol 13396. Springer, Cham. https://doi.org/10.1007/978-3-031-23793-5_25

Download citation

DOI: https://doi.org/10.1007/978-3-031-23793-5_25
Published: 26 February 2023
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-23792-8
Online ISBN: 978-3-031-23793-5
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Addressing the Issue of Unavailability of Parallel Corpus Incorporating Monolingual Corpus on PBSMT System for English-Manipuri Translation

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Statistical machine translation of Indian languages: a survey

Phrase-Based English–Nyishi Machine Translation

Indowordnet’s help in Indian language machine translation

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Subscribe and save

Buy Now

Navigation

Addressing the Issue of Unavailability of Parallel Corpus Incorporating Monolingual Corpus on PBSMT System for English-Manipuri Translation

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Statistical machine translation of Indian languages: a survey

Phrase-Based English–Nyishi Machine Translation

Indowordnet’s help in Indian language machine translation

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation