Reference Hub2
Document Alignment for Generation of English-Punjabi Comparable Corpora from Wikipedia

Document Alignment for Generation of English-Punjabi Comparable Corpora from Wikipedia

Vishal Goyal, Ajit Kumar, Manpreet Singh Lehal
Copyright: © 2020 |Volume: 12 |Issue: 1 |Pages: 10
ISSN: 1937-9633|EISSN: 1937-9641|EISBN13: 9781799805656|DOI: 10.4018/IJEA.2020010104
Cite Article Cite Article

MLA

Goyal, Vishal, et al. "Document Alignment for Generation of English-Punjabi Comparable Corpora from Wikipedia." IJEA vol.12, no.1 2020: pp.42-51. http://doi.org/10.4018/IJEA.2020010104

APA

Goyal, V., Kumar, A., & Lehal, M. S. (2020). Document Alignment for Generation of English-Punjabi Comparable Corpora from Wikipedia. International Journal of E-Adoption (IJEA), 12(1), 42-51. http://doi.org/10.4018/IJEA.2020010104

Chicago

Goyal, Vishal, Ajit Kumar, and Manpreet Singh Lehal. "Document Alignment for Generation of English-Punjabi Comparable Corpora from Wikipedia," International Journal of E-Adoption (IJEA) 12, no.1: 42-51. http://doi.org/10.4018/IJEA.2020010104

Export Reference

Mendeley
Favorite Full-Issue Download

Abstract

Comparable corpora come as an alternative to parallel corpora for the languages where the parallel corpora is scarce. The efficiency of the models trained on comparable corpora is comparatively less to that of the parallel corpora however it helps to compensate much to the machine translation. In this article, the authors have explored Wikipedia as a potential source and delineated the process of alignment of documents which will be further used for the extraction of parallel data. The parallel data thus extracted will help to enhance the performance of Statistical Machine translation.

Request Access

You do not own this content. Please login to recommend this title to your institution's librarian or purchase it from the IGI Global bookstore.