Building Spell-Check Dictionary for Low-Resource Language by Comparing Word Usage | IEEE Conference Publication | IEEE Xplore

Building Spell-Check Dictionary for Low-Resource Language by Comparing Word Usage


Abstract:

Each language has its own vocabulary which is spoken by a corresponding group of speakers. There are generally languages that have better resources and thus Natural Langu...Show More

Abstract:

Each language has its own vocabulary which is spoken by a corresponding group of speakers. There are generally languages that have better resources and thus Natural Language Processing methods typically perform generally better for such languages; whereas on other hand, in the case of a large number of low-resource languages - there is a lack of sufficient annotated data that can be used in order to efficiently use the unsupervised methods for NLP tasks. As a result, a spell checker is a necessity for composing any documentation in a language; typically, by identifying words that are typologically and grammatically correct as well as misspelled words in such a language. The aim of this paper is to present a spell-check dictionary for the Albanian language by comparing word usage among various texts. Furthermore, it aims to do so by defining words to be entered in the dictionary from a large text collection taken from experiments and then conducting a comparison review of word usage frequency. The corpora include 49k sentences for the Albanian language of different fields such as computer science, economics, law, medicine, politics, tourism, art, psychology, etc. This spell-check dictionary would further contribute to the ease of use of the Albanian language in electronic media. Noting that the Albanian language is a low-resource language, another aim of this paper and related further research relates to building a larger and better corpus of Albanian language on top of which the spell-checking dictionary could be continuously advanced and perfected.
Date of Conference: 27 September 2021 - 01 October 2021
Date Added to IEEE Xplore: 15 November 2021
ISBN Information:
Electronic ISSN: 2623-8764
Conference Location: Opatija, Croatia

Contact IEEE to Subscribe

References

References is not available for this document.