Abstract:
Natural Language Processing (NLP) has been applied to machine translation, chatbots, speech recognition, question and answer systems, document summarization and so on. Th...Show MoreMetadata
Abstract:
Natural Language Processing (NLP) has been applied to machine translation, chatbots, speech recognition, question and answer systems, document summarization and so on. The Dzongkha language of Bhutan, however, has not been considered in NLP systems, due, presumably, to the fact that the language is complex and written as a string of syllables without proper word boundaries. Thus, Dzongkha word segmentation is the essential first step in building the NLP applications. The novelty of our research is in applying Deep Learning to the task of Dzongkha word segmentation, avoiding the need for manual feature engineering. The segmentation problem is formulated as a syllable tagging task. We also incorporate the windows approach where the tag of a syllable depends on its surrounding syllables. Two sets of experiments were designed, with four models of varying context sizes in each set. We evaluated our models using the syllable-tagged-corpus prepared by Dzongkha Development Commission. The model with context size 2 achieved the highest F-score of 94.40% with 94.47% Precision and 94.35% Recall.
Date of Conference: 29 January 2020 - 01 February 2020
Date Added to IEEE Xplore: 09 April 2020
ISBN Information:
Print on Demand(PoD) ISSN: 2374-314X