ABSTRACT
There are various approaches to the problem of assigning each word of a text with a parts-of-speech tag, which is known as Part-Of-Speech (POS) tagging. This article presents a comprehensive study and comparison of two different techniques of Part-of-Speech (POS) Tagging for Nepali text viz. Hidden Markov Model (HMM) and General Regression Neural Network (GRNN) based. The POS taggers resolves the problem of ambiguity in POS tagging of Nepali text through two different approaches. The evaluation of the taggers are done on the corpora developed and provided by TDIL (Technology Development for Indian Languages). Apart from corpora, python and Java programming languages and the NLTK Toolkit library has been used for implementation. Both the tagger achieves accuracy of 100 percent for known words (with no ambiguity), 58.29 percent (HMM) and 60.45 percent (GRNN) for ambiguous words and 85.36 percent (GRNN) for non- ambiguous unknown words.
- Jayaraj Acharya. 1991. A Descriptive Grammar of Nepali and an Analyzed Corpus (1st. ed.). Georgetown University Press, Washington, D.C.Google Scholar
- Bal K. Bal. 2004. Structure of Nepali Grammar (1st. ed.). Madan Puraskar Pustakalaya, Nepal.Google Scholar
- Asif Ekbal, Rejwanul Haque, and Sivaji Bandyopadhyay. 2008. Maximum Entropy Based Bengali Part of Speech Tagging. Advances in Natural Language Processing and Applications Research in Computing Science 33 (2008), 67–78.Google Scholar
- David G. Forney. 1973. The viterbi algorithm. In Proceedings of the IEEE (3), Vol. 61. IEEE, 268–278. https://doi.org/10.1109/PROC. 1973.9030Google ScholarCross Ref
- Fahim M. Hasan, Naushad UzZaman, and Mumit Khan. 2007. Compar- ison of different POS Tagging Techniques (n-gram, HMM and Brill's tagger) for Bangla. Advances and Innovations in Systems, Comput- ing Sciences and Software Engineering (Springer) (2007), 121–126. https://doi.org/10.1007/978-1-4020-6264-3_23Google Scholar
- Simon Haykin. 1999. Neural Networks A Comprehensive Foundation (2nd. ed.). G Prentice Hall International, Inc., New Jersey.Google Scholar
- Nisheeth Joshi, Hemant Darbari, and Iti Mathur. 2013. HMM BASED POS TAGGER FOR HINDI.Google Scholar
- Andrew MacKinlay. 2005. The Effects of Part-of-Speech Tagsets on Tagger Performance (Bachelor's thesis). Master's thesis. University of Melbourne, Melbourne, Australia.Google Scholar
- Indian Language Technology Proliferation and Deployment Center. 2019. . Retrieved 2018 from http://tdil-dc.in/index.php?lang=enGoogle Scholar
- FA Shamsi and Ahmed Guessoum. [n.d.]. A Hidden Markov Model-Based POS Tagger for Arabic. In proceedings of 8th International Conference on Textual Data Statistical Analysis.Google Scholar
- Tanveer Siddiqui and Uma S. Tiwary. 2008. Natural Language Processing and Information Retrieval (1st. ed.). Oxford University Press, United Kingdom.Google Scholar
- Tien-Ping Tan, Bali R. Malançon, Laurent Besacier, Yin-Lai Yeong, Keng H. Gan, and Enya K. Tang. 2018. Evaluating LSTM Networks, HMM and WFST in Malay Part-of-Speech Tagging. Journal of elecommunication, Electronic and Computer Engineering 9, 2 (2018), 79–83.Google Scholar
- Scott M. Thede and Mary P. Harper. 1999. Second-Order Hidden Markov Model for Part-of-Speech Tagging. In Proceedings of the 37th Annual Meeting of the Association for Computational Linguistics on Computational Linguistics. Association for Computational Linguistics, College Park, Maryland, USA, 175–182.Google Scholar
- Archit Yajnik. [n.d.]. General Regression Neural Network Based PoS Tagging for Nepali Text. In Dhinaharan Nagamalai et al. (Eds) : NATL, CSEA, DMDBS, Fuzzy, ITCON, NSEC, COMIT - 2018.Google ScholarCross Ref
Index Terms
- Probabilistic and Neural Network Based POS Tagging of Ambiguous Nepali text: A Comparative Study
Recommendations
Part-of-Speech (POS) Tagging Using Deep Learning-Based Approaches on the Designed Khasi POS Corpus
Part-of-speech (POS) tagging is one of the research challenging fields in natural language processing (NLP). It requires good knowledge of a particular language with large amounts of data or corpora for feature engineering, which can lead to achieving a ...
A Comparative Study on the Efficiency of POS Tagging Techniques on Amazigh Corpus
NISS '19: Proceedings of the 2nd International Conference on Networking, Information Systems & SecurityPart-of-speech (POS) tagging is a fundamental task of Natural Language Processing (NLP). It provides useful information for many other NLP tasks, including word sense disambiguation, text chunking, named entity recognition, syntactic parsing, semantic ...
Unsupervised Joint PoS Tagging and Stemming for Agglutinative Languages
The number of possible word forms is theoretically infinite in agglutinative languages. This brings up the out-of-vocabulary (OOV) issue for part-of-speech (PoS) tagging in agglutinative languages. Since inflectional morphology does not change the PoS ...
Comments