ABSTRACT
With advent of Unicode encoding, Punjabi language content, written using gurmukhi script as well as in shahmukhi script, is increasing day by day on internet. Processing textual information involves passing it to various pre-processing phases. Stop-word elimination is one such sub phase. 256 Gurmukhi stop words had been identified from poetry, stories and online material and passed to Punjabi stemmer. After stemming, 184 stemmed stop words were generated and these stemmed stop words were passed to transliteration phase. This led to generation of stop words in shahmukhi script. For the first time in scientific community dealing with computational linguistics and literature processing using NLP techniques, the list of 184 stop words of Punjabi language is released for public usage and further NLP applications. The presented list consists of stop words of Punjabi language with their Gurmukhi, Shahmukhi as well as Roman scripted forms.
- Alhadidi B. and Alwedyan, M. 2008. Hybrid Stop Word Removal Technique for Arabic language, Egyptian Computer Science Journal. 30,1.Google Scholar
- Article Overview of Punjabi Grammar accessed from http://punjabi.aglsoft.com/punjabi/learngrammar/?show=conjunction.Google Scholar
- Article Punjabi Language accessed from http://en.wikipedia.org/wiki/Punjabi_language. On November 2014.Google Scholar
- Article stemming accessed from http://en.wikipedia.org/wiki/Stemming.Google Scholar
- Article Transliteration accessed from https://en.wikipedia.org/wiki/Transliteration.Google Scholar
- Bhatia, Tej K. 1993. Punjabi: A Cognitive-Descriptive Grammar. Routledge Descriptive Grammar Series.Google Scholar
- El-Khair I. A., 2006 Effect of Stop Words Elimination for Arabic Information Retrieval: A Comparative Study, International Journal of Computing & Information Sciences, 4, 3.Google Scholar
- Feldman R. and Sanger J. 2007 The text mining handbook, Cambridge university press.Google Scholar
- Gupta V. 2014 Automatic Stemming of Words for Punjabi Language. Advances in Signal Processing and Intelligent Recognition systems, Advances in Intelligent Systems and Computing, 264, 73--84.Google ScholarCross Ref
- Gupta V. And Lehal G.S. 2011 Preprocessing Phase of Punjabi Language Text Summarization, International Conference on Information System for Indian Languages, 139, 250--253.Google ScholarCross Ref
- Gurmukhi to Shahmukhi Transliteration System available at http://g2s.learnpunjabi.org/default.aspx.Google Scholar
- Hao, L. and Hao, L. 2008 Automatic Identification of StopWords in Chinese Text Classification, International Conference on Computer Science and Software Engineering, 2008. Google ScholarDigital Library
- Kaur J., and Saini J.R 2015. POS based word class categorization of Gurumukhi language stemmed stop words. International Conference in Information Communication Technology for Intelligent System, Smart Innovation in Smart Technology Springer, November 2015(in print).Google Scholar
- Kaur J. and Saini JR, 2014 A Study and Analysis of Opinion Mining Research in Indo-Aryan, Dravidian and Tibeto-Burman Language Families. International Journal of Data Mining and Emerging Technologies ISSN 2249-3220, 4, 2, 53--60.Google Scholar
- Kaur J., and Saini J.R., 2015 A Natural Language Processing Approach for Identification of Stop Words in Punjabi Language. International Journal of Data Mining and Emerging Technologies; ISSN: 2249-3212 (eISSN: 2249-3220), 5, 2, 114--120.Google Scholar
- Myerson R.B., 1996 Fundamentals of social choice theory.Google Scholar
- Saini J.R. 2009 Self learning taxonomical classification system using vector space document analysis model for web text mining in UBE, Ph. D. Thesis under guidance of Desai A.A., accepted by the Department of Computer Science. VNSGU, Surat.Google Scholar
- Savoy J., 1999 A Stemming Procedure And Stopword List For General French Corpora, Journal of the American Society for Information Science, 50, 10, 944--952. Google ScholarDigital Library
- Yao Z. and Ze-wen C. 2011, Research on the construction and filter method of stop-word list in text Preprocessing, Fourth International Conference on Intelligent Computation Technology and Automation, 2011. Google ScholarDigital Library
- Zheng G. And Gaowa G., 2010 The Selection of Mongolian Stop Words, IEEE International Conference on Intelligent Computing and Intelligent Systems (ICIS), 2010.Google ScholarCross Ref
- Zou F., Wang F. L., Deng X., Han S. and Wang L. S.,2006 Automatic Construction of Chinese Stop Word List. Proceedings of the 5th WSEAS International Conference on Applied Computer Science, Hangzhou, China, April 16-18, 1010-1015. Google ScholarDigital Library
Recommendations
Forward-backward Transliteration of Punjabi Gurmukhi Script Using N-gram Language Model
Transliterating the text of a language to a foreign script is called forward transliteration and transliterating the text back to the original script is called backward transliteration. In this work, we perform both forward as well as backward ...
Named Entity Recognition and Classification for Punjabi Shahmukhi
Named entity recognition (NER) refers to the identification of proper nouns from natural language text and classifying them into named entity types, such as person, location, and organization. Due to the widespread applications of NER, numerous NER ...
Punjabi to ISO 15919 and Roman Transliteration with Phonetic Rectification
Transliteration removes the script barriers. Unfortunately, Punjabi is written in four different scripts, i.e., Gurmukhi, Shahmukhi, Devnagri, and Latin. The Latin script is understandable for nearly all factions of the Punjabi community. The objective ...
Comments