skip to main content
10.1145/2909067.2909073acmotherconferencesArticle/Chapter ViewAbstractPublication PageswirConference Proceedingsconference-collections
research-article

Punjabi Stop Words: A Gurmukhi, Shahmukhi and Roman Scripted Chronicle

Authors Info & Claims
Published:21 March 2016Publication History

ABSTRACT

With advent of Unicode encoding, Punjabi language content, written using gurmukhi script as well as in shahmukhi script, is increasing day by day on internet. Processing textual information involves passing it to various pre-processing phases. Stop-word elimination is one such sub phase. 256 Gurmukhi stop words had been identified from poetry, stories and online material and passed to Punjabi stemmer. After stemming, 184 stemmed stop words were generated and these stemmed stop words were passed to transliteration phase. This led to generation of stop words in shahmukhi script. For the first time in scientific community dealing with computational linguistics and literature processing using NLP techniques, the list of 184 stop words of Punjabi language is released for public usage and further NLP applications. The presented list consists of stop words of Punjabi language with their Gurmukhi, Shahmukhi as well as Roman scripted forms.

References

  1. Alhadidi B. and Alwedyan, M. 2008. Hybrid Stop Word Removal Technique for Arabic language, Egyptian Computer Science Journal. 30,1.Google ScholarGoogle Scholar
  2. Article Overview of Punjabi Grammar accessed from http://punjabi.aglsoft.com/punjabi/learngrammar/?show=conjunction.Google ScholarGoogle Scholar
  3. Article Punjabi Language accessed from http://en.wikipedia.org/wiki/Punjabi_language. On November 2014.Google ScholarGoogle Scholar
  4. Article stemming accessed from http://en.wikipedia.org/wiki/Stemming.Google ScholarGoogle Scholar
  5. Article Transliteration accessed from https://en.wikipedia.org/wiki/Transliteration.Google ScholarGoogle Scholar
  6. Bhatia, Tej K. 1993. Punjabi: A Cognitive-Descriptive Grammar. Routledge Descriptive Grammar Series.Google ScholarGoogle Scholar
  7. El-Khair I. A., 2006 Effect of Stop Words Elimination for Arabic Information Retrieval: A Comparative Study, International Journal of Computing & Information Sciences, 4, 3.Google ScholarGoogle Scholar
  8. Feldman R. and Sanger J. 2007 The text mining handbook, Cambridge university press.Google ScholarGoogle Scholar
  9. Gupta V. 2014 Automatic Stemming of Words for Punjabi Language. Advances in Signal Processing and Intelligent Recognition systems, Advances in Intelligent Systems and Computing, 264, 73--84.Google ScholarGoogle ScholarCross RefCross Ref
  10. Gupta V. And Lehal G.S. 2011 Preprocessing Phase of Punjabi Language Text Summarization, International Conference on Information System for Indian Languages, 139, 250--253.Google ScholarGoogle ScholarCross RefCross Ref
  11. Gurmukhi to Shahmukhi Transliteration System available at http://g2s.learnpunjabi.org/default.aspx.Google ScholarGoogle Scholar
  12. Hao, L. and Hao, L. 2008 Automatic Identification of StopWords in Chinese Text Classification, International Conference on Computer Science and Software Engineering, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Kaur J., and Saini J.R 2015. POS based word class categorization of Gurumukhi language stemmed stop words. International Conference in Information Communication Technology for Intelligent System, Smart Innovation in Smart Technology Springer, November 2015(in print).Google ScholarGoogle Scholar
  14. Kaur J. and Saini JR, 2014 A Study and Analysis of Opinion Mining Research in Indo-Aryan, Dravidian and Tibeto-Burman Language Families. International Journal of Data Mining and Emerging Technologies ISSN 2249-3220, 4, 2, 53--60.Google ScholarGoogle Scholar
  15. Kaur J., and Saini J.R., 2015 A Natural Language Processing Approach for Identification of Stop Words in Punjabi Language. International Journal of Data Mining and Emerging Technologies; ISSN: 2249-3212 (eISSN: 2249-3220), 5, 2, 114--120.Google ScholarGoogle Scholar
  16. Myerson R.B., 1996 Fundamentals of social choice theory.Google ScholarGoogle Scholar
  17. Saini J.R. 2009 Self learning taxonomical classification system using vector space document analysis model for web text mining in UBE, Ph. D. Thesis under guidance of Desai A.A., accepted by the Department of Computer Science. VNSGU, Surat.Google ScholarGoogle Scholar
  18. Savoy J., 1999 A Stemming Procedure And Stopword List For General French Corpora, Journal of the American Society for Information Science, 50, 10, 944--952. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Yao Z. and Ze-wen C. 2011, Research on the construction and filter method of stop-word list in text Preprocessing, Fourth International Conference on Intelligent Computation Technology and Automation, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Zheng G. And Gaowa G., 2010 The Selection of Mongolian Stop Words, IEEE International Conference on Intelligent Computing and Intelligent Systems (ICIS), 2010.Google ScholarGoogle ScholarCross RefCross Ref
  21. Zou F., Wang F. L., Deng X., Han S. and Wang L. S.,2006 Automatic Construction of Chinese Stop Word List. Proceedings of the 5th WSEAS International Conference on Applied Computer Science, Hangzhou, China, April 16-18, 1010-1015. Google ScholarGoogle ScholarDigital LibraryDigital Library

Recommendations

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Sign in
  • Published in

    cover image ACM Other conferences
    WIR '16: Proceedings of the ACM Symposium on Women in Research 2016
    March 2016
    179 pages
    ISBN:9781450342780
    DOI:10.1145/2909067

    Copyright © 2016 ACM

    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    • Published: 21 March 2016

    Permissions

    Request permissions about this article.

    Request Permissions

    Check for updates

    Qualifiers

    • research-article
    • Research
    • Refereed limited

    Acceptance Rates

    WIR '16 Paper Acceptance Rate35of117submissions,30%Overall Acceptance Rate35of117submissions,30%

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader