Skip to main content
Log in

Associating targets with SentiUnits: a step forward in sentiment analysis of Urdu text

  • Published:
Artificial Intelligence Review Aims and scope Submit manuscript

Abstract

This paper presents, a grammatically motivated, sentiment classification model, applied on a morphologically rich language: Urdu. The morphological complexity and flexibility in grammatical rules of this language require an improved or altogether different approach. We emphasize on the identification of the SentiUnits, rather than, the subjective words in the given text. SentiUnits are the sentiment carrier expressions, which reveal the inherent sentiments of the sentence for a specific target. The targets are the noun phrases for which an opinion is made. The system extracts SentiUnits and the target expressions through the shallow parsing based chunking. The dependency parsing algorithm creates associations between these extracted expressions. For our system, we develop sentiment-annotated lexicon of Urdu words. Each entry of the lexicon is marked with its orientation (positive or negative) and the intensity (force of orientation) score. For the evaluation of the system, two corpora of reviews, from the domains of movies and electronic appliances are collected. The results of the experimentation show that, we achieve the state of the art performance in the sentiment analysis of the Urdu text.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Similar content being viewed by others

References

  • Abbasi A, Chen H, Salem A (2008) Sentiment analysis in multiple languages: feature selection for opinion classification in web forums. ACM Trans Inf Syst, pp 1–34

  • Abdul-Mageed M, Korayem M (2010) Automatic identification of subjectivity in morphologically rich languages: the case of Arabic. In: Proceedings of the 1st workshop on computational approaches to subjectivity and sentiment analysis (WASSA), Lisbon pp 2–6

  • Andreevskaia A, Bergler S (2006) Mining WordNet for fuzzy sentiment: sentiment tag extraction from WordNet glosses. In Proceedings of the 11th conference of the European chapter of the association for computational linguistics, EACL-2006, Trent, pp 209–216

  • Annet M, Kondrak G (2008) A comparison of sentiment analysis techniques: polarizing movie blogs. In: Proceedings of Canadian AI, pp 25–35

  • Baker P, Hardie A, McEnery T, Jayaram BD (2003) Corpus data for South Asian language processing. In: Proceedings of the EACL workshop on South Asian languages, Budapest

  • Bansal M, Cardie C, Lee L (2008) The power of negative thinking: exploring label disagreement in the min cut classification framework, Manchester. In: Proceedings of COLING pp 13–16

  • Bloom K, Argamon S (2010) Unsupervised extraction of appraisal expressions. In: Proceedings of Canadian AI, Ottawa, pp 290–294

  • Breck E, Choi Y, Cardie C (2007) Identifying expressions of opinion in context. In: Proceedings of IJCAI’07. Menlo Park, CA, pp 2683–2688

  • Choi Y, Cardie C (2008) Learning with compositional semantics as structural inference for subsentential sentiment analysis. In: Proceedings of the conference on empirical methods in natural language processing, Honolulu, HI, pp 793–801

  • Crilley K (2001) Information warfare: new battle fields, terrorists, propaganda, and the Internet. ASLIB Proc 53(7): 250–264

    Article  Google Scholar 

  • Dave K, Lawrence S, Pennock DM (2003) Mining the peanut gallery: opinion extraction and semantic classification of product reviews. In: Proceedings of the twelfth international world wide web conference (WWW 2003), Budapest, pp 519–528

  • Durrani N, Hussain S (2010) Urdu word segmentation. In: Proceedings of 11th annual conference of the North American chapter of the association for computational linguistics, Los Angeles

  • Glaser J, Dixit J, Green DP (2002) Studying hate crime with the Internet: What makes racists advocate racial violence?. J Soc Issues 58(1): 177–193

    Article  Google Scholar 

  • Hardie A (2003) Developing a tagset for automated part-of-speech tagging in Urdu. In: Proceedings of the conference of the corpus linguistics, Lancaster

  • Hatzivassiloglou V, McKeown KR (1997) Predicting the semantic orientation of adjectives. In: Proceedings of ACL’97. Stroudsburg, PA, pp 174–181

  • Hatzivassiloglou V, Wiebe JM (2000) Effects of adjective orientation and gradability on sentence subjectivity. In Proceedings of the 18th international conference on computational linguistics, New Brunswick, NJ

  • Higashinaka R, Prasad R, Walker MA (2006) Learning to generate naturalistic utterances using reviews in spoken dialogue systems. In: Proceedings of the 21st international conference on computational linguistics and 44th annual meeting of the ACL, Sydney, pp 265–272

  • Hu M, Liu B (2004) Mining and summarizing customer reviews. In Proceedings of SIGKDD’04, pp 168–177

  • Humayoun M, Hammarström H, Ranta A (2007) Urdu morphology, orthography and lexicon extraction. In: Proceedings of the 2nd workshop on computational approaches to Arabic script-based languages. Stanford, USA, pp 59–66

  • Ijaz M, Hussain S (2007) Corpus based Urdu lexicon development. In: Proceedings of the conference on language technology, University of Peshawar, Pakistan

  • Jang H, Shin H (2010) Language-specific sentiment analysis in morphologically rich languages. In: Proceedings of the COLING Poster Volume, Beijing, pp 498–506

  • Kaji N, Kitsuregawa M (2007) Building lexicon for sentiment analysis from massive collection of html documents. In: Proceedings of EMNLP’07, pp 1075–1083

  • Kamps J, Marx M, Mokken RJ, de Rijke M (2004) Using Wordnet to measure semantic orientation of adjectives. In Proceedings of LREC’04, pp 1115–1118

  • Kennedy A, Inkpen D (2006) Sentiment classification of movie and product reviews using contextual valence shifters. Comput Intell 22(2): 110–125

    Article  MathSciNet  Google Scholar 

  • Kim S-M, Hovy E (2006) Automatic identification of pro and con reasons in online reviews. In: Proceedings of the COLING, Sydney pp 483–490

  • Lehal GS (2009) A two stage word segmentation system for handling space insertion problem in Urdu script. In: Proceedings of world academy of science, engineering and technology, Bangkok pp 321–324

  • Lehal GS (2010) A word segmentation system for handling space omission problem in Urdu script. In: Proceedings of the 1st workshop on South and Southeast Asian natural language processing (WSSANLP), the 23rd international conference on computational linguistics, COLING, Beijing, pp 43–50

  • Muaz A, Ali A, Hussain S (2009) Analysis and development of Urdu POS tagged corpora. In: Proceedings of the 7th workshop on Asian language resources, ACL-IJCNLP, Suntec, Singapore, pp 24–31

  • Mukund S, Ghosh D, Srihari RK (2010) Using cross-lingual projections to generate semantic role labeled corpus for Urdu—a resource poor language. In: Proceeding of the 23rd international conference on computational linguistics COLING, Beijing pp 797–805

  • Mullen T, Collier N (2004) Sentiment analysis using support vector machines with diverse information sources. In: Proceedings of the conference on empirical methods in natural language processing, Barcelona, pp 412–418

  • Na J-C, Sui H, Khoo C, Chan S, Zhou Y (2004) Effectiveness of simple linguistic processing in automatic sentiment classification of product reviews. In: Proceedings of conference of the international society of knowledge organization (ISKO), pp 49–54

  • Pang B, Lee L (2004) A sentimental education: sentiment analysis using subjectivity summarization based on minimum cuts. In: Proceedings of the 42nd meeting of the association for computational linguistics, Barcelona, pp 271–278

  • Pang B, Lee L (2008) Opinion mining and sentiment analysis. Found Trends Inf Retrieval 2(1–2): 1–135

    Article  Google Scholar 

  • Pang B, Lee L, Vaithyanathan S (2002) Thumbs up? Sentiment classification using machine learning techniques. In: Proceedings of the conference on empirical methods in NLP, Philadelphia, PA, pp 79–86

  • Riloff E, Wiebe J (2003) Learning extraction patterns for subjective expressions. In: Proceedings of the conference on empirical methods in natural language processing (EMNLP), Sapporo pp 25–32

  • Riloff E, Wiebe J, Wilson T (2003) Learning subjective nouns using extraction pattern bootstrapping. In: Proceedings of the 7th conference on natural language learning, Edmonton, pp 25–32

  • Rizvi SMJ, Hussain M (2005) Modeling case marking systems of Urdu-Hindi languages by using semantic information. In: Proceedings of natural language processing and knowledge engineering, pp 85–90

  • Schmidt RL (1999) Urdu: an essential grammar. Routledge Publishing, New York

    Google Scholar 

  • Snyder B, Barzilay R (2007) Multiple aspect ranking using the Good Grief algorithm. In: Proceedings of the joint human language technology/North American chapter of the ACL conference, Rochester, NY pp 300–307

  • Stone PJ, Dunphy DC, Smith MS, Ogilvie DM (1966) The general inquirer: a computer approach to content analysis. MIT Press, Cambridge

    Google Scholar 

  • Syed AZ, Muhammad A, Martínez-Enríquez AM (2010) Lexicon based sentiment analysis of Urdu text using SentiUnits. In: Proceedings of the 9th Mexican international conference of artificial intelligence, Pachuca, Mexico, pp 32–43

  • Tan S, Cheng X, Wang Y, Xu H (2009) Adapting Naive Bayes to domain adaptation for sentiment analysis. In: Proceedings of the 31st European conference on IR research on advances in information retrieval, pp 337–349

  • Tsarfaty R, Seddah D, Goldberg Y, Kübler S, Candito M, Foster J, Versley Y, Rehbein I, Tounsi L (2010) Statistical parsing of morphologically rich languages (SPMRL) what, how and whither. In: Proceedings of the NAACL HLT 2010 first workshop on statistical parsing of morphologically-rich languages, Los Angeles, pp 1–12

  • Turney P (2002) Thumbs up or thumbs down? Semantic orientation applied to unsupervised classification of reviews. In: Proceedings of 40th meeting of the association for computational linguistics, Philadelphia, PA, pp 417–424

  • Turney P, Littman M (2003) Measuring praise and criticism: inference of semantic orientation from association. ACM Trans Inf Syst 21(4): 315–346

    Article  Google Scholar 

  • Whitelaw C, Garg N, Argamon S (2005) Using appraisal groups for sentiment analysis. In: Proceedings of ACM SIGIR conference on information and knowledge management (CIKM 2005), Bremen, pp 625–631

  • Wiebe J, Wilson T, Bruce R, Bell M, Martin M (2004) Learning subjective language. Comput Linguist 30(3): 277–308

    Article  Google Scholar 

  • Yang K, Yu N, Valerio A, Zhang H (2006) WIDIT in TREC 2006 Blog Track. In: Proceedings of Text REtrieval conference—TREC

  • Yu H, Hatzivassiloglou V (2003) Towards answering opinion questions: separating facts from opinions and identifying the polarity of opinion sentences. In: Proceedings of EMNLP’03, pp 129–136

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Afraz Z. Syed.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Syed, A.Z., Aslam, M. & Martinez-Enriquez, A.M. Associating targets with SentiUnits: a step forward in sentiment analysis of Urdu text. Artif Intell Rev 41, 535–561 (2014). https://doi.org/10.1007/s10462-012-9322-6

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10462-012-9322-6

Keywords

Navigation