Lexicon Based Sentiment Analysis of Urdu Text Using SentiUnits

Syed, Afraz Z.; Aslam, Muhammad; Martinez-Enriquez, Ana Maria

doi:10.1007/978-3-642-16761-4_4

Lexicon Based Sentiment Analysis of Urdu Text Using SentiUnits

Afraz Z. Syed²²,
Muhammad Aslam²² &
Ana Maria Martinez-Enriquez²³

Conference paper

1601 Accesses
24 Citations

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 6437))

Abstract

Like other languages, Urdu websites are becoming more popular, because the people prefer to share opinions and express sentiments in their own language. Sentiment analyzers developed for other well-studied languages, like English, are not workable for Urdu, due to their scriptic, morphological, and grammatical differences. As a result, this language should be studied as an independent problem domain. Our approach towards sentiment analysis is based on the identification and extraction of SentiUnits from the given text, using shallow parsing. SentiUnits are the expressions, which contain the sentiment information in a sentence. We use sentiment-annotated lexicon based approach. Unluckily, for Urdu language no such lexicon exists. So, a major part of this research consists in developing such a lexicon. Hence, this paper is presented as a base line for this colossal and complex task. Our goal is to highlight the linguistic (grammar and morphology) as well as technical aspects of this multidimensional research problem. The performance of the system is evaluated on multiple texts and the achieved results are quite satisfactory.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Pang, B., Lee, L.: Opinion mining and sentiment analysis. Foundation and Trends in Information Retrieval 2(1-2), 1–135 (2008)
Google Scholar
Bautin, M., Vijayarenu, L., Skiena, S.: International sentiment analysis for news and blogs. In: International Conference on Weblogs and Social Media, ICWSM (2008)
Google Scholar
Hatzivassiloglou, V., Wiebe, J.: Effects of Adjective Orientation and Gradability on Sentence Subjectivity. In: 18th International Conference on Computational Linguistics, New Brunswick, NJ (2000).
Google Scholar
Turney, P.: Thumbs Up or Thumbs Down? Semantic Orientation Applied to Unsupervised Classification of Reviews. In: ACL, Ph, PA, pp. 417–424 (July 2002)
Google Scholar
Riaz, K.: Challenges in Urdu Stemming. Future Directions in Information Access, Glasgow (August 2007)
Google Scholar
Akram, Q., Naseer, A., Hussain, S.: Assas-band, an Affix-Exception-List Based Urdu Stemmer. In: 7th Workshop on Asian Language Resources, IJCNLP 2009, Singapore (2009)
Google Scholar
Melville, P., Gryc, W., Lawrence, R.D.: Sentiment analysis of blogs by combining lexical knowledge with text classification. In: Conference on Knowledge Discovery and Data Mining (2009)
Google Scholar
Bloom, K., Argamon, S.: Unsupervised Extraction of Appraisal Expressions. In: Farzindar, A., Kešelj, V. (eds.) Canadian AI 2010. LNCS (LNAI), vol. 6085, pp. 290–294. Springer, Heidelberg (2010)
Chapter Google Scholar
Annet, M., Kondark, G.: A comparison of sentiment analysis techniques: Polarizing movie blogs. In: Bergler, S. (ed.) Canadian AI 2008. LNCS (LNAI), vol. 5032, pp. 25–35. Springer, Heidelberg (2008)
Chapter Google Scholar
Bloom, K., Argamon, S.: Automated learning of appraisal extraction patterns. In: Gries, S.T., Wulff, S., Davies, M. (eds.) Corpus Linguistic Applications: Current Studies, New Directions. Rodopi, Amsterdam (2009)
Google Scholar
Andreevskaia, A., Bergler, S.: Mining WordNet for fuzzy sentiment: Sentiment tag extraction from WordNet glosses. In: EACL 2006, Trent, Italy (2006)
Google Scholar
Mansour, Y., Mohri, M., Rostamizadeh, A.: Multiple source adaptation and the Renyi divergence. In: Uncertainty in Artificial Intelligence, UAI (2009)
Google Scholar
Tan, S., Cheng, Z., Wang, Y., Xu, H.: Adapting Naive Bayes to Domain Adaptation for Sentiment Analysis. In: Advances in Information Retrieval, vol. 5478, pp. 337–349 (2009)
Google Scholar
Bansal, M., Cardi, C., Lee, L.: The power of negative thinking: Exploring label disagreement in the min cut classification framework. In: International Conference in Computational Linguistics, COLING (2008)
Google Scholar
Hu, M., Lui, B.: Mining and summarizing customer reviews. In: Conference on Human Language Technology and Empirical Methods in Natural Language Processing (2005)
Google Scholar
Whitelaw, C., Garg, N., Argamon, S.: Using appraisal taxonomies for sentiment analysis. In: SIGIR (2005)
Google Scholar
Na, J.-C., Sui, H., Khoo, C., Chan, S., Zhou, Y.: Effectiveness of simple linguistic processing in automatic sentiment classification of product reviews. In: Conference of the International Society of Knowledge Organization (ISKO), pp. 49–54 (2004)
Google Scholar
Muaz, A., Khan, A.: The morphosyntactic behavior of ‘Wala’ in Urdu Language. In: 28th Annual Meeting of the South Asian Language Analysis Roundtable, SALA 2009, University of North Texas, US (2009)
Google Scholar
Durrani, N., Hussain, S.: Urdu Word Segmentation. In: 11th Annual Conference of the North American Chapter of the Association for Computational Linguistics (NAACL HLT 2010), Los Angeles, US (2010)
Google Scholar
Riaz, K.: Stop Word Identification in Urdu. In: Conference of Language and Technology, Bara Gali, Pakistan (August 2007)
Google Scholar
Ijaz, M., Hussain, S.: Corpus based Urdu Lexicon Development. In: Conference on Language Technology (CLT 2007), University of Peshawar, Pakistan (2007)
Google Scholar
Schmidt, R.: Urdu: An Essential Grammar. Routlege Publishing, New York (2000)
Google Scholar

Download references

Author information

Authors and Affiliations

Department of CS & E, U.E.T., Lahore, Pakistan
Afraz Z. Syed & Muhammad Aslam
Department of CS, CINVESTAV-IPN, D.F., Mexico
Ana Maria Martinez-Enriquez

Authors

Afraz Z. Syed
View author publications
You can also search for this author in PubMed Google Scholar
Muhammad Aslam
View author publications
You can also search for this author in PubMed Google Scholar
Ana Maria Martinez-Enriquez
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Instituto Politécnico Nacional, Centro de Investigación en Computación, Av. Juan Dios Batiz, s/n, Zacatenco, 07738, Mexico City, México
Grigori Sidorov
Area de Computación, Centro de Investigación en Matemáticas (CIMAT), Callejón de Jalisco s/n, Mineral de Valenciana, 36240, Guanajuato, México
Arturo Hernández Aguirre
Instituto Nacional de Astrofísica, Optica y Electrónica (INAOE), Ciencias Computacionales, Luis Enrique Erro No. 1, 72840, Santa María Tonantzintla, Puebla,, México
Carlos Alberto Reyes García

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Syed, A.Z., Aslam, M., Martinez-Enriquez, A.M. (2010). Lexicon Based Sentiment Analysis of Urdu Text Using SentiUnits. In: Sidorov, G., Hernández Aguirre, A., Reyes García, C.A. (eds) Advances in Artificial Intelligence. MICAI 2010. Lecture Notes in Computer Science(), vol 6437. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-16761-4_4

Download citation

DOI: https://doi.org/10.1007/978-3-642-16761-4_4
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-16760-7
Online ISBN: 978-3-642-16761-4
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics