Abstract
Like other languages, Urdu websites are becoming more popular, because the people prefer to share opinions and express sentiments in their own language. Sentiment analyzers developed for other well-studied languages, like English, are not workable for Urdu, due to their scriptic, morphological, and grammatical differences. As a result, this language should be studied as an independent problem domain. Our approach towards sentiment analysis is based on the identification and extraction of SentiUnits from the given text, using shallow parsing. SentiUnits are the expressions, which contain the sentiment information in a sentence. We use sentiment-annotated lexicon based approach. Unluckily, for Urdu language no such lexicon exists. So, a major part of this research consists in developing such a lexicon. Hence, this paper is presented as a base line for this colossal and complex task. Our goal is to highlight the linguistic (grammar and morphology) as well as technical aspects of this multidimensional research problem. The performance of the system is evaluated on multiple texts and the achieved results are quite satisfactory.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
References
Pang, B., Lee, L.: Opinion mining and sentiment analysis. Foundation and Trends in Information Retrieval 2(1-2), 1–135 (2008)
Bautin, M., Vijayarenu, L., Skiena, S.: International sentiment analysis for news and blogs. In: International Conference on Weblogs and Social Media, ICWSM (2008)
Hatzivassiloglou, V., Wiebe, J.: Effects of Adjective Orientation and Gradability on Sentence Subjectivity. In: 18th International Conference on Computational Linguistics, New Brunswick, NJ (2000).
Turney, P.: Thumbs Up or Thumbs Down? Semantic Orientation Applied to Unsupervised Classification of Reviews. In: ACL, Ph, PA, pp. 417–424 (July 2002)
Riaz, K.: Challenges in Urdu Stemming. Future Directions in Information Access, Glasgow (August 2007)
Akram, Q., Naseer, A., Hussain, S.: Assas-band, an Affix-Exception-List Based Urdu Stemmer. In: 7th Workshop on Asian Language Resources, IJCNLP 2009, Singapore (2009)
Melville, P., Gryc, W., Lawrence, R.D.: Sentiment analysis of blogs by combining lexical knowledge with text classification. In: Conference on Knowledge Discovery and Data Mining (2009)
Bloom, K., Argamon, S.: Unsupervised Extraction of Appraisal Expressions. In: Farzindar, A., Kešelj, V. (eds.) Canadian AI 2010. LNCS (LNAI), vol. 6085, pp. 290–294. Springer, Heidelberg (2010)
Annet, M., Kondark, G.: A comparison of sentiment analysis techniques: Polarizing movie blogs. In: Bergler, S. (ed.) Canadian AI 2008. LNCS (LNAI), vol. 5032, pp. 25–35. Springer, Heidelberg (2008)
Bloom, K., Argamon, S.: Automated learning of appraisal extraction patterns. In: Gries, S.T., Wulff, S., Davies, M. (eds.) Corpus Linguistic Applications: Current Studies, New Directions. Rodopi, Amsterdam (2009)
Andreevskaia, A., Bergler, S.: Mining WordNet for fuzzy sentiment: Sentiment tag extraction from WordNet glosses. In: EACL 2006, Trent, Italy (2006)
Mansour, Y., Mohri, M., Rostamizadeh, A.: Multiple source adaptation and the Renyi divergence. In: Uncertainty in Artificial Intelligence, UAI (2009)
Tan, S., Cheng, Z., Wang, Y., Xu, H.: Adapting Naive Bayes to Domain Adaptation for Sentiment Analysis. In: Advances in Information Retrieval, vol. 5478, pp. 337–349 (2009)
Bansal, M., Cardi, C., Lee, L.: The power of negative thinking: Exploring label disagreement in the min cut classification framework. In: International Conference in Computational Linguistics, COLING (2008)
Hu, M., Lui, B.: Mining and summarizing customer reviews. In: Conference on Human Language Technology and Empirical Methods in Natural Language Processing (2005)
Whitelaw, C., Garg, N., Argamon, S.: Using appraisal taxonomies for sentiment analysis. In: SIGIR (2005)
Na, J.-C., Sui, H., Khoo, C., Chan, S., Zhou, Y.: Effectiveness of simple linguistic processing in automatic sentiment classification of product reviews. In: Conference of the International Society of Knowledge Organization (ISKO), pp. 49–54 (2004)
Muaz, A., Khan, A.: The morphosyntactic behavior of ‘Wala’ in Urdu Language. In: 28th Annual Meeting of the South Asian Language Analysis Roundtable, SALA 2009, University of North Texas, US (2009)
Durrani, N., Hussain, S.: Urdu Word Segmentation. In: 11th Annual Conference of the North American Chapter of the Association for Computational Linguistics (NAACL HLT 2010), Los Angeles, US (2010)
Riaz, K.: Stop Word Identification in Urdu. In: Conference of Language and Technology, Bara Gali, Pakistan (August 2007)
Ijaz, M., Hussain, S.: Corpus based Urdu Lexicon Development. In: Conference on Language Technology (CLT 2007), University of Peshawar, Pakistan (2007)
Schmidt, R.: Urdu: An Essential Grammar. Routlege Publishing, New York (2000)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2010 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Syed, A.Z., Aslam, M., Martinez-Enriquez, A.M. (2010). Lexicon Based Sentiment Analysis of Urdu Text Using SentiUnits. In: Sidorov, G., Hernández Aguirre, A., Reyes García, C.A. (eds) Advances in Artificial Intelligence. MICAI 2010. Lecture Notes in Computer Science(), vol 6437. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-16761-4_4
Download citation
DOI: https://doi.org/10.1007/978-3-642-16761-4_4
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-16760-7
Online ISBN: 978-3-642-16761-4
eBook Packages: Computer ScienceComputer Science (R0)