Abstract
The lexicon-based approach is used for sentiment analysis of Urdu. In the lexicon, apart from the traditional approach of having adjectives, nouns and negations we have also included verbs, intensifiers and context-dependent words. An effective Urdu sentiment analyzer is developed that applies rules and make use of this new lexicon and perform Urdu sentiment analysis by classifying sentences as positive, negative or neutral. Evaluating this Urdu sentiment analyzer, by using sentences from Urdu blogs, yields the most promising results so far in Urdu language with 89.03% accuracy with 0.86 precision, 0.90 recall and 0.88 F-measure. Results are evaluated using kappa statistics as well. The comparison with the previous work in Urdu shows that the combination of this Urdu sentiment lexicon and Urdu sentiment analyzer is much more effective than the previous such combinations. The main reason for increased efficiency is the development of wide coverage lexicon and effective handling of negations, intensifiers and context-dependent words by the Urdu sentiment analyzer.
Similar content being viewed by others
References
Baccianella S, Esuli A, Sebastiani F (2010) SentiWordNet 3.0: an enhanced lexical resource for sentiment analysis and opinion mining. In: LREC, Valletta, Malta, pp 2200–2204
Benamara F, Cesarano C, Picariello A, Reforgiato D, Subrahmanian V (2007) Sentiment analysis: adjectives and adverbs are better than adjectives alone. In: ICWSM’07, pp 203–206
Chen Y, Skiena S (2014) Building sentiment lexicons for all major languages. In: ACL, pp 383–389
Choi Y, Wiebe J (2014) ± effectWordNet: sense-level lexicon acquisition for opinion inference. In: EMNLP (ed) 2014 Conference on empirical methods in Natural Language Processing (EMNLP), Doha, Qatar. Association for Computational Linguistics, pp 1181–1191
Dashtipour K, Poria S, Hussain A, Cambria E, Hawalah AYA, Gelbukh A, Zhou Q (2016) Multilingual sentiment analysis: state of the art and independent comparison of techniques. Cogn Comput 8:757–771
Daud M, Khan R, Duad A (2014) Roman Urdu opinion mining system (RUOMiS). CSEIJ 4:1–9
Dave K, Lawrence S, Pennock DM (2003) Mining the peanut gallery: opinion extraction and semantic classification of product reviews. In: 12th International World Wide Web conference (WWW 2003), Budapest, pp 519–528
Devitt A, Ahmad K (2007) Sentiment polarity identification in financial news: a cohesion based approach. In: ACL-07, pp 984–991
Ding X, Liu B, Yu PS (2008) A holistic lexicon-based approach to opinion mining, WSDM’08. In: International conference on web search and web data mining, ACM, pp 231–240
Dragut EC, Yu CT, Sistla AP, Meng W (2010) Construction of a sentimental word dictionary. In: CIKM, pp 1761–1764
Dragut EC, Wang H, Yu C, Sistla P, Meng W (2012) Polarity consistency checking for sentiment dictionaries. In: 50th Annual meeting of the association for computational linguistics, pp 997–1005
Feng S, Kang JS, Kuznetsova P, Choi Y (2013) Connotation lexicon: a dash of sentiment beneath the surface meaning. In: 51st Annual meeting of the association for computational linguistics, ACL-2013, pp 1774–1784
Hardie A (2003) Developing a tag-set for automated part-of-speech tagging in Urdu. In: Archer D, Rayson P, Wilson A, McEnery T (eds) Corpus Linguistics 2003 conference, Department of Linguistics, Lancaster University, UK, pp 298–307
Hassan A, Radev D (2010) Identifying text polarity using random walks. In: 48th Annual meeting of the association for computational linguistics, ACL’10, Stroudsburg, PA, USA. Association for Computational Linguistics, pp 395–403
Hatzivassiloglou V, McKeown KR (1997) Predicting the semantic orientation of adjectives. In: 5th Association for computational linguistics and 8th conference of the European Chapter of the Association for Computational Linguistics, Stroudsburg, pp 174–181
Hatzivassiloglou V, Wiebe JM (2000) Effects of adjective orientation and gradability on sentence subjectivity. In: 18th International conference on computational linguistics, New Brunswick, pp 299–305
Humayoun M, Hammarström H, Ranta A (2007) Urdu morphology, orthography and lexicon extraction. In: 2nd Workshop on computational approaches to Arabic script-based languages, Stanford, USA, pp 59–66
Ibrahim HS, Abdou SM, Gheith M (2015) Automatic expandable large-scale sentiment lexicon of Modern Standard Arabic and Colloquial. In: 16th International conference on intelligent text processing and computational linguistics (CICLING), Cairo, Egypt, pp 94–99
Ijaz M, Hussain S (2007) Corpus based Urdu Lexicon Development. In: Conference on language technology (CLT 2007), University of Peshawar, Pakistan, pp 85–94
Javed I, Afzal H (2013) Opinion analysis of bi-lingual event data from social networks. In: ESSEM, Italy, pp 164–172
Jia L, Yu C, Meng W (2009) The effect of negation on sentiment analysis and retrieval effectiveness. In: 18th ACM conference on Information and knowledge management, Hong Kong, China. ACM, pp 1827–1830
Kaji N, Kitsuregawa M (2007) Building lexicon for sentiment analysis from massive collection of HTML documents. In: Joint conference on empirical methods in NLP and computational natural language learning (EMNLP-CoNLL), Association for Computational Linguistics, pp 1075–1083
Kamps, Marx M, Mokken RJ, Rijke MD (2004) Using WordNet to measure semantic orientation of adjectives. In: 4th International conference on language resources and evaluation, Lisbon, Portugal, pp 1115–1118
Kanayama H, Nasukawa T (2006) Fully automatic lexicon expansion for domain-oriented sentiment analysis. In: Conference on empirical methods in NLP, Association for Computational Linguistics, pp 355–363
Kaushik C, Mishra A (2014) A scalable, lexicon based technique for sentiment analysis. Int J Found Comput Sci Technol 4:35–43
Kennedy A, Inkpen D (2006) Sentiment classification of movie reviews using contextual valence shifters. Comput Intell 22:110–125
Kim SM, Hovy E (2004) Determining the sentiment of opinions. In: COLING-04, 20th international conference on computational linguistics, pp 1367–1373
Kim SM, Hovy E (2006) Automatic identification of pro and con reasons in online reviews. In: COLING, Sydney, pp 483–490
Klebanov BB, Madnani N, Burstein J (2013) Using pivot-based paraphrasing and sentiment profiles to improve a subjectivity lexicon for essay data. In: ACL, pp 99–110
Li N, Wu DD (2010) Using text mining and sentiment analysis for online forums hotspot detection and forecast. Decis Support Syst 48:354–368
Li Y, Pan Q, Yang T, Wang S, Tang J, Cambria E (2017) Learning word representations for sentiment analysis. Cogn Comput 9:843–851
Lizhen L, Wei S, Hanshi W, Chuchu L, Jingli L (2014) A novel feature-based method for sentiment analysis of Chinese product reviews. China Commun 11:154–164
Lo SL, Cambria E, Chiong R, Cornforth D (2017) Multilingual sentiment analysis: from formal to informal and scarce resource languages. Artif Intell Rev 48:499–527
Lu Y, Castellanos M, Dayal U, Zhai C (2011) Automatic construction of a context-aware sentiment lexicon: an optimization approach. In: 20th International conference on World Wide Web (WWW), ACM, pp 347–356
Ma Y, Peng H, Cambria E (2018) Targeted aspect-based sentiment analysis via embedding commonsense knowledge into an attentive LSTM. In: Thirty-second AAAI conference on artificial intelligence (AAAI-18), pp 5876–5883
Matthew JK, Spencer G, Andrea Z (2015) Potential applications of sentiment analysis in educational research and practice—Is SITE the friendliest conference? In: DSGM (ed) Society for information technology and teacher education international conference. Association for the Advancement of Computing in Education (AACE), Chesapeake, VA, pp 1348–1354
McHugh M (2012) Interrater reliability: the kappa statistic. Biochem Med 22:276–282
Melville P, Gryc W, Lawrence R (2009) Sentiment analysis of blogs by combining lexical knowledge with text classification. In: 15th ACM SIGKDD international conference on knowledge discovery and data mining, ACM, pp 1275–1284
Miller G, Beckwith R, Fellbaum C, Gross D, Miller K (1990) Introduction to WordNet: an on-line lexical database. Int J Lexicogr 3:235–312
Mohammad S, Dunne C, Dorr B (2009) Generating high-coverage semantic orientation lexicons from overtly marked words and a thesaurus. In: Conference on empirical methods in Natural Language Processing, Singapore, pp 599–608
Muaz A, Ali A, Hussain S (2009) Analysis and development of Urdu POS tagged corpora. In: 7th Workshop on Asian language resources, ACLIJCNLP, Suntec, Singapore 2009, pp 24–31
Mukhtar N, Khan MA, Chiragh N (2017) Effective use of evaluation measures for the validation of best classifier in Urdu sentiment analysis. Cogn Comput 9:446–456. https://doi.org/10.1007/s12559-017-9481-5
Mukhtar N, Khan MA, Chiragh N (2018a) Lexicon-based approach outperforms supervised machine learning approach for Urdu sentiment analysis in multiple domains. Telematics Inform 35:2173–2183
Mukhtar N, Khan MA, Chiragh N, Nazir S (2018b) Identification and handling of intensifiers for enhancing accuracy of Urdu sentiment analysis. Expert Syst 35:1–12
Mukund S, Srihari R (2010) An information–extraction system for Urdu—a resource poor language. ACM Trans Asian Lang Inf Process 9:1–43
Mullen T, Collier N (2004) Sentiment analysis using support vector machines with diverse information sources. In: Empirical methods in Natural Language Processing, Barcelona, pp 412–418
Palogiannidi E et al (2016) Tweester at SemEval-2016 Task 4: sentiment analysis in Twitter using semantic-affective model adaptation. In: 10th International workshop on semantic evaluation (SemEval 2016), San Diego, US, pp 160–168
Pang B, Lee L (2008) Opinion mining and sentiment analysis. Found Trends Inf Retr 2:1–135
Polanyi L, Zaenen A (2004) Contextual valence shifters. In: AAAI spring symposium on exploring attitude and affect in text, California, USA, pp 106–111
Polanyi L, Zaenen A (2006) Contextual valence shifters. In: Shanahan JG, Qu Y, Wiebe J (eds) Computing attitude and affect in text: theory and applications. Springer, Dordrecht
Poriaa S, Cambria E, Gelbukh A (2016) Aspect extraction for opinion mining with a deep convolutional neural network. Knowl-Based Syst 108:42–49
Prabu P, Yadav V, Elchuri H (2013) Serendio: simple and practical lexicon based approach to sentiment analysis. In: Second joint conference on lexical and computational semantics (*SEM), Seventh international workshop on semantic evaluation (SemEval 2013), Association for Computational Linguistics Atlanta, Georgia, pp 543–548
Rehman ZU, Bajwa IS (2016) Lexicon-based sentiment analysis for Urdu language. In: Sixth international conference on innovative computing technology (INTECH 2016), pp 497–501
Riloff E, Wiebe J, Wilson T (2003) Learning subjective nouns using extraction pattern bootstrapping. In: 7th Conference on natural language learning, Edmonton, pp 25–32
Saifa H, Heb Y, Fernandeza M, Alania H (2016) Contextual semantics for sentiment analysis of Twitter. Inf Process Manage 52:5–19
Siegel S, John Castellan N (1988) Nonparametric statistics for the behavioral sciences, 2nd edn. McGraw-Hill, New York
Stone PJ, Hunt EB (1963) A computer approach to content analysis: studies using the general inquirer system. In: AFIPS spring joint computer conference, pp 241–256
Syed AZ, Muhammad A, Enríquez AMM (2010) Lexicon based sentiment analysis of Urdu text using SentiUnits. In: Proceedings of the 9th Mexican international conference of artificial intelligence, MICAI. Springer, Berlin, pp 32–43
Syed AZ, Muhammad A, Enríquez AMM (2011) Adjectival phrases as the sentiment carriers in Urdu. J Am Sci 7:644–652
Syed AZ, Muhammad A, Enríquez AMM (2014) Associating targets with SentiUnits: a step forward in sentiment analysis of Urdu text. Artif Intell Rev 41:535–561
Taboada M, Brooke J, Tofilosk M, Voll K, Stede M (2011) Lexicon based methods for sentiment analysis. Comput Linguist 37:267–307
Takamura H, Inui T, Okumura M (2005) Extracting semantic orientations of words using spin model. In: ACL 2005, pp 133–140
Turney PD (2002) Thumbs up or thumbs down? Semantic orientation applied to unsupervised classification of reviews. In: 40th Annual meeting on association for computational linguistics, Association for Computational Linguistics, Morristown, NJ, USA, pp 417–424
Turney P, Littman M (2003) Measuring praise and criticism: inference of semantic orientation from association. ACM Trans Inf Syst 21:315–346
Viera AJ, Garrett JM (2005) Understanding inter observer agreement: the kappa statistic. Family Med 37:360–363
Weichselbraun A, Gindl S, Scharl A (2013) Extracting and grounding contextualized sentiment lexicons. In: IEEE 2013, pp 39–46
Whitelaw C, Garg N, Argamon S (2005) Using appraisal groups for sentiment analysis. In: ACM SIGIR conference on information and knowledge management (CIKM 2005), Bremen, pp 625–631
Wiebe J, Wilson T, Bruce R, Bell M, Martin M (2004) Learning subjective language. Comput Linguist 30:277–308
Wilson T, Wiebe J, Hoffmann P (2005) Recognizing contextual polarity in phrase-level sentiment analysis. In: Association for Computational Linguistics (ed) Human language technology and empirical methods in Natural Language Processing, Canada, pp 347–354
Wu Y, Wen M (2010) Disambiguating dynamic sentiment ambiguous adjectives. In: COLING 2010, pp 1191–1199
Xie S, Wang TJ (2014) Construction of unsupervised sentiment classifier on idioms resources. J Cent South Univ 21:1376–1384
Ye Z, Li F, Baldwin T (2018) Encoding sentiment information into word vectors for sentiment analysis. In: Proceedings of the 27th international conference on computational linguistics, Santa Fe, New Mexico, USA, pp 997–1007
Yu H, Deng Z-H, Li S (2013) Identifying sentiment words using an optimization-based model without seed words. In: ACL, pp 855–859
Zhang L, Wang S, Liu B (2018) Deep learning for sentiment analysis: a survey. Wiley Interdiscip Rev Data Min Knowl Discov 8:e1253
Acknowledgements
The authors are very thankful to Mr. Al-Gaili for his valuable discussions, suggestions and help throughout this research.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Mukhtar, N., Khan, M.A. Effective lexicon-based approach for Urdu sentiment analysis. Artif Intell Rev 53, 2521–2548 (2020). https://doi.org/10.1007/s10462-019-09740-5
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10462-019-09740-5