Skip to main content
Log in

Effective lexicon-based approach for Urdu sentiment analysis

  • Published:
Artificial Intelligence Review Aims and scope Submit manuscript

Abstract

The lexicon-based approach is used for sentiment analysis of Urdu. In the lexicon, apart from the traditional approach of having adjectives, nouns and negations we have also included verbs, intensifiers and context-dependent words. An effective Urdu sentiment analyzer is developed that applies rules and make use of this new lexicon and perform Urdu sentiment analysis by classifying sentences as positive, negative or neutral. Evaluating this Urdu sentiment analyzer, by using sentences from Urdu blogs, yields the most promising results so far in Urdu language with 89.03% accuracy with 0.86 precision, 0.90 recall and 0.88 F-measure. Results are evaluated using kappa statistics as well. The comparison with the previous work in Urdu shows that the combination of this Urdu sentiment lexicon and Urdu sentiment analyzer is much more effective than the previous such combinations. The main reason for increased efficiency is the development of wide coverage lexicon and effective handling of negations, intensifiers and context-dependent words by the Urdu sentiment analyzer.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4

Similar content being viewed by others

Notes

  1. http://urdulughat.info/.

  2. http://www.cle.org.pk/.

  3. http://www.cle.org.pk/software/langproc/POStagset.htm.

References

  • Baccianella S, Esuli A, Sebastiani F (2010) SentiWordNet 3.0: an enhanced lexical resource for sentiment analysis and opinion mining. In: LREC, Valletta, Malta, pp 2200–2204

  • Benamara F, Cesarano C, Picariello A, Reforgiato D, Subrahmanian V (2007) Sentiment analysis: adjectives and adverbs are better than adjectives alone. In: ICWSM’07, pp 203–206

  • Chen Y, Skiena S (2014) Building sentiment lexicons for all major languages. In: ACL, pp 383–389

  • Choi Y, Wiebe J (2014) ± effectWordNet: sense-level lexicon acquisition for opinion inference. In: EMNLP (ed) 2014 Conference on empirical methods in Natural Language Processing (EMNLP), Doha, Qatar. Association for Computational Linguistics, pp 1181–1191

  • Dashtipour K, Poria S, Hussain A, Cambria E, Hawalah AYA, Gelbukh A, Zhou Q (2016) Multilingual sentiment analysis: state of the art and independent comparison of techniques. Cogn Comput 8:757–771

    Article  Google Scholar 

  • Daud M, Khan R, Duad A (2014) Roman Urdu opinion mining system (RUOMiS). CSEIJ 4:1–9

    Article  Google Scholar 

  • Dave K, Lawrence S, Pennock DM (2003) Mining the peanut gallery: opinion extraction and semantic classification of product reviews. In: 12th International World Wide Web conference (WWW 2003), Budapest, pp 519–528

  • Devitt A, Ahmad K (2007) Sentiment polarity identification in financial news: a cohesion based approach. In: ACL-07, pp 984–991

  • Ding X, Liu B, Yu PS (2008) A holistic lexicon-based approach to opinion mining, WSDM’08. In: International conference on web search and web data mining, ACM, pp 231–240

  • Dragut EC, Yu CT, Sistla AP, Meng W (2010) Construction of a sentimental word dictionary. In: CIKM, pp 1761–1764

  • Dragut EC, Wang H, Yu C, Sistla P, Meng W (2012) Polarity consistency checking for sentiment dictionaries. In: 50th Annual meeting of the association for computational linguistics, pp 997–1005

  • Feng S, Kang JS, Kuznetsova P, Choi Y (2013) Connotation lexicon: a dash of sentiment beneath the surface meaning. In: 51st Annual meeting of the association for computational linguistics, ACL-2013, pp 1774–1784

  • Hardie A (2003) Developing a tag-set for automated part-of-speech tagging in Urdu. In: Archer D, Rayson P, Wilson A, McEnery T (eds) Corpus Linguistics 2003 conference, Department of Linguistics, Lancaster University, UK, pp 298–307

  • Hassan A, Radev D (2010) Identifying text polarity using random walks. In: 48th Annual meeting of the association for computational linguistics, ACL’10, Stroudsburg, PA, USA. Association for Computational Linguistics, pp 395–403

  • Hatzivassiloglou V, McKeown KR (1997) Predicting the semantic orientation of adjectives. In: 5th Association for computational linguistics and 8th conference of the European Chapter of the Association for Computational Linguistics, Stroudsburg, pp 174–181

  • Hatzivassiloglou V, Wiebe JM (2000) Effects of adjective orientation and gradability on sentence subjectivity. In: 18th International conference on computational linguistics, New Brunswick, pp 299–305

  • Humayoun M, Hammarström H, Ranta A (2007) Urdu morphology, orthography and lexicon extraction. In: 2nd Workshop on computational approaches to Arabic script-based languages, Stanford, USA, pp 59–66

  • Ibrahim HS, Abdou SM, Gheith M (2015) Automatic expandable large-scale sentiment lexicon of Modern Standard Arabic and Colloquial. In: 16th International conference on intelligent text processing and computational linguistics (CICLING), Cairo, Egypt, pp 94–99

  • Ijaz M, Hussain S (2007) Corpus based Urdu Lexicon Development. In: Conference on language technology (CLT 2007), University of Peshawar, Pakistan, pp 85–94

  • Javed I, Afzal H (2013) Opinion analysis of bi-lingual event data from social networks. In: ESSEM, Italy, pp 164–172

  • Jia L, Yu C, Meng W (2009) The effect of negation on sentiment analysis and retrieval effectiveness. In: 18th ACM conference on Information and knowledge management, Hong Kong, China. ACM, pp 1827–1830

  • Kaji N, Kitsuregawa M (2007) Building lexicon for sentiment analysis from massive collection of HTML documents. In: Joint conference on empirical methods in NLP and computational natural language learning (EMNLP-CoNLL), Association for Computational Linguistics, pp 1075–1083

  • Kamps, Marx M, Mokken RJ, Rijke MD (2004) Using WordNet to measure semantic orientation of adjectives. In: 4th International conference on language resources and evaluation, Lisbon, Portugal, pp 1115–1118

  • Kanayama H, Nasukawa T (2006) Fully automatic lexicon expansion for domain-oriented sentiment analysis. In: Conference on empirical methods in NLP, Association for Computational Linguistics, pp 355–363

  • Kaushik C, Mishra A (2014) A scalable, lexicon based technique for sentiment analysis. Int J Found Comput Sci Technol 4:35–43

    Article  Google Scholar 

  • Kennedy A, Inkpen D (2006) Sentiment classification of movie reviews using contextual valence shifters. Comput Intell 22:110–125

    Article  MathSciNet  Google Scholar 

  • Kim SM, Hovy E (2004) Determining the sentiment of opinions. In: COLING-04, 20th international conference on computational linguistics, pp 1367–1373

  • Kim SM, Hovy E (2006) Automatic identification of pro and con reasons in online reviews. In: COLING, Sydney, pp 483–490

  • Klebanov BB, Madnani N, Burstein J (2013) Using pivot-based paraphrasing and sentiment profiles to improve a subjectivity lexicon for essay data. In: ACL, pp 99–110

  • Li N, Wu DD (2010) Using text mining and sentiment analysis for online forums hotspot detection and forecast. Decis Support Syst 48:354–368

    Article  Google Scholar 

  • Li Y, Pan Q, Yang T, Wang S, Tang J, Cambria E (2017) Learning word representations for sentiment analysis. Cogn Comput 9:843–851

    Article  Google Scholar 

  • Lizhen L, Wei S, Hanshi W, Chuchu L, Jingli L (2014) A novel feature-based method for sentiment analysis of Chinese product reviews. China Commun 11:154–164

    Article  Google Scholar 

  • Lo SL, Cambria E, Chiong R, Cornforth D (2017) Multilingual sentiment analysis: from formal to informal and scarce resource languages. Artif Intell Rev 48:499–527

    Article  Google Scholar 

  • Lu Y, Castellanos M, Dayal U, Zhai C (2011) Automatic construction of a context-aware sentiment lexicon: an optimization approach. In: 20th International conference on World Wide Web (WWW), ACM, pp 347–356

  • Ma Y, Peng H, Cambria E (2018) Targeted aspect-based sentiment analysis via embedding commonsense knowledge into an attentive LSTM. In: Thirty-second AAAI conference on artificial intelligence (AAAI-18), pp 5876–5883

  • Matthew JK, Spencer G, Andrea Z (2015) Potential applications of sentiment analysis in educational research and practice—Is SITE the friendliest conference? In: DSGM (ed) Society for information technology and teacher education international conference. Association for the Advancement of Computing in Education (AACE), Chesapeake, VA, pp 1348–1354

  • McHugh M (2012) Interrater reliability: the kappa statistic. Biochem Med 22:276–282

    Article  Google Scholar 

  • Melville P, Gryc W, Lawrence R (2009) Sentiment analysis of blogs by combining lexical knowledge with text classification. In: 15th ACM SIGKDD international conference on knowledge discovery and data mining, ACM, pp 1275–1284

  • Miller G, Beckwith R, Fellbaum C, Gross D, Miller K (1990) Introduction to WordNet: an on-line lexical database. Int J Lexicogr 3:235–312

    Article  Google Scholar 

  • Mohammad S, Dunne C, Dorr B (2009) Generating high-coverage semantic orientation lexicons from overtly marked words and a thesaurus. In: Conference on empirical methods in Natural Language Processing, Singapore, pp 599–608

  • Muaz A, Ali A, Hussain S (2009) Analysis and development of Urdu POS tagged corpora. In: 7th Workshop on Asian language resources, ACLIJCNLP, Suntec, Singapore 2009, pp 24–31

  • Mukhtar N, Khan MA, Chiragh N (2017) Effective use of evaluation measures for the validation of best classifier in Urdu sentiment analysis. Cogn Comput 9:446–456. https://doi.org/10.1007/s12559-017-9481-5

    Article  Google Scholar 

  • Mukhtar N, Khan MA, Chiragh N (2018a) Lexicon-based approach outperforms supervised machine learning approach for Urdu sentiment analysis in multiple domains. Telematics Inform 35:2173–2183

    Article  Google Scholar 

  • Mukhtar N, Khan MA, Chiragh N, Nazir S (2018b) Identification and handling of intensifiers for enhancing accuracy of Urdu sentiment analysis. Expert Syst 35:1–12

    Article  Google Scholar 

  • Mukund S, Srihari R (2010) An information–extraction system for Urdu—a resource poor language. ACM Trans Asian Lang Inf Process 9:1–43

    Article  Google Scholar 

  • Mullen T, Collier N (2004) Sentiment analysis using support vector machines with diverse information sources. In: Empirical methods in Natural Language Processing, Barcelona, pp 412–418

  • Palogiannidi E et al (2016) Tweester at SemEval-2016 Task 4: sentiment analysis in Twitter using semantic-affective model adaptation. In: 10th International workshop on semantic evaluation (SemEval 2016), San Diego, US, pp 160–168

  • Pang B, Lee L (2008) Opinion mining and sentiment analysis. Found Trends Inf Retr 2:1–135

    Article  Google Scholar 

  • Polanyi L, Zaenen A (2004) Contextual valence shifters. In: AAAI spring symposium on exploring attitude and affect in text, California, USA, pp 106–111

  • Polanyi L, Zaenen A (2006) Contextual valence shifters. In: Shanahan JG, Qu Y, Wiebe J (eds) Computing attitude and affect in text: theory and applications. Springer, Dordrecht

    Google Scholar 

  • Poriaa S, Cambria E, Gelbukh A (2016) Aspect extraction for opinion mining with a deep convolutional neural network. Knowl-Based Syst 108:42–49

    Article  Google Scholar 

  • Prabu P, Yadav V, Elchuri H (2013) Serendio: simple and practical lexicon based approach to sentiment analysis. In: Second joint conference on lexical and computational semantics (*SEM), Seventh international workshop on semantic evaluation (SemEval 2013), Association for Computational Linguistics Atlanta, Georgia, pp 543–548

  • Rehman ZU, Bajwa IS (2016) Lexicon-based sentiment analysis for Urdu language. In: Sixth international conference on innovative computing technology (INTECH 2016), pp 497–501

  • Riloff E, Wiebe J, Wilson T (2003) Learning subjective nouns using extraction pattern bootstrapping. In: 7th Conference on natural language learning, Edmonton, pp 25–32

  • Saifa H, Heb Y, Fernandeza M, Alania H (2016) Contextual semantics for sentiment analysis of Twitter. Inf Process Manage 52:5–19

    Article  Google Scholar 

  • Siegel S, John Castellan N (1988) Nonparametric statistics for the behavioral sciences, 2nd edn. McGraw-Hill, New York

    Google Scholar 

  • Stone PJ, Hunt EB (1963) A computer approach to content analysis: studies using the general inquirer system. In: AFIPS spring joint computer conference, pp 241–256

  • Syed AZ, Muhammad A, Enríquez AMM (2010) Lexicon based sentiment analysis of Urdu text using SentiUnits. In: Proceedings of the 9th Mexican international conference of artificial intelligence, MICAI. Springer, Berlin, pp 32–43

  • Syed AZ, Muhammad A, Enríquez AMM (2011) Adjectival phrases as the sentiment carriers in Urdu. J Am Sci 7:644–652

    Google Scholar 

  • Syed AZ, Muhammad A, Enríquez AMM (2014) Associating targets with SentiUnits: a step forward in sentiment analysis of Urdu text. Artif Intell Rev 41:535–561

    Article  Google Scholar 

  • Taboada M, Brooke J, Tofilosk M, Voll K, Stede M (2011) Lexicon based methods for sentiment analysis. Comput Linguist 37:267–307

    Article  Google Scholar 

  • Takamura H, Inui T, Okumura M (2005) Extracting semantic orientations of words using spin model. In: ACL 2005, pp 133–140

  • Turney PD (2002) Thumbs up or thumbs down? Semantic orientation applied to unsupervised classification of reviews. In: 40th Annual meeting on association for computational linguistics, Association for Computational Linguistics, Morristown, NJ, USA, pp 417–424

  • Turney P, Littman M (2003) Measuring praise and criticism: inference of semantic orientation from association. ACM Trans Inf Syst 21:315–346

    Article  Google Scholar 

  • Viera AJ, Garrett JM (2005) Understanding inter observer agreement: the kappa statistic. Family Med 37:360–363

    Google Scholar 

  • Weichselbraun A, Gindl S, Scharl A (2013) Extracting and grounding contextualized sentiment lexicons. In: IEEE 2013, pp 39–46

  • Whitelaw C, Garg N, Argamon S (2005) Using appraisal groups for sentiment analysis. In: ACM SIGIR conference on information and knowledge management (CIKM 2005), Bremen, pp 625–631

  • Wiebe J, Wilson T, Bruce R, Bell M, Martin M (2004) Learning subjective language. Comput Linguist 30:277–308

    Article  Google Scholar 

  • Wilson T, Wiebe J, Hoffmann P (2005) Recognizing contextual polarity in phrase-level sentiment analysis. In: Association for Computational Linguistics (ed) Human language technology and empirical methods in Natural Language Processing, Canada, pp 347–354

  • Wu Y, Wen M (2010) Disambiguating dynamic sentiment ambiguous adjectives. In: COLING 2010, pp 1191–1199

  • Xie S, Wang TJ (2014) Construction of unsupervised sentiment classifier on idioms resources. J Cent South Univ 21:1376–1384

    Article  Google Scholar 

  • Ye Z, Li F, Baldwin T (2018) Encoding sentiment information into word vectors for sentiment analysis. In: Proceedings of the 27th international conference on computational linguistics, Santa Fe, New Mexico, USA, pp 997–1007

  • Yu H, Deng Z-H, Li S (2013) Identifying sentiment words using an optimization-based model without seed words. In: ACL, pp 855–859

  • Zhang L, Wang S, Liu B (2018) Deep learning for sentiment analysis: a survey. Wiley Interdiscip Rev Data Min Knowl Discov 8:e1253

    Article  Google Scholar 

Download references

Acknowledgements

The authors are very thankful to Mr. Al-Gaili for his valuable discussions, suggestions and help throughout this research.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Neelam Mukhtar.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Mukhtar, N., Khan, M.A. Effective lexicon-based approach for Urdu sentiment analysis. Artif Intell Rev 53, 2521–2548 (2020). https://doi.org/10.1007/s10462-019-09740-5

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10462-019-09740-5

Keywords

Navigation