Skip to main content
Log in

Exploiting Linguistic Features for Effective Sentence-Level Sentiment Analysis in Urdu Language

  • Published:
Multimedia Tools and Applications Aims and scope Submit manuscript

Abstract

Rapid increase in the use of social media has led to the generation of gigabytes of information shared by billions of users worldwide. To analyze this information and determine the behavior of people towards different events, sentiment analysis is widely used by researchers. Existing studies in Urdu sentiment analysis mostly use traditional n-gram features, which unlike linguistic features, do not focus on the contextual information being discussed. Moreover, no existing study classifies sentiments of proverbs and idioms which is challenging as mostly they do not contain sentiment words but carry strong sentiments. This study exploits linguistic features of Urdu language for sentence-level sentiment analysis and classifies idioms and proverbs using classical machine learning techniques. We develop a dataset comprising of idioms, proverbs, and sentences from the news domain, and extract part-of-speech tag-based features, boolean features, and numeric features from the dataset after keen linguistic analysis of Urdu language. Experimental results show that J48 classifier performs best in sentiment classification with an accuracy of 90% and an F-measure of 88%.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1

Similar content being viewed by others

Data Availability

Not applicable.

Code Availability

Not applicable.

Notes

  1. https://www.statista.com/statistics/264810/number-of-monthly-active-facebook-users-worldwide/

  2. http://www.urdu2eng.com/idioms.php

References

  1. Amjad K, Ishtiaq M, Firdous S, Mehmood MA (2017) Exploring twitter news biases using Urdu-based sentiment Lexicon. 11th International Conference on Open Source Systems & Technologies (ICOSST), IEEE, pp 48–53

  2. Abd-Elhamid L, Elzanfaly D, Eldin AS (2016) Feature-based sentiment analysis in online arabic reviews. 11th International Conference on Computer Engineering & Systems (ICCES), IEEE, pp 260–265

  3. Ali S, Smith KA (2006) On learning algorithm selection for classification. Appl Soft Comput 6(2):119–138

    Article  Google Scholar 

  4. Aziz S, Ullah S, Mughal B, Mushtaq F, Zahra S (2020) Roman Urdu sentiment analysis using machine learning with best parameters and comparative study of machine learning algorithms. Pakistan J Eng Technol 3(2):172–177

    Google Scholar 

  5. Benamara F, Cesarano C, Picariello A, Recupero D, Subrahmanian V (2007) Sentiment analysis: adjectives and adverbs are better than adjectives alone. 1st International Conference on Weblogs and Social Media (ICWSM), pp 203–206

  6. Daud A, Khan W, Che D (2017) Urdu language processing: a survey. Artif Intell Rev 3:279–311

    Article  Google Scholar 

  7. Feldman R (2013) Techniques and applications for sentiment analysis. Commun ACM 56(4):82–89

    Article  Google Scholar 

  8. Furnkranz J (1999) Separate-and-conquer rule learning. Artif Intell Rev 13:3–54

    Article  MATH  Google Scholar 

  9. Ghulam H, Zeng F, Li W, Xiao Y (2019) Deep learning-based sentiment analysis for Roman Urdu text. Procedia Comput Sci 147:131–135

    Article  Google Scholar 

  10. Glasmachers T, Igel C (2006) Maximum-gain working set selection for SVMs. J Mach Learn Res 7:1437–1466

    MathSciNet  MATH  Google Scholar 

  11. Han J, Micheline K, Jian P (2012) Data mining concepts and techniques, 3rd edn. Morgan Kaufmann, Cambridge, Massachusetts

    MATH  Google Scholar 

  12. Hashim F, Khan MA (2016) Sentence level sentiment analysis using Urdu nouns. 6th International Conference on Language and Technology, pp 101–108

  13. Ibrahim HS, Abdou SM, Gheith M (2015) Idioms-proverbs lexicon for modern standard arabic and colloquial sentiment analysis. Int J Comput Appl 118(11):26–31

    Google Scholar 

  14. Jawaid B, Kamran A, Bojar O (2014) A Tagged corpus and a tagger for Urdu. 9th International Conference on Language Resources and Evaluation (LREC), European Language Resources Association (ELRA), pp 2938–2943

  15. Khan L, Amjad A, Ashraf N, Chang H-T, Gelbukh A (2021) Urdu sentiment analysis with deep learning methods. IEEE Access, 9:97803-97812

  16. Khan M, Malik K (2018) Sentiment classification of customer’s reviews about automobiles in Roman Urdu. Future of Information and Communication Conference (FICC), Springer, pp 630–640

  17. Kaur G, Chhabra A (2014) Improved J48 classification algorithm for the prediction of diabetes. Int J Comput Appl 98(22):13–17

    Google Scholar 

  18. Khan L, Amjad A, Ashraf N, Chang H-T (2022) Multi-class sentiment analysis of urdu text using multilingual BERT. Sci Rep 12:1–17

    Google Scholar 

  19. Kohavi R (1995) The power of decision tables. 8th European Conference on Machine Learning, Springer, pp 174–189

  20. Kolkur S, Dantal G, Mahe R (2015) Study of different levels for Sentiment Analysis. Int J Curr Eng Technol 5(2):768–770

    Google Scholar 

  21. Mahmood Z, Safder I, Nawab R, Bukhari F, Nawaz R, Alfakeeh A, Aljohani N, Hassan S-U (2020) Deep sentiments in Roman Urdu text using recurrent convolutional neural network model. Inf Process Manag 57(4):102233

  22. Manuel F-D, Eva C, B. Sen´en and A. Dinani (2014) Do we Need Hundreds of Classifiers to Solve Real World. J Mach Learn Res 15:3133–3181

    MathSciNet  MATH  Google Scholar 

  23. Masood M, Azam F, Anwar MW, Rahman JU (2022) "Deep-learning based framework for sentiment analysis in Urdu language. 2nd International Conference on Digital Futures and Transformative Technologies (ICoDT2), IEEE, pp 1–7

  24. McHugh ML (2012) Interrater Reliability: The Kappa Statistic. Biochem Med 22(3):276–282

    Article  MathSciNet  Google Scholar 

  25. Mehmood K, Essam D, Shafi K (2019) Sentiment analysis system for Roman Urdu intelligent computing. SAI 2018. Adv Intell Syst Comput 858:29–42

  26. Mehmood K, Essam D, Shafi K, Malik M (2020) An unsupervised lexical normalization for Roman Hindi and Urdu sentiment analysis. Inf Process Manag 57(6):102368

  27. Mehmood K, Essam D, Shafi K, Malik M (2019) Discriminative feature spamming technique for Roman Urdu sentiment analysis. IEEE Access 7:47991–48002

    Article  Google Scholar 

  28. Mehmood K, Essam D, Shafi K, Malik MK (2019) Sentiment analysis for a resource poor language—Roman Urdu. ACM Trans Asian Low-Resource Lang Inf Process (TALLIP) 19:1–15

    Google Scholar 

  29. Mehmood F, Ghani MU, Ibrahim MA, Shahzadi R, Mahmood W, Asim MN (2020) A precisely xtreme-multi channel hybrid approach for Roman Urdu sentiment analysis. IEEE Access 8:192740–192759

    Article  Google Scholar 

  30. Mukhtar N, Khan MA (2018) Urdu sentiment analysis using supervised machine learning approach. Int J Pattern Recognit Artif Intell 32(2):1851001

  31. Mukhtar N, Khan MA (2020) Effective lexicon-based approach for Urdu sentiment analysis. Artif Intell Rev 53:2521–2548

    Article  Google Scholar 

  32. Mukhtar N, Khan MA, Chiragh N, Nazir S (2018) Identification and Handling of intensifiers for enhancing accuracy of Urdu sentiment analysis. Expert Systems 35(6):e12317

  33. Mukhtar N, Khan MA, Chiragh N (2017) Effective use of evaluation measures for the validation of best classifier in Urdu sentiment analysis. Cogn Comput 9(4):446–456

    Article  Google Scholar 

  34. Mukhtar N, Khan MA, Chiragh N (2018) Lexicon-based approach outperforms supervised machine learning approach for Urdu Sentiment analysis in multiple domains. Telematics Inform 35(8):2173–2183

    Article  Google Scholar 

  35. Mukhtar N, Khan M, Chiragh N, Jan AU, Nazir S (2020) Recognition and effective handling of negations in enhancing the accuracy of Urdu sentiment analyzer. Mehran Univ Res J Eng Technol 39(4):759–771

    Article  Google Scholar 

  36. Mukhtar N, Khan MA, Chiragh N (2022) An intelligent unsupervised approach for handling context-dependent words in Urdu sentiment analysis. Trans Asian Low-Resource Lang Inf Process 21:1–15

    Article  Google Scholar 

  37. Rehman ZU, Bajwa IS (2016) Lexicon-based sentiment analysis for Urdu language. 6th International Conference on Innovative Computing Technology (INTECH), IEEE, pp 497–501

  38. Rehman Z, Anwar W, Bajwa UI (2011) Challenges in Urdu text tokenization and sentence boundary. 2nd Workshop on South Southeast Asian Natural Language Processing, Association for Computational Linguistics, pp. 40–45

  39. Riloff E, Wiebe J (2003) Learning extraction patterns for subjective expressions. Conference on Empirical Methods in Natural Language Processing, ACM, pp 105–112

  40. Safder I, Mahmood Z, Sarwar R, Hassan S-U, Zaman F, Nawab RMA, Bukhari F, Abbasi RA, Alelyani S, Aljohani NR (2021) Sentiment analysis for Urdu online reviews using deep learning models. Expert Systems 38(8):e12751

  41. Singh VK, Piryani R, Uddin A, Waila P (2013) Sentiment analysis of movie reviews: a new feature-based heuristic for aspect-level sentiment classification, in International Mutli-Conference on Automation, Computing, Communication, Control and Compressed Sensing (iMac4s). IEEE, pp 712–717

    Google Scholar 

  42. Syed AZ, Aslam M, Martinez-Enriquez A (2011) Sentiment analysis of Urdu language: handling phrase-level negation. Advances in Artificial Intelligence. MICAI 2011. Lecture Notes Comput Sci, vol. 7094

  43. Syed AZ, Aslam, M Martinez-Enriquez AM (2010) Lexicon based sentiment analysis of Urdu text using SentiUnits. Advances in Artificial Intelligence. MICAI 2010. Lect Notes Comput Sci, vol. 6437

  44. Syed AZ, Aslam M, Martinez-Enriquez A (2011) Adjectival phrases as the sentiment carriers in the Urdu text. J Am Sci 7:644–652

    Google Scholar 

  45. Syed AZ, Aslam M, Martinez-Enriquez AM (2014) Associating targets with SentiUnits: a step forward in sentiment analysis of Urdu text. Artif Intell Rev 41:535–561

    Article  Google Scholar 

  46. Taboada M, Brooke J, Tofiloski M, Voll K, Stede M (2011) Lexicon-based methods for sentiment analysis. Comput Linguist 37(2):267–307

    Article  Google Scholar 

  47. Tan S, Zhang J (2008) An empirical study of sentiment analysis for chinese documents. Expert Syst Appl 34(4):2622–2629

    Article  Google Scholar 

  48. Toutanova K, Manning CD (2000) Enriching the knowledge sources used in a maximum entropy part-of-speech tagger. Joint SIGDAT Conference on Empirical Methods in Natural Language Processing and Very Large Corpora, Association for Computational Linguistics, pp 63–70

  49. Zhang J, Jin R, Alexander YY, Hauptmann AG (2003) Modified logistic regression: an approximation to svm and its applications in large-scale text categorization. In Proceedings of the Twentieth International Conference on International Conference on Machine Learning, Washington DC

Download references

Funding

No grants, funds or other support was received.

Author information

Authors and Affiliations

Authors

Contributions

• Amna Altaf: Conceptualization, Data Curation, Formal Analysis, Investigation, Methodology, Resources, Software, Writing Original Draft, Investigation.

• Muhammad Waqas Anwar: Visualization, Supervision, Project Administration, Funding Acquisition, Writing and Review Editing, Investigation, Validation.

• Muhammad Hasan Jamal: Acquisition, Writing and Review Editing, Investigation, Validation.

• Usama Ijaz Bajwa: Acquisition, Writing and Review Editing, Investigation, Validation.

Corresponding author

Correspondence to Amna Altaf.

Ethics declarations

Competing Interests

We have no financial and personal relationships with other people and organization.

Conflict of Interest

The authors declare no conflict of interest related to the content of this article.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Altaf, A., Anwar, M.W., Jamal, M.H. et al. Exploiting Linguistic Features for Effective Sentence-Level Sentiment Analysis in Urdu Language. Multimed Tools Appl 82, 41813–41839 (2023). https://doi.org/10.1007/s11042-023-15216-0

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11042-023-15216-0

Keywords