Rapid increase in the use of social media has led to the generation of gigabytes of information shared by billions of users worldwide. To analyze this information and determine the behavior of people towards different events, sentiment analysis is widely used by researchers. Existing studies in Urdu sentiment analysis mostly use traditional n-gram features, which unlike linguistic features, do not focus on the contextual information being discussed. Moreover, no existing study classifies sentiments of proverbs and idioms which is challenging as mostly they do not contain sentiment words but carry strong sentiments. This study exploits linguistic features of Urdu language for sentence-level sentiment analysis and classifies idioms and proverbs using classical machine learning techniques. We develop a dataset comprising of idioms, proverbs, and sentences from the news domain, and extract part-of-speech tag-based features, boolean features, and numeric features from the dataset after keen linguistic analysis of Urdu language. Experimental results show that J48 classifier performs best in sentiment classification with an accuracy of 90% and an F-measure of 88%.

Data Availability
Not applicable.
Code Availability
Not applicable.
No grants, funds or other support was received.
• Amna Altaf: Conceptualization, Data Curation, Formal Analysis, Investigation, Methodology, Resources, Software, Writing Original Draft, Investigation.
• Muhammad Waqas Anwar: Visualization, Supervision, Project Administration, Funding Acquisition, Writing and Review Editing, Investigation, Validation.
• Muhammad Hasan Jamal: Acquisition, Writing and Review Editing, Investigation, Validation.
• Usama Ijaz Bajwa: Acquisition, Writing and Review Editing, Investigation, Validation.
Competing Interests
We have no financial and personal relationships with other people and organization.
Conflict of Interest
The authors declare no conflict of interest related to the content of this article.
