Skip to main content
Log in

Effective Use of Evaluation Measures for the Validation of Best Classifier in Urdu Sentiment Analysis

  • Published:
Cognitive Computation Aims and scope Submit manuscript

Abstract

Sentiment analysis (SA) can help in decision making, drawing conclusion, or recommending appropriate solution for different business, political, or other problems. At the same time reliable ways are also required to verify the results that are achieved after SA. In the frame of biologically inspired approaches for machine learning, getting reliable result is challenging but important. Properly verified and validated results are always appreciated and preferred by the research community. The strategy of achieving reliable result is adopted in this research by using three standard evaluation measures. First, SA of Urdu is performed. After collection and annotation of data, five classifiers, i.e., PART, Naives Bayes mutinomial Text, Lib SVM (support vector machine), decision tree (J48), and k nearest neighbor (KNN, IBK) are employed using Weka. After using 10-fold cross-validation, three top most classifiers, i.e., Lib SVM, J48, and IBK are selected on the basis of high accuracy, precision, recall, and F-measure. Further, IBK resulted as the best classifier among the three. For verification of this result, labels of the sentences (positive, negative, or neutral) are predicted by using training and test data, followed by the application of the three standard evaluation measures, i.e., McNemar’s test, kappa statistic, and root mean squared error. IBK performs much better than the other two classifiers. To make this result more reliable, a number of steps are taken including the use of three evaluation measures for getting a confirmed and validated result which is the main contribution of this research. It is concluded with confidence that IBK is the best classifier in this case.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12

Similar content being viewed by others

Notes

  1. http://www.cs.waikato.ac.nz/ml/weka/

  2. http://standardwisdom.com/softwarejournal/2011/12/confusion-matrix-another-single-value-metric-kappastatistic/

  3. http://statweb.stanford.edu/~susan/courses/s60/split/node60.html

References

  1. Cambria E, Schuller B, Xia Y, Havasi C. New avenues in opinion mining and sentiment analysis. IEEE Intell Syst. 2013;28(2):15–21.

    Article  Google Scholar 

  2. Palogiannidi E, Kolovou A, Christopoulou F, Kokkinos F, Iosif E, Malandrakis N, et al., editors. Tweester at SemEval-2016 Task 4: Sentiment analysis in Twitter using semantic- affective model adaptation. 10th International Workshop on Semantic Evaluation (SemEval 2016) 2016; San Diego, US.

  3. Cambria E. Affective computing and sentiment analysis. IEEE Intell Syst. 2016;31:102–7.

    Article  Google Scholar 

  4. Ofek N, Rokach L, Cambria E, Hussain A, Shabtai A. Unsupervised commonsense knowledge enrichment for domain-specific sentiment analysis. Cogn Comput. 2016;8(3):467–77.

    Article  Google Scholar 

  5. Oneto L, Bisio F, Cambria E, Anguita D. Statistical learning theory and ELM for big social data analysis. IEEE Comput Intell Mag. 2016;11(3):45–55.

    Article  Google Scholar 

  6. Bautin M, Vijayarenu L, Skiena S, editors. International Sentiment Analysis for News and Blog. Second International Conference on Weblogs and Social Media Seattle, WA; 2008.

  7. Cambria E, Poria S, Bajpai R, Schuller B, editors. SenticNet 4: A Semantic Resource for Sentiment Analysis Based on Conceptual Primitives. Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: Technical Papers; 2016; Japan.

  8. Appela O, Chiclana F, Cartera J, Fujitab H. A hybrid approach to the sentiment analysis problem at the sentence level. Spec Issue New Avenues Knowl Bases Nat Lang Process Knowl-Based Syst. 2016;108:110–24.

    Google Scholar 

  9. Minhas S, Hussain A. From spin to swindle: identifying falsification in financial text. Cogn Comput. 2016;8:729–45.

    Article  Google Scholar 

  10. Khan FH, Qamar U, Bashir S. Multi-objective model selection (MOMS)-based semi-supervised framework for sentiment analysis. Cogn Comput. 2016;8(4):614–28.

    Article  Google Scholar 

  11. Dashtipour K, Poria S, Hussain A, Cambria E, Hawalah AYA, Gelbukh A, et al. Multilingual sentiment analysis: state of the art and independent comparison of techniques. Cogn Comput. 2016;8:757–71.

    Article  Google Scholar 

  12. Bilal M, Israr H, Shahid M, Khan A. Sentiment classification of Roman-Urdu opinions using Naı¨ve Bayesian, decision tree and KNN classification techniques. J King Saud Univ Comput Inf Sci. 2015;

  13. Syed AZ, Muhammad A, Enríquez AMM, editors. Lexicon Based Sentiment Analysis of Urdu Text Using SentiUnits. Proceedings of the 9th Mexican international conference of artificial intelligence, MICAI; 2010; Berlin Heidelberg. Springer.

  14. Syed AZ, Muhammad A, Enríquez AMM. Adjectival phrases as the sentiment carriers in Urdu. J Am Sci. 2011;7(3):644–52.

    Google Scholar 

  15. Syed AZ, Muhammad A, Enríquez AMM. Associating targets with SentiUnits: a step forward in sentiment analysis of Urdu text. Artif Intell Rev Springer. 2014;41(4):535–61.

    Article  Google Scholar 

  16. Daud M, Khan R, Duad A. Roman Urdu opinion mining system (RUOMiS). CSEIJ. 2014;4(6):1–9.

    Article  Google Scholar 

  17. Dietterich TG. Approximate statistical tests for comparing supervised classification learning algorithms. Neural Comput. 1998;10:1895–923.

    Article  CAS  PubMed  Google Scholar 

  18. Bouckaert RR, Frank E, editors. Evaluating the replicability of significance tests for comparing learning algorithms. 8th Pacific-Asia Conference; 2004.

  19. Demsar J. Statistical comparisons of classifiers over multiple data sets. J Mach Learn Res. 2006:1–6.

  20. Bostanci B, Bostanci E, editors. An evaluation of classification algorithms using Mc Nemar’s test. Seventh International Conference on Bio-Inspired Computing: Theories and Applications; 2013; New Delhi. Advances in Intelligent Systems and Computing, Springer.

  21. Westfall PH, Troendle JF, Pennello G. Multiple McNemar tests. Biometrics. 2010;66(4):1185–91.

    Article  PubMed  PubMed Central  Google Scholar 

  22. Vieira S, Kaymak U, Sousa J, editors. Cohen’s kappa coefficient as a performance measure for feature selection. IEEE International Conference on Fuzzy Systems (FUZZ) 2010; Piscataway.

  23. Ben-David A. Comparison of classification accuracy using Cohen's weighted kappa. Expert Syst Appl. 2008;34(2):825–32.

  24. Petrakos M, Benediktsson J. The effect of classifier agreement on the accuracy of the combined classifier in decision level fusion. IEEE Trans Geosci Remote Sens. 2001;39(11):2539–46.

    Article  Google Scholar 

  25. Caruana R, Niculescu-Mizil A, editors. An empirical comparison of supervised learning algorithms. 23rd International Conference on Machine learning; 2006; New York. ACM.

  26. Tushkanova O, editor. Comparative analysis of the numerical measures for mining associative and causal relationships in big data Creativity in intelligent technologies and data science, First conference Proceedings, CIT &DS 2015; Russia.

  27. Braga-Neto UM. Classification and error estimation for discrete data. Curr Genomics. 2009;10(7):446–62.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  28. Siegel S, John Castellan N. Nonparametric statistics for the behavioral sciences. Second ed: McGraw-Hill; 1988.

  29. McHugh M. Interrater reliability: the kappa statistic. Biochem Med. 2012;22:276–82.

    Article  Google Scholar 

  30. Viera AJ, Garrett JM. Understanding inter observer agreement: the kappa statistic. Family Med. 2005;37(5):360–3.

    Google Scholar 

  31. Silva C, Ribeiro B, editors. The importance of stop word removal on recall values in text categorization. Neural Netw, 2003 Proceedings of the International Joint Conference; 2003. IEEE.

  32. Sun X, Yang Z, editors. Generalized McNemar's test for homogeneity of the marginal distributions. SAS Global Forum. Cary: SAS Institute; 2008.

    Google Scholar 

  33. McNemar Q. Note on the sampling error of the difference between correlated proportions or percentages. Psychometrika. 1947;17:153–7.

    Article  Google Scholar 

  34. Witten IH, Frank E, Hall MA, editors. Data mining: practical machine learning tools and techniques; 2011.

  35. Japkowicz N, Shah M, editors. Evaluating learning algorithms: a classification perspective. Cambridge: Cambridge University Press; 2011.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Neelam Mukhtar.

Ethics declarations

Conflict of Interest

The authors declare that they have no conflict of interest.

Informed Consent

All procedures followed were in accordance with the ethical standards of the responsible committee on human experimentation (institutional and national) and with the Helsinki Declaration of 1975, as revised in 2008 [6].

Human and Animal Rights

This article does not contain any studies with human participants or animals performed by any of the authors.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Mukhtar, N., Khan, M.A. & Chiragh, N. Effective Use of Evaluation Measures for the Validation of Best Classifier in Urdu Sentiment Analysis. Cogn Comput 9, 446–456 (2017). https://doi.org/10.1007/s12559-017-9481-5

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s12559-017-9481-5

Keywords

Navigation