ABSTRACT
There is a remarkable growth in the usage of social networks, such as Facebook and Twitter. Users from different cultures and backgrounds post large volumes of textual comments reflecting their opinion in different aspect of life and make them available to everyone. In particular we study the case of Twitter and focus on presidential elections in Egypt 2012. This paper compares between two techniques for Arabic text classification using WEKA application. These techniques are Support Vector Machine (SVM) and Naïve Bayesian (NB), we investigate the use of TF-IDF to obtain document vector. The main objective of this paper is to measure the accuracy and time to get the result for each classifier and to determine which classifier is more accurate for Arabic text classification.
Comparison reported in this paper shows that the Naïve Bayesian method is the highest accuracy and the lowest error rate.
- Tumasjan, A.; Sprenger, T. O.; Sandner, P. G.; and Welpe, I. M.2010. Predicting elections with twitter: What 140 characters reveal.Google Scholar
- About political sentiment. InICWSMH. ElSahar and S. R. El-Beltagy. 2014. "A Fully Automated Approach for Arabic Slang Lexicon Extraction from Microblogs," in Computational Linguistics and Intelligent Text Processing, Springer, pp. 79--91. Google ScholarDigital Library
- B. Liu. 2012. "Sentiment analysis and opinion mining," Synth. Lect. Hum. Lang. Technol., vol. 5, no. 1, pp. 1--167.Google Scholar
- Elhawary, M., Elfeky, M. 2010. Mining Arabic Business Reviews. In Proceedings of the 20 10 IEEE International Conference on Data Mining Workshops. 1108--1113. Google ScholarDigital Library
- Parikh and Matin Movassate. 2009. "Sentiment Analysis of User-Generated Twitter Updates using Various Classification Techniques".Google Scholar
- Rushdi-Saleh, M., Martín-Valdivia, M., Ureña-López, L., and Perea-Ortega, J. 2011. OCA: Opinion corpus for Arabic. Journal of the American Society for Information Science and Technology. 62, 10, 2045--2054. Google ScholarDigital Library
- El-Kourdi, M., Bensaid, A., and Rachidi, T. 2004. Automatic Arabic Document Categorization Based on the Naïve Bayes Algorithm, In proceedings of COLING 20th Workshop on Computational Approaches to Arabic Scriptbased Languages, University of Geneva, Geneva, Switzerland, August 23rd---27th. Google ScholarDigital Library
- Kanaan, G., Al-Shalabi, R., Ghawanmeh, S., and Al-Ma'adeed, H. 2009.A Comparison of Text-Classification Techniques Applied to Arabic Text, Journal of the American society for information science and technology, 60 (9) 1836--1844. Google ScholarDigital Library
- Larkey, L. S., Ballesteros, L., & Connell, M. E. 2002. Improving stemming for Arabic information.Google Scholar
- retrieval: Light stemming and co-occurrence analysis. In Proceedings of SIGIR. 2002.pp. 275--282.Tampere, Finland.Google Scholar
- Liu, S. Liu, Z. Chen and Wei-Ying Ma. 2003."An Evaluation on Feature Selection for Text Clustering", Proceedings of the 12th International Conference ICML, Washington, DC, USA, 2003, pp. 488--495.Google Scholar
- L. Khreisat. 2006. Arabic Text Classification Using NGram Frequency Statistics A Comparative Study. DMIN 2006: 78--82.Google Scholar
- Wenliang, D. and Z. Zhijun. 2002. Building decision tree classifier on private data, in Proceedings of the IEEE international conference on Privacy, security and data mining - Volume14. Australian Computer Society, Inc.: Maebashi City, Japan. Google ScholarDigital Library
- Duda R., Hart P., Stork D. 2001. "Pattern Classification", (2nd Ed), Wiley Interscience. Google ScholarDigital Library
- Dumais S. T., Platt J., Heckerman D., Sahami M. 1998. "Inductive learning algorithms and representations for text categorization", In the Proc. of ACM-CIKM98, pp. 148--155. Google ScholarDigital Library
- Zhao Da-peng. 2013. "Research on the Vector Space Model Based Text Automatic Classification System", JDCTA: International Journal of Digital Content Technology and its Applications, AICIT, vol. 7, pp. 381 ~ 388.Google Scholar
- Andrew McCallum and Kamal Nigam. 1998. "A Comparison Of Event Models For Naive Bayes Text Classification.", In AAAI-98 workshop on learning for text categorization, pp. 41--48.Google Scholar
- Ian H. Witten., Eibe Frank.2005."Data Mining Practical Machine Learning Tools and Techniques", (2nd Ed). Google ScholarDigital Library
- Witten I. H. and Frank, E. 2005. Data Mining: Practica Machine Learning Tools and Techniques, Morgan Kaufmann Series in Data Management Systems, second edition, Morgan Kaufmann (MK). Google ScholarDigital Library
- Paltoglou, G., and Thelwall, M. 2012. Twitter, MySpace, Unsupervised Sentiment Analysis in Social Media. ACM Transactions on Intelligent Systems and Technology (TIST). 3, 4,Article 66, 1--19. Google ScholarDigital Library
- Political Sentiment Analysis Using Twitter Data
Recommendations
Sentiment Analysis on Twitter Data: A Survey
ICCCM '19: Proceedings of the 7th International Conference on Computer and Communications ManagementTwitter is the popular micro blogging site where thousands of people exchange their thoughts daily in the form of tweets. The characteristics of tweet is to be short and simple way of expressions. Though this paper will focus on sentiment analysis of ...
An Ensemble Classification System for Twitter Sentiment Analysis
AbstractTwitter Sentiment Analysis is the way of identifying sentiments and opinions in tweets. The main computational steps in this process are determining the polarity or sentiment of the tweet and then categorizing them into the positive tweet or ...
Review On Sentiment Analysis of Twitter Posts About News Headlines Using Machine Learning Approaches and Naïve Bayes Classifier
ICCAE 2020: Proceedings of the 2020 12th International Conference on Computer and Automation EngineeringIn today's world there are so much micro blogging sites, among all twitter is one of the popular site. It has become an important part for all individuals, politicians, companies, celebrities, etc. Almost all the major news outlets have Twitter account ...
Comments