Abstract
Nowadays, social media is used by many people to express their opinions about a variety of topics. Opinion Mining or Sentiment Analysis techniques extract opinions from user generated contents. Over the years, a multitude of Sentiment Analysis studies has been done about the English language with deficiencies of research in all other languages. Unfortunately, Arabic is one of the languages that seems to lack substantial research, despite the rapid growth of its use on social media outlets. Furthermore, specific Arabic dialects should be studied, not just Modern Standard Arabic. In this paper, we experiment sentiments analysis of Iraqi Arabic dialect using word embedding. First, we made a large corpus from previous works to learn word representations. Second, we generated word embedding model by training corpus using Doc2Vec representations based on Paragraph and Distributed Memory Model of Paragraph Vectors (DM-PV) architecture. Lastly, the represented feature used for training four binary classifiers (Logistic Regression, Decision Tree, Support Vector Machine and Naive Bayes) to detect sentiment. We also experimented different values of parameters (window size, dimension and negative samples). In the light of the experiments, it can be concluded that our approach achieves a better performance for Logistic Regression and Support Vector Machine than the other classifiers.
- Thabit Sabbah, Ali Selamat, Md Hafiz Selamat, Fawaz S. Al-Anzi, Enrique Herrera Viedma, Ondrej Krejcar, and Hamido Fujita. 2017. Modified frequency-based term weighting schemes for text classification. Applied Soft Computing 58 (September 2017), 193--206.Google Scholar
- Hassan Saif, Yulan He, Miriam Fernandez, and Harith Alani. 2016. Contextual semantics for sentiment analysis of twitter. Information Processing 8 Management 52, 1 (January 2016), 5--19. Google ScholarDigital Library
- A. Aziz Altowayan and Lixin Tao. 2016. Word embeddings for arabic sentiment analysis. In Proceedings of the IEEE International Conference on Big Data. IEEE, Los Alamitos, CA, USA, 3820--3825.Google ScholarCross Ref
- Bing Liu. 2012. Sentiment Analysis and Opinion mining, Morgan 8 Claypool Publishers. California, USA.Google Scholar
- Aymen Abu-Errub, Ashraf Odeh, Qusai Shambour, and Osama Al-Haj Hassan. 2014. Arabic roots extraction using morphological analysis. International Journal of Computer Science Issues (IJCSI) 11, 2 (March 2014), 128--134.Google Scholar
- Alaa M. El-Halees. 2017. Arabic opinion mining using distributed representations of documents. In Proceedings of the Palestinian International Conference on Information and Communication Technology. IEEE, Washington, DC, USA, 28--33.Google ScholarCross Ref
- RM Duwairi, Nizar A. Ahmed, and Saleh Y. Al-Rifai. 2015. Detecting sentiment embedded in arabic social media--a lexicon-based approach. Journal of Intelligent 8 Fuzzy Systems 29, 1 (2015), 107--117.Google ScholarCross Ref
- Abdullateef M. Rabab'ah, Mahmoud Al-Ayyoub, Yaser Jararweh, and Mohammed N. Al-Kabi. 2016. Evaluating sentistrength for arabic sentiment analysis. In Proceedings of the 7th International Conference on Computer Science and Information Technology (CSIT). IEEE, Washington, DC, USA, 1--6.Google Scholar
- Sadam Al-Azani and El-Sayed M. El-Alfy. 2017. Using word embedding and ensemble learning for highly imbalanced data sentiment analysis in short arabic text. In Proceedings of the 8th International Conference on Ambient Systems, Networks and Technologies, ANT 2017. Procedia Computer Science, 359--366.Google Scholar
- Anwar Alnawas and Nursal Arıcı. 2018. The corpus based approach to sentiment analysis in modern standard arabic and arabic dialects: A literature review. Journal of Polytechnic 21, 2 (June 2018), 461--470.Google Scholar
- Abdelghani Dahou, Shengwu Xiong, Junwei Zhou, Mohamed Houcine Haddoud, and Pengfei Duan. 2016. Word embeddings and convolutional neural network for arabic sentiment classification. In Proceedings of the 26th International Conference on Computational Linguistics. Association for Computational Linguistics, Stroudsburg PA USA, 2418--2427.Google Scholar
- Mohamed Aly and Amir Atiya. 2013. Labr: Large scale arabic book reviews. In Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics, Association for Computational Linguistics, Stroudsburg, PA, USA, 494--498.Google Scholar
- Hady ElSahar and Samhaa R. El-Beltagy. 2015. Building large arabic multi-domain resources for sentiment analysis. In Proceedings of the International Conference on Intelligent Text Processing and Computational Linguistics. Springer, Cham, Switzerland, 23--34.Google Scholar
- Eshrag Refaee and Verena Rieser. 2014. An arabic twitter corpus for subjectivity and sentiment analysis. In Proceedings of the 9th International Language Resources and Evaluation Conference. European Language Resources Association, France, 2268--2273.Google Scholar
- Tomas Mikolov, Wen-tau Yih, and Geoffrey Zweig. 2013. Linguistic regularities in continuous space word representations. In Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL-HLT-2013), Association for Computational Linguistics, Stroudsburg PA, USA, 746--751.Google Scholar
- Ahmad Al-Sallab, Ramy Baly, Hazem Hajj, Khaled Bashir Shaban, Wassim El-Hajj, and Gilbert Badaro.2017.Aroma: A recursive deep learning model for opinion mining in arabic as a low resource language. ACM Trans. Asian Low-Resour. Lang. Inf. Process. 16, 4, Article 25 (July 2017), 20 pages. Google ScholarDigital Library
- Ramy Baly, Hazem Hajj, Nizar Habash, Khaled Bashir Shaban, and Wassim El-Hajj. 2017. A sentiment treebank and morphologically enriched recursive deep models for effective sentiment analysis in arabic. ACM Trans. Asian Low-Resour. Lang. Inf. Process. 16, 4, Article 23 (July 2017), 21 pages. Google ScholarDigital Library
- Fadi Biadsy, Julia Hirschberg, and Nizar Habash. 2009. Spoken arabic dialect identification using phonotactic modeling. In Proceedings of the EACL 2009 workshop on computational approaches to semitic languages. Association for Computational Linguistics, Stroudsburg, PA, USA, 53--61. Google ScholarDigital Library
- Matti Phillips Khoshaba Al-Bazi. 2005. Iraqi Dialect Versus Standard Arabic, Matti Phillips Khoshaba (Al- Bazi). United States.Google Scholar
- Tomas Mikolov, Kai Chen, Greg Corrado, and Jeffrey Dean. 2013. Efficient estimation of word representations in vector space. arXiv:1301.3781. Retrieved from https://arxiv.org/abs/1301.3781.Google Scholar
- Jeffrey Pennington, Richard Socher, and Christopher Manning. 2014. Glove: Global vectors for word representation. In Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP). Association for Computational Linguistics, Stroudsburg PA, USA, 1532--1543.Google ScholarCross Ref
- Ahmet Hayran and Mustafa Sert. 2017. Sentiment analysis on microblog data based on word embedding and fusion techniques. In Proceedings of the 25th Signal Processing and Communications Applications Conference (SIU). IEEE, Washington, DC, USA, 1--4.Google ScholarCross Ref
- Quoc Le and Tomas Mikolov. 2014. Distributed representations of sentences and documents. In Proceedings of the 31st International Conference on International Conference on Machine Learning. JMLR.org, 1188--1196. Google ScholarDigital Library
- Antoine J-P Tixier, Michalis Vazirgiannis, and Matthew R. Hallowell. 2016. Word embeddings for the construction domain. ArXiv:1610.09333. Retrieved from https://arXiv:1610.09333.Google Scholar
- Aitor García-Pablos, Montse Cuadros, and German Rigau. 2018. W2vlda: Almost unsupervised system for aspect based sentiment analysis. Expert Systems with Applications 91 (January 2018), 127--137.Google Scholar
- Maria Giatsoglou, Manolis G. Vozalis, Konstantinos Diamantaras, Athena Vakali, George Sarigiannidis, and Konstantinos Ch. Chatzisavvas. 2017. Sentiment analysis leveraging emotions and word embeddings. Expert Systems with Applications 69 (March 2017), 214--224.Google Scholar
- Sungwoon Choi, Jangho Lee, Min-Gyu Kang, Hyeyoung Min, Yoon-Seok Chang, and Sungroh Yoon. 2017. Large-scale machine learning of media outlets for understanding public reactions to nation-wide viral infection outbreaks. Methods 129, 1 (October 2017), 50--59.Google ScholarCross Ref
- Marwa Naili, Anja Habacha Chaibi, and Henda Hajjami Ben Ghezala. 2017. Comparative study of word embedding methods in topic segmentation. Procedia Computer Science 112 (September 2017), 340--349. Google ScholarDigital Library
- Mohammad Al-Smadi, Mahmoud Al-Ayyoub, Yaser Jararweh, and Omar Qawasmeh. 2018. Enhancing aspect-based sentiment analysis of arabic hotels’ reviews using morphological, syntactic and semantic features. Information Processing 8 Management (January 2018).Google Scholar
- Hunaida Awwad and Adil Alpkocak. 2017. Using hybrid-stemming approach to enhance lexicon-based sentiment analysis in arabic. In Proceedings of the International Conference on New Trends in Computing Sciences (ICTCS). IEEE, Los Alamitos, CA, USA, 229--235.Google ScholarCross Ref
- Nawaf A. Abdulla, Nizar A. Ahmed, Mohammed A. Shehab, and Mahmoud Al-Ayyoub. 2013. Arabic sentiment analysis: Lexicon-based and corpus-based. In Proceedings of the IEEE Jordan Conference on Applied Electrical Engineering and Computing Technologies (AEECT). IEEE, Los Alamitos, CA, USA, 1--6.Google ScholarCross Ref
- Mohammed Rushdi‐Saleh, M Teresa Martín‐Valdivia, L. Alfonso Ureña‐López, and José M. Perea‐Ortega. 2011. Oca: Opinion corpus for arabic. J. Am. Soc. Inf. Sci. Technol. 62, 10 (October 2011), 2045--2054. Google ScholarDigital Library
- Mahmoud Nabil, Mohamed Aly, and Amir Atiya. 2015. Astd: Arabic sentiment tweets dataset. In Proceedings of the Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, Stroudsburg PA, USA, 2515--2519.Google ScholarCross Ref
- Carmen Banea, Rada Mihalcea, and Janyce Wiebe. 2010. Multilingual subjectivity: Are more languages better? In Proceedings of the Proceedings of the 23rd International Conference on Computational Linguistics. Association for Computational Linguistics, Stroudsburg, PA, USA, 28--36. Google ScholarDigital Library
- Matic Perovšek, Janez Kranjc, Tomaž Erjavec, Bojan Cestnik, and Nada Lavrač. 2016. Textflows: A visual programming platform for text mining and natural language processing. Science of Computer Programming 121 (June 2016), 128--152. Google ScholarDigital Library
- Fréderic Godin, Baptist Vandersmissen, Wesley De Neve, and Rik Van de Walle. 2015. Multimedia lab @ acl wnut ner shared task: Named entity recognition for twitter microposts using distributed word representations. In Proceedings of the Workshop on Noisy User-generated Text. Association for Computational Linguistics, Stroudsburg, PA, USA, 146--153.Google ScholarCross Ref
- Mohit Bansal, Kevin Gimpel, and Karen Livescu. 2014. Tailoring continuous word representations for dependency parsing. In Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics. Association for Computational Linguistics, Stroudsburg, PA, USA, 809--815.Google ScholarCross Ref
- Duyu Tang, Furu Wei, Nan Yang, Ming Zhou, Ting Liu, and Bing Qin. 2014. Learning sentiment-specific word embedding for twitter sentiment classification. In Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Long Papers). Association for Computational Linguistics, Stroudsburg PA, USA, 1555--1565.Google ScholarCross Ref
- Tomas Mikolov, Ilya Sutskever, Kai Chen, Greg S. Corrado, and Jeff Dean. 2013. Distributed representations of words and phrases and their compositionality. In Proceedings of the 26th International Conference on Neural Information Processing Systems - Volume 2 (NIPS=13). Curran Associates Inc., USA, 3111--3119. Google ScholarDigital Library
- Tomas Mikolov, Quoc V Le, and Ilya Sutskever. 2013. Exploiting similarities among languages for machine translation. ArXiv:1309.4168. Retrieved from https://arxiv.org/abs/1309.4168.Google Scholar
Index Terms
- Sentiment Analysis of Iraqi Arabic Dialect on Facebook Based on Distributed Representations of Documents
Recommendations
A Sentiment Analysis Algorithm of Danmaku Based on Building a Mixed Fine-grained Sentiment Lexicon
ICCPR '20: Proceedings of the 2020 9th International Conference on Computing and Pattern RecognitionThe Danmaku is a form of instant video text commentary that reflects the viewer's sentiment orientation. Currently, most of sentiment analysis algorithms based on the sentiment lexicon are using manual construction of the lexicon. However, this kind of ...
Social Sentiment Detection of Event via Microblog
CSE '13: Proceedings of the 2013 IEEE 16th International Conference on Computational Science and EngineeringSentimental analyses of the public have been attracting increasing attentions from researchers. This paper focuses on the research problem of social sentiment detection, which aims to identify the sentiments of the public evoked by online microblogs. A ...
Topic-related Chinese message sentiment analysis
Considering sentiment analysis of microblogs plays an important role in behavior analysis of social media, there has been a significant progress in this area recently. However, most researches are topic-ignored and neglect the sentimental orientation ...
Comments