Arabic Sentiment Analysis Resources: A Survey

alOwisheq, Areeb; alHumoud, Sarah; alTwairesh, Nora; alBuhairi, Tarfa

doi:10.1007/978-3-319-39910-2_25

Areeb alOwisheq¹⁴,
Sarah alHumoud¹⁴,
Nora alTwairesh¹⁵ &
…
Tarfa alBuhairi¹⁴

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 9742))

Included in the following conference series:

International Conference on Social Computing and Social Media

3478 Accesses
7 Citations
9 Altmetric

Abstract

Research interest in Arabic sentiment analysis (ASA) is rapidly increasing, therefore it is important to compile, document and analyze efforts in this area to facilitate further development. These ASA efforts aim to create tools that can sift through and gain meaningful knowledge from the unending data explosion. ASA approaches have continued to evolve despite lack in Arabic linguistic resources. In this paper we conduct a comprehensive and up-to-date review of recent resources for ASA.

You have full access to this open access chapter, Download conference paper PDF

Challenges and Approaches in Arabic Sentiment Analysis: A Review

A Systematic Review of Sentiment Analysis in Arabizi

Building Large Arabic Multi-domain Resources for Sentiment Analysis

Keywords

1 Introduction

Large-scale data stream analysis has lately become one of the important business and research priorities. Social networks like Twitter and other micro-blogging platforms hold an enormous amount of data. Extracting valuable information and trends out of these data would aid in a better understanding and decision-making. Multiple analysis techniques are deployed for English content, and although the Arabic language is one of the languages that has a large amount of content over social networks, yet it is least analyzed.

As of March 2014, there are over 5.7 million Arab Twitter users, 2.4 million of those are from Saudi Arabia^{Footnote 1}, together producing an average of over 17 million tweets per day. This huge volume of data provides the opportunity of Sentiment Analysis (SA), enabling organizations to observe feelings and opinions of twitter users towards products, policies or people. Existing solutions to Arabic SA are limited compared to English SA approaches, the unique nature and complexity of the Arabic language requires researching appropriate solutions. Arabic is a morphologically rich language where important grammatical information is expressed at word level. Moreover, Arabic language is a collection of multiple variants, where the everyday spoken language Dialectal Arabic (DA) is different from the formal language Modern Standard Arabic (MSA). In social media, Arab users have started using their own dialect in expressing themselves. This has complicated the task of SA since most Arabic NLP tools have been developed for MSA.

Although Research in Arabic SA is still in its early stages, it is rapidly increasing. As shown in Fig. 1, which is adapted from work done by [1], the number of scientific publications (conference papers and journal articles), have rapidly risen in the last couple of years.

This increase in interest demands formal and systematic reviews of the area. It is highly important for the scientific community to recognize the state of the art, realize existing methodologies and tools; and address challenges and open issues.

One of the main obstacles in Arabic SA is the scarcity of high quality resources, such as datasets, corpora, and lexicons. This paper reviews main methods used to create them, their targeted dialects, and their size, in addition to their utilization by the reviewed SA approaches. The paper is organized as follows, Sect. 2 presents the survey methodology. Section 3 provides an overview of lexica resources which were used by the ASA approaches and Sect. 4 concludes the paper.

2 Survey Methodology

We followed the process from [1] in collecting the articles. The search process was conducted using keywords: ‘Arabic subjectivity and sentiment analysis’, ‘Arabic opinion mining’, ‘Comparative opinions Arabic’, and ‘Opinion spam Arabic’ in these databases: Google Scholar, Springer, IEEE explorer, ACM digital library, and Science Direct. Reviewed papers covers papers written until 2015. A total of 28 articles were selected from the retrieved publications: these included ones that introduced a new ASA resource, and that was not covered in [1]. The articles were then categorized into either an ASA approach or a resource depending on their contributions. For the ASA resources, we included ones that were used by the surveyed approaches in addition to any resources that have not been covered by previous surveys. The following sections review resources, which are divided into lexicons and corpora/datasets. In each section, the articles are presented in a tabulated form to ease readability. The aim is to provide a valuable resource for researchers when considering ASA.

3 Resources

In this section we cover linguistic resources essential to ASA approaches; these are sentiment lexicons, and corpora.

3.1 Sentiment Lexicons

Here we review papers that reported the construction of a lexicon without presenting any new methods in SA. Papers that constructed a new lexicon and developed a new approach that uses the lexicon are mentioned in the summary table (Table 1) for referencing. Where the proposed lexicons are mentioned if they were available publicly otherwise NA is written denoting that lexicon is not available or AOR if lexicon was available on request.

Table 1. Lexica used in ASA techniques

Full size table

In an attempt to produce an Arabic SentiWordNet (SWN) Al-Hazmi et al. [2] proposed a methodology for mapping SWN 3.0 to Arabic. However, this resource has limited coverage (10 K) and was not tested in a sentiment analysis setting and is not publicly available. Badaro et al. [3] however, present pioneering work in the same direction by constructing ArSenL a large scale Arabic sentiment lexicon. They relied on four resources to create ArSenL: English WordNet (EWN), Arabic WordNet (AWN), English SentiWordNet (ESWN), and SAMA (Standard Arabic Morphological Analyzer). Two approaches were followed producing two different lexicons, each validated separately. Then the union of the two lexicons was validated and produced the best performance. The first approach used AWN, by mapping AWN entries into ESWN using existing offsets thus producing ArSenL-AWN. The second approach utilizes SAMA’s English glosses by finding the highest overlapping synsets between these glosses and ESWN thus producing ArSenL-Eng. Hence ArSenL is the union of these two lexicons. They evaluated the lexicon by comparing it to SIFAAT lexicon [4], and it gave the highest coverage and best performance in subjectivity and sentiment classification. Although this lexicon can be considered as the largest Arabic sentiment lexicon developed to date, it is unfortunate that it only has MSA entries and no dialect words and is not developed from a social media context which could affect the accuracy when applied to social media text. Following the example of ArSenL, the lexicon SLSA (Sentiment Lexicon for Standard Arabic) [5] is constructed by linking the lexicon of an Arabic morphological analyzer Aramorph with SentiWordNet. Although the approach is very similar to ArSenL since both use SentiWordNet to obtain the scores of words, the authors argue that SLSA uses Aramorp which is a free resource while ArSenL use SAMA which is not free and thus makes ArSenL not publicly available. Also the linking algorithm used to link the glosses in Aramorph with those in SentiWordNet is different. SLSA starts by linking every entry in Aramorph with SentiWordNet if the one-gloss word and POS match. Then to accommodate the unlinked entries the POS match is relaxed further as to include the cases where the same lemma has POS noun and adjective, the next step ignores the POS completely. In case of multi-word glosses, the stop words are removed and the relaxed condition is tested on each word separately. This covers 98.2 % of the entries in Aramorph. Intrinsic and extrinsic evaluations were performed by comparing SLSA and ArSenL which demonstrated the superiority of SLSA. Nevertheless, SLSA like ArSenL does not include dialect words and cannot accurately analyze social media text.

In [6] a bilingual sentiment lexicon was developed especially for mining Dark Web forums. Two lexicons were developed SentiLEn for English and SentiLAr for Arabic. The Arabic lexicon was constructed by extracting sentiment words related to cyber threats, radicalism, and conflicts from 2000 message posts of Alokab Web forum. Three Arabic language experts annotated the extracted terms’ polarity by giving each term a positive score [0, 1] and a negative score [0, 1]. If a word is always positive its positive score is 1 and the negative score is 0. Similarly, if a word is always negative its negative score is 1 and positive score is 0. For words that are used in both positive as well as negative contexts, positive and negative polarity scores are assigned in the range of 0 and 1 in such a way that their sum is 1. Also two different scores are given for each term for strong and hostile valences. Then the scores given by the three experts are aggregated and normalized to be between [-1,1]. The paper only reported the construction of the lexicon but nothing was reported about validating the lexicon in a real application.

Starting from a small seed list of positive and negative words, Mahyoub et al. [7] used semi-supervised learning to propagate the scores on the Arabic WordNet by exploiting the synset relations. They used the same relations that were used by [8] in developing WordNet-Affect to expand the seed list. These relations include eight semantic/lexical relations {near_synonym, verb_group, see_also_wn15, has_derived, related_to, has_subevent, causes and near_antonym}. The lexicon was evaluated on two corpora’s of movie and book reviews. Although reaching a high accuracy when evaluated, the lexicon still has a low coverage (7576 words) and does not include dialect words.

One of the challenges in sentiment analysis is handling phrases and idioms that convey sentiment. While sentiment words are significant clues to detect sentiment in text, users tend to use common phrases and idioms to express their opinions. These phrases are made up of a different number of words that are usually not sentiment bearing words, and when treated separately by any sentiment analysis algorithm would not be detected as a sentiment clue. Consequently, some efforts have been initiated to deal with this challenge. Authors in [9] constructed an idioms/proverbs lexicon for the Egyptian dialect. They collected 32785 idioms/proverbs from Arabic websites that present directories and encyclopedias of common Egyptian idioms and proverbs. Then they selected 3632 common phrases and manually annotated them for polarity (positive, negative). To check the coverage of this lexicon they developed a technique to detect and extract phrases in text using similarity measures (cosine similarity and Levenshtein distance) combining these measures with n-gram, they reached a 98 % accuracy when applied on tweets and reviews.

The Arabic lexical semantics database (RDI-ArabSemanticDB) [10] was exploited in [11] to construct an Arabic Sentiment Lexicon. The RDI-ArabSemanticDB contains approximately 150,000 Arabic words, 18,413 semantic fields, and 20 semantic relations, including synonyms, antonym, hyponymy and causality. These relations were used to expand a seed list of positive, negative, and neutral words. The lexicon was tested by first comparing it to a translated version of the MPQA lexicon and a manually annotated subset of the lexicon. The results showed that the translation of an English lexicon does not give accurate results. Also the lexicon was tested using different machine learning classifiers of Arabic sentiment using a translated version of the MPQA corpus.

3.2 Corpora and Datasets

Applying sentiment analysis requires a corpus to train a classifier or to evaluate it. This section covers Arabic sentiment analysis researches, and reviews that used corpora. Mostly used corpora were collected from social media; because the content is provided freely, easily, and instantaneously. Users can express, reach, and share opinions in public. Table 2 shows the most available corpora in MSA or dialect which is used in sentiment analysis.

Table 2. Corpora used in ASA techniques

Full size table

Authors in [5] and [30] used the corpus of [31] which was based on [32]. While the OCA corpus [33] was used by [13] and [34]. The authors in [28] used the HAAD corpus which was produced by [35]. The HAAD [35] minimized and utilized LABR corpus [36]. The authors [37] used the corpus of [38]. [39] utilize the corpus created in [40].

Notes

1.
Twitter in the Arab Region, Arab Social Media Report
http://www.arabsocialmediareport.com/Twitter/LineChart.aspx?&PriMenuID=18&CatID=25&mnu=Cat.

References

Al-Twairesh, N., Al-Khalifa, H., Al-Salman, A.-M.: Subjectivity and sentiment analysis of Arabic: trends and challenges. In: 11th International Conference on Computer Systems and Applications (AICCSA), IEEE/ACS, 2014, pp. 148–155 (2014)
Google Scholar
Alhazmi, S., Black, W., McNaught, J.: Arabic SentiWordNet in relation to SentiWordNet 3.0. 2180 1266(4), 1 (2013)
Google Scholar
Badaro, G., Baly, R., Hajj, H., Habash, N., El-Hajj, W.: A large scale Arabic sentiment lexicon for Arabic opinion mining. In: ANLP 2014, p. 165 (2014)
Google Scholar
Abdul-Mageed, M., Diab, M.: Toward building a large-scale Arabic sentiment lexicon. In: Proceedings of the 6th International Global WordNet Conference, pp. 18–22 (2012)
Google Scholar
Eskander, R., Rambow, O.: SLSA: A sentiment lexicon for standard Arabic presented at the empirical methods in natural language processing, Lisbon, Portugal (2015)
Google Scholar
Al-Rowaily, K., Abulaish, M., Haldar, N.A.-H., Al-Rubaian, M.: BiSAL–a bilingual sentiment analysis lexicon to analyze dark web forums for cyber security. Digit. Investig. 14, 53–62 (2015)
Article Google Scholar
Mahyoub, F.H., Siddiqui, M.A., Dahab, M.Y.: Building an Arabic sentiment lexicon using semi-supervised learning. J. King Saud Univ.-Comput. Inf. Sci. 26(4), 417–424 (2014)
Google Scholar
Valitutti, A., Strapparava, C., Stock, O.: Developing affective lexical resources. PsychNology J. 2(1), 61–83 (2004)
Google Scholar
Ibrahim, H.S., Abdou, S.M., Gheith, M.: Idioms-proverbs lexicon for modern standard Arabic and colloquial sentiment analysis. Int. J. Comput. Appl. 118(11), 26–31 (2015)
Google Scholar
Attia, M., Rashwan, M., Ragheb, A., Al-Badrashiny, M., Al-Basoumy, H., Abdou, S.: A compact Arabic lexical semantics language resource based on the theory of semantic fields. In: Nordström, B., Ranta, A. (eds.) GoTAL 2008. LNCS (LNAI), vol. 5221, pp. 65–76. Springer, Heidelberg (2008)
Chapter Google Scholar
Mobarz, H., Rashown, M., Farag, I.: Using automated lexical resources in Arabic sentence subjectivity. Int. J. Artif. Intell. Appl. 5(6), 1 (2014)
Google Scholar
Duwairi, R.M.: Sentiment analysis for dialectical Arabic. In: 2015 6th International Conference on Information and Communication Systems (ICICS), pp. 166–170 (2015)
Google Scholar
Bayoudhi, A., Belguith, L.H., Ghorbel, H.: Sentiment classification of Arabic documents: experiments with multi-type features and ensemble algorithms (2015)
Google Scholar
Al-Ayyoub, M., Essa, S.B., Alsmadi, I.: Lexicon-based sentiment analysis of Arabic tweets. Int. J. Soc. Netw. Min. 2(2), 101–114 (2015)
Article Google Scholar
Abuaiadh, D.: Dataset for Arabic Document Classification (2011). http://diab.edublogs.org/dataset-for-arabic-document-classification/
Abdulla, N., Majdalawi, R., Mohammed, S., Al-Ayyoub, M., Al-Kabi, M.: Automatic lexicon construction for Arabic sentiment analysis. In: 2014 International Conference on Future Internet of Things and Cloud (FiCloud), pp. 547–552 (2014)
Google Scholar
Abdulla, N.A., Ahmed, N.A., Shehab, M.A., Al-Ayyoub, M., Al-Kabi, M.N., Al-rifai, S.: Towards improving the lexicon-based approach for Arabic sentiment analysis. Int. J. Inf. Technol. Web Eng. IJITWE 9(3), 55–71 (2014)
Article Google Scholar
ElSahar, H., El-Beltagy, S.R.: Building large Arabic multi-domain resources for sentiment analysis. In: Gelbukh, A. (ed.) Computational Linguistics and Intelligent Text Processing. LNCS, vol. 9042, pp. 23–34. Springer, Heidelberg (2015)
Google Scholar
Ibrahim, H.S., Abdou, S.M., Gheith, M.: Sentiment analysis for modern standard Arabic and colloquial. Int. J. Nat. Lang. Comput. 4(2) (2015)
Google Scholar
El-Makky, N., Nagi, K., El-Ebshihy, A., Apady, E., Hafez, O., Mostafa, S., Ibrahim, S.: Sentiment Analysis of Colloquial Arabic Tweets (2015)
Google Scholar
ALTEC, Arabic MPQA Subjective Lexicon and Arabic Opinion Holder Corpus, Arabic Langauge Technology Center (2011). http://www.altec-center.org/Repository_61.html
Abdul-Mageed, M., Diab, M.T.: Subjectivity and sentiment annotation of modern standard Arabic newswire. In: Proceedings of the 5th Linguistic Annotation Workshop, pp. 110–118 (2011)
Google Scholar
El-Beltagy, S.R., Ali, A.: Open issues in the sentiment analysis of Arabic social media: a case study. In: 2013 9th International Conference on Innovations in Information Technology (IIT), pp. 215–220 (2013)
Google Scholar
Alhumoud, S., Albuhairi, T., Alohaideb, W.: Hybrid sentiment analyser for Arabic tweets using R. In: Proceedings of the 7th International Joint Conference on Knowledge Discovery, Knowledge Engineering and Knowledge Management (IC3 K 2015), vol. 1, Lisbon, Purtogal (2015)
Google Scholar
Alhumoud, S., Albuhairi, T., Altuwaijri, M.: Arabic sentiment analysis using WEKA a hybrid learning approach. In: Proceedings of the 7th International Joint Conference on Knowledge Discovery, Knowledge Engineering and Knowledge Management (IC3K 2015), vol. 1, Lisbon, Purtogal (2015)
Google Scholar
Aldayel, H.K., Azmi, A.M.: Arabic tweets sentiment analysis–a hybrid scheme. J. Inf. Sci. (2015)
Google Scholar
Duwairi, R., Ahmed, N.A., Al-Rifai, S.Y.: Detecting sentiment embedded in Arabic social media–a lexicon-based approach. J. Intell. Fuzzy Syst. 29(1), 107–117 (2015)
Article Google Scholar
Obaidat, I., Mohawesh, R., Al-Ayyoub, M., AL-Smadi, M., Jararweh, Y.: Enhancing the determination of aspect categories and their polarities in Arabic reviews using lexicon-based approaches. In: 2015 IEEE Jordan Conference on Applied Electrical Engineering and Computing Technologies (AEECT), pp. 1–6 (2015)
Google Scholar
Khasawneh, R.T., Wahsheh, H.A., Alsmadi, I.M., AI-Kabi, M.N.: Arabic sentiment polarity identification using a hybrid approach. In: 2015 6th International Conference on Information and Communication Systems (ICICS), pp. 148–153 (2015)
Google Scholar
Al Sallab, A.A., Baly, R., Badaro, G., Hajj, H., El Hajj, W., Shaban, K.B.: Deep learning models for sentiment analysis in Arabic. In: ANLP Workshop 2015, p. 9 (2015)
Google Scholar
Abdul-Mageed, M., Diab, M., Korayem, M.: Subjectivity and sentiment analysis of modern standard Arabic. In: 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies: Short Papers, vol. 2 (2011)
Google Scholar
Maamouri, M., Bies, A., Buckwalter, T., Mekki, W.: The Penn Arabic treebank: building a large-scale annotated Arabic corpus. In: NEMLAR Conference on Arabic Language Resources and Tools, vol. 27, pp. 466–467 (2004)
Google Scholar
Rushdi-Saleh, M., Martín-Valdivia, M.T., Ureña-López, L.A., Perea-Ortega, J.M.: OCA: opinion corpus for Arabic. J. Am. Soc. Inf. Sci. Technol. 62(10), 2045–2054 (2011)
Article Google Scholar
Ahmed, W.A., El-Halees, A.: Arabic Opinion Mining Using Parallel Decision Trees
Google Scholar
Al-Smadi, M., Qawasmeh, O., Talafha, B., Quwaider, M.: Human annotated Arabic dataset of book reviews for aspect based sentiment analysis. In: 2015 3rd International Conference on Future Internet of Things and Cloud (FiCloud), pp. 726–730 (2015)
Google Scholar
Aly, M.A., Atiya, A.F.: LABR: A large scale Arabic book reviews dataset. In: ACL, vol. 2, pp. 494–498 (2013)
Google Scholar
Khalil, T., Halaby, A., Hammad, M., El-Beltagy, S.R.: Which configuration works best? An experimental study on supervised Arabic twitter sentiment analysis (2015)
Google Scholar
Refaee, E., Rieser, V.: Subjectivity and sentiment analysis of Arabic twitter feeds with limited resources. In: Workshop on Free/Open-Source Arabic Corpora and Corpora Processing Tools Workshop Programme, p. 16 (2014)
Google Scholar
Badaro, G., Baly, R., Akel, R., Fayad, L., Khairallah, J., Hajj, H., El-Hajj, W., Shaban, K.B.: A light lexicon-based mobile application for sentiment mining of Arabic tweets. In: ANLP Workshop 2015, p. 18 (2015)
Google Scholar
Mourad, A., Darwish, K.: Subjectivity and sentiment analysis of modern standard Arabic and Arabic microblogs. In: Proceedings of the 4th Workshop on Computational Approaches to Subjectivity, Sentiment and Social Media Analysis, pp. 55–64 (2013)
Google Scholar
Nabil, M., Aly, M., Atiya, A.F.: ASTD: Arabic sentiment tweets dataset. In: Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, pp. 2515–2519 (2015)
Google Scholar
Ibrahim, H.S., Abdou, S.M., Gheith, M.: MIKA: a tagged corpus for modern standard Arabic and colloquial sentiment analysis. In: 2015 IEEE 2nd International Conference on Recent Trends in Information Systems (ReTIS), vol. 2, pp. 353–358 (2015)
Google Scholar

Download references

Author information

Authors and Affiliations

Computer Science Department, Al-Imam Muhammad ibn Saud Islamic University, Riyadh, Saudi Arabia
Areeb alOwisheq, Sarah alHumoud & Tarfa alBuhairi
Information Technology Department, King Saud University, Riyadh, Saudi Arabia
Nora alTwairesh

Authors

Areeb alOwisheq
View author publications
You can also search for this author in PubMed Google Scholar
Sarah alHumoud
View author publications
You can also search for this author in PubMed Google Scholar
Nora alTwairesh
View author publications
You can also search for this author in PubMed Google Scholar
Tarfa alBuhairi
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Areeb alOwisheq .

Editor information

Editors and Affiliations

Towson University, Towson, Maryland, USA
Gabriele Meiselwitz

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

alOwisheq, A., alHumoud, S., alTwairesh, N., alBuhairi, T. (2016). Arabic Sentiment Analysis Resources: A Survey. In: Meiselwitz, G. (eds) Social Computing and Social Media. SCSM 2016. Lecture Notes in Computer Science(), vol 9742. Springer, Cham. https://doi.org/10.1007/978-3-319-39910-2_25

Download citation

DOI: https://doi.org/10.1007/978-3-319-39910-2_25
Published: 22 June 2016
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-39909-6
Online ISBN: 978-3-319-39910-2
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Arabic Sentiment Analysis Resources: A Survey

Abstract

Similar content being viewed by others

Challenges and Approaches in Arabic Sentiment Analysis: A Review

A Systematic Review of Sentiment Analysis in Arabizi

Building Large Arabic Multi-domain Resources for Sentiment Analysis

Keywords

1 Introduction

2 Survey Methodology

3 Resources

3.1 Sentiment Lexicons

3.2 Corpora and Datasets

Notes

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Navigation

Arabic Sentiment Analysis Resources: A Survey

Abstract

Similar content being viewed by others

Challenges and Approaches in Arabic Sentiment Analysis: A Review

A Systematic Review of Sentiment Analysis in Arabizi

Building Large Arabic Multi-domain Resources for Sentiment Analysis

Keywords

1 Introduction

2 Survey Methodology

3 Resources

3.1 Sentiment Lexicons

3.2 Corpora and Datasets

Notes

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation