Abstract
Social networks platforms such as Facebook are becoming one of the most powerful sources for information. The produced and shared data are important in volume, in velocity and in variety. Processing these data in the raw state to extract useful information can be a very difficult task and a big challenge. Furthermore, the Arabic language under its modern standard or dialectal shape is one of the languages producing an important quantity of data in social networks and the least analyzed. The characteristics and the specificity of the Arabic language present a big challenge for sentiment analysis, especially if this analysis is performed on Arabic Facebook comments. In this paper, we present a methodology that we have elaborated, for collecting and preprocessing Facebook comments written in Modern Standard Arabic (MSA) or in Moroccan Dialectal Arabic (MDA) for Sentiment Analysis (SA) using supervised classification methods. In this methodology, we have detailed the processing applied to the comments’ text as well as various schemes of features’ construction (words or groups of words) useful for supervised sentiments’ classification. This methodology was tested on comments written in MSA or in MDA collected from Facebook for the sentiment analysis on a political phenomenon. The experiments’ results obtained are promising and this encourages us to continue working on this topic.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Abdulla, N., Ahmed, N., Shehab, M., Al-Ayyoub, M.: Arabic sentiment analysis: lexicon-based and corpus-based. In: 2013 IEEE Jordan Conference on Applied Electrical Engineering and Computing Technologies (AEECT) (2013)
Pang, B., Lee, L.: A sentimental education. In: Proceedings of the 42nd Annual Meeting on Association for Computational Linguistics - ACL 2004 (2004)
Dave, K., Lawrence, S., Pennock, D.: Mining the peanut gallery. In: Proceedings of the Twelfth International Conference on World Wide Web - WWW 2003 (2003)
Pablos, A., Cuadros, M., German, R., Gaines, S.: Unsupervised acquisition of domain aspect terms for aspect based opinion mining. Procesamiento del Lenguaje Nat. 53, 121–128 (2014)
Abdul-Mageed, M., Diab, M., Korayem, M.: Subjectivity and sentiment analysis of modern standard Arabic. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies: Short Papers, vol. 2, pp. 587–591 (2011)
Diab, M., Habash, N., Rambow, O., Altantawy, M., Benajiba, Y.: COLABA: Arabic dialect annotation and processing. In: Lrec Workshop on Semitic Language Processing, pp. 66–74 (2010)
Nabil, M., Aly, M., Atiya, A.: ASTD: Arabic sentiment tweets dataset. In: EMNLP, pp. 2515–2519 (2015)
Shoukry, A., Rafea, A.: Sentence-level Arabic sentiment analysis. In: 2012 International Conference on Collaboration Technologies and Systems (CTS), pp. 546–550 (2012)
Duwairi, R., Marji, R., Sha’ban, N., Rushaidat, S.: Sentiment analysis in Arabic tweets. In: 2014 5th International Conference on Information and Communication Systems (ICICS) (2014)
Abdul-Mageed, M., Diab, M., Kbler, S.: SAMAR: subjectivity and sentiment analysis for Arabic social media. Comput. Speech Lang. 28, 20–37 (2014)
Abdul-Mageed, M., Diab, M.: SANA: a large scale multi-genre, multi-dialect lexicon for Arabic subjectivity and sentiment analysis. In: LREC, pp. 1162–1169 (2014)
Refaee, E., Rieser, V.: An Arabic Twitter corpus for subjectivity and sentiment analysis. In: LREC, pp. 2268–2273 (2014)
West, D., Ford, J., Ibrahim, E.: Strategic Marketing. Oxford University Press, Oxford (2015)
Ahkter, J., Soria, S.: Sentiment analysis: Facebook status messages. Unpublished Master’s thesis, Stanford, CA (2010)
Assiri, A., Emam, A., Aldossari, H.: Arabic sentiment analysis: a survey. Int. J. Adv. Comput. Sci. Appl. 6, 75–85 (2015)
Duwairi, R., Qarqaz, I.: Arabic sentiment analysis using supervised classification. In: 2014 International Conference on Future Internet of Things and Cloud (2014)
Duwairi, R., Al-Refai, M., Khasawneh, N.: Stemming versus light stemming as feature selection techniques for Arabic text categorization. In: 2007 Innovations in Information Technologies (IIT) (2007)
Al-Anzi, F., AbuZeina, D.: Stemming impact on Arabic text categorization performance: a survey. In: 2015 5th International Conference on Information and Communication Technology and Accessibility (ICTA) (2015)
Saif, H., Fernandez, M., He, Y., Alani, H.: Evaluation datasets for twitter sentiment analysis: a survey and a new dataset, the STS-gold (2013)
Larkey, L.S., Ballesteros, L., Connell, M.E.: Light stemming for Arabic information retrieval. In: Soudi, A., Bosch, A., Neumann, G. (eds.) Arabic Computational Morphology. Text, Speech and Language Technology, vol. 38, pp. 221–243. Springer, Dordrecht (2007). doi:10.1007/978-1-4020-6046-5_12
Liu, B.: Sentiment analysis and opinion mining. Synth. Lect. Hum. Lang. Technol. 5, 1–167 (2012)
Pang, B., Lee, L.: Opinion mining and sentiment analysis. Found. Trends Inf. Retr. 2, 1–135 (2008)
Alexa website. http://www.alexa.com/topsites/countries/MA. Accessed 20 Sep 2016
ElecMorocco2016 dataset. https://github.com/sentiprojects/ElecMorocco2016. Accessed 27 Jun 2017
Sebastiani, F.: Machine learning in automated text categorization. ACM Comput. Surv. 34, 1–47 (2002)
Kanavos, A., Nodarakis, N., Sioutas, S., Tsakalidis, A., Tsolis, D., Tzimas, G.: Large scale implementations for twitter sentiment classification. Algorithms 10, 33 (2017)
Nodarakis, N., Pitoura, E., Sioutas, S., Tsakalidis, A., Tsoumakos, D., Tzimas, G.: kdANN+: a rapid AkNN classifier for big data. In: Hameurlain, A., Küng, J., Wagner, R., Decker, H., Lhotska, L., Link, S. (eds.) Transactions on Large-Scale Data- and Knowledge-Centered Systems XXIV. LNCS, vol. 9510, pp. 139–168. Springer, Heidelberg (2016). doi:10.1007/978-3-662-49214-7_5
Nodarakis, N., Sioutas, S., Tsakalidis, A., Tzimas, G.: MR-SAT: a MapReduce algorithm for big data sentiment analysis on Twitter. In: Proceedings of the 12th International Conference on Web Information Systems and Technologies (2016)
Nodarakis, N., Sioutas, S., Tsakalidis, A., Tzimas, G.: Large scale sentiment analysis on twitter with spark. In: EDBT/ICDT Workshops (2016)
Sioutas, S., Mylonas, P., Panaretos, A., Gerolymatos, P., Vogiatzis, D., Karavaras, E., Spitieris, T., Kanavos, A.: Survey of machine learning algorithms on spark over DHT-based structures. In: Sellis, T., Oikonomou, K. (eds.) ALGOCLOUD 2016. LNCS, vol. 10230, pp. 146–156. Springer, Cham (2017). doi:10.1007/978-3-319-57045-7_9
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2017 Springer International Publishing AG
About this paper
Cite this paper
Elouardighi, A., Maghfour, M., Hammia, H. (2017). Collecting and Processing Arabic Facebook Comments for Sentiment Analysis. In: Ouhammou, Y., Ivanovic, M., Abelló, A., Bellatreche, L. (eds) Model and Data Engineering. MEDI 2017. Lecture Notes in Computer Science(), vol 10563. Springer, Cham. https://doi.org/10.1007/978-3-319-66854-3_20
Download citation
DOI: https://doi.org/10.1007/978-3-319-66854-3_20
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-66853-6
Online ISBN: 978-3-319-66854-3
eBook Packages: Computer ScienceComputer Science (R0)