skip to main content
research-article

A Sentiment Treebank and Morphologically Enriched Recursive Deep Models for Effective Sentiment Analysis in Arabic

Published: 13 July 2017 Publication History

Abstract

Accurate sentiment analysis models encode the sentiment of words and their combinations to predict the overall sentiment of a sentence. This task becomes challenging when applied to morphologically rich languages (MRL). In this article, we evaluate the use of deep learning advances, namely the Recursive Neural Tensor Networks (RNTN), for sentiment analysis in Arabic as a case study of MRLs. While Arabic may not be considered the only representative of all MRLs, the challenges faced and proposed solutions in Arabic are common to many other MRLs. We identify, illustrate, and address MRL-related challenges and show how RNTN is affected by the morphological richness and orthographic ambiguity of the Arabic language. To address the challenges with sentiment extraction from text in MRL, we propose to explore different orthographic features as well as different morphological features at multiple levels of abstraction ranging from raw words to roots. A key requirement for RNTN is the availability of a sentiment treebank; a collection of syntactic parse trees annotated for sentiment at all levels of constituency and that currently only exists in English. Therefore, our contribution also includes the creation of the first Arabic Sentiment Treebank (ArSenTB) that is morphologically and orthographically enriched. Experimental results show that, compared to the basic RNTN proposed for English, our solution achieves significant improvements up to 8% absolute at the phrase level and 10.8% absolute at the sentence level, measured by average F1 score. It also outperforms well-known classifiers including Support Vector Machines, Recursive Auto Encoders, and Long Short-Term Memory by 7.6%, 3.2%, and 1.6% absolute respectively, all models being trained with similar morphological considerations.

References

[1]
2015. Internet World Stats: Internet World Users by Language. Retrieved from http://www.internetworldstats.com/.
[2]
Ahmed Abbasi, Hsinchun Chen, and Arab Salem. 2008. Sentiment analysis in multiple languages: Feature selection for opinion classification in web forums. ACM Trans. Inf. Syst. 26, 3 (2008), 12.
[3]
Muhammad Abdul-Mageed, Mona Diab, and Sandra Kübler. 2014. SAMAR: Subjectivity and sentiment analysis for arabic social media. Comput. Speech Lang. 28, 1 (2014), 20--37.
[4]
Muhammad Abdul-Mageed and Mona T. Diab. 2012. AWATIF: A multi-genre corpus for modern standard arabic subjectivity and sentiment analysis. In Proceedings of the International Conference on Language Resources and Evaluation (LREC’12). Citeseer, 3907--3914.
[5]
Muhammad Abdul-Mageed and Mona T. Diab. 2014. SANA: A large scale multi-genre, multi-dialect lexicon for arabic subjectivity and sentiment analysis. In Proceedings of the International Conference on Language Resources and Evaluation (LREC’14). 1162--1169.
[6]
Muhammad Abdul-Mageed, Mona T. Diab, and Mohammed Korayem. 2011. Subjectivity and sentiment analysis of modern standard arabic. In Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies: Short Papers, Volume 2. Association for Computational Linguistics, 587--591.
[7]
Mohammed N. Al-Kabi, Nawaf A. Abdulla, and Mahmoud Al-Ayyoub. 2013. An analytical study of arabic sentiments: Maktoob case study. In Proceedings of the 2013 8th International Conference for Internet Technology and Secured Transactions (ICITST’13). IEEE, 89--94.
[8]
Ahmad A. Al Sallab, Ramy Baly, Gilbert Badaro, Hazem Hajj, Wassim El Hajj, and Khaled B. Shaban. 2015. Deep learning models for sentiment analysis in arabic. In Proceedings of the EMNLP 2014 Workshop on Arabic Natural Language Processing (ANLP’15). 9.
[9]
Ahmad A. Al Sallab, Ramy Baly, Gilbert Badaro, Hazem Hajj, Wassim El Hajj, and Khaled B. Shaban. forthcoming 2017. AROMA: A recursive deep learning model for opinion mining in arabic as a low resource language. (unpublished).
[10]
Mohamed Altantawy, Nizar Habash, Owen Rambow, and Ibrahim Saleh. 2010. Morphological analysis and generation of arabic nouns: A morphemic functional approach. In Proceedings of the International Conference on Language Resources and Evaluation (LREC’10).
[11]
Mohamed A. Aly and Amir F. Atiya. 2013. LABR: A large scale arabic book reviews dataset. In Proceedings of the 51th Annual Meeting of the Association for Computational Linguistics. 494--498.
[12]
Gilbert Badaro, Ramy Baly, Rana Akel, Linda Fayad, Jeffrey Khairallah, Hazem Hajj, Wassim El-Hajj, and Khaled Bashir Shaban. 2015. A light lexicon-based mobile application for sentiment mining of arabic tweets. In Proceedings of the EMNLP 2014 Workshop on Arabic Natural Language Processing (ANLP’15) 2015. 18.
[13]
Gilbert Badaro, Ramy Baly, Hazem Hajj, Nizar Habash, and Wassim El-Hajj. 2014. A large scale arabic sentiment lexicon for arabic opinion mining. In Proceedings of the EMNLP 2014 Workshop on Arabic Natural Language Processing (ANLP14). 165.
[14]
Georgios Balikas and Massih-Reza Amini. 2016. TwiSE at semeval-2016 task 4: Twitter sentiment classification. arXiv:1606.04351 (2016).
[15]
Ramy Baly, Gilbert Badaro, Georges El-Khoury, Rawan Moukalled, Rita Aoun, Hazem Hajj, Wassim El-Hajj, Nizar Habash, and Khaled Bashir Shaban. 2017. A characterization study of arabic twitter data with a benchmarking for state-of-the-art opinion mining models. In Proceedings of the 3rd Arabic Natural Language Processing Workshop (WANLP’17) (Co-located with EACL 2017). 110.
[16]
Ramy Baly, Roula Hobeica, Hazem Hajj, Wassim El-Hajj, Khaled Shaban, and Ahmad El-Sallab. 2016. A meta-framework for modeling the human reading process in sentiment analysis. ACM Trans. Inf. Syst. 35, 1 (2016), 7.
[17]
J. Cohen. 1960. A coefficient of agreement for nominal scales. Educational and Psychosocial Measurement, 20 (1960), 37--46.
[18]
Jacob Cohen. 1968. Weighted kappa: Nominal scale agreement provision for scaled disagreement or partial credit. Psychol. Bull. 70, 4 (1968), 213.
[19]
Ronan Collobert and Jason Weston. 2008. A unified architecture for natural language processing: Deep neural networks with multitask learning. In Proceedings of the 25th International Conference on Machine Learning. ACM, 160--167.
[20]
Jan Deriu, Maurice Gonzenbach, Fatih Uzdilli, Aurelien Lucchi, Valeria De Luca, and Martin Jaggi. 2016. SwissCheese at semeval-2016 task 4: Sentiment classification using an ensemble of convolutional neural networks with distant supervision. In Proceedings of the International Workshop on Semantic Evaluation (SemEval’16). 1124--1128.
[21]
John Duchi, Elad Hazan, and Yoram Singer. 2011. Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, (Jul. 2011), 2121--2159.
[22]
Ahmed El Kholy and Nizar Habash. 2012. Orthographic and morphological processing for english--arabic statistical machine translation. Mach. Transl. 26, 1--2 (2012), 25--45.
[23]
Rasheed M. Elawady, Sherif Barakat, and M. Elrashidy Nora. 2014. Sentiment analyzer for arabic comments. Int. J. Inf. Sci. Intell. Syst. 3, 4 (2014), 73--86.
[24]
Noura Farra, Elie Challita, Rawad Abou Assi, and Hazem Hajj. 2010. Sentence-level and document-level sentiment mining for arabic texts. In 2010 IEEE International Conference on Data Mining Workshops (ICDMW’10). IEEE, 1114--1119.
[25]
Noura Farra, Kathleen McKeown, and Nizar Habash. 2015. Annotating targets of opinions in arabic using crowdsourcing. In Proceedings of the EMNLP 2014 Workshop on Arabic Natural Language Processing (ANLP’15). 89.
[26]
Joseph L. Fleiss, Bruce Levin, and Myunghee Cho Paik. 2013. Statistical Methods for Rates and Proportions. John Wiley 8 Sons, New York, NY.
[27]
Spence Green and Christopher D. Manning. 2010. Better arabic parsing: Baselines, evaluations, and analysis. In Proceedings of the 23rd International Conference on Computational Linguistics. Association for Computational Linguistics, 394--402.
[28]
Nizar Habash and Owen Rambow. 2006. MAGEAD: A morphological analyzer and generator for the arabic dialects. In Proceedings of the 21st International Conference on Computational Linguistics and the 44th Annual Meeting of the Association for Computational Linguistics. Association for Computational Linguistics, 681--688.
[29]
Nizar Habash and Fatiha Sadat. 2006. Arabic preprocessing schemes for statistical machine translation. (2006).
[30]
Nizar Y. Habash. 2010. Introduction to arabic natural language processing. Synth. Lect. Hum. Lang. Technol. 3, 1 (2010), 1--187.
[31]
Ozan Irsoy and Claire Cardie. 2014. Deep recursive neural networks for compositionality in language. In Adv. Neur. Inf. Process. Syst. 2096--2104.
[32]
Nal Kalchbrenner, Edward Grefenstette, and Phil Blunsom. 2014. A convolutional neural network for modelling sentences. In Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics.
[33]
Svetlana Kiritchenko, Xiaodan Zhu, and Saif M. Mohammad. 2014. Sentiment analysis of short informal texts. J. Artif. Intell. Res. 50 (2014), 723--762.
[34]
Thang Luong, Richard Socher, and Christopher D. Manning. 2013. Better word representations with recursive neural networks for morphology. In Proceedings of the SIGNLL Conference on Computational Natural Language Learning (CoNLL’13). 104--113.
[35]
Mohamed Maamouri, Dave Graff, Basma Bouziri, Sondos Krouna, Ann Bies, and Seth Kulick. 2010. Standard arabic morphological analyzer (SAMA) version 3.1. Linguistic Data Consortium, Catalog No.: LDC2010L01 (2010).
[36]
Christopher D. Manning, Mihai Surdeanu, John Bauer, Jenny Finkel, Steven J. Bethard, and David McClosky. 2014. The stanford corenlp natural language processing toolkit. In Association for Computational Linguistics (ACL) System Demonstrations. 55--60.
[37]
Tomas Mikolov, Ilya Sutskever, Kai Chen, Greg S. Corrado, and Jeff Dean. 2013. Distributed representations of words and phrases and their compositionality. In Advances in Neural Information Processing Systems. 3111--3119.
[38]
Jeff Mitchell and Mirella Lapata. 2010. Composition in distributional models of semantics. Cogn. Sci. 34, 8 (2010), 1388--1429.
[39]
Saif M. Mohammad. 2016. A practical guide to sentiment annotation: Challenges and solutions. In Proceedings of the Workshop on Computational Approaches to Subjectivity, Sentiment and Social Media Analysis.
[40]
Behrang Mohit, Alla Rozovskaya, Nizar Habash, Wajdi Zaghouani, and Ossama Obeid. 2014. The first QALB shared task on automatic text correction for arabic. In Proceedings of the EMNLP 2014 Workshop on Arabic Natural Language Processing (ANLP’14). 39--47.
[41]
Asmaa Mountassir, Houda Benbrahim, and Ilham Berrada. 2012. A cross-study of sentiment classification on arabic corpora. In Research and Development in Intelligent Systems XXIX. Springer, 259--272.
[42]
Ahmed Mourad and Kareem Darwish. 2013. Subjectivity and sentiment analysis of modern standard arabic and arabic microblogs. In Proceedings of the 4th Workshop on Computational Approaches to Subjectivity, Sentiment and Social Media Analysis. 55--64.
[43]
Nazlia Omar, Mohammed Albared, Adel Qasem Al-Shabi, and Tareq Al-Moslmi. 2013. Ensemble of classification algorithms for subjectivity and sentiment analysis of arabic customers’ reviews. Int. J. Adv. Comput. Technol. 5, 14 (2013), 77.
[44]
Bo Pang and Lillian Lee. 2004. A sentimental education: Sentiment analysis using subjectivity summarization based on minimum cuts. In Proceedings of the 42nd Annual Meeting on Association for Computational Linguistics. Association for Computational Linguistics, 271.
[45]
Bo Pang, Lillian Lee, and Shivakumar Vaithyanathan. 2002. Thumbs up?: Sentiment classification using machine learning techniques. In Proceedings of the ACL-02 Conference on Empirical Methods in Natural Language Processing, Volume 10. Association for Computational Linguistics, 79--86.
[46]
Arfath Pasha, Mohamed Al-Badrashiny, Mona T. Diab, Ahmed El Kholy, Ramy Eskander, Nizar Habash, Manoj Pooleery, Owen Rambow, and Ryan Roth. 2014. MADAMIRA: A fast, comprehensive tool for morphological analysis and disambiguation of arabic. In Proceedings of the International Conference on Language Resources and Evaluation (LREC’14). 1094--1101.
[47]
Jeffrey Pennington, Richard Socher, and Christopher D. Manning. 2014. Glove: Global vectors for word representation. In Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP’14), Vol. 14. 1532--43.
[48]
Eshrag Refaee and Verena Rieser. 2014. Subjectivity and sentiment analysis of arabic twitter feeds with limited resources. In Proceedings of the Workshop on Free/Open-Source Arabic Corpora and Corpora Processing Tools. 16.
[49]
Eshrag Refaee and Verena Rieser. 2015. Benchmarking machine translated sentiment analysis for arabic tweets. In Proceedings of the NAACL-HLT 2015 Student Research Workshop (SRW). 71.
[50]
Mohammed Rushdi-Saleh, M. Teresa Martín-Valdivia, L. Alfonso Ureña-López, and José M. Perea-Ortega. 2011. OCA: Opinion corpus for arabic. J. Am. Soc. Inf. Sci. Technol. 62, 10 (2011), 2045--2054.
[51]
Mohammad Salameh, Saif M. Mohammad, and Svetlana Kiritchenko. 2015. Sentiment after translation: A case-study on arabic social media posts. In Proceedings of the 2015 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 767--777.
[52]
Anas Shahrour, Salam Khalifa, and Nizar Habash. 2016. Improving arabic diacritization through syntactic analysis. In Proceedings of the International Conference on Language Resources and Evaluation (LREC’16).
[53]
Amira Shoukry and Ahmed Rafea. 2012. Sentence-level arabic sentiment analysis. In Proceedings of the 2012 International Conference on Collaboration Technologies and Systems (CTS’12). IEEE, 546--550.
[54]
Richard Socher, Jeffrey Pennington, Eric H. Huang, Andrew Y. Ng, and Christopher D. Manning. 2011. Semi-supervised recursive autoencoders for predicting sentiment distributions. In Proceedings of the Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, 151--161.
[55]
Richard Socher, Alex Perelygin, Jean Y. Wu, Jason Chuang, Christopher D. Manning, Andrew Y. Ng, and Christopher Potts. 2013. Recursive deep models for semantic compositionality over a sentiment treebank. In Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP’13), Vol. 1631. Citeseer, 1642.
[56]
Kai Sheng Tai, Richard Socher, and Christopher D. Manning. 2015. Improved semantic representations from tree-structured long short-term memory networks. arXiv:1503.00075 (2015).
[57]
Duyu Tang, Bing Qin, and Ting Liu. 2015. Document modeling with gated recurrent neural network for sentiment classification. In Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing. 1422--1432.
[58]
D. S. Tarasov. 2015. Deep recurrent neural networks for multiple language aspect-based sentiment analysis of user reviews. In Proceedings of International Conference of Computational Linguistics and Intellectual Technologies Dialog-2015, Vol. 2. 53--64.
[59]
Peter D. Turney. 2002. Thumbs up or thumbs down?: Semantic orientation applied to unsupervised classification of reviews. In Proceedings of the 40th Annual Meeting on Association for Computational Linguistics. Association for Computational Linguistics, 417--424.
[60]
UNESCO. 2014. World Arabic Language Day. Retrieved from http://english.alarabiya.net/articles/2012/12/18/2558 53.html.
[61]
Xiang Zhang, Junbo Zhao, and Yann LeCun. 2015. Character-level convolutional networks for text classification. In Advances in Neural Information Processing Systems. 649--657.

Cited By

View all
  • (2024)Online Reviews-Driven Kano-QFD Method for Service DesignIEEE Transactions on Engineering Management10.1109/TEM.2024.338757971(8153-8165)Online publication date: 2024
  • (2024)Digital Emotions using Sentiment Analysis for Predictive Insights on Customer Recommendations2024 IEEE 5th India Council International Subsections Conference (INDISCON)10.1109/INDISCON62179.2024.10744376(1-6)Online publication date: 22-Aug-2024
  • (2024)Towards a robust deep learning framework for Arabic sentiment analysisNatural Language Processing10.1017/nlp.2024.35(1-35)Online publication date: 6-Sep-2024
  • Show More Cited By

Index Terms

  1. A Sentiment Treebank and Morphologically Enriched Recursive Deep Models for Effective Sentiment Analysis in Arabic

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Transactions on Asian and Low-Resource Language Information Processing
    ACM Transactions on Asian and Low-Resource Language Information Processing  Volume 16, Issue 4
    December 2017
    146 pages
    ISSN:2375-4699
    EISSN:2375-4702
    DOI:10.1145/3097269
    Issue’s Table of Contents
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 13 July 2017
    Accepted: 01 April 2017
    Revised: 01 April 2017
    Received: 01 January 2017
    Published in TALLIP Volume 16, Issue 4

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. Arabic morphology
    2. Sentiment analysis
    3. deep learning
    4. sentiment treebank

    Qualifiers

    • Research-article
    • Research
    • Refereed

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)21
    • Downloads (Last 6 weeks)1
    Reflects downloads up to 16 Feb 2025

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)Online Reviews-Driven Kano-QFD Method for Service DesignIEEE Transactions on Engineering Management10.1109/TEM.2024.338757971(8153-8165)Online publication date: 2024
    • (2024)Digital Emotions using Sentiment Analysis for Predictive Insights on Customer Recommendations2024 IEEE 5th India Council International Subsections Conference (INDISCON)10.1109/INDISCON62179.2024.10744376(1-6)Online publication date: 22-Aug-2024
    • (2024)Towards a robust deep learning framework for Arabic sentiment analysisNatural Language Processing10.1017/nlp.2024.35(1-35)Online publication date: 6-Sep-2024
    • (2024)A combined AraBERT and Voting Ensemble classifier model for Arabic sentiment analysisNatural Language Processing Journal10.1016/j.nlp.2024.1001008(100100)Online publication date: Sep-2024
    • (2023)Arabic Sentiment Analysis Based on Word Embeddings and Deep LearningComputers10.3390/computers1206012612:6(126)Online publication date: 19-Jun-2023
    • (2023)EMPOLITICON: NLP and ML Based Approach for Context and Emotion Classification of Political Speeches From TranscriptsIEEE Access10.1109/ACCESS.2023.328216211(54808-54821)Online publication date: 2023
    • (2023)Importance-performance analysis to develop product/service improvement strategies through online reviews with reliabilityAnnals of Operations Research10.1007/s10479-023-05594-x342:3(1905-1924)Online publication date: 15-Sep-2023
    • (2023)Sh-DistilBERT: New Transfer Learning Model for Arabic Sentiment Analysis and Aspect Category DetectionAdvances in Computational Collective Intelligence10.1007/978-3-031-41774-0_22(272-283)Online publication date: 22-Sep-2023
    • (2022)Multitasking Learning Model Based on Hierarchical Attention Network for Arabic Sentiment Analysis ClassificationElectronics10.3390/electronics1108119311:8(1193)Online publication date: 9-Apr-2022
    • (2022)A New Ontology-Based Method for Arabic Sentiment AnalysisBig Data and Cognitive Computing10.3390/bdcc60200486:2(48)Online publication date: 29-Apr-2022
    • Show More Cited By

    View Options

    Login options

    Full Access

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Figures

    Tables

    Media

    Share

    Share

    Share this Publication link

    Share on social media