research-article

AROMA: A Recursive Deep Learning Model for Opinion Mining in Arabic as a Low Resource Language

Authors:

Ahmad Al-Sallab,

Khaled Bashir Shaban,

Wassim El-Hajj,

Gilbert BadaroAuthors Info & Claims

ACM Transactions on Asian and Low-Resource Language Information Processing (TALLIP), Volume 16, Issue 4

Article No.: 25, Pages 1 - 20

https://doi.org/10.1145/3086575

Published: 13 July 2017 Publication History

Abstract

While research on English opinion mining has already achieved significant progress and success, work on Arabic opinion mining is still lagging. This is mainly due to the relative recency of research efforts in developing natural language processing (NLP) methods for Arabic, handling its morphological complexity, and the lack of large-scale opinion resources for Arabic. To close this gap, we examine the class of models used for English and that do not require extensive use of NLP or opinion resources. In particular, we consider the Recursive Auto Encoder (RAE). However, RAE models are not as successful in Arabic as they are in English, due to their limitations in handling the morphological complexity of Arabic, providing a more complete and comprehensive input features for the auto encoder, and performing semantic composition following the natural way constituents are combined to express the overall meaning. In this article, we propose A Recursive Deep Learning Model for Opinion Mining in Arabic (AROMA) that addresses these limitations. AROMA was evaluated on three Arabic corpora representing different genres and writing styles. Results show that AROMA achieved significant performance improvements compared to the baseline RAE. It also outperformed several well-known approaches in the literature.

References

[1]

Ahmed Abbasi, Hsinchun Chen, and Arab Salem. 2008. Sentiment analysis in multiple languages: Feature selection for opinion classification in web forums. ACM Trans. Inf. Syst. 26, 3 (2008), 12.

Digital Library

[2]

Ahmed Abbasi, Stephen France, Zhu Zhang, and Hsinchun Chen. 2011. Selecting attributes for sentiment classification using feature relation networks. IEEE Trans. Knowl. Data Eng. 23, 3 (2011), 447--462.

Digital Library

[3]

Muhammad Abdul-Mageed and Mona T. Diab. 2014. SANA: A large scale multi-genre, multi-dialect lexicon for arabic subjectivity and sentiment analysis. In Proceedings of the International Conference on Language Resources and Evaluation (LREC’14). 1162--1169.

[4]

Muhammad Abdul-Mageed, Mona T. Diab, and Mohammed Korayem. 2011. Subjectivity and sentiment analysis of modern standard arabic. In Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies: Short Papers-Volume 2. Association for Computational Linguistics, 587--591.

Digital Library

[5]

Rodrigo Agerri, Xabier Artola, Zuhaitz Beloki, German Rigau, and Aitor Soroa. 2015. Big data for natural language processing: A streaming approach. Knowl.-Based Syst. 79 (2015), 36--42.

Digital Library

[6]

Mohammed N. Al-Kabi, Nawaf A. Abdulla, and Mahmoud Al-Ayyoub. 2013. An analytical study of arabic sentiments: Maktoob case study. In Proceedings of the 2013 8th International Conference for Internet Technology and Secured Transactions (ICITST’13). IEEE, 89--94.

[7]

Ahmad A. Al Sallab, Ramy Baly, Gilbert Badaro, Hazem Hajj, Wassim El Hajj, and Khaled B. Shaban. 2015. Deep learning models for sentiment analysis in arabic. In ANLP Workshop 2015. 9 (July 2015).

[8]

Fahad Alotaiby, Salah Foda, and Ibrahim Alkharashi. 2014. Arabic vs. english: comparative statistical study. Arab. J. Sci. Eng. 39, 2 (2014), 809--820.

[9]

Mohamed A. Aly and Amir F. Atiya. 2013. LABR: A large scale arabic book reviews dataset. In ACL (2). 494--498 (August 2013).

[10]

Gilbert Badaro, Ramy Baly, Rana Akel, Linda Fayad, Jeffrey Khairallah, Hazem Hajj, Wassim El-Hajj, and Khaled Bashir Shaban. 2015. A light lexicon-based mobile application for sentiment mining of arabic tweets. In ANLP Workshop 2015. 18.

[11]

Gilbert Badaro, Ramy Baly, Hazem Hajj, Nizar Habash, and Wassim El-Hajj. 2014. A large scale arabic sentiment lexicon for arabic opinion mining. ANLP 2014, 165.

[12]

Yoshua Bengio. 2012. Practical recommendations for gradient-based training of deep architectures. In Neural Networks: Tricks of the Trade. Springer, 437--478.

Digital Library

[13]

William Black, Sabri Elkateb, Horacio Rodriguez, Musa Alkhalifa, Piek Vossen, Adam Pease, and Christiane Fellbaum. 2006. Introducing the arabic wordnet project. In Proceedings of the 3rd International WordNet Conference. Citeseer, 295--300.

[14]

Erik Cambria and Amir Hussain. 2015. Sentic Computing: A Common-Sense-Based Framework for Concept-Level Sentiment Analysis. Vol. 1. Springer.

Digital Library

[15]

Noam Chomsky. 1959. On certain formal properties of grammars. Inf. Control 2, 2 (1959), 137--167.

[16]

Ronan Collobert and Jason Weston. 2008. A unified architecture for natural language processing: Deep neural networks with multitask learning. In Proceedings of the 25th International Conference on Machine Learning. ACM, 160--167.

Digital Library

[17]

Ahmed El Kholy and Nizar Habash. 2012. Orthographic and morphological processing for english--arabic statistical machine translation. Mach. Transl. 26, 1--2 (2012), 25--45.

Digital Library

[18]

Rasheed M. Elawady, Sherif Barakat, and M. Elrashidy Nora. 2014. Sentiment analyzer for arabic comments. Int. J. Inf. Sci. Intell. Syst. 3, 4 (2014), 73--86.

[19]

Andrea Esuli and Fabrizio Sebastiani. 2006. Sentiwordnet: A publicly available lexical resource for opinion mining. In Proceedings of the International Conference on Language Resources and Evaluation (LREC’06), Vol. 6. Citeseer, 417--422.

[20]

Noura Farra, Kathleen McKeown, and Nizar Habash. 2015. Annotating targets of opinions in arabic using crowdsourcing. In ANLP Workshop 2015. 89.

[21]

Alec Go, Richa Bhayani, and Lei Huang. 2009. Twitter sentiment classification using distant supervision. CS224N Proj. Rep. Stanf. 1 (2009), 12.

[22]

Spence Green and Christopher D Manning. 2010. Better arabic parsing: Baselines, evaluations, and analysis. In Proceedings of the 23rd International Conference on Computational Linguistics. Association for Computational Linguistics, 394--402.

Digital Library

[23]

Nizar Habash and Owen Rambow. 2005. Arabic tokenization, part-of-speech tagging and morphological disambiguation in one fell swoop. In Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics. Association for Computational Linguistics, 573--580.

Digital Library

[24]

Nizar Habash and Fatiha Sadat. 2006. Arabic preprocessing schemes for statistical machine translation. In Proceedings of the Human Language Technology Conference of the NAACL, Companion Volume: Short Papers. Association for Computational Linguistics, 49--52.

Digital Library

[25]

Nizar Y. Habash. 2010. Introduction to arabic natural language processing. Synth. Lect. Hum. Lang. Technol. 3, 1 (2010), 1--187.

[26]

Geoffrey E. Hinton, Simon Osindero, and Yee-Whye Teh. 2006. A fast learning algorithm for deep belief nets. Neural Comput. 18, 7 (2006), 1527--1554.

Digital Library

[27]

Hossam S. Ibrahim, Sherif M. Abdou, and Mervat Gheith. 2015. Sentiment analysis for modern standard arabic and colloquial. arXiv:1505.03105 (2015).

[28]

Aamera Z. H. Khan, Mohammad Atique, and V. M. Thakare. 2015. Combining lexicon-based and learning-based methods for twitter sentiment analysis. International Journal of Electronics, Communication and Soft Computing Science 8 Engineering (IJECSCSE) (2015), 89.

[29]

Efthymios Kouloumpis, Theresa Wilson, and Johanna D. Moore. 2011. Twitter sentiment analysis: The good the bad and the omg&excl; Icwsm 11 (2011), 538--541.

[30]

Bing Liu and Lei Zhang. 2012. A survey of opinion mining and sentiment analysis. In Mining Text Data. Springer, 415--463.

[31]

Mohamed Maamouri, Ann Bies, Tim Buckwalter, and Wigdan Mekki. 2004. The penn arabic treebank: Building a large-scale annotated arabic corpus. In Proceedings of the Network for Euro-Mediterranean Language Resources (NEMLAR) Conference on Arabic Language Resources and Tools, Vol. 27. 466--467.

[32]

Mohamed Maamouri, Ann Bies, Seth Kulick, Fatma Gaddeche, Wigdan Mekki, Sondos Krouna, Basma Bouziri, and Zaghouani Wajdi. 2010a. Arabic treebank: Part 1 v 4.1. LDC Catalog No. LDC2010T13. ISBN (2010).

[33]

Mohamed Maamouri, Dave Graff, Basma Bouziri, Sondos Krouna, and Seth Kulick. 2010b. LDC standard arabic morphological analyzer (SAMA) v. 3.1. LDC Catalog No. LDC2010L01. ISBN (2010), 1--58563.

[34]

T. Mikolov and J. Dean. 2013. Distributed representations of words and phrases and their compositionality. Adv. Neur. Inf. Process. Syst. (2013).

Digital Library

[35]

George A. Miller, Richard Beckwith, Christiane Fellbaum, Derek Gross, and Katherine J. Miller. 1990. Introduction to wordnet: An on-line lexical database. Int. J. of Lexicogr. 3, 4 (1990), 235--244.

[36]

Behrang Mohit, Alla Rozovskaya, Nizar Habash, Wajdi Zaghouani, and Ossama Obeid. 2014. The first QALB shared task on automatic text correction for arabic. In Proceedings of the EMNLP 2014 Workshop on Arabic Natural Language Processing (ANLP’14). 39--47.

[37]

Asmaa Mountassir, Houda Benbrahim, and Ilham Berrada. 2012. A cross-study of sentiment classification on arabic corpora. In Research and Development in Intelligent Systems XXIX. Springer, 259--272.

[38]

Preslav Nakov, Sara Rosenthal, Svetlana Kiritchenko, Saif M. Mohammad, Zornitsa Kozareva, Alan Ritter, Veselin Stoyanov, and Xiaodan Zhu. 2016. Developing a successful semeval task in sentiment analysis of twitter and other social media texts. Lang. Resourc. Eval. 50, 1 (2016), 35--65.

Digital Library

[39]

Nazlia Omar, Mohammed Albared, Adel Qasem Al-Shabi, and Tareq Al-Moslmi. 2013. Ensemble of classification algorithms for subjectivity and sentiment analysis of arabic customers’ reviews. Int. J. Adv. Comput. Technol. 5, 14 (2013), 77.

[40]

Arfath Pasha, Mohamed Al-Badrashiny, Mona T. Diab, Ahmed El Kholy, Ramy Eskander, Nizar Habash, Manoj Pooleery, Owen Rambow, and Ryan Roth. 2014. MADAMIRA: A fast, comprehensive tool for morphological analysis and disambiguation of arabic. In Proceedings of the International Conference on Language Resources and Evaluation (LREC’14), Vol. 14. 1094--1101.

[41]

Kumar Ravi and Vadlamani Ravi. 2015. A survey on opinion mining and sentiment analysis: Tasks, approaches and applications. Knowl.-Based Syst. 89 (2015), 14--46.

Digital Library

[42]

Eshrag Refaee and Verena Rieser. 2014. An arabic twitter corpus for subjectivity and sentiment analysis. In Proceedings of the International Conference on Language Resources and Evaluation (LREC’14). 2268--2273.

[43]

Mohammed Rushdi-Saleh, M. Teresa Martín-Valdivia, L. Alfonso Ureña-López, and José M. Perea-Ortega. 2011. OCA: Opinion corpus for arabic. J. Am. Soc. Inf. Sci. Technol. 62, 10 (2011), 2045--2054.

Digital Library

[44]

Anas Shahrour, Salam Khalifa, and Nizar Habash. 2016. Improving arabic diacritization through syntactic analysis. In LREC.

[45]

Amira Shoukry and Ahmed Rafea. 2012. Sentence-level arabic sentiment analysis. In Proceedings of the 2012 International Conference on Collaboration Technologies and Systems (CTS’12). IEEE, 546--550.

[46]

Richard Socher, Cliff C. Lin, Chris Manning, and Andrew Y. Ng. 2011a. Parsing natural scenes and natural language with recursive neural networks. In Proceedings of the 28th international conference on machine learning (ICML’11). 129--136.

Digital Library

[47]

Richard Socher, Jeffrey Pennington, Eric H. Huang, Andrew Y. Ng, and Christopher D. Manning. 2011b. Semi-supervised recursive autoencoders for predicting sentiment distributions. In Proceedings of the Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, 151--161.

Digital Library

[48]

Richard Socher, Alex Perelygin, Jean Y. Wu, Jason Chuang, Christopher D. Manning, Andrew Y. Ng, and Christopher Potts. 2013. Recursive deep models for semantic compositionality over a sentiment treebank. In Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP’13), Vol. 1631. Citeseer, 1642.

[49]

Kai Sheng Tai, Richard Socher, and Christopher D. Manning. 2015. Improved semantic representations from tree-structured long short-term memory networks. arXiv:1503.00075 (2015).

[50]

Duyu Tang, Bing Qin, and Ting Liu. 2015. Document modeling with gated recurrent neural network for sentiment classification. In Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing. 1422--1432.

[51]

UNESCO. 2014. World Arabic Language Day. Retrieved from http://english.alarabiya.net/articles/2012/12/18/2558 53.html.

Cited By

Jabbar M(2025)The Semantic Implications of the Arabic Language: Exploring Meaning Through Intelligent Algorithms in Machine LearningIntelligent Systems, Blockchain, and Communication Technologies10.1007/978-3-031-82377-0_67(838-847)Online publication date: 5-Mar-2025
https://doi.org/10.1007/978-3-031-82377-0_67
Alshammary MUddin MKhan L(2024)RFPG: Question-Answering from Low-Resource Language (Arabic) Texts using Factually Aware RAG2024 IEEE 10th International Conference on Collaboration and Internet Computing (CIC)10.1109/CIC62241.2024.00023(107-116)Online publication date: 28-Oct-2024
https://doi.org/10.1109/CIC62241.2024.00023
Radman ADuwairi R(2024)Towards a robust deep learning framework for Arabic sentiment analysisNatural Language Processing10.1017/nlp.2024.35(1-35)Online publication date: 6-Sep-2024
https://doi.org/10.1017/nlp.2024.35
Show More Cited By

Index Terms

AROMA: A Recursive Deep Learning Model for Opinion Mining in Arabic as a Low Resource Language
1. Information systems
  1. Information retrieval
    1. Retrieval tasks and goals
      1. Sentiment analysis

Recommendations

A Sentiment Treebank and Morphologically Enriched Recursive Deep Models for Effective Sentiment Analysis in Arabic

Accurate sentiment analysis models encode the sentiment of words and their combinations to predict the overall sentiment of a sentence. This task becomes challenging when applied to morphologically rich languages (MRL). In this article, we evaluate the ...
Artificial Neural Networks for Document Analysis and Recognition

Artificial neural networks have been extensively applied to document analysis and recognition. Most efforts have been devoted to the recognition of isolated handwritten and printed characters with widely recognized successful results. However, many ...
Word representation using a deep neural network
CASCON '16: Proceedings of the 26th Annual International Conference on Computer Science and Software Engineering

A growth in the number of applications that make use of cognitive computing has increased the need for algorithms that can parse and understand natural language. Most modern systems rely on machine learning algorithms that take a large corpus of text as ...

Comments

Information & Contributors

Information

Published In

cover image ACM Transactions on Asian and Low-Resource Language Information Processing

ACM Transactions on Asian and Low-Resource Language Information Processing Volume 16, Issue 4

December 2017

146 pages

ISSN:2375-4699

EISSN:2375-4702

DOI:10.1145/3097269

Editor:
Nianwen Xue
Brandeis University, Waltham, USA

Issue’s Table of Contents

Copyright © 2017 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 13 July 2017

Accepted: 01 April 2017

Revised: 01 February 2017

Received: 01 May 2016

Published in TALLIP Volume 16, Issue 4

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article
Research
Refereed

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

53
Total Citations
View Citations
463
Total Downloads

Downloads (Last 12 months)20
Downloads (Last 6 weeks)5

Reflects downloads up to 05 Mar 2025

Other Metrics

View Author Metrics

Citations

Cited By

Jabbar M(2025)The Semantic Implications of the Arabic Language: Exploring Meaning Through Intelligent Algorithms in Machine LearningIntelligent Systems, Blockchain, and Communication Technologies10.1007/978-3-031-82377-0_67(838-847)Online publication date: 5-Mar-2025
https://doi.org/10.1007/978-3-031-82377-0_67
Alshammary MUddin MKhan L(2024)RFPG: Question-Answering from Low-Resource Language (Arabic) Texts using Factually Aware RAG2024 IEEE 10th International Conference on Collaboration and Internet Computing (CIC)10.1109/CIC62241.2024.00023(107-116)Online publication date: 28-Oct-2024
https://doi.org/10.1109/CIC62241.2024.00023
Radman ADuwairi R(2024)Towards a robust deep learning framework for Arabic sentiment analysisNatural Language Processing10.1017/nlp.2024.35(1-35)Online publication date: 6-Sep-2024
https://doi.org/10.1017/nlp.2024.35
Aladeemy AAlzahrani AAlgarni MAlsubari SAldhyani TDeshmukh SKhalaf OWong WAqburi S(2024)Advancements and challenges in Arabic sentiment analysis: A decade of methodologies, applications, and resource developmentHeliyon10.1016/j.heliyon.2024.e3978610:21(e39786)Online publication date: Nov-2024
https://doi.org/10.1016/j.heliyon.2024.e39786
Viet NQuang NKing NThanh D(2024)Performance Insights of Attention-Free Language Models in Sentiment Analysis: A Case Study for E-Commerce Platforms in VietnamInventive Communication and Computational Technologies10.1007/978-981-97-7710-5_3(29-42)Online publication date: 15-Dec-2024
https://doi.org/10.1007/978-981-97-7710-5_3
Karajeh OAl-Kabi MFox E(2024)Multi-dimensional Edge-Embedded GCNs for Arabic Text ClassificationLinking Theory and Practice of Digital Libraries10.1007/978-3-031-72437-4_14(241-255)Online publication date: 24-Sep-2024
https://dl.acm.org/doi/10.1007/978-3-031-72437-4_14
Bamidele Awotunde JMisra SKatta VCharles Adebayo O(2023)An Ensemble-Based Hotel Reviews System Using Naive Bayes ClassifierComputer Modeling in Engineering & Sciences10.32604/cmes.2023.026812137:1(131-154)Online publication date: 2023
https://doi.org/10.32604/cmes.2023.026812
Atabuzzaman MShajalal MBaby MBoden A(2023)Arabic Sentiment Analysis with Noisy Deep Explainable ModelProceedings of the 2023 7th International Conference on Natural Language Processing and Information Retrieval10.1145/3639233.3639241(185-189)Online publication date: 15-Dec-2023
https://dl.acm.org/doi/10.1145/3639233.3639241
Al-Jarrah IMustafa ANajadat H(2023)Aspect-Based Sentiment Analysis for Arabic Food Delivery ReviewsACM Transactions on Asian and Low-Resource Language Information Processing10.1145/360514622:7(1-18)Online publication date: 20-Jul-2023
https://dl.acm.org/doi/10.1145/3605146
Um SKim DKim JEl Saddik AMei TCucchiara RBertini MTobon Vallejo DAtrey PHossain M(2023)Audio-Visual Spatial Integration and Recursive Attention for Robust Sound Source LocalizationProceedings of the 31st ACM International Conference on Multimedia10.1145/3581783.3611722(3507-3516)Online publication date: 26-Oct-2023
https://dl.acm.org/doi/10.1145/3581783.3611722
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Issue’s Table of Contents