skip to main content
research-article

AROMA: A Recursive Deep Learning Model for Opinion Mining in Arabic as a Low Resource Language

Published: 13 July 2017 Publication History

Abstract

While research on English opinion mining has already achieved significant progress and success, work on Arabic opinion mining is still lagging. This is mainly due to the relative recency of research efforts in developing natural language processing (NLP) methods for Arabic, handling its morphological complexity, and the lack of large-scale opinion resources for Arabic. To close this gap, we examine the class of models used for English and that do not require extensive use of NLP or opinion resources. In particular, we consider the Recursive Auto Encoder (RAE). However, RAE models are not as successful in Arabic as they are in English, due to their limitations in handling the morphological complexity of Arabic, providing a more complete and comprehensive input features for the auto encoder, and performing semantic composition following the natural way constituents are combined to express the overall meaning. In this article, we propose A Recursive Deep Learning Model for Opinion Mining in Arabic (AROMA) that addresses these limitations. AROMA was evaluated on three Arabic corpora representing different genres and writing styles. Results show that AROMA achieved significant performance improvements compared to the baseline RAE. It also outperformed several well-known approaches in the literature.

References

[1]
Ahmed Abbasi, Hsinchun Chen, and Arab Salem. 2008. Sentiment analysis in multiple languages: Feature selection for opinion classification in web forums. ACM Trans. Inf. Syst. 26, 3 (2008), 12.
[2]
Ahmed Abbasi, Stephen France, Zhu Zhang, and Hsinchun Chen. 2011. Selecting attributes for sentiment classification using feature relation networks. IEEE Trans. Knowl. Data Eng. 23, 3 (2011), 447--462.
[3]
Muhammad Abdul-Mageed and Mona T. Diab. 2014. SANA: A large scale multi-genre, multi-dialect lexicon for arabic subjectivity and sentiment analysis. In Proceedings of the International Conference on Language Resources and Evaluation (LREC’14). 1162--1169.
[4]
Muhammad Abdul-Mageed, Mona T. Diab, and Mohammed Korayem. 2011. Subjectivity and sentiment analysis of modern standard arabic. In Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies: Short Papers-Volume 2. Association for Computational Linguistics, 587--591.
[5]
Rodrigo Agerri, Xabier Artola, Zuhaitz Beloki, German Rigau, and Aitor Soroa. 2015. Big data for natural language processing: A streaming approach. Knowl.-Based Syst. 79 (2015), 36--42.
[6]
Mohammed N. Al-Kabi, Nawaf A. Abdulla, and Mahmoud Al-Ayyoub. 2013. An analytical study of arabic sentiments: Maktoob case study. In Proceedings of the 2013 8th International Conference for Internet Technology and Secured Transactions (ICITST’13). IEEE, 89--94.
[7]
Ahmad A. Al Sallab, Ramy Baly, Gilbert Badaro, Hazem Hajj, Wassim El Hajj, and Khaled B. Shaban. 2015. Deep learning models for sentiment analysis in arabic. In ANLP Workshop 2015. 9 (July 2015).
[8]
Fahad Alotaiby, Salah Foda, and Ibrahim Alkharashi. 2014. Arabic vs. english: comparative statistical study. Arab. J. Sci. Eng. 39, 2 (2014), 809--820.
[9]
Mohamed A. Aly and Amir F. Atiya. 2013. LABR: A large scale arabic book reviews dataset. In ACL (2). 494--498 (August 2013).
[10]
Gilbert Badaro, Ramy Baly, Rana Akel, Linda Fayad, Jeffrey Khairallah, Hazem Hajj, Wassim El-Hajj, and Khaled Bashir Shaban. 2015. A light lexicon-based mobile application for sentiment mining of arabic tweets. In ANLP Workshop 2015. 18.
[11]
Gilbert Badaro, Ramy Baly, Hazem Hajj, Nizar Habash, and Wassim El-Hajj. 2014. A large scale arabic sentiment lexicon for arabic opinion mining. ANLP 2014, 165.
[12]
Yoshua Bengio. 2012. Practical recommendations for gradient-based training of deep architectures. In Neural Networks: Tricks of the Trade. Springer, 437--478.
[13]
William Black, Sabri Elkateb, Horacio Rodriguez, Musa Alkhalifa, Piek Vossen, Adam Pease, and Christiane Fellbaum. 2006. Introducing the arabic wordnet project. In Proceedings of the 3rd International WordNet Conference. Citeseer, 295--300.
[14]
Erik Cambria and Amir Hussain. 2015. Sentic Computing: A Common-Sense-Based Framework for Concept-Level Sentiment Analysis. Vol. 1. Springer.
[15]
Noam Chomsky. 1959. On certain formal properties of grammars. Inf. Control 2, 2 (1959), 137--167.
[16]
Ronan Collobert and Jason Weston. 2008. A unified architecture for natural language processing: Deep neural networks with multitask learning. In Proceedings of the 25th International Conference on Machine Learning. ACM, 160--167.
[17]
Ahmed El Kholy and Nizar Habash. 2012. Orthographic and morphological processing for english--arabic statistical machine translation. Mach. Transl. 26, 1--2 (2012), 25--45.
[18]
Rasheed M. Elawady, Sherif Barakat, and M. Elrashidy Nora. 2014. Sentiment analyzer for arabic comments. Int. J. Inf. Sci. Intell. Syst. 3, 4 (2014), 73--86.
[19]
Andrea Esuli and Fabrizio Sebastiani. 2006. Sentiwordnet: A publicly available lexical resource for opinion mining. In Proceedings of the International Conference on Language Resources and Evaluation (LREC’06), Vol. 6. Citeseer, 417--422.
[20]
Noura Farra, Kathleen McKeown, and Nizar Habash. 2015. Annotating targets of opinions in arabic using crowdsourcing. In ANLP Workshop 2015. 89.
[21]
Alec Go, Richa Bhayani, and Lei Huang. 2009. Twitter sentiment classification using distant supervision. CS224N Proj. Rep. Stanf. 1 (2009), 12.
[22]
Spence Green and Christopher D Manning. 2010. Better arabic parsing: Baselines, evaluations, and analysis. In Proceedings of the 23rd International Conference on Computational Linguistics. Association for Computational Linguistics, 394--402.
[23]
Nizar Habash and Owen Rambow. 2005. Arabic tokenization, part-of-speech tagging and morphological disambiguation in one fell swoop. In Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics. Association for Computational Linguistics, 573--580.
[24]
Nizar Habash and Fatiha Sadat. 2006. Arabic preprocessing schemes for statistical machine translation. In Proceedings of the Human Language Technology Conference of the NAACL, Companion Volume: Short Papers. Association for Computational Linguistics, 49--52.
[25]
Nizar Y. Habash. 2010. Introduction to arabic natural language processing. Synth. Lect. Hum. Lang. Technol. 3, 1 (2010), 1--187.
[26]
Geoffrey E. Hinton, Simon Osindero, and Yee-Whye Teh. 2006. A fast learning algorithm for deep belief nets. Neural Comput. 18, 7 (2006), 1527--1554.
[27]
Hossam S. Ibrahim, Sherif M. Abdou, and Mervat Gheith. 2015. Sentiment analysis for modern standard arabic and colloquial. arXiv:1505.03105 (2015).
[28]
Aamera Z. H. Khan, Mohammad Atique, and V. M. Thakare. 2015. Combining lexicon-based and learning-based methods for twitter sentiment analysis. International Journal of Electronics, Communication and Soft Computing Science 8 Engineering (IJECSCSE) (2015), 89.
[29]
Efthymios Kouloumpis, Theresa Wilson, and Johanna D. Moore. 2011. Twitter sentiment analysis: The good the bad and the omg! Icwsm 11 (2011), 538--541.
[30]
Bing Liu and Lei Zhang. 2012. A survey of opinion mining and sentiment analysis. In Mining Text Data. Springer, 415--463.
[31]
Mohamed Maamouri, Ann Bies, Tim Buckwalter, and Wigdan Mekki. 2004. The penn arabic treebank: Building a large-scale annotated arabic corpus. In Proceedings of the Network for Euro-Mediterranean Language Resources (NEMLAR) Conference on Arabic Language Resources and Tools, Vol. 27. 466--467.
[32]
Mohamed Maamouri, Ann Bies, Seth Kulick, Fatma Gaddeche, Wigdan Mekki, Sondos Krouna, Basma Bouziri, and Zaghouani Wajdi. 2010a. Arabic treebank: Part 1 v 4.1. LDC Catalog No. LDC2010T13. ISBN (2010).
[33]
Mohamed Maamouri, Dave Graff, Basma Bouziri, Sondos Krouna, and Seth Kulick. 2010b. LDC standard arabic morphological analyzer (SAMA) v. 3.1. LDC Catalog No. LDC2010L01. ISBN (2010), 1--58563.
[34]
T. Mikolov and J. Dean. 2013. Distributed representations of words and phrases and their compositionality. Adv. Neur. Inf. Process. Syst. (2013).
[35]
George A. Miller, Richard Beckwith, Christiane Fellbaum, Derek Gross, and Katherine J. Miller. 1990. Introduction to wordnet: An on-line lexical database. Int. J. of Lexicogr. 3, 4 (1990), 235--244.
[36]
Behrang Mohit, Alla Rozovskaya, Nizar Habash, Wajdi Zaghouani, and Ossama Obeid. 2014. The first QALB shared task on automatic text correction for arabic. In Proceedings of the EMNLP 2014 Workshop on Arabic Natural Language Processing (ANLP’14). 39--47.
[37]
Asmaa Mountassir, Houda Benbrahim, and Ilham Berrada. 2012. A cross-study of sentiment classification on arabic corpora. In Research and Development in Intelligent Systems XXIX. Springer, 259--272.
[38]
Preslav Nakov, Sara Rosenthal, Svetlana Kiritchenko, Saif M. Mohammad, Zornitsa Kozareva, Alan Ritter, Veselin Stoyanov, and Xiaodan Zhu. 2016. Developing a successful semeval task in sentiment analysis of twitter and other social media texts. Lang. Resourc. Eval. 50, 1 (2016), 35--65.
[39]
Nazlia Omar, Mohammed Albared, Adel Qasem Al-Shabi, and Tareq Al-Moslmi. 2013. Ensemble of classification algorithms for subjectivity and sentiment analysis of arabic customers’ reviews. Int. J. Adv. Comput. Technol. 5, 14 (2013), 77.
[40]
Arfath Pasha, Mohamed Al-Badrashiny, Mona T. Diab, Ahmed El Kholy, Ramy Eskander, Nizar Habash, Manoj Pooleery, Owen Rambow, and Ryan Roth. 2014. MADAMIRA: A fast, comprehensive tool for morphological analysis and disambiguation of arabic. In Proceedings of the International Conference on Language Resources and Evaluation (LREC’14), Vol. 14. 1094--1101.
[41]
Kumar Ravi and Vadlamani Ravi. 2015. A survey on opinion mining and sentiment analysis: Tasks, approaches and applications. Knowl.-Based Syst. 89 (2015), 14--46.
[42]
Eshrag Refaee and Verena Rieser. 2014. An arabic twitter corpus for subjectivity and sentiment analysis. In Proceedings of the International Conference on Language Resources and Evaluation (LREC’14). 2268--2273.
[43]
Mohammed Rushdi-Saleh, M. Teresa Martín-Valdivia, L. Alfonso Ureña-López, and José M. Perea-Ortega. 2011. OCA: Opinion corpus for arabic. J. Am. Soc. Inf. Sci. Technol. 62, 10 (2011), 2045--2054.
[44]
Anas Shahrour, Salam Khalifa, and Nizar Habash. 2016. Improving arabic diacritization through syntactic analysis. In LREC.
[45]
Amira Shoukry and Ahmed Rafea. 2012. Sentence-level arabic sentiment analysis. In Proceedings of the 2012 International Conference on Collaboration Technologies and Systems (CTS’12). IEEE, 546--550.
[46]
Richard Socher, Cliff C. Lin, Chris Manning, and Andrew Y. Ng. 2011a. Parsing natural scenes and natural language with recursive neural networks. In Proceedings of the 28th international conference on machine learning (ICML’11). 129--136.
[47]
Richard Socher, Jeffrey Pennington, Eric H. Huang, Andrew Y. Ng, and Christopher D. Manning. 2011b. Semi-supervised recursive autoencoders for predicting sentiment distributions. In Proceedings of the Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, 151--161.
[48]
Richard Socher, Alex Perelygin, Jean Y. Wu, Jason Chuang, Christopher D. Manning, Andrew Y. Ng, and Christopher Potts. 2013. Recursive deep models for semantic compositionality over a sentiment treebank. In Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP’13), Vol. 1631. Citeseer, 1642.
[49]
Kai Sheng Tai, Richard Socher, and Christopher D. Manning. 2015. Improved semantic representations from tree-structured long short-term memory networks. arXiv:1503.00075 (2015).
[50]
Duyu Tang, Bing Qin, and Ting Liu. 2015. Document modeling with gated recurrent neural network for sentiment classification. In Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing. 1422--1432.
[51]
UNESCO. 2014. World Arabic Language Day. Retrieved from http://english.alarabiya.net/articles/2012/12/18/2558 53.html.

Cited By

View all
  • (2025)The Semantic Implications of the Arabic Language: Exploring Meaning Through Intelligent Algorithms in Machine LearningIntelligent Systems, Blockchain, and Communication Technologies10.1007/978-3-031-82377-0_67(838-847)Online publication date: 5-Mar-2025
  • (2024)RFPG: Question-Answering from Low-Resource Language (Arabic) Texts using Factually Aware RAG2024 IEEE 10th International Conference on Collaboration and Internet Computing (CIC)10.1109/CIC62241.2024.00023(107-116)Online publication date: 28-Oct-2024
  • (2024)Towards a robust deep learning framework for Arabic sentiment analysisNatural Language Processing10.1017/nlp.2024.35(1-35)Online publication date: 6-Sep-2024
  • Show More Cited By

Index Terms

  1. AROMA: A Recursive Deep Learning Model for Opinion Mining in Arabic as a Low Resource Language

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Transactions on Asian and Low-Resource Language Information Processing
    ACM Transactions on Asian and Low-Resource Language Information Processing  Volume 16, Issue 4
    December 2017
    146 pages
    ISSN:2375-4699
    EISSN:2375-4702
    DOI:10.1145/3097269
    Issue’s Table of Contents
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 13 July 2017
    Accepted: 01 April 2017
    Revised: 01 February 2017
    Received: 01 May 2016
    Published in TALLIP Volume 16, Issue 4

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. Deep Learning
    2. Opinion mining in Arabic
    3. Recursive Auto Encoder
    4. Recursive Neural Networks

    Qualifiers

    • Research-article
    • Research
    • Refereed

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)20
    • Downloads (Last 6 weeks)5
    Reflects downloads up to 05 Mar 2025

    Other Metrics

    Citations

    Cited By

    View all
    • (2025)The Semantic Implications of the Arabic Language: Exploring Meaning Through Intelligent Algorithms in Machine LearningIntelligent Systems, Blockchain, and Communication Technologies10.1007/978-3-031-82377-0_67(838-847)Online publication date: 5-Mar-2025
    • (2024)RFPG: Question-Answering from Low-Resource Language (Arabic) Texts using Factually Aware RAG2024 IEEE 10th International Conference on Collaboration and Internet Computing (CIC)10.1109/CIC62241.2024.00023(107-116)Online publication date: 28-Oct-2024
    • (2024)Towards a robust deep learning framework for Arabic sentiment analysisNatural Language Processing10.1017/nlp.2024.35(1-35)Online publication date: 6-Sep-2024
    • (2024)Advancements and challenges in Arabic sentiment analysis: A decade of methodologies, applications, and resource developmentHeliyon10.1016/j.heliyon.2024.e3978610:21(e39786)Online publication date: Nov-2024
    • (2024)Performance Insights of Attention-Free Language Models in Sentiment Analysis: A Case Study for E-Commerce Platforms in VietnamInventive Communication and Computational Technologies10.1007/978-981-97-7710-5_3(29-42)Online publication date: 15-Dec-2024
    • (2024)Multi-dimensional Edge-Embedded GCNs for Arabic Text ClassificationLinking Theory and Practice of Digital Libraries10.1007/978-3-031-72437-4_14(241-255)Online publication date: 24-Sep-2024
    • (2023)An Ensemble-Based Hotel Reviews System Using Naive Bayes ClassifierComputer Modeling in Engineering & Sciences10.32604/cmes.2023.026812137:1(131-154)Online publication date: 2023
    • (2023)Arabic Sentiment Analysis with Noisy Deep Explainable ModelProceedings of the 2023 7th International Conference on Natural Language Processing and Information Retrieval10.1145/3639233.3639241(185-189)Online publication date: 15-Dec-2023
    • (2023)Aspect-Based Sentiment Analysis for Arabic Food Delivery ReviewsACM Transactions on Asian and Low-Resource Language Information Processing10.1145/360514622:7(1-18)Online publication date: 20-Jul-2023
    • (2023)Audio-Visual Spatial Integration and Recursive Attention for Robust Sound Source LocalizationProceedings of the 31st ACM International Conference on Multimedia10.1145/3581783.3611722(3507-3516)Online publication date: 26-Oct-2023
    • Show More Cited By

    View Options

    Login options

    Full Access

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Figures

    Tables

    Media

    Share

    Share

    Share this Publication link

    Share on social media