skip to main content
research-article

Autoregressive Feature Extraction with Topic Modeling for Aspect-based Sentiment Analysis of Arabic as a Low-resource Language

Published: 08 February 2024 Publication History

Abstract

This paper proposes an approach for aspect-based sentiment analysis of Arabic social data, especially the considerable text corpus generated through communications on X (formerly known as Twitter) for expressing opinions in Arabic-language tweets during the COVID-19 pandemic. The proposed approach examines the performance of several pre-trained predictive and autoregressive language models; namely, Bidirectional Encoder Representations from Transformers (BERT) and XLNet, along with topic modeling algorithms; namely, Latent Dirichlet Allocation (LDA) and Non-negative Matrix Factorization (NMF), for aspect-based sentiment analysis of online Arabic text. In addition, Bidirectional Long Short Term Memory (Bi-LSTM) deep learning model is used to classify the extracted aspects from online reviews. Obtained experimental results indicate that the combined XLNet-NMF model outperforms other implemented state-of-the-art methods through improving the feature extraction of unstructured social media text with achieving values of 0.946 and 0.938, for average sentiment classification accuracy and F-measure, respectively.

References

[1]
Bing Liu and Lei Zhang. 2012. A survey of opinion mining and sentiment analysis. In Mining Text Data. Springer, 415–463.
[2]
Kamaran H. Manguri, Rebaz N. Ramadhan, and Pshko R. Mohammed Amin. 2020. Twitter sentiment analysis on worldwide COVID-19 outbreaks. Kurdistan Journal of Applied Research (2020), 54–65.
[3]
Andreas Hallberg. 2022. Principles of variation in the use of diacritics (taškıl) in Arabic books. Language Sciences 93 (2022), 101482. DOI:
[4]
Mahmoud Al-Ayyoub, Abed Allah Khamaiseh, Yaser Jararweh, and Mohammed N. Al-Kabi. 2019. A comprehensive survey of Arabic sentiment analysis. Information Processing & Management 56, 2 (2019), 320–342.
[5]
Ruba Obiedat, Duha Al-Darras, Esra Alzaghoul, and Osama Harfoushi. 2021. Arabic aspect-based sentiment analysis: A systematic literature review. IEEE Access 9 (2021), 152628–152645.
[6]
Richard F. Sear, Nicolás Velásquez, Rhys Leahy, Nicholas Johnson Restrepo, Sara El Oud, Nicholas Gabriel, Yonatan Lupu, and Neil F. Johnson. 2020. Quantifying COVID-19 content in the online health opinion war using machine learning. IEEE Access 8 (2020), 91886–91893.
[7]
Jim Samuel, G. G. Md. Nawaz Ali, Md. Mokhlesur Rahman, Ek Esawi, and Yana Samuel. 2020. Covid-19 public sentiment insights and machine learning for tweets classification. Information 11, 6 (2020), 314.
[8]
Muthusami Ra, Bharathi Ab, and Saritha Kc. 2020. COVID-19 outbreak: Tweet based analysis and visualization towards the influence of coronavirus in the world. Gedrag. Organ. Rev 33 (2020), 8–9.
[9]
Md. Shahriare Satu, Md. Imran Khan, Mufti Mahmud, Shahadat Uddin, Matthew A. Summers, Julian M. W. Quinn, and Mohammad Ali Moni. 2021. TClustVID: A novel machine learning classification model to investigate topics and sentiment in COVID-19 tweets. Knowledge-Based Systems 226 (2021), 107126.
[10]
Akansha Gautam, V. Venktesh, and Sarah Masud. 2021. Fake news detection system using XLNet model with topic distributions: CONSTRAINT@AAAI2021 shared task. In Combating Online Hostile Posts in Regional Languages during Emergency Situation, Tanmoy Chakraborty, Kai Shu, H. Russell Bernard, Huan Liu, and Md. Shad Akhtar (Eds.). Springer International Publishing, Cham, 189–200.
[11]
Alvin Wei Ze Chew, Yue Pan, Ying Wang, and Limao Zhang. 2021. Hybrid deep learning of social media big data for predicting the evolution of COVID-19 transmission. Knowledge-Based Systems 233 (2021), 107417.
[12]
Lijimol George and P. Sumathy. 2023. An integrated clustering and BERT framework for improved topic modeling. International Journal of Information Technology (2023), 1–9.
[13]
Md. Rajib Hossain, Mohammed Moshiul Hoque, Nazmul Siddique, and Iqbal H. Sarker. 2023. CovTiNet: Covid text identification network using attention-based positional embedding feature fusion. Neural Computing and Applications 35, 18 (2023), 13503–13527.
[14]
Raghad Alshalan, Hend Al-Khalifa, Duaa Alsaeed, Heyam Al-Baity, and Shahad Alshalan. 2020. Detection of hate speech in Covid-19–related tweets in the Arab region: Deep learning and topic modeling approach. Journal of Medical Internet Research 22, 12 (2020), e22609.
[15]
Ruba Alhejaili, Abdullah Alsaeedi, and Wael M. S. Yafooz. 2022. Detecting hate speech in Arabic tweets during COVID-19 using machine learning approaches. In Proceedings of Third Doctoral Symposium on Computational Intelligence: DoSCI 2022. Springer, 467–475.
[16]
Xiangliang Zhang, Qiang Yang, Hind Alamro, Somayah Albaradei, Adil Salhi, Xiaoting Lyu, Changsheng Ma, Manal Alshehri, Inji Jaber, Faroug Tifratene, Wei Wang, Takashi Gojobori, Carlos Duarte, and Xin Gao. 2022. SenWave: Monitoring the Global Sentiments under the COVID-19 Pandemic. (2022). DOI:
[17]
Abdullah Al-Hashedi, Belal Al-Fuhaidi, Abdulqader M. Mohsen, Yousef Ali, Hasan Ali Gamal Al-Kaf, Wedad Al-Sorori, and Naseebah Maqtary. 2022. Ensemble classifiers for Arabic sentiment analysis of social network (Twitter data) towards COVID-19-related conspiracy theories. Applied Computational Intelligence and Soft Computing 2022 (2022), 1–10.
[18]
Sumayh S. Aljameel, Dina A. Alabbad, Norah A. Alzahrani, Shouq M. Alqarni, Fatimah A. Alamoudi, Lana M. Babili, Somiah K. Aljaafary, and Fatima M. Alshamrani. 2021. A sentiment analysis approach to predict an individual’s awareness of the precautionary procedures to prevent COVID-19 outbreaks in Saudi Arabia. International Journal of Environmental Research and Public Health 18, 1 (2021), 218.
[19]
Mohammed Al-Sarem, Rania Al-Sabbagh, Abdulrahman Al-Salman, and Walid Magdy. 2018. ArSAS: An Arabic speech-act and sentiment corpus of tweets. In Proceedings of the 11th International Conference on Language Resources and Evaluation (LREC 2018).
[20]
Shatha Ali A. Hakami, Robert Hendley, and Phillip Smith. 2021. Arabic emoji sentiment lexicon (Arab-ESL): A comparison between Arabic and European emoji sentiment lexicons. In Proceedings of the Sixth Arabic Natural Language Processing Workshop. Association for Computational Linguistics, Kyiv, Ukraine (Virtual), 60–71. https://aclanthology.org/2021.wanlp-1.7
[21]
Sameh Alansary and Magdi Nagi. 2014. The international corpus of Arabic: Compilation, analysis and evaluation. In Proceedings of the EMNLP 2014 Workshop on Arabic Natural Language Processing (ANLP). 8–17.
[22]
Gilbert Badaro, Hussein Jundi, Hazem Hajj, Wassim El-Hajj, and Nizar Habash. 2018. ArSEL: A large scale arabic sentiment and emotion lexicon. OSACT 3 (2018), 26.
[23]
Asmaa Hashem Sweidan, Nashwa El-Bendary, and Haytham Al-Feel. 2021. Sentence-level aspect-based sentiment analysis for classifying adverse drug reactions (ADRs) using hybrid ontology-XLNet transfer learning. IEEE Access 9 (2021), 90828–90846. DOI:
[24]
Hamed Jelodar, Yongli Wang, Chi Yuan, Xia Feng, Xiahui Jiang, Yanchao Li, and Liang Zhao. 2019. Latent dirichlet a llocation (LDA) and topic modeling: Models, applications, a survey. Multimedia Tools and Applications 78 (2019), 15169–15211.
[25]
Fatima Haouari, Maram Hasanain, Reem Suwaileh, and Tamer Elsayed. 2021. ArCOV-19: The first Arabic COVID-19 Twitter dataset with propagation networks. In Proceedings of the Sixth Arabic Natural Language Processing Workshop, Nizar Habash, Houda Bouamor, Hazem Hajj, Walid Magdy, Wajdi Zaghouani, Fethi Bougares, Nadi Tomeh, Ibrahim Abu Farha, and Samia Touileb (Eds.). Association for Computational Linguistics, Kyiv, Ukraine (Virtual), 82–91. https://aclanthology.org/2021.wanlp-1.9
[26]
Donald E. Knuth. 1981. Seminumerical Algorithms (2nd ed.). The Art of Computer Programming, Vol. 2. Addison-Wesley, Reading, MA.
[27]
Janez Demšar. 2006. Statistical comparisons of classifiers over multiple data sets. J. Mach. Learn. Res. 7 (Dec.2006), 1–30.

Cited By

View all
  • (2025)Explainable Deep Learning for COVID-19 Vaccine Sentiment in Arabic Tweets Using Multi-Self-Attention BiLSTM with XLNetBig Data and Cognitive Computing10.3390/bdcc90200379:2(37)Online publication date: 10-Feb-2025
  • (2025)Performance Comparison of Text Weighting Schemas on NMF-Based Topic AnalysisDokuz Eylül Üniversitesi Mühendislik Fakültesi Fen ve Mühendislik Dergisi10.21205/deufmd.202527790727:79(46-53)Online publication date: 23-Jan-2025
  • (2024)Exploring public-private partnerships in Latin America and the Caribbean using topic modeling and sentiment analysisCaderno Pedagógico10.54033/cadpedv21n9-02321:9(e7428)Online publication date: 4-Sep-2024
  • Show More Cited By

Index Terms

  1. Autoregressive Feature Extraction with Topic Modeling for Aspect-based Sentiment Analysis of Arabic as a Low-resource Language

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Transactions on Asian and Low-Resource Language Information Processing
    ACM Transactions on Asian and Low-Resource Language Information Processing  Volume 23, Issue 2
    February 2024
    340 pages
    EISSN:2375-4702
    DOI:10.1145/3613556
    Issue’s Table of Contents

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 08 February 2024
    Online AM: 27 December 2023
    Accepted: 17 November 2023
    Received: 07 September 2023
    Published in TALLIP Volume 23, Issue 2

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. Sentiment analysis
    2. feature extraction
    3. topic modeling
    4. XLNet
    5. latent dirichlet allocation (LDA)
    6. non-negative matrix factorization (NMF)
    7. Arabic
    8. low-resource language
    9. X (formerly known as Twitter)
    10. COVID-19

    Qualifiers

    • Research-article

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)165
    • Downloads (Last 6 weeks)15
    Reflects downloads up to 02 Mar 2025

    Other Metrics

    Citations

    Cited By

    View all
    • (2025)Explainable Deep Learning for COVID-19 Vaccine Sentiment in Arabic Tweets Using Multi-Self-Attention BiLSTM with XLNetBig Data and Cognitive Computing10.3390/bdcc90200379:2(37)Online publication date: 10-Feb-2025
    • (2025)Performance Comparison of Text Weighting Schemas on NMF-Based Topic AnalysisDokuz Eylül Üniversitesi Mühendislik Fakültesi Fen ve Mühendislik Dergisi10.21205/deufmd.202527790727:79(46-53)Online publication date: 23-Jan-2025
    • (2024)Exploring public-private partnerships in Latin America and the Caribbean using topic modeling and sentiment analysisCaderno Pedagógico10.54033/cadpedv21n9-02321:9(e7428)Online publication date: 4-Sep-2024
    • (2024)Arabic Opinion Classification of Customer Service Conversations Using Data Augmentation and Artificial IntelligenceBig Data and Cognitive Computing10.3390/bdcc81201968:12(196)Online publication date: 19-Dec-2024
    • (2024)Advancements and challenges in Arabic sentiment analysis: A decade of methodologies, applications, and resource developmentHeliyon10.1016/j.heliyon.2024.e3978610:21(e39786)Online publication date: Nov-2024

    View Options

    Login options

    Full Access

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Full Text

    View this article in Full Text.

    Full Text

    Figures

    Tables

    Media

    Share

    Share

    Share this Publication link

    Share on social media