skip to main content
10.1145/3555776.3577746acmconferencesArticle/Chapter ViewAbstractPublication PagessacConference Proceedingsconference-collections
research-article

Arabic Aspect Category Detection for Hotel Reviews based on Data Augmentation and Classifier Chains

Published:07 June 2023Publication History

ABSTRACT

Recently, the amount of content generated on online hospitality platforms has increased exponentially and has changed people's ways of life. Consumers often refer to online reviews before deciding which hotel to choose. These reviews provide firsthand information, essential to improving hotel services' quality. However, the massive amount of review data and its unstructured nature make it a difficult challenge. Indeed, many researchers were interested in exploring the field of sentiment analysis in the hotel industry. In particular, they have given more attention to aspect-based sentiment analysis, which categorizes opinions by aspect and identifies the sentiment related to each aspect. However, studies examining the Arabic language are limited compared to English. Our paper aims to explore aspect category detection as a sub-task of aspect-based sentiment analysis using Arabic reviews. We relied on the SemEval-2016 Arabic dataset for hotel reviews. As this data suffers from an imbalanced distribution, we propose an approach for multi-label data augmentation of the minority classes in this used dataset. Then, we propose a specific preprocessing for this Arabic reviews dataset. Our aspect category prediction approach is based on the classifier chains technique. In fact, unlike previous works that treat each label separately, we handle the dependencies between the various labels. Our findings show that our proposed approach achieves a good F1 score that outperforms the pioneering related work approaches.

References

  1. Ahmed Abdelali, Kareem Darwish, Nadir Durrani, and Hamdy Mubarak. 2016. Farasa: A fast and furious segmenter for arabic. In Proc. of the 2016 conference of the North American chapter of the association for computational linguistics: Demonstrations.Google ScholarGoogle ScholarCross RefCross Ref
  2. Saja Al-Dabet, Sara Tedmori, and AL-Smadi Mohammad. 2021. Enhancing Arabic aspect-based sentiment analysis using deep learning models. Computer Speech Language (2021).Google ScholarGoogle Scholar
  3. Mohammad Al-Smadi, Omar Qawasmeh, Mahmoud Al-Ayyoub. Yaser Jararweh, and Brij Gupta. 2018. Deep Recurrent neural network vs. support vector machine for aspect-based sentiment analysis of Arabic hotels' reviews. Journal of computational science 27 (2018), 386--393.Google ScholarGoogle ScholarCross RefCross Ref
  4. Eiman Alsharhan and Allan Ramsay. 2020. Investigating the effects of gender, dialect, and training size on the performance of Arabic speech recognition. Language Resources and Evaluation (2020).Google ScholarGoogle Scholar
  5. Wissam Antoun, Fady Baly, and Hazem Hajj. 2020. Arabert: Transformer-based model for Arabic language understanding. arXiv preprint arXiv:2003.00104 (2020).Google ScholarGoogle Scholar
  6. Francisco Charte, Antonio J Rivera, María J del Jesus, and Francisco Herrera. 2015. MLSMOTE: Approaching imbalanced multilabel learning through synthetic instance generation. Knowledge-Based Systems (2015).Google ScholarGoogle Scholar
  7. Nitesh V Chawla, Kevin W Bowyer, Lawrence O Hall, and W Philip Kegelmeyer. 2002. SMOTE: synthetic minority over-sampling technique. Journal of artificial intelligence research (2002).Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Krzysztof Dembczynski, Weiwei Cheng, and Eyke Hüllermeier. 2010. Bayes optimal multilabel classification via probabilistic classifier chains. In ICML.Google ScholarGoogle Scholar
  9. Pedro Gonnet and Thomas Deselaers. 2020. Indylstms: Independently Recurrent LSTMS. In International Conference on Acoustics, Speech and Signal Processing (ICASSP).Google ScholarGoogle Scholar
  10. Sana Hamdi, Ahmed Hamdi, and Sadok Ben Yahia. 2022. BERT and Word Embedding for Interest Mining of Instagram Users. In Advances in Computational Collective Intelligence. Springer International Publishing, Cham, 123--136.Google ScholarGoogle Scholar
  11. Mai Ibrahim, Marwan Torki, and Nagwa El-Makky. 2018. Imbalanced toxic comments classification using data augmentation and deep learning. In 2018 17th IEEE international conference on machine learning and applications (ICMLA). IEEE, 875--878.Google ScholarGoogle ScholarCross RefCross Ref
  12. Mai Ibrahim, Marwan Torki, and Nagwa M El-Makky. 2020. AlexU-BackTranslation-TL at SemEval-2020 Task 12: Improving offensive language detection using data augmentation and transfer learning. In Proc. of the Fourteenth Workshop on Semantic Evaluation. 1881--1890.Google ScholarGoogle ScholarCross RefCross Ref
  13. Tomas Liesting, Flavius Frasincar, and Maria Mihaela Truşcă. 2021. Data augmentation in a hybrid approach for aspect-based sentiment analysis. In Proceedings of the 36th Annual ACM Symposium on Applied Computing. 828--835.Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Maria Pontiki, Dimitrios Galanis, Haris Papageorgiou, Ion Androutsopoulos, Suresh Manandhar, Mohammad Al-Smadi, Mahmoud Al-Ayyoub, Yanyan Zhao, Bing Qin, Orphée De Clercq, et al. 2016. Semeval-2016 task 5: Aspect based sentiment analysis. In International workshop on semantic evaluation. 19--30.Google ScholarGoogle ScholarCross RefCross Ref
  15. M Pontiki, D Galanis, H Papageorgiou, S Manandhar, and I Androutsopoulos. 2016. SemEval 2016 task 5: aspect based sentiment analysis (ABSA-16) annotation guidelines. (2016).Google ScholarGoogle Scholar
  16. Jesse Read, Bernhard Pfahringer, Geoff Holmes, and Eibe Frank. 2011. Classifier chains for multi-label classification. Machine learning (2011).Google ScholarGoogle Scholar
  17. Sebastian Ruder, Parsa Ghaffari, and John G Breslin. 2016. Insight-1 at semeval-2016 task 5: Deep learning for multilingual aspect-based sentiment analysis. (2016).Google ScholarGoogle Scholar
  18. Abu Bakr Soliman, Kareem Eissa, and Samhaa R El-Beltagy. 2017. Aravec: A set of Arabic word embedding models for use in Arabic nlp. Procedia Computer Science 117 (2017), 256--265.Google ScholarGoogle ScholarCross RefCross Ref
  19. Aleš Tamchyna and Kateŕina Veselovská. 2016. Ufal at semeval-2016 task 5: recurrent neural networks for sentence classification. In Proc. of the international workshop on semantic evaluation.Google ScholarGoogle ScholarCross RefCross Ref
  20. Maria Mihaela Truşcă and Flavius Frasincar. 2022. Survey on aspect detection for aspect-based sentiment analysis. Artificial Intelligence Review (2022), 1--50.Google ScholarGoogle Scholar

Index Terms

  1. Arabic Aspect Category Detection for Hotel Reviews based on Data Augmentation and Classifier Chains

          Recommendations

          Comments

          Login options

          Check if you have access through your login credentials or your institution to get full access on this article.

          Sign in
          • Published in

            cover image ACM Conferences
            SAC '23: Proceedings of the 38th ACM/SIGAPP Symposium on Applied Computing
            March 2023
            1932 pages
            ISBN:9781450395175
            DOI:10.1145/3555776

            Copyright © 2023 ACM

            Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

            Publisher

            Association for Computing Machinery

            New York, NY, United States

            Publication History

            • Published: 7 June 2023

            Permissions

            Request permissions about this article.

            Request Permissions

            Check for updates

            Qualifiers

            • research-article

            Acceptance Rates

            Overall Acceptance Rate1,650of6,669submissions,25%

          PDF Format

          View or Download as a PDF file.

          PDF

          eReader

          View online with eReader.

          eReader