ABSTRACT
Recently, the amount of content generated on online hospitality platforms has increased exponentially and has changed people's ways of life. Consumers often refer to online reviews before deciding which hotel to choose. These reviews provide firsthand information, essential to improving hotel services' quality. However, the massive amount of review data and its unstructured nature make it a difficult challenge. Indeed, many researchers were interested in exploring the field of sentiment analysis in the hotel industry. In particular, they have given more attention to aspect-based sentiment analysis, which categorizes opinions by aspect and identifies the sentiment related to each aspect. However, studies examining the Arabic language are limited compared to English. Our paper aims to explore aspect category detection as a sub-task of aspect-based sentiment analysis using Arabic reviews. We relied on the SemEval-2016 Arabic dataset for hotel reviews. As this data suffers from an imbalanced distribution, we propose an approach for multi-label data augmentation of the minority classes in this used dataset. Then, we propose a specific preprocessing for this Arabic reviews dataset. Our aspect category prediction approach is based on the classifier chains technique. In fact, unlike previous works that treat each label separately, we handle the dependencies between the various labels. Our findings show that our proposed approach achieves a good F1 score that outperforms the pioneering related work approaches.
- Ahmed Abdelali, Kareem Darwish, Nadir Durrani, and Hamdy Mubarak. 2016. Farasa: A fast and furious segmenter for arabic. In Proc. of the 2016 conference of the North American chapter of the association for computational linguistics: Demonstrations.Google ScholarCross Ref
- Saja Al-Dabet, Sara Tedmori, and AL-Smadi Mohammad. 2021. Enhancing Arabic aspect-based sentiment analysis using deep learning models. Computer Speech Language (2021).Google Scholar
- Mohammad Al-Smadi, Omar Qawasmeh, Mahmoud Al-Ayyoub. Yaser Jararweh, and Brij Gupta. 2018. Deep Recurrent neural network vs. support vector machine for aspect-based sentiment analysis of Arabic hotels' reviews. Journal of computational science 27 (2018), 386--393.Google ScholarCross Ref
- Eiman Alsharhan and Allan Ramsay. 2020. Investigating the effects of gender, dialect, and training size on the performance of Arabic speech recognition. Language Resources and Evaluation (2020).Google Scholar
- Wissam Antoun, Fady Baly, and Hazem Hajj. 2020. Arabert: Transformer-based model for Arabic language understanding. arXiv preprint arXiv:2003.00104 (2020).Google Scholar
- Francisco Charte, Antonio J Rivera, María J del Jesus, and Francisco Herrera. 2015. MLSMOTE: Approaching imbalanced multilabel learning through synthetic instance generation. Knowledge-Based Systems (2015).Google Scholar
- Nitesh V Chawla, Kevin W Bowyer, Lawrence O Hall, and W Philip Kegelmeyer. 2002. SMOTE: synthetic minority over-sampling technique. Journal of artificial intelligence research (2002).Google ScholarDigital Library
- Krzysztof Dembczynski, Weiwei Cheng, and Eyke Hüllermeier. 2010. Bayes optimal multilabel classification via probabilistic classifier chains. In ICML.Google Scholar
- Pedro Gonnet and Thomas Deselaers. 2020. Indylstms: Independently Recurrent LSTMS. In International Conference on Acoustics, Speech and Signal Processing (ICASSP).Google Scholar
- Sana Hamdi, Ahmed Hamdi, and Sadok Ben Yahia. 2022. BERT and Word Embedding for Interest Mining of Instagram Users. In Advances in Computational Collective Intelligence. Springer International Publishing, Cham, 123--136.Google Scholar
- Mai Ibrahim, Marwan Torki, and Nagwa El-Makky. 2018. Imbalanced toxic comments classification using data augmentation and deep learning. In 2018 17th IEEE international conference on machine learning and applications (ICMLA). IEEE, 875--878.Google ScholarCross Ref
- Mai Ibrahim, Marwan Torki, and Nagwa M El-Makky. 2020. AlexU-BackTranslation-TL at SemEval-2020 Task 12: Improving offensive language detection using data augmentation and transfer learning. In Proc. of the Fourteenth Workshop on Semantic Evaluation. 1881--1890.Google ScholarCross Ref
- Tomas Liesting, Flavius Frasincar, and Maria Mihaela Truşcă. 2021. Data augmentation in a hybrid approach for aspect-based sentiment analysis. In Proceedings of the 36th Annual ACM Symposium on Applied Computing. 828--835.Google ScholarDigital Library
- Maria Pontiki, Dimitrios Galanis, Haris Papageorgiou, Ion Androutsopoulos, Suresh Manandhar, Mohammad Al-Smadi, Mahmoud Al-Ayyoub, Yanyan Zhao, Bing Qin, Orphée De Clercq, et al. 2016. Semeval-2016 task 5: Aspect based sentiment analysis. In International workshop on semantic evaluation. 19--30.Google ScholarCross Ref
- M Pontiki, D Galanis, H Papageorgiou, S Manandhar, and I Androutsopoulos. 2016. SemEval 2016 task 5: aspect based sentiment analysis (ABSA-16) annotation guidelines. (2016).Google Scholar
- Jesse Read, Bernhard Pfahringer, Geoff Holmes, and Eibe Frank. 2011. Classifier chains for multi-label classification. Machine learning (2011).Google Scholar
- Sebastian Ruder, Parsa Ghaffari, and John G Breslin. 2016. Insight-1 at semeval-2016 task 5: Deep learning for multilingual aspect-based sentiment analysis. (2016).Google Scholar
- Abu Bakr Soliman, Kareem Eissa, and Samhaa R El-Beltagy. 2017. Aravec: A set of Arabic word embedding models for use in Arabic nlp. Procedia Computer Science 117 (2017), 256--265.Google ScholarCross Ref
- Aleš Tamchyna and Kateŕina Veselovská. 2016. Ufal at semeval-2016 task 5: recurrent neural networks for sentence classification. In Proc. of the international workshop on semantic evaluation.Google ScholarCross Ref
- Maria Mihaela Truşcă and Flavius Frasincar. 2022. Survey on aspect detection for aspect-based sentiment analysis. Artificial Intelligence Review (2022), 1--50.Google Scholar
Index Terms
- Arabic Aspect Category Detection for Hotel Reviews based on Data Augmentation and Classifier Chains
Recommendations
Aspect based Sentiment Oriented Summarization of Hotel Reviews
Hotel booking websites use online ratings and customer feedback to help the customers decision making process but reviews provide a better insight about the hotel but most travellers dont have the time or patience to read all reviews. This study ...
Optimization of classifier chains via conditional likelihood maximization
A general framework is proposed for multi-label classification from the viewpoint of conditional likelihood maximization.Based on the proposed framework, the popular classifier chains method is optimized in terms of label correlation modeling and multi-...
Classifier chains for positive unlabelled multi-label learning
AbstractIn traditional multi-label setting it is assumed that all relevant labels are assigned to the given instance. In positive unlabelled setting, only some of relevant labels are assigned. The appearance of a label means that the instance ...
Comments