Abstract
User-generated comments are crucial in the domain of hotel bookings, especially in the fast-changing online planning and booking industry. Our research presents a sophisticated system that collects and organizes feedback from Booking’s website, with a specific emphasis on tourist cities in Morocco. The methodology utilizes XLNet for analyzing Arabic and French languages. It incorporates four advanced deep learning models, including LSTM, CNN, GRU, and BiLSTM. These models are merged using a stacking ensemble technique and a neural topic model to extract significant themes from reviews effectively. This novel strategy demonstrates a significant 0.975 increase in accuracy compared to traditional methods, such as GPT and BERT. In addition, the study methodically investigates the influence of dataset scaling by comparing different levels of dataset scaling.
Similar content being viewed by others
Data availability
No datasets were generated or analysed during the current study.
References
Abdelgwad M, Soliman MA, I.Taloba TH, A., Farghaly MF (2021) Arabic aspect based sentiment analysis using bidirectional GRU based models. Journal of King Saud University - Computer and Information Sciences, S1319157821002482. https://doi.org/10.1016/j.jksuci.2021.08.030
Abdelgwad MM, Hassan T, Ahmed AS, I. T., Fawzy M (2021a) F. Arabic aspect based sentiment classification using BERT (arXiv:2107.13290). arXiv. http://arxiv.org/abs/2107.13290
Alduailej A, Alothaim A (2022) AraXLNet: pre-trained language model for sentiment analysis of Arabic. J Big Data 9(1):72. https://doi.org/10.1186/s40537-022-00625-z
Allocine · Datasets at Hugging Face. (s. d.). Consulté 15 janvier 2022, à l’adresse https://huggingface.co/datasets/allocine
Ameer I, Bölücü N, Siddiqui MHF, Can B, Sidorov G, Gelbukh A (2023) Multi-label emotion classification in texts using transfer learning. Expert Syst Appl 213:118534. https://doi.org/10.1016/j.eswa.2022.118534
Antoun W, Baly F, Hajj H (s. d.-b). AraGPT2: Pre-Trained Transformer for Arabic Language Generation. 12
Antoun W, Baly F, Hajj H (s. d.-a). AraBERT: Transformer-based Model for Arabic Language Understanding. 7
Arabic 100k Reviews. (s. d.). Consulté 29 juin 2022, à l’adresse https://www.kaggle.com/datasets/abedkhooli/arabic-100k-reviews
Ardabili S, Mosavi A, Várkonyi-Kóczy AR (2019) Advances in machine learning modeling reviewing hybrid and ensemble methods. Math Comput Sci. https://doi.org/10.20944/preprints201908.0203.v1. [Preprint]
Awni M, Khalil MI, Abbas HM (2019) Deep-Learning Ensemble for Offline Arabic Handwritten Words Recognition. 2019 14th International Conference on Computer Engineering and Systems (ICCES), 40–45. https://doi.org/10.1109/ICCES48960.2019.9068184
Bianchi F, Terragni S, Hovy D (2020) Pre-training is a Hot Topic: Contextualized Document Embeddings Improve Topic Coherence. arXiv:2004.03974 [Cs]. http://arxiv.org/abs/2004.03974
Blei DM, Ng AY, Jordan MI (2003) Latent Dirichlet Allocation. J Mach Learn Res 3(null):993–1022
Cambria E (2016) Affective Computing and sentiment analysis. IEEE INTELLIGENT SYSTEMS
Cendani LM, Kusumaningrum R, Endah SN (2023) Aspect-Based Sentiment Analysis of Indonesian-Language Hotel Reviews Using Long Short-Term Memory with an Attention Mechanism. In M. Ben Ahmed, B. A. Abdelhakim, B. K. Ane, & D. Rosiyadi (Éds.), Emerging Trends in Intelligent Systems & Network Security (Vol. 147, pp. 106–122). Springer International Publishing. https://doi.org/10.1007/978-3-031-15191-0_11
Collini E, Nesi P, Pantaleo G (2023) Reputation assessment and visitor arrival forecasts for data driven tourism attractions assessment. Online Social Networks and Media, 37–38, 100274. https://doi.org/10.1016/j.osnem.2023.100274
Devlin J, Chang M-W, Lee K, Toutanova K (2019) BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. arXiv:1810.04805 [Cs]. http://arxiv.org/abs/1810.04805
Díaz-Pacheco Á, Guerrero-Rodríguez R, Álvarez-Carmona MÁ, Rodríguez-González AY, Aranda R (2023) A comprehensive deep learning approach for topic discovering and sentiment analysis of textual information in tourism. J King Saud Univ - Comput Inform Sci 35(9):101746. https://doi.org/10.1016/j.jksuci.2023.101746
Elnagar A, Einea O (2016) BRAD 1.0: Book reviews in Arabic dataset. 2016 IEEE/ACS 13th International Conference of Computer Systems and Applications (AICCSA), 1–8. https://doi.org/10.1109/AICCSA.2016.7945800
Elnagar A, Khalifa YS, Einea A (2018) Hotel Arabic-Reviews Dataset Construction for Sentiment Analysis Applications. In K. Shaalan, A. E. Hassanien, & F. Tolba (Éds.), Intelligent Natural Language Processing: Trends and Applications (pp. 35–52). Springer International Publishing. https://doi.org/10.1007/978-3-319-67056-0_3
Eshkevari M, Jahangoshai Rezaee M, Saberi M, Hussain OK (2022) An end-to-end ranking system based on customers reviews: integrating semantic mining and MCDM techniques. Expert Syst Appl 209:118294. https://doi.org/10.1016/j.eswa.2022.118294
Essebbar A, Kane B, Guinaudeau O, Chiesa V, Quénel I, Chau S (2021) Aspect Based Sentiment Analysis using French Pre-Trained Models: Proceedings of the 13th International Conference on Agents and Artificial Intelligence, 519–525. https://doi.org/10.5220/0010382705190525
Fadel AS, Saleh ME, Abulnaja OA (2022) Arabic aspect extraction based on stacked Contextualized Embedding with Deep Learning. IEEE Access 10:30526–30535. https://doi.org/10.1109/ACCESS.2022.3159252
Freund Y, Schapire RE (1997) A decision-theoretic generalization of On-Line learning and an application to boosting. J Comput Syst Sci 55(1):119–139. https://doi.org/10.1006/jcss.1997.1504
Ganji RN, Dadkhah C, Tohidi N (2023) Improving sentiment classification for Hotel Recommender System through Deep Learning and Data Balancing. Computación Y Sistemas 27(3). https://doi.org/10.13053/cys-27-3-4655
Ghorbel H, Jacot D (2011) Sentiment Analysis of French Movie Reviews. In V. Pallotta, A. Soro, & E. Vargiu (Éds.), Advances in Distributed Agent-Based Retrieval Tools (pp. 97–108). Springer Berlin Heidelberg. https://doi.org/10.1007/978-3-642-21384-7_7
Guedes DMD, Gosling MDS, ACTIVITY OF BRAZILIAN TOURISM AGENCIES IN SOCIAL MEDIA: AN ANALYSIS USING NATURAL LANGUAGE PROCESSING (2023) Perspectivas em Ciência Da Informação 28:e25280. https://doi.org/10.1590/1981-5344/25280
Habbat N, Anoun H, Hassouni L, PAGE USING CONTEXTUALIZED DOCUMENT EMBEDDING (2021a) EXTRACTING TOPICS FROM A TV CHANNEL’S FACEBOOK. The International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences, XLVI-4/W5-2021, 245–249. https://doi.org/10.5194/isprs-archives-XLVI-4-W5-2021-245-2021
Habbat N, Anoun H, Hassouni L (2021b) Topic Modeling and Sentiment Analysis with LDA and NMF on Moroccan Tweets. In M. Ben Ahmed, İ. Rakıp Karaș, D. Santos, O. Sergeyeva, & A. A. Boudhir (Éds.), Innovations in Smart Cities Applications Volume 4 (pp. 147–161). Springer International Publishing
Habbat N, Anoun H, Hassouni L (2022a) Exploration, Sentiment Analysis, Topic Modeling, and Visualization of Moroccan Twitter Data. In J. Kacprzyk, V. E. Balas, & M. Ezziyyani (Éds.), Advanced Intelligent Systems for Sustainable Development (AI2SD’2020) (pp. 1067–1083). Springer International Publishing
Habbat N, Anoun H, Hassouni L (2022b) LSTM-CNN Deep Learning Model for French Online Product Reviews Classification. In R. Saidi, B. El Bhiri, Y. Maleh, A. Mosallam, & M. Essaaidi (Éds.), Advanced Technologies for Humanity (pp. 228–240). Springer International Publishing
Habbat N, Anoun H, Hassouni L (2022c) Sentiment analysis and topic modeling on Arabic Twitter Data during Covid-19 pandemic. Indonesian J Innov Appl Sci (IJIAS) 2(1):60–67. https://doi.org/10.47540/ijias.v2i1.432
Hajek P, Barushka A, Munk M (2021) Neural networks with emotion associations, topic modeling and supervised term weighting for sentiment analysis. Int J Neural Syst 31(10):2150013. https://doi.org/10.1142/S0129065721500131
He K, Mao R, Gong T, Li C, Cambria E (2022) Meta-based self-training and re-weighting for aspect-based sentiment analysis. IEEE Trans Affect Comput 1–13. https://doi.org/10.1109/TAFFC.2022.3202831
Hicham N, Karim S (2022) Analysis of unsupervised machine learning techniques for an efficient customer segmentation using clustering ensemble and spectral clustering. Int J Adv Comput Sci Appl 13(10). https://doi.org/10.14569/IJACSA.2022.0131016
Hicham N, Karim S, Habbat N (2022) An efficient approach for improving customer Sentiment Analysis in the Arabic language using an Ensemble machine learning technique. 2022 5th International Conference on Advanced Communication Technologies and Networking (CommNet), 1–6. https://doi.org/10.1109/CommNet56067.2022.9993924
Hussain S, Ayoub M, Jilani G, Yu Y, Khan A, Wahid JA, Butt MFA, Yang G, Moller DPF, Weiyan H (2022) Aspect2Labels: a novelistic decision support system for higher educational institutions by using multi-layer topic modelling approach. Expert Syst Appl 209:118119. https://doi.org/10.1016/j.eswa.2022.118119
Jalali E, Zojaji Z, Soleimani M, on Iranian Tourist Destinations (2023) A Sentiment Analysis Dataset Preparation Framework: A Case Study on User Reviews Dataset. 2023 9th International Conference on Web Research (ICWR), 330–334. https://doi.org/10.1109/ICWR57742.2023.10139217
Keung P, Lu Y, Szarvas G, Smith NA (2020) The Multilingual Amazon Reviews Corpus. arXiv:2010.02573 [Cs]. http://arxiv.org/abs/2010.02573
Khan L, Amjad A, Ashraf N, Chang H-T (2022) Multi-class sentiment analysis of Urdu text using multilingual BERT. Sci Rep 12(1):5436. https://doi.org/10.1038/s41598-022-09381-9
Kim D, Kang P (2022) Cross-modal distillation with audio–text fusion for fine-grained emotion classification using BERT and Wav2vec 2.0. Neurocomputing 506:168–183. https://doi.org/10.1016/j.neucom.2022.07.035
Kingma DP, Welling M (2014) Auto-Encoding Variational Bayes. arXiv:1312.6114 [Cs, Stat]. http://arxiv.org/abs/1312.6114
Li X, Zhang Y, Mei L (2023) Analyzing online reviews of foreign tourists to destination attractions in China: a novel text mining approach. Asia Pac J Tourism Res 28(7):647–666. https://doi.org/10.1080/10941665.2023.2255315
Li H, Yu BXB, Li G, Gao H (2023a) Restaurant survival prediction using customer-generated content: an aspect-based sentiment analysis of online reviews. Tour Manag 96:104707. https://doi.org/10.1016/j.tourman.2022.104707
Liu RR, Lin J, Wei Q, Jiang Q (2023) Fuzhou destination image perception study: Based on machine learning LDA model and SVM model. In R. Liang & J. Wang (Éds.), International Conference on Computer Graphics, Artificial Intelligence, and Data Processing (ICCAID 2022) (p. 147). SPIE. https://doi.org/10.1117/12.2674702
Manosso FC, Domareski R, Thays C (2021) Using sentiment analysis in tourism research: A systematic, bibliometric, and integrative review. https://doi.org/10.5281/ZENODO.5548426
Mao R, Liu Q, He K, Li W, Cambria E (2022) The biases of Pre-trained Language models: an empirical study on prompt-based sentiment analysis and emotion detection. IEEE Trans Affect Comput 1–11. https://doi.org/10.1109/TAFFC.2022.3204972
Martin L, Muller B, Suárez PJO, Dupont Y, Romary L, de la Clergerie ÉV, Seddah D, Sagot B (2020) CamemBERT: a tasty French Language Model. Proc 58th Annual Meeting Association Comput Linguistics 7203–7219. https://doi.org/10.18653/v1/2020.acl-main.645
Miao Y, Grefenstette E, Blunsom P (2018) Discovering Discrete Latent Topics with Neural Variational Inference. arXiv:1706.00359 [Cs]. http://arxiv.org/abs/1706.00359
Nayak K, Panigrahy SK (2023) A Machine Learning Model to Classify Indian Taxi System in Tourism Industry. 2023 3rd International conference on Artificial Intelligence and Signal Processing (AISP), 1–5. https://doi.org/10.1109/AISP57993.2023.10134814
Ounacer S, Mhamdi D, Ardchir S, Daif A, Azzouazi M (2023) Customer sentiment analysis in Hotel reviews through Natural Language Processing techniques. Int J Adv Comput Sci Appl 14(1). https://doi.org/10.14569/IJACSA.2023.0140162
Priyamal GAN, Rupasingha RAHM (2023) Sentiment Analysis of Twitter Data on the Tourism Industry During the Covid-19 Pandemic. 2023 3rd International Conference on Advanced Research in Computing (ICARC), 48–53. https://doi.org/10.1109/ICARC57651.2023.10145708
Puh K, Bagić Babac M (2023) Predicting sentiment and rating of tourist reviews using machine learning. J Hospitality Tourism Insights 6(3):1188–1204. https://doi.org/10.1108/JHTI-02-2022-0078
Radford A, Narasimhan K, Salimans T, Sutskever I (s. d.). Improving Language Understanding by Generative Pre-Training. 12
Ranga KK, Nagpal CK, Vedpal V (2023) Trip planner: a Big Data Analytics based recommendation system for Tourism Planning. Int J Recent Innov Trends Comput Communication 11(3s):159–174. https://doi.org/10.17762/ijritcc.v11i3s.6176
Sagi O, Rokach L (2018) Ensemble learning: a survey. WIREs Data Min Knowl Discov 8(4):e1249. https://doi.org/10.1002/widm.1249
Saranya S, Usha G (2023) A machine learning-based technique with Intelligent WordNet Lemmatize for Twitter Sentiment Analysis. Intell Autom Soft Comput 36(1):339–352. https://doi.org/10.32604/iasc.2023.031987
Sarkar K (2020) A Stacked Ensemble Approach to Bengali Sentiment Analysis. In U. S. Tiwary & S. Chaudhury (Éds.), Intelligent Human Computer Interaction (pp. 102–111). Springer International Publishing
Sattar K, Umer Q, Vasbieva DG, Chung S, Latif Z, Lee C (2021) A Multi-layer Network for aspect-based cross-lingual sentiment classification. IEEE Access 9:133961–133973. https://doi.org/10.1109/ACCESS.2021.3116053
Sievert C, Shirley K (2014) LDAvis: A method for visualizing and interpreting topics. Proceedings of the Workshop on Interactive Language Learning, Visualization, and Interfaces, 63–70. https://doi.org/10.3115/v1/W14-3110
Srivastava A, Sutton C (2017) Autoencoding Variational Inference For Topic Models. arXiv:1703.01488 [Stat]. http://arxiv.org/abs/1703.01488
Srivastava PR, Eachempati P, Charles V, Rana NP (2023) A hybrid machine learning approach to hotel sales rank prediction. J Oper Res Soc 74(6):1407–1423. https://doi.org/10.1080/01605682.2022.2096498
Sufian SAM, Bahrin UFM, Jantan H (2023) Deep Dive into Hotel Reviews Analysis Using Convolutional Neural Networks Algorithm. 2023 International Conference on Data Science and Its Applications (ICoDSA), 94–99. https://doi.org/10.1109/ICoDSA58501.2023.10277549
Tela A, Woubie A, Hautamaki V (2020) Transferring Monolingual Model to Low-Resource Language: The Case of Tigrinya (arXiv:2006.07698). arXiv. http://arxiv.org/abs/2006.07698
Vaish N, Goel N, Gupta G (2022) Machine Learning Techniques for Sentiment Analysis of Hotel Reviews. 2022 International Conference on Computer Communication and Informatics (ICCCI), 01–07. https://doi.org/10.1109/ICCCI54379.2022.9740876
Viñán-Ludeña MS, De Campos LM (2022) Discovering a tourism destination with social media data: BERT-based sentiment analysis. J Hospitality Tourism Technol 13(5):907–921. https://doi.org/10.1108/JHTT-09-2021-0259
Wang H, Lu Y, Zhai C (s. d.). Latent aspect rating analysis on review text data: A rating regression approach. 10
Wen Y, Liang Y, Zhu X (2023) Sentiment analysis of hotel online reviews using the BERT model and ERNIE model—data from China. PLoS ONE 18(3):e0275382. https://doi.org/10.1371/journal.pone.0275382
Wu DC, Zhong S, Qiu RTR, Wu J (2022) Are customer reviews just reviews ? Hotel forecasting using sentiment analysis. Tour Econ 28(3):795–816. https://doi.org/10.1177/13548166211049865
Wu H, Huang C, Deng S (2023) Improving aspect-based sentiment analysis with knowledge-aware dependency Graph Network. Inform Fusion 92:289–299. https://doi.org/10.1016/j.inffus.2022.12.004
Yang Z, Dai Z, Yang Y, Carbonell J, Salakhutdinov R, Le QV (2020) XLNet: Generalized Autoregressive Pretraining for Language Understanding. arXiv:1906.08237 [Cs]. http://arxiv.org/abs/1906.08237
Author information
Authors and Affiliations
Contributions
N.H.: Conceptualization, Methodology, Resources, Software, Data curation, Writing- Original draft preparation, Revisions. Hi.N.: Resources, Visualization, Investigation, Data curation, Writing- Original draft preparation, Revisions.
Corresponding author
Ethics declarations
Competing interests
The authors declare no competing interests.
Additional information
Publisher’s Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Habbat, N., Nouri, H. Unlocking travel narratives: a fusion of stacking ensemble deep learning and neural topic modeling for enhanced tourism comment analysis. Soc. Netw. Anal. Min. 14, 82 (2024). https://doi.org/10.1007/s13278-024-01256-3
Received:
Revised:
Accepted:
Published:
DOI: https://doi.org/10.1007/s13278-024-01256-3