Skip to main content
Log in

Unlocking travel narratives: a fusion of stacking ensemble deep learning and neural topic modeling for enhanced tourism comment analysis

  • Original Article
  • Published:
Social Network Analysis and Mining Aims and scope Submit manuscript

Abstract

User-generated comments are crucial in the domain of hotel bookings, especially in the fast-changing online planning and booking industry. Our research presents a sophisticated system that collects and organizes feedback from Booking’s website, with a specific emphasis on tourist cities in Morocco. The methodology utilizes XLNet for analyzing Arabic and French languages. It incorporates four advanced deep learning models, including LSTM, CNN, GRU, and BiLSTM. These models are merged using a stacking ensemble technique and a neural topic model to extract significant themes from reviews effectively. This novel strategy demonstrates a significant 0.975 increase in accuracy compared to traditional methods, such as GPT and BERT. In addition, the study methodically investigates the influence of dataset scaling by comparing different levels of dataset scaling.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11

Similar content being viewed by others

Data availability

No datasets were generated or analysed during the current study.

References

  • Abdelgwad M, Soliman MA, I.Taloba TH, A., Farghaly MF (2021) Arabic aspect based sentiment analysis using bidirectional GRU based models. Journal of King Saud University - Computer and Information Sciences, S1319157821002482. https://doi.org/10.1016/j.jksuci.2021.08.030

  • Abdelgwad MM, Hassan T, Ahmed AS, I. T., Fawzy M (2021a) F. Arabic aspect based sentiment classification using BERT (arXiv:2107.13290). arXiv. http://arxiv.org/abs/2107.13290

  • Alduailej A, Alothaim A (2022) AraXLNet: pre-trained language model for sentiment analysis of Arabic. J Big Data 9(1):72. https://doi.org/10.1186/s40537-022-00625-z

    Article  Google Scholar 

  • Allocine · Datasets at Hugging Face. (s. d.). Consulté 15 janvier 2022, à l’adresse https://huggingface.co/datasets/allocine

  • Ameer I, Bölücü N, Siddiqui MHF, Can B, Sidorov G, Gelbukh A (2023) Multi-label emotion classification in texts using transfer learning. Expert Syst Appl 213:118534. https://doi.org/10.1016/j.eswa.2022.118534

    Article  Google Scholar 

  • Antoun W, Baly F, Hajj H (s. d.-b). AraGPT2: Pre-Trained Transformer for Arabic Language Generation. 12

  • Antoun W, Baly F, Hajj H (s. d.-a). AraBERT: Transformer-based Model for Arabic Language Understanding. 7

  • Arabic 100k Reviews. (s. d.). Consulté 29 juin 2022, à l’adresse https://www.kaggle.com/datasets/abedkhooli/arabic-100k-reviews

  • Ardabili S, Mosavi A, Várkonyi-Kóczy AR (2019) Advances in machine learning modeling reviewing hybrid and ensemble methods. Math Comput Sci. https://doi.org/10.20944/preprints201908.0203.v1. [Preprint]

    Article  Google Scholar 

  • Awni M, Khalil MI, Abbas HM (2019) Deep-Learning Ensemble for Offline Arabic Handwritten Words Recognition. 2019 14th International Conference on Computer Engineering and Systems (ICCES), 40–45. https://doi.org/10.1109/ICCES48960.2019.9068184

  • Bianchi F, Terragni S, Hovy D (2020) Pre-training is a Hot Topic: Contextualized Document Embeddings Improve Topic Coherence. arXiv:2004.03974 [Cs]. http://arxiv.org/abs/2004.03974

  • Blei DM, Ng AY, Jordan MI (2003) Latent Dirichlet Allocation. J Mach Learn Res 3(null):993–1022

    Google Scholar 

  • Cambria E (2016) Affective Computing and sentiment analysis. IEEE INTELLIGENT SYSTEMS

  • Cendani LM, Kusumaningrum R, Endah SN (2023) Aspect-Based Sentiment Analysis of Indonesian-Language Hotel Reviews Using Long Short-Term Memory with an Attention Mechanism. In M. Ben Ahmed, B. A. Abdelhakim, B. K. Ane, & D. Rosiyadi (Éds.), Emerging Trends in Intelligent Systems & Network Security (Vol. 147, pp. 106–122). Springer International Publishing. https://doi.org/10.1007/978-3-031-15191-0_11

  • Collini E, Nesi P, Pantaleo G (2023) Reputation assessment and visitor arrival forecasts for data driven tourism attractions assessment. Online Social Networks and Media, 37–38, 100274. https://doi.org/10.1016/j.osnem.2023.100274

  • Devlin J, Chang M-W, Lee K, Toutanova K (2019) BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. arXiv:1810.04805 [Cs]. http://arxiv.org/abs/1810.04805

  • Díaz-Pacheco Á, Guerrero-Rodríguez R, Álvarez-Carmona MÁ, Rodríguez-González AY, Aranda R (2023) A comprehensive deep learning approach for topic discovering and sentiment analysis of textual information in tourism. J King Saud Univ - Comput Inform Sci 35(9):101746. https://doi.org/10.1016/j.jksuci.2023.101746

    Article  Google Scholar 

  • Elnagar A, Einea O (2016) BRAD 1.0: Book reviews in Arabic dataset. 2016 IEEE/ACS 13th International Conference of Computer Systems and Applications (AICCSA), 1–8. https://doi.org/10.1109/AICCSA.2016.7945800

  • Elnagar A, Khalifa YS, Einea A (2018) Hotel Arabic-Reviews Dataset Construction for Sentiment Analysis Applications. In K. Shaalan, A. E. Hassanien, & F. Tolba (Éds.), Intelligent Natural Language Processing: Trends and Applications (pp. 35–52). Springer International Publishing. https://doi.org/10.1007/978-3-319-67056-0_3

  • Eshkevari M, Jahangoshai Rezaee M, Saberi M, Hussain OK (2022) An end-to-end ranking system based on customers reviews: integrating semantic mining and MCDM techniques. Expert Syst Appl 209:118294. https://doi.org/10.1016/j.eswa.2022.118294

    Article  Google Scholar 

  • Essebbar A, Kane B, Guinaudeau O, Chiesa V, Quénel I, Chau S (2021) Aspect Based Sentiment Analysis using French Pre-Trained Models: Proceedings of the 13th International Conference on Agents and Artificial Intelligence, 519–525. https://doi.org/10.5220/0010382705190525

  • Fadel AS, Saleh ME, Abulnaja OA (2022) Arabic aspect extraction based on stacked Contextualized Embedding with Deep Learning. IEEE Access 10:30526–30535. https://doi.org/10.1109/ACCESS.2022.3159252

    Article  Google Scholar 

  • Freund Y, Schapire RE (1997) A decision-theoretic generalization of On-Line learning and an application to boosting. J Comput Syst Sci 55(1):119–139. https://doi.org/10.1006/jcss.1997.1504

    Article  MathSciNet  Google Scholar 

  • Ganji RN, Dadkhah C, Tohidi N (2023) Improving sentiment classification for Hotel Recommender System through Deep Learning and Data Balancing. Computación Y Sistemas 27(3). https://doi.org/10.13053/cys-27-3-4655

  • Ghorbel H, Jacot D (2011) Sentiment Analysis of French Movie Reviews. In V. Pallotta, A. Soro, & E. Vargiu (Éds.), Advances in Distributed Agent-Based Retrieval Tools (pp. 97–108). Springer Berlin Heidelberg. https://doi.org/10.1007/978-3-642-21384-7_7

  • Guedes DMD, Gosling MDS, ACTIVITY OF BRAZILIAN TOURISM AGENCIES IN SOCIAL MEDIA: AN ANALYSIS USING NATURAL LANGUAGE PROCESSING (2023) Perspectivas em Ciência Da Informação 28:e25280. https://doi.org/10.1590/1981-5344/25280

    Article  Google Scholar 

  • Habbat N, Anoun H, Hassouni L, PAGE USING CONTEXTUALIZED DOCUMENT EMBEDDING (2021a) EXTRACTING TOPICS FROM A TV CHANNEL’S FACEBOOK. The International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences, XLVI-4/W5-2021, 245–249. https://doi.org/10.5194/isprs-archives-XLVI-4-W5-2021-245-2021

  • Habbat N, Anoun H, Hassouni L (2021b) Topic Modeling and Sentiment Analysis with LDA and NMF on Moroccan Tweets. In M. Ben Ahmed, İ. Rakıp Karaș, D. Santos, O. Sergeyeva, & A. A. Boudhir (Éds.), Innovations in Smart Cities Applications Volume 4 (pp. 147–161). Springer International Publishing

  • Habbat N, Anoun H, Hassouni L (2022a) Exploration, Sentiment Analysis, Topic Modeling, and Visualization of Moroccan Twitter Data. In J. Kacprzyk, V. E. Balas, & M. Ezziyyani (Éds.), Advanced Intelligent Systems for Sustainable Development (AI2SD’2020) (pp. 1067–1083). Springer International Publishing

  • Habbat N, Anoun H, Hassouni L (2022b) LSTM-CNN Deep Learning Model for French Online Product Reviews Classification. In R. Saidi, B. El Bhiri, Y. Maleh, A. Mosallam, & M. Essaaidi (Éds.), Advanced Technologies for Humanity (pp. 228–240). Springer International Publishing

  • Habbat N, Anoun H, Hassouni L (2022c) Sentiment analysis and topic modeling on Arabic Twitter Data during Covid-19 pandemic. Indonesian J Innov Appl Sci (IJIAS) 2(1):60–67. https://doi.org/10.47540/ijias.v2i1.432

    Article  Google Scholar 

  • Hajek P, Barushka A, Munk M (2021) Neural networks with emotion associations, topic modeling and supervised term weighting for sentiment analysis. Int J Neural Syst 31(10):2150013. https://doi.org/10.1142/S0129065721500131

    Article  Google Scholar 

  • He K, Mao R, Gong T, Li C, Cambria E (2022) Meta-based self-training and re-weighting for aspect-based sentiment analysis. IEEE Trans Affect Comput 1–13. https://doi.org/10.1109/TAFFC.2022.3202831

  • Hicham N, Karim S (2022) Analysis of unsupervised machine learning techniques for an efficient customer segmentation using clustering ensemble and spectral clustering. Int J Adv Comput Sci Appl 13(10). https://doi.org/10.14569/IJACSA.2022.0131016

  • Hicham N, Karim S, Habbat N (2022) An efficient approach for improving customer Sentiment Analysis in the Arabic language using an Ensemble machine learning technique. 2022 5th International Conference on Advanced Communication Technologies and Networking (CommNet), 1–6. https://doi.org/10.1109/CommNet56067.2022.9993924

  • Hussain S, Ayoub M, Jilani G, Yu Y, Khan A, Wahid JA, Butt MFA, Yang G, Moller DPF, Weiyan H (2022) Aspect2Labels: a novelistic decision support system for higher educational institutions by using multi-layer topic modelling approach. Expert Syst Appl 209:118119. https://doi.org/10.1016/j.eswa.2022.118119

    Article  Google Scholar 

  • Jalali E, Zojaji Z, Soleimani M, on Iranian Tourist Destinations (2023) A Sentiment Analysis Dataset Preparation Framework: A Case Study on User Reviews Dataset. 2023 9th International Conference on Web Research (ICWR), 330–334. https://doi.org/10.1109/ICWR57742.2023.10139217

  • Keung P, Lu Y, Szarvas G, Smith NA (2020) The Multilingual Amazon Reviews Corpus. arXiv:2010.02573 [Cs]. http://arxiv.org/abs/2010.02573

  • Khan L, Amjad A, Ashraf N, Chang H-T (2022) Multi-class sentiment analysis of Urdu text using multilingual BERT. Sci Rep 12(1):5436. https://doi.org/10.1038/s41598-022-09381-9

    Article  Google Scholar 

  • Kim D, Kang P (2022) Cross-modal distillation with audio–text fusion for fine-grained emotion classification using BERT and Wav2vec 2.0. Neurocomputing 506:168–183. https://doi.org/10.1016/j.neucom.2022.07.035

    Article  Google Scholar 

  • Kingma DP, Welling M (2014) Auto-Encoding Variational Bayes. arXiv:1312.6114 [Cs, Stat]. http://arxiv.org/abs/1312.6114

  • Li X, Zhang Y, Mei L (2023) Analyzing online reviews of foreign tourists to destination attractions in China: a novel text mining approach. Asia Pac J Tourism Res 28(7):647–666. https://doi.org/10.1080/10941665.2023.2255315

    Article  Google Scholar 

  • Li H, Yu BXB, Li G, Gao H (2023a) Restaurant survival prediction using customer-generated content: an aspect-based sentiment analysis of online reviews. Tour Manag 96:104707. https://doi.org/10.1016/j.tourman.2022.104707

    Article  Google Scholar 

  • Liu RR, Lin J, Wei Q, Jiang Q (2023) Fuzhou destination image perception study: Based on machine learning LDA model and SVM model. In R. Liang & J. Wang (Éds.), International Conference on Computer Graphics, Artificial Intelligence, and Data Processing (ICCAID 2022) (p. 147). SPIE. https://doi.org/10.1117/12.2674702

  • Manosso FC, Domareski R, Thays C (2021) Using sentiment analysis in tourism research: A systematic, bibliometric, and integrative review. https://doi.org/10.5281/ZENODO.5548426

  • Mao R, Liu Q, He K, Li W, Cambria E (2022) The biases of Pre-trained Language models: an empirical study on prompt-based sentiment analysis and emotion detection. IEEE Trans Affect Comput 1–11. https://doi.org/10.1109/TAFFC.2022.3204972

  • Martin L, Muller B, Suárez PJO, Dupont Y, Romary L, de la Clergerie ÉV, Seddah D, Sagot B (2020) CamemBERT: a tasty French Language Model. Proc 58th Annual Meeting Association Comput Linguistics 7203–7219. https://doi.org/10.18653/v1/2020.acl-main.645

  • Miao Y, Grefenstette E, Blunsom P (2018) Discovering Discrete Latent Topics with Neural Variational Inference. arXiv:1706.00359 [Cs]. http://arxiv.org/abs/1706.00359

  • Nayak K, Panigrahy SK (2023) A Machine Learning Model to Classify Indian Taxi System in Tourism Industry. 2023 3rd International conference on Artificial Intelligence and Signal Processing (AISP), 1–5. https://doi.org/10.1109/AISP57993.2023.10134814

  • Ounacer S, Mhamdi D, Ardchir S, Daif A, Azzouazi M (2023) Customer sentiment analysis in Hotel reviews through Natural Language Processing techniques. Int J Adv Comput Sci Appl 14(1). https://doi.org/10.14569/IJACSA.2023.0140162

  • Priyamal GAN, Rupasingha RAHM (2023) Sentiment Analysis of Twitter Data on the Tourism Industry During the Covid-19 Pandemic. 2023 3rd International Conference on Advanced Research in Computing (ICARC), 48–53. https://doi.org/10.1109/ICARC57651.2023.10145708

  • Puh K, Bagić Babac M (2023) Predicting sentiment and rating of tourist reviews using machine learning. J Hospitality Tourism Insights 6(3):1188–1204. https://doi.org/10.1108/JHTI-02-2022-0078

    Article  Google Scholar 

  • Radford A, Narasimhan K, Salimans T, Sutskever I (s. d.). Improving Language Understanding by Generative Pre-Training. 12

  • Ranga KK, Nagpal CK, Vedpal V (2023) Trip planner: a Big Data Analytics based recommendation system for Tourism Planning. Int J Recent Innov Trends Comput Communication 11(3s):159–174. https://doi.org/10.17762/ijritcc.v11i3s.6176

    Article  Google Scholar 

  • Sagi O, Rokach L (2018) Ensemble learning: a survey. WIREs Data Min Knowl Discov 8(4):e1249. https://doi.org/10.1002/widm.1249

    Article  Google Scholar 

  • Saranya S, Usha G (2023) A machine learning-based technique with Intelligent WordNet Lemmatize for Twitter Sentiment Analysis. Intell Autom Soft Comput 36(1):339–352. https://doi.org/10.32604/iasc.2023.031987

    Article  Google Scholar 

  • Sarkar K (2020) A Stacked Ensemble Approach to Bengali Sentiment Analysis. In U. S. Tiwary & S. Chaudhury (Éds.), Intelligent Human Computer Interaction (pp. 102–111). Springer International Publishing

  • Sattar K, Umer Q, Vasbieva DG, Chung S, Latif Z, Lee C (2021) A Multi-layer Network for aspect-based cross-lingual sentiment classification. IEEE Access 9:133961–133973. https://doi.org/10.1109/ACCESS.2021.3116053

    Article  Google Scholar 

  • Sievert C, Shirley K (2014) LDAvis: A method for visualizing and interpreting topics. Proceedings of the Workshop on Interactive Language Learning, Visualization, and Interfaces, 63–70. https://doi.org/10.3115/v1/W14-3110

  • Srivastava A, Sutton C (2017) Autoencoding Variational Inference For Topic Models. arXiv:1703.01488 [Stat]. http://arxiv.org/abs/1703.01488

  • Srivastava PR, Eachempati P, Charles V, Rana NP (2023) A hybrid machine learning approach to hotel sales rank prediction. J Oper Res Soc 74(6):1407–1423. https://doi.org/10.1080/01605682.2022.2096498

    Article  Google Scholar 

  • Sufian SAM, Bahrin UFM, Jantan H (2023) Deep Dive into Hotel Reviews Analysis Using Convolutional Neural Networks Algorithm. 2023 International Conference on Data Science and Its Applications (ICoDSA), 94–99. https://doi.org/10.1109/ICoDSA58501.2023.10277549

  • Tela A, Woubie A, Hautamaki V (2020) Transferring Monolingual Model to Low-Resource Language: The Case of Tigrinya (arXiv:2006.07698). arXiv. http://arxiv.org/abs/2006.07698

  • Vaish N, Goel N, Gupta G (2022) Machine Learning Techniques for Sentiment Analysis of Hotel Reviews. 2022 International Conference on Computer Communication and Informatics (ICCCI), 01–07. https://doi.org/10.1109/ICCCI54379.2022.9740876

  • Viñán-Ludeña MS, De Campos LM (2022) Discovering a tourism destination with social media data: BERT-based sentiment analysis. J Hospitality Tourism Technol 13(5):907–921. https://doi.org/10.1108/JHTT-09-2021-0259

    Article  Google Scholar 

  • Wang H, Lu Y, Zhai C (s. d.). Latent aspect rating analysis on review text data: A rating regression approach. 10

  • Wen Y, Liang Y, Zhu X (2023) Sentiment analysis of hotel online reviews using the BERT model and ERNIE model—data from China. PLoS ONE 18(3):e0275382. https://doi.org/10.1371/journal.pone.0275382

    Article  Google Scholar 

  • Wu DC, Zhong S, Qiu RTR, Wu J (2022) Are customer reviews just reviews ? Hotel forecasting using sentiment analysis. Tour Econ 28(3):795–816. https://doi.org/10.1177/13548166211049865

    Article  Google Scholar 

  • Wu H, Huang C, Deng S (2023) Improving aspect-based sentiment analysis with knowledge-aware dependency Graph Network. Inform Fusion 92:289–299. https://doi.org/10.1016/j.inffus.2022.12.004

    Article  Google Scholar 

  • Yang Z, Dai Z, Yang Y, Carbonell J, Salakhutdinov R, Le QV (2020) XLNet: Generalized Autoregressive Pretraining for Language Understanding. arXiv:1906.08237 [Cs]. http://arxiv.org/abs/1906.08237

Download references

Author information

Authors and Affiliations

Authors

Contributions

N.H.: Conceptualization, Methodology, Resources, Software, Data curation, Writing- Original draft preparation, Revisions. Hi.N.: Resources, Visualization, Investigation, Data curation, Writing- Original draft preparation, Revisions.

Corresponding author

Correspondence to Nassera Habbat.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Habbat, N., Nouri, H. Unlocking travel narratives: a fusion of stacking ensemble deep learning and neural topic modeling for enhanced tourism comment analysis. Soc. Netw. Anal. Min. 14, 82 (2024). https://doi.org/10.1007/s13278-024-01256-3

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s13278-024-01256-3

Keywords

Navigation