Skip to main content
Log in

Automatic analysis of textual hotel reviews

  • Original Research
  • Published:
Information Technology & Tourism Aims and scope Submit manuscript

Abstract

Social Media and consumer-generated content continue to grow and impact the hospitality domain. Consumers write online reviews to indicate their level of satisfaction with a hotel and inform other consumers on the Internet of their hotel stay experience. A number of websites specialized in tourism and hospitality have flourished on the Web (e.g. Tripadvisor). The tremendous growth of these data-generating sources demands new tools to deal with them. To cope with big amounts of customer-generated reviews and comments, Natural Language Processing (NLP) tools have become necessary to automatically process and manage textual customer reviews (e.g. to perform Sentiment Analysis). This work describes OpeNER, a NLP platform applied to the hospitality domain to automatically process customer-generated textual content and obtain valuable information from it. The presented platform consists of a set of Open Source and free NLP tools to analyse text based on a modular architecture to ease its modification and extension. The training and evaluation has been performed using a set of manually annotated hotel reviews gathered from websites like Zoover and HolidayCheck.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11

Similar content being viewed by others

Notes

  1. The complete information can be found at http://www.opener-project.eu.

  2. https://opennlp.apache.org/index.html.

  3. https://github.com/dbpedia-spotlight/dbpedia-spotlight/wiki.

  4. Formerly KAF acronym stood for Kyoto Annotation Format, due to the name of the project in which a first version of KAF was designed. Since then KAF has evolved and the K letter changed its meaning to “Knowledge”.

  5. https://github.com/shuyo/language-detection.

  6. https://www.ling.upenn.edu/courses/Fall_2003/ling001/penn_treebank_pos.html.

  7. https://code.google.com/p/semanticvectors/.

  8. http://www.zoover.com.

  9. http://www.holidaycheck.com/.

  10. http://tour-pedia.org/about/.

References

  • Agerri R, Cuadros M, Gaines S, Rigau G (2013) OpeNER: Open Polarity Enhanced Named Entity Recognition. In: Proceedings of the 29th annual meeting of Sociedad Española para el Procesamiento del Lenguaje Natural, SEPLN’13. Madrid, España. Procesamiento del Lenguaje Natural, vol. 51, pp 215–218

  • Bacciu C, Lo Duca A, Marchetti A, Tesconi M (2014) Accommodations in Tuscany as Linked Data. In: Proceedings of the 9th edition of the language resources and evaluation conference

  • Bagga A, Baldwin B (1999) Cross-document event coreference: Annotations, experiments, and observations. In: Proceedings of the workshop on coreference and its applications

  • Bosma W, Vossen P, Soroa A (2009) KAF: a generic semantic annotation format. In: Proceedings of the GL2009 Workshop on semantic annotation

  • Brants T (2000) TnT: a statistical part-of-speech tagger. In: Proceedings of the sixth conference on Applied natural language processing, vol 1

  • Brereton RG, Lloyd GR (2010) Support vector machines for classification and regression. Analyst 135:230–267

    Article  Google Scholar 

  • Browning V, So KKF, Sparks B (2013) The influence of online reviews on consumers’ attributions of service quality and control for service standards in hotels. J Travel Tour Mark 30(1–2):23–40

    Article  Google Scholar 

  • Cambria E, White B (2014) Jumping NLP curves: a review of natural language processing research [review article]. Comput Intell Mag IEEE 9(2):48–57

    Article  Google Scholar 

  • Cambria E, Schuller B, Xia Y, Havasi C (2013) New avenues in opinion mining and sentiment analysis. IEEE Intell Syst 2:15–21

    Article  Google Scholar 

  • Collins M (2002) Discriminative training methods for hidden Markov models: theory and experiments with perceptron algorithms. In: Proceedings of the ACL-02 conference on empirical methods in natural language processing, pp 1–8

  • Derczynski L, Ritter A, Clark S, Bontcheva K (2013) Twitter part-of-speech tagging for all: overcoming sparse and noisy data. In: Proceedings of the recent advances in natural language processing, September, pp 198–206

  • Doan A, Ramakrishnan R, Halevy AY (2011) Crowdsourcing systems on the World-Wide Web. Commun ACM 54(4):86–96

    Article  Google Scholar 

  • Dunning T (1993) Accurate methods for the statistics of surprise and coincidence. Comput Linguist 19(1):61–74

    Google Scholar 

  • Filieri R, McLeay F (2014) E-WOM and accommodation: an analysis of the factors that influence travelers’ adoption of information from online reviews. J Travel Res. 53(1):44–57

    Article  Google Scholar 

  • Ghose A, Ipeirotis P, Li B (2009) The economic impact of user-generated content on the Internet: Combining text mining with demand estimation in the hotel industry. In: Proceedings of the 20th workshop on information systems and economics (WISE)

  • Giesbrecht E, Evert S (2009) Is part-of-speech tagging a solved task? An evaluation of POS taggers for the German Web as Corpus. Web Corpus Workshop WAC 5:27

    Google Scholar 

  • Gräbner D, Zanker M, Fliedl G, Fuchs M (2012) Classification of customer reviews based on sentiment analysis. In: Proceedings of the 19th conference on information and communication technologies in tourism (ENTER), pp 460–470

  • Hu M, Liu B (2004) Mining opinion features in customer reviews. AAAI. 4(4):755–760

    Google Scholar 

  • Kasper W, Vela M (2011) Sentiment analysis for hotel reviews. Computational linguistics-applications conference, pp 45–52

  • Kim EEK, Mattila AS, Baloglu S (2011) Effects of gender and expertise on consumers’ motivation to read online hotel reviews. Cornell Hosp Q. 52(4):399–406

    Article  Google Scholar 

  • Kiyavitskaya N, Zeni N, Cordy JR, Mich L, Mylopoulos J (2009) Cerno: light-weight tool support for semantic annotation of textual documents. Data Knowl Eng 68(12):1470–1492

    Article  Google Scholar 

  • Lau K, Lee K, Ho Y (2005) Text mining for the hotel industry. Cornell Hotel Restaur Adm Q 46(3):344–362

    Article  Google Scholar 

  • Lee H, Peirsman Y, Chang A, Chambers N, Surdeanu M, Jurafsky D (2011) Stanford’ s multi-pass sieve coreference resolution system at the CoNLL-2011 shared task. In: Proceedings of the fifteenth conference on computational natural language learning: shared task. Association for Computational Linguistics, pp 28–34

  • Lee MJ, Singh N, Chan ESW (2011b) Service failures and recovery actions in the hotel industry: a text-mining approach. J Vacation Mark 17(3):197–207

    Article  Google Scholar 

  • Litvin SW, Goldsmith RE, Pan B (2008) Electronic word-of-mouth in hospitality and tourism management. Tour Manag 29(3):458–468

    Article  Google Scholar 

  • Liu B (2010) Sentiment analysis and subjectivity. Handb Nat Lang Process 2:627–666

    Google Scholar 

  • Liu Z, Park S (2015) What makes a useful online review? Implication for travel product websites. Tour Manag 47:140–151

    Article  Google Scholar 

  • Liu S, Law R, Rong J, Li G, Hall J (2013) Analyzing changes in hotel customers’ expectations by trip mode. Int J Hosp Manag 34:359–371

    Article  Google Scholar 

  • Marrero M, Urbano J, Sánchez-Cuadrado S, Morato J, Gómez-Berbís JM (2012) Named entity recognition: fallacies, challenges and opportunities. Comput Stand Interfaces

  • Montejo-Ráez A, Díaz-Galiano MC, Martinez-Santiago F, Ureña-López LA (2014) Crowd explicit sentiment analysis. Knowl Based Syst 69:134–139

    Article  Google Scholar 

  • Nadeau D, Sekine S (2007) A survey of named entity recognition and classification. Lingvisticae Investig 30(1):3–26

    Article  Google Scholar 

  • O’Connor P (2008) User-generated content and travel: a case study on TripAd-visor.com. In: O’Connor P, Höpken W, Gretzel U (eds) Information and communication technologies in tourism, vol 2008. Springer, Vienna, pp 47–58

    Google Scholar 

  • O’Reilly T (2005) What Is Web 2.0? Design patterns and business models for the next generation of software, September 30. http://www.oreillynet.com/pub/a/oreilly/tim/news/2005/09/30/what-is-web-20.html. Accessed 14 Dec 2015

  • Pang B, Lee L (2008) Opinion mining and sentiment analysis. Found Trends Inf Retr 2(1–2):1–135

    Article  Google Scholar 

  • Park S-Y, Allen JP (2013) Responding to online reviews: problem solving and engagement in hotels. Cornell Hosp Q 54(1):64–73

    Article  Google Scholar 

  • Popescu A, Etzioni O (2005) Extracting product features and opinions from reviews. Nat Lang Process Text Min (October), pp 339–346

  • Ramanathan U, Ramanathan R (2011) Guests’ perceptions on factors influencing customer loyalty: an analysis for UK hotels. Int J Contemp Hosp Manag 23(1):7–25

    Article  Google Scholar 

  • Rao D, McNamee P, Dredze M (2013) Entity linking: Finding extracted entities in a knowledge base. In: Poibeau T, Saggion H, Piskorski J, Yangarber R (eds) Multi-source, multilingual information extraction and summarization, part II. Springer, Berlin, Heidelberg, pp 93–115

    Chapter  Google Scholar 

  • Řehůřek R, Kolkus M (2009) Language identification on the web: extending the dictionary method. In: Gelbukh A (ed) Computational linguistics and intelligent text processing. Springer, Berlin, Heidelberg, pp 357–368

    Google Scholar 

  • Sahlgren M (2005) An introduction to random indexing. In: Methods and applications of semantic indexing workshop at the 7th international conference on terminology and knowledge engineering, TKE, vol. 5

  • Sil A, Cronin E, Nie P, Yang Y, Popescu A-M, Yates A (2012) Linking named entities to any database. EMNLP-CoNLL 2012, pp 116–127

  • Sun L, Mielens J, Baldridge J (2014) Parsing low-resource languages using Gibbs sampling for PCFGs with latent annotations. In: Proceedings of the 2014 conference on empirical methods in natural language processing (EMNLP), 2002, pp 290–300

  • Sutton C, McCallum A (2012) An introduction to conditional random fields. Found Trends Mach Learn 4:267–373

    Article  Google Scholar 

  • Webster JJ, Kit C (1992).Tokenization as the initial phase in NLP. Proceedings of COLING-92, pp 1106–1110

  • Widdows D, Cohen T (2010) The semantic vectors package: New algorithms and public tools for distributional semantics. In Semantic Computing (ICSC), 2010 IEEE Fourth International Conference on IEEE, pp 9–15

  • Xiang Z, Schwartz Z, Gerdes JH, Uysal M (2015) What can big data and text analytics tell us about hotel guest experience and satisfaction? Int J Hosp Manag 44:120–130

    Article  Google Scholar 

  • Ye Q, Zhang Z, Law R (2009) Sentiment classification of online reviews to travel destinations by supervised machine learning approaches. Exp Syst Appl, 36(3):6527–6535 (Elsevier Ltd)

  • Ye Q, Law R, Gu B, Chen W (2011) The influence of user-generated content on traveler behavior: an empirical investigation on the effects of e-word-of-mouth to hotel online bookings. Comput Hum Behav 27(2):634–639

    Article  Google Scholar 

  • Zhang Z, Wang F, Law R, Li D (2013) Factors influencing the effective-ness of online group buying in the restaurant industry. Int J Hosp Manag 35:237–245

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Aitor García-Pablos.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

García-Pablos, A., Cuadros, M. & Linaza, M.T. Automatic analysis of textual hotel reviews. Inf Technol Tourism 16, 45–69 (2016). https://doi.org/10.1007/s40558-015-0047-7

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s40558-015-0047-7

Keywords

Navigation