Automatic analysis of textual hotel reviews

García-Pablos, Aitor; Cuadros, Montse; Linaza, Maria Teresa

doi:10.1007/s40558-015-0047-7

Automatic analysis of textual hotel reviews

Original Research
Published: 22 December 2015

Volume 16, pages 45–69, (2016)
Cite this article

Information Technology & Tourism Aims and scope Submit manuscript

Aitor García-Pablos ORCID: orcid.org/0000-0001-9882-7521¹,
Montse Cuadros² &
Maria Teresa Linaza¹

1501 Accesses
47 Citations
Explore all metrics

Abstract

Social Media and consumer-generated content continue to grow and impact the hospitality domain. Consumers write online reviews to indicate their level of satisfaction with a hotel and inform other consumers on the Internet of their hotel stay experience. A number of websites specialized in tourism and hospitality have flourished on the Web (e.g. Tripadvisor). The tremendous growth of these data-generating sources demands new tools to deal with them. To cope with big amounts of customer-generated reviews and comments, Natural Language Processing (NLP) tools have become necessary to automatically process and manage textual customer reviews (e.g. to perform Sentiment Analysis). This work describes OpeNER, a NLP platform applied to the hospitality domain to automatically process customer-generated textual content and obtain valuable information from it. The presented platform consists of a set of Open Source and free NLP tools to analyse text based on a modular architecture to ease its modification and extension. The training and evaluation has been performed using a set of manually annotated hotel reviews gathered from websites like Zoover and HolidayCheck.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Notes

The complete information can be found at http://www.opener-project.eu.
https://opennlp.apache.org/index.html.
https://github.com/dbpedia-spotlight/dbpedia-spotlight/wiki.
Formerly KAF acronym stood for Kyoto Annotation Format, due to the name of the project in which a first version of KAF was designed. Since then KAF has evolved and the K letter changed its meaning to “Knowledge”.
https://github.com/shuyo/language-detection.
https://www.ling.upenn.edu/courses/Fall_2003/ling001/penn_treebank_pos.html.
https://code.google.com/p/semanticvectors/.
http://www.zoover.com.
http://www.holidaycheck.com/.
http://tour-pedia.org/about/.

References

Agerri R, Cuadros M, Gaines S, Rigau G (2013) OpeNER: Open Polarity Enhanced Named Entity Recognition. In: Proceedings of the 29th annual meeting of Sociedad Española para el Procesamiento del Lenguaje Natural, SEPLN’13. Madrid, España. Procesamiento del Lenguaje Natural, vol. 51, pp 215–218
Bacciu C, Lo Duca A, Marchetti A, Tesconi M (2014) Accommodations in Tuscany as Linked Data. In: Proceedings of the 9th edition of the language resources and evaluation conference
Bagga A, Baldwin B (1999) Cross-document event coreference: Annotations, experiments, and observations. In: Proceedings of the workshop on coreference and its applications
Bosma W, Vossen P, Soroa A (2009) KAF: a generic semantic annotation format. In: Proceedings of the GL2009 Workshop on semantic annotation
Brants T (2000) TnT: a statistical part-of-speech tagger. In: Proceedings of the sixth conference on Applied natural language processing, vol 1
Brereton RG, Lloyd GR (2010) Support vector machines for classification and regression. Analyst 135:230–267
Article Google Scholar
Browning V, So KKF, Sparks B (2013) The influence of online reviews on consumers’ attributions of service quality and control for service standards in hotels. J Travel Tour Mark 30(1–2):23–40
Article Google Scholar
Cambria E, White B (2014) Jumping NLP curves: a review of natural language processing research [review article]. Comput Intell Mag IEEE 9(2):48–57
Article Google Scholar
Cambria E, Schuller B, Xia Y, Havasi C (2013) New avenues in opinion mining and sentiment analysis. IEEE Intell Syst 2:15–21
Article Google Scholar
Collins M (2002) Discriminative training methods for hidden Markov models: theory and experiments with perceptron algorithms. In: Proceedings of the ACL-02 conference on empirical methods in natural language processing, pp 1–8
Derczynski L, Ritter A, Clark S, Bontcheva K (2013) Twitter part-of-speech tagging for all: overcoming sparse and noisy data. In: Proceedings of the recent advances in natural language processing, September, pp 198–206
Doan A, Ramakrishnan R, Halevy AY (2011) Crowdsourcing systems on the World-Wide Web. Commun ACM 54(4):86–96
Article Google Scholar
Dunning T (1993) Accurate methods for the statistics of surprise and coincidence. Comput Linguist 19(1):61–74
Google Scholar
Filieri R, McLeay F (2014) E-WOM and accommodation: an analysis of the factors that influence travelers’ adoption of information from online reviews. J Travel Res. 53(1):44–57
Article Google Scholar
Ghose A, Ipeirotis P, Li B (2009) The economic impact of user-generated content on the Internet: Combining text mining with demand estimation in the hotel industry. In: Proceedings of the 20th workshop on information systems and economics (WISE)
Giesbrecht E, Evert S (2009) Is part-of-speech tagging a solved task? An evaluation of POS taggers for the German Web as Corpus. Web Corpus Workshop WAC 5:27
Google Scholar
Gräbner D, Zanker M, Fliedl G, Fuchs M (2012) Classification of customer reviews based on sentiment analysis. In: Proceedings of the 19th conference on information and communication technologies in tourism (ENTER), pp 460–470
Hu M, Liu B (2004) Mining opinion features in customer reviews. AAAI. 4(4):755–760
Google Scholar
Kasper W, Vela M (2011) Sentiment analysis for hotel reviews. Computational linguistics-applications conference, pp 45–52
Kim EEK, Mattila AS, Baloglu S (2011) Effects of gender and expertise on consumers’ motivation to read online hotel reviews. Cornell Hosp Q. 52(4):399–406
Article Google Scholar
Kiyavitskaya N, Zeni N, Cordy JR, Mich L, Mylopoulos J (2009) Cerno: light-weight tool support for semantic annotation of textual documents. Data Knowl Eng 68(12):1470–1492
Article Google Scholar
Lau K, Lee K, Ho Y (2005) Text mining for the hotel industry. Cornell Hotel Restaur Adm Q 46(3):344–362
Article Google Scholar
Lee H, Peirsman Y, Chang A, Chambers N, Surdeanu M, Jurafsky D (2011) Stanford’ s multi-pass sieve coreference resolution system at the CoNLL-2011 shared task. In: Proceedings of the fifteenth conference on computational natural language learning: shared task. Association for Computational Linguistics, pp 28–34
Lee MJ, Singh N, Chan ESW (2011b) Service failures and recovery actions in the hotel industry: a text-mining approach. J Vacation Mark 17(3):197–207
Article Google Scholar
Litvin SW, Goldsmith RE, Pan B (2008) Electronic word-of-mouth in hospitality and tourism management. Tour Manag 29(3):458–468
Article Google Scholar
Liu B (2010) Sentiment analysis and subjectivity. Handb Nat Lang Process 2:627–666
Google Scholar
Liu Z, Park S (2015) What makes a useful online review? Implication for travel product websites. Tour Manag 47:140–151
Article Google Scholar
Liu S, Law R, Rong J, Li G, Hall J (2013) Analyzing changes in hotel customers’ expectations by trip mode. Int J Hosp Manag 34:359–371
Article Google Scholar
Marrero M, Urbano J, Sánchez-Cuadrado S, Morato J, Gómez-Berbís JM (2012) Named entity recognition: fallacies, challenges and opportunities. Comput Stand Interfaces
Montejo-Ráez A, Díaz-Galiano MC, Martinez-Santiago F, Ureña-López LA (2014) Crowd explicit sentiment analysis. Knowl Based Syst 69:134–139
Article Google Scholar
Nadeau D, Sekine S (2007) A survey of named entity recognition and classification. Lingvisticae Investig 30(1):3–26
Article Google Scholar
O’Connor P (2008) User-generated content and travel: a case study on TripAd-visor.com. In: O’Connor P, Höpken W, Gretzel U (eds) Information and communication technologies in tourism, vol 2008. Springer, Vienna, pp 47–58
Google Scholar
O’Reilly T (2005) What Is Web 2.0? Design patterns and business models for the next generation of software, September 30. http://www.oreillynet.com/pub/a/oreilly/tim/news/2005/09/30/what-is-web-20.html. Accessed 14 Dec 2015
Pang B, Lee L (2008) Opinion mining and sentiment analysis. Found Trends Inf Retr 2(1–2):1–135
Article Google Scholar
Park S-Y, Allen JP (2013) Responding to online reviews: problem solving and engagement in hotels. Cornell Hosp Q 54(1):64–73
Article Google Scholar
Popescu A, Etzioni O (2005) Extracting product features and opinions from reviews. Nat Lang Process Text Min (October), pp 339–346
Ramanathan U, Ramanathan R (2011) Guests’ perceptions on factors influencing customer loyalty: an analysis for UK hotels. Int J Contemp Hosp Manag 23(1):7–25
Article Google Scholar
Rao D, McNamee P, Dredze M (2013) Entity linking: Finding extracted entities in a knowledge base. In: Poibeau T, Saggion H, Piskorski J, Yangarber R (eds) Multi-source, multilingual information extraction and summarization, part II. Springer, Berlin, Heidelberg, pp 93–115
Chapter Google Scholar
Řehůřek R, Kolkus M (2009) Language identification on the web: extending the dictionary method. In: Gelbukh A (ed) Computational linguistics and intelligent text processing. Springer, Berlin, Heidelberg, pp 357–368
Google Scholar
Sahlgren M (2005) An introduction to random indexing. In: Methods and applications of semantic indexing workshop at the 7th international conference on terminology and knowledge engineering, TKE, vol. 5
Sil A, Cronin E, Nie P, Yang Y, Popescu A-M, Yates A (2012) Linking named entities to any database. EMNLP-CoNLL 2012, pp 116–127
Sun L, Mielens J, Baldridge J (2014) Parsing low-resource languages using Gibbs sampling for PCFGs with latent annotations. In: Proceedings of the 2014 conference on empirical methods in natural language processing (EMNLP), 2002, pp 290–300
Sutton C, McCallum A (2012) An introduction to conditional random fields. Found Trends Mach Learn 4:267–373
Article Google Scholar
Webster JJ, Kit C (1992).Tokenization as the initial phase in NLP. Proceedings of COLING-92, pp 1106–1110
Widdows D, Cohen T (2010) The semantic vectors package: New algorithms and public tools for distributional semantics. In Semantic Computing (ICSC), 2010 IEEE Fourth International Conference on IEEE, pp 9–15
Xiang Z, Schwartz Z, Gerdes JH, Uysal M (2015) What can big data and text analytics tell us about hotel guest experience and satisfaction? Int J Hosp Manag 44:120–130
Article Google Scholar
Ye Q, Zhang Z, Law R (2009) Sentiment classification of online reviews to travel destinations by supervised machine learning approaches. Exp Syst Appl, 36(3):6527–6535 (Elsevier Ltd)
Ye Q, Law R, Gu B, Chen W (2011) The influence of user-generated content on traveler behavior: an empirical investigation on the effects of e-word-of-mouth to hotel online bookings. Comput Hum Behav 27(2):634–639
Article Google Scholar
Zhang Z, Wang F, Law R, Li D (2013) Factors influencing the effective-ness of online group buying in the restaurant industry. Int J Hosp Manag 35:237–245
Article Google Scholar

Download references

Author information

Authors and Affiliations

Department of eTourism and Cultural Heritage, Vicomtech-IK4, San Sebastián, Spain
Aitor García-Pablos & Maria Teresa Linaza
Department of Human Speech and Language Technologies, Vicomtech-IK4, San Sebastián, Spain
Montse Cuadros

Authors

Aitor García-Pablos
View author publications
You can also search for this author in PubMed Google Scholar
Montse Cuadros
View author publications
You can also search for this author in PubMed Google Scholar
Maria Teresa Linaza
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Aitor García-Pablos.

Rights and permissions

Reprints and permissions

About this article

Cite this article

García-Pablos, A., Cuadros, M. & Linaza, M.T. Automatic analysis of textual hotel reviews. Inf Technol Tourism 16, 45–69 (2016). https://doi.org/10.1007/s40558-015-0047-7

Download citation

Received: 22 April 2015
Accepted: 10 December 2015
Published: 22 December 2015
Issue Date: March 2016
DOI: https://doi.org/10.1007/s40558-015-0047-7

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Automatic analysis of textual hotel reviews

Abstract

Access this article

Similar content being viewed by others

Natural language processing: state of the art, current trends and challenges

Artificial intelligence in E-Commerce: a bibliometric study and literature review

Sentiment Analysis in the Age of Generative AI

Notes

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Automatic analysis of textual hotel reviews

Abstract

Access this article

Similar content being viewed by others

Natural language processing: state of the art, current trends and challenges

Artificial intelligence in E-Commerce: a bibliometric study and literature review

Sentiment Analysis in the Age of Generative AI

Notes

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation