Skip to main content

Assessing Reliability of Social Media Data: Lessons from Mining TripAdvisor Hotel Reviews

  • Conference paper
  • First Online:
Information and Communication Technologies in Tourism 2017

Abstract

As an emerging research paradigm, big data analytics has been gaining currency in various fields. However, in existing hospitality and tourism literature there is scarcity of discussions on the quality of data which may impact the validity and generalizability of research findings. This study examines the reliability of online hotel reviews in TripAdvisor by developing a text classifier to predict travel purpose (i.e., business versus leisure) based upon review textual contents. The classifier is tested over a range of cities and data sizes to examine its sensitivity to data samples. The findings show that, while the classifier’s performance is fairly consistent across different sets of cities, there are variations in response to data sizes and sampling methods. More importantly, a considerable amount of noise is found in the data, which leads to misclassification. Furthermore, a novel approach is developed to address the misclassification problem resulting from data noise. This study reveals important data quality issues and contributes to the theoretical foundations of social media analytics in hospitality and tourism.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 169.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 219.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  • Banerjee, S., & Chua, A. Y. (2016). In search of patterns among travellers’ hotel ratings in TripAdvisor. Tourism Management, 53, 125–131.

    Article  Google Scholar 

  • Bird, S., Klein, E., & Loper, E. (2009). Natural language processing with python. O’Reilly Media, Inc.

    Google Scholar 

  • Chua, A. Y., & Banerjee, S. (2013). Reliability of reviews on the Internet: The case of Tripadvisor. In Proceedings of the World Congress on Engineering and Computer Science (Vol. 1). Available at http://www.iaeng.org/publication/WCECS2013/WCECS2013_pp453-457.pdf

  • Ekbia, H., Mattioli, M., Kouper, I., Arave, G., Ghazinejad, A., Bowman, T., … & Sugimoto, C. R. (2015). Big data, bigger dilemmas: A critical review. Journal of the Association for Information Science and Technology, 66(8), 1523–1545.

    Google Scholar 

  • Fan, W., & Gordon, M. D. (2014). The power of social media analytics. Communications of the ACM, 57(6), 74–81.

    Article  Google Scholar 

  • Fesenmaier, D. R., Wöber, K. W., & Werthner, H. (Eds.). (2006). Destination recommendation systems: Behavioral foundations and applications. CABI.

    Google Scholar 

  • Frické, M. (2015). Big data and its epistemology. Journal of the Association for Information Science and Technology, 66(4), 651–661.

    Article  Google Scholar 

  • Gretzel, U., & Fesenmaier, D. R. (2002). Building narrative logic into tourism information systems. IEEE Intelligent Systems, 17(6), 59–61.

    Google Scholar 

  • Lazer, D., Pentland, A. S., Adamic, L., Aral, S., Barabasi, A. L., Brewer, D., … & Jebara, T. (2009). Life in the network: The coming age of computational social science. Science, 323(5915), 721 (New York, NY).

    Google Scholar 

  • McCallum, A., & Nigam, K. (1998). A comparison of event models for naive bayes text classification. In AAAI-98 Workshop on Learning for Text Categorization (Vol. 752, pp. 41–48).

    Google Scholar 

  • Mccleary, K. W., Weaver, P. A., & Hutchinson, J. C. (1993). Hotel selection factors as they relate to business travel situations. Journal of Travel Research, 32(2), 42–48.

    Article  Google Scholar 

  • Nigam, K., McCallum, A. K., Thrun, S., & Mitchell, T. (2000). Text classification from labeled and unlabeled documents using EM. Machine Learning, 39(2–3), 103–134.

    Article  Google Scholar 

  • Park, S., & Nicolau, J. L. (2015). Asymmetric effects of online consumer reviews. Annals of Tourism Research, 50, 67–83.

    Article  Google Scholar 

  • Ruths, D., & Pfeffer, J. (2014). Social media for large studies of behavior. Science, 346(6213), 1063–1064.

    Article  Google Scholar 

  • Schuckert, M., Liu, X., & Law, R. (2015). Hospitality and tourism online reviews: Recent trends and future directions. Journal of Travel & Tourism Marketing, 32(5), 608–621.

    Article  Google Scholar 

  • Schuckert, M., Liu, X., & Law, R. (2016). Insights into suspicious online ratings: Direct evidence from TripAdvisor. Asia Pacific Journal of Tourism Research, 21(3), 259–272.

    Article  Google Scholar 

  • Tufekci, Z. (2014). Big questions for social media big data: Representativeness, validity and other methodological pitfalls. arXiv preprint arXiv:1403.7400

  • Xiang, Z., & Pan, B. (2011). Travel queries on cities in the United States: Implications for search engine marketing for tourist destinations. Tourism Management, 32(1), 88–97.

    Article  Google Scholar 

  • Xiang, Z., Schwartz, Z., Gerdes, J., & Uysal, M. (2015). What can big data and text analytics tell us about hotel guest experience and satisfaction? International Journal of Hospitality Management, 44(1), 120–130.

    Article  Google Scholar 

  • Xiang, Z., Du, Q., Ma, Y., & Fan, W. (forthcoming). A comparative analysis of major online review platforms: Implications for social media analytics in hospitality and tourism. Tourism Management.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Zheng Xiang .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2017 Springer International Publishing AG

About this paper

Cite this paper

Xiang, Z., Du, Q., Ma, Y., Fan, W. (2017). Assessing Reliability of Social Media Data: Lessons from Mining TripAdvisor Hotel Reviews. In: Schegg, R., Stangl, B. (eds) Information and Communication Technologies in Tourism 2017. Springer, Cham. https://doi.org/10.1007/978-3-319-51168-9_45

Download citation

Publish with us

Policies and ethics