Abstract
As an emerging research paradigm, big data analytics has been gaining currency in various fields. However, in existing hospitality and tourism literature there is scarcity of discussions on the quality of data which may impact the validity and generalizability of research findings. This study examines the reliability of online hotel reviews in TripAdvisor by developing a text classifier to predict travel purpose (i.e., business versus leisure) based upon review textual contents. The classifier is tested over a range of cities and data sizes to examine its sensitivity to data samples. The findings show that, while the classifier’s performance is fairly consistent across different sets of cities, there are variations in response to data sizes and sampling methods. More importantly, a considerable amount of noise is found in the data, which leads to misclassification. Furthermore, a novel approach is developed to address the misclassification problem resulting from data noise. This study reveals important data quality issues and contributes to the theoretical foundations of social media analytics in hospitality and tourism.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Banerjee, S., & Chua, A. Y. (2016). In search of patterns among travellers’ hotel ratings in TripAdvisor. Tourism Management, 53, 125–131.
Bird, S., Klein, E., & Loper, E. (2009). Natural language processing with python. O’Reilly Media, Inc.
Chua, A. Y., & Banerjee, S. (2013). Reliability of reviews on the Internet: The case of Tripadvisor. In Proceedings of the World Congress on Engineering and Computer Science (Vol. 1). Available at http://www.iaeng.org/publication/WCECS2013/WCECS2013_pp453-457.pdf
Ekbia, H., Mattioli, M., Kouper, I., Arave, G., Ghazinejad, A., Bowman, T., … & Sugimoto, C. R. (2015). Big data, bigger dilemmas: A critical review. Journal of the Association for Information Science and Technology, 66(8), 1523–1545.
Fan, W., & Gordon, M. D. (2014). The power of social media analytics. Communications of the ACM, 57(6), 74–81.
Fesenmaier, D. R., Wöber, K. W., & Werthner, H. (Eds.). (2006). Destination recommendation systems: Behavioral foundations and applications. CABI.
Frické, M. (2015). Big data and its epistemology. Journal of the Association for Information Science and Technology, 66(4), 651–661.
Gretzel, U., & Fesenmaier, D. R. (2002). Building narrative logic into tourism information systems. IEEE Intelligent Systems, 17(6), 59–61.
Lazer, D., Pentland, A. S., Adamic, L., Aral, S., Barabasi, A. L., Brewer, D., … & Jebara, T. (2009). Life in the network: The coming age of computational social science. Science, 323(5915), 721 (New York, NY).
McCallum, A., & Nigam, K. (1998). A comparison of event models for naive bayes text classification. In AAAI-98 Workshop on Learning for Text Categorization (Vol. 752, pp. 41–48).
Mccleary, K. W., Weaver, P. A., & Hutchinson, J. C. (1993). Hotel selection factors as they relate to business travel situations. Journal of Travel Research, 32(2), 42–48.
Nigam, K., McCallum, A. K., Thrun, S., & Mitchell, T. (2000). Text classification from labeled and unlabeled documents using EM. Machine Learning, 39(2–3), 103–134.
Park, S., & Nicolau, J. L. (2015). Asymmetric effects of online consumer reviews. Annals of Tourism Research, 50, 67–83.
Ruths, D., & Pfeffer, J. (2014). Social media for large studies of behavior. Science, 346(6213), 1063–1064.
Schuckert, M., Liu, X., & Law, R. (2015). Hospitality and tourism online reviews: Recent trends and future directions. Journal of Travel & Tourism Marketing, 32(5), 608–621.
Schuckert, M., Liu, X., & Law, R. (2016). Insights into suspicious online ratings: Direct evidence from TripAdvisor. Asia Pacific Journal of Tourism Research, 21(3), 259–272.
Tufekci, Z. (2014). Big questions for social media big data: Representativeness, validity and other methodological pitfalls. arXiv preprint arXiv:1403.7400
Xiang, Z., & Pan, B. (2011). Travel queries on cities in the United States: Implications for search engine marketing for tourist destinations. Tourism Management, 32(1), 88–97.
Xiang, Z., Schwartz, Z., Gerdes, J., & Uysal, M. (2015). What can big data and text analytics tell us about hotel guest experience and satisfaction? International Journal of Hospitality Management, 44(1), 120–130.
Xiang, Z., Du, Q., Ma, Y., & Fan, W. (forthcoming). A comparative analysis of major online review platforms: Implications for social media analytics in hospitality and tourism. Tourism Management.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2017 Springer International Publishing AG
About this paper
Cite this paper
Xiang, Z., Du, Q., Ma, Y., Fan, W. (2017). Assessing Reliability of Social Media Data: Lessons from Mining TripAdvisor Hotel Reviews. In: Schegg, R., Stangl, B. (eds) Information and Communication Technologies in Tourism 2017. Springer, Cham. https://doi.org/10.1007/978-3-319-51168-9_45
Download citation
DOI: https://doi.org/10.1007/978-3-319-51168-9_45
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-51167-2
Online ISBN: 978-3-319-51168-9
eBook Packages: Business and ManagementBusiness and Management (R0)