Abstract
In recent years, many articles have been published about the study of user-generated content (UGC) data in the domains of tourism and hospitality, in particular concerning quantitative and qualitative content analysis of travel blogs and online travel reviews (OTR). In general, researchers have worked on more or less population-representative samples of travel diaries, of tens or hundreds of files, which enables their manual processing. However, due to their dramatic growth, especially in the case of hospitality OTRs, this article proposes a method for semi-automatic downloading, arranging, cleaning, debugging, and analysing large-scale travel blog and OTR data. The main goal is to classify the collected webpages by dates and destinations and to be able to perform offline content analysis of the written text as provided by the author. This methodology is applied to analyse about 85,000 diaries of tourists who visited Catalonia between 2004 and 2013, and significant results are obtained in terms of content analysis.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsReferences
Abburu, S., & Babu, G. S. (2013). A frame work for web information extraction and analysis. International Journal of Computers & Technology, 7(2), 574–579.
Banyai, M., & Glover, T. D. (2012). Evaluating research methods on travel blogs. Journal of Travel Research, 51(3), 267–277.
Eurostat. (2014). Tourism. In Eurostat regional yearbook 2014 (pp. 187–210). Luxembourg: Publications Office of the European Union.
Johnson, P. A., Sieber, R. E., Magnien, N., & Ariwi, J. (2012). Automated web harvesting to collect and analyse user-generated content for tourism. Current Issues in Tourism, 15(3), 293–299.
Liu, B. (2011). Web data mining: Exploring hyperlinks, contents, and usage data. Berlin: Springer.
Lu, W., & Stepchenkova, S. (2014). User-generated content as a research mode in tourism and hospitality applications: Topics, methods, and software. Journal of Hospitality Marketing & Management. doi:10.1080/19368623.2014.907758.
Marine-Roig, E. (2013). From the projected to the transmitted image: The 2.0 construction of tourist destination image and identity in Catalonia. Ph.D. dissertation. Retrieved September 1, 2014 from http://hdl.handle.net/10803/135006
Marine-Roig, E. (2014a). A webometric analysis of travel blogs and reviews hosting: The case of Catalonia. Journal of Travel & Tourism Marketing, 31(3), 381–396.
Marine-Roig, E. (2014b). The impact of the consecration of ‘La Sagrada Familia’ basilica in Barcelona by Pope Benedict XVI. International Journal of Tourism Anthropology (Special issue on “Sites of Religion, Sites of Heritage: Exploring the Interface between Religion and Heritage in Tourist Destinations”), 1–21. Retrieved September 1, 2014, from http://www.inderscience.com/info/ingeneral/forthcoming.php?jcode=IJTA
Michael, C. (2014, May 6). From Milan to Mecca: The world’s most powerful city brands revealed. The Guardian, News, Cities, City brand. Retrieved September 1, 2014, from http://www.theguardian.com/cities/gallery/2014/may/06/from-milan-to-mecca-the-worlds-most-powerful-city-brands-revealed
Moens, M. F., Li, J., & Chua, T. S. (Eds.). (2014). Mining user generated content. Boca Raton, FL: CRC Press.
Schmunk, S., Hopken, W., Fuchs, M., & Lexhagen, M. (2014). Sentiment analysis: Extracting decision-relevant knowledge from UGC. In Z. Xiamg & L. Tussyadiah (Eds.), Information and communication technologies in tourism (pp. 253–265). ENTER 2014: Proceedings of the international conference in Dublin, Ireland, January 21–24, 2014. Switzerland: Springer.
Serna, A., Gerrikagoitia, J. K., & Alzua, A. (2014). Towards a better understanding of the cognitive destination image of Euskadi-Basque Country based on the analysis of UGC. In Z. Xiamg & L. Tussyadiah (Eds.), Information and communication technologies in tourism (pp. 395–407). ENTER 2014: Proceedings of the international conference in Dublin, Ireland, January 21–24, 2014. Switzerland: Springer.
Wahsheh, H. A., Alsmadi, I. M., & Al-Kabi, M. N. (2012). Analyzing the popular words to evaluate spam in Arabic web pages. The Research Bulletin of Jordan ACM, 2(2), 22–26.
Wang, Y., Chan, S. C., Ngai, G., & Leong, H. V. (2013). Quantifying reviewer credibility in online tourism. In H. Decker et al. (Eds.), DEXCA 2013 (pp. 381–395). Proceedings of 24th international conference: Database and expert systems applications, Prague, Czech Republic.
Yadav, Y., & Yadav, P. K. (2011). Site content analyzer in context of keyword density and key phrase. International Journal of Computer Technology and Applications, 2(4), 860–872.
Acknowledgements
This work was supported by the Spanish Ministry of Economy and Competitiveness [Grant id.: GLOBALTUR CSO2011-23004 / GEOG].
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2015 Springer International Publishing Switzerland
About this paper
Cite this paper
Marine-Roig, E., Clave, S.A. (2015). A Method for Analysing Large-Scale UGC Data for Tourism: Application to the Case of Catalonia. In: Tussyadiah, I., Inversini, A. (eds) Information and Communication Technologies in Tourism 2015. Springer, Cham. https://doi.org/10.1007/978-3-319-14343-9_1
Download citation
DOI: https://doi.org/10.1007/978-3-319-14343-9_1
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-14342-2
Online ISBN: 978-3-319-14343-9
eBook Packages: Business and EconomicsBusiness and Management (R0)