Skip to main content

A Method for Analysing Large-Scale UGC Data for Tourism: Application to the Case of Catalonia

  • Conference paper
  • First Online:

Abstract

In recent years, many articles have been published about the study of user-generated content (UGC) data in the domains of tourism and hospitality, in particular concerning quantitative and qualitative content analysis of travel blogs and online travel reviews (OTR). In general, researchers have worked on more or less population-representative samples of travel diaries, of tens or hundreds of files, which enables their manual processing. However, due to their dramatic growth, especially in the case of hospitality OTRs, this article proposes a method for semi-automatic downloading, arranging, cleaning, debugging, and analysing large-scale travel blog and OTR data. The main goal is to classify the collected webpages by dates and destinations and to be able to perform offline content analysis of the written text as provided by the author. This methodology is applied to analyse about 85,000 diaries of tourists who visited Catalonia between 2004 and 2013, and significant results are obtained in terms of content analysis.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   129.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   169.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

  • Abburu, S., & Babu, G. S. (2013). A frame work for web information extraction and analysis. International Journal of Computers & Technology, 7(2), 574–579.

    Google Scholar 

  • Banyai, M., & Glover, T. D. (2012). Evaluating research methods on travel blogs. Journal of Travel Research, 51(3), 267–277.

    Article  Google Scholar 

  • Eurostat. (2014). Tourism. In Eurostat regional yearbook 2014 (pp. 187–210). Luxembourg: Publications Office of the European Union.

    Google Scholar 

  • Johnson, P. A., Sieber, R. E., Magnien, N., & Ariwi, J. (2012). Automated web harvesting to collect and analyse user-generated content for tourism. Current Issues in Tourism, 15(3), 293–299.

    Article  Google Scholar 

  • Liu, B. (2011). Web data mining: Exploring hyperlinks, contents, and usage data. Berlin: Springer.

    Book  Google Scholar 

  • Lu, W., & Stepchenkova, S. (2014). User-generated content as a research mode in tourism and hospitality applications: Topics, methods, and software. Journal of Hospitality Marketing & Management. doi:10.1080/19368623.2014.907758.

    Google Scholar 

  • Marine-Roig, E. (2013). From the projected to the transmitted image: The 2.0 construction of tourist destination image and identity in Catalonia. Ph.D. dissertation. Retrieved September 1, 2014 from http://hdl.handle.net/10803/135006

  • Marine-Roig, E. (2014a). A webometric analysis of travel blogs and reviews hosting: The case of Catalonia. Journal of Travel & Tourism Marketing, 31(3), 381–396.

    Article  Google Scholar 

  • Marine-Roig, E. (2014b). The impact of the consecration of ‘La Sagrada Familia’ basilica in Barcelona by Pope Benedict XVI. International Journal of Tourism Anthropology (Special issue on “Sites of Religion, Sites of Heritage: Exploring the Interface between Religion and Heritage in Tourist Destinations”), 1–21. Retrieved September 1, 2014, from http://www.inderscience.com/info/ingeneral/forthcoming.php?jcode=IJTA

  • Michael, C. (2014, May 6). From Milan to Mecca: The world’s most powerful city brands revealed. The Guardian, News, Cities, City brand. Retrieved September 1, 2014, from http://www.theguardian.com/cities/gallery/2014/may/06/from-milan-to-mecca-the-worlds-most-powerful-city-brands-revealed

  • Moens, M. F., Li, J., & Chua, T. S. (Eds.). (2014). Mining user generated content. Boca Raton, FL: CRC Press.

    Google Scholar 

  • Schmunk, S., Hopken, W., Fuchs, M., & Lexhagen, M. (2014). Sentiment analysis: Extracting decision-relevant knowledge from UGC. In Z. Xiamg & L. Tussyadiah (Eds.), Information and communication technologies in tourism (pp. 253–265). ENTER 2014: Proceedings of the international conference in Dublin, Ireland, January 21–24, 2014. Switzerland: Springer.

    Google Scholar 

  • Serna, A., Gerrikagoitia, J. K., & Alzua, A. (2014). Towards a better understanding of the cognitive destination image of Euskadi-Basque Country based on the analysis of UGC. In Z. Xiamg & L. Tussyadiah (Eds.), Information and communication technologies in tourism (pp. 395–407). ENTER 2014: Proceedings of the international conference in Dublin, Ireland, January 21–24, 2014. Switzerland: Springer.

    Google Scholar 

  • Wahsheh, H. A., Alsmadi, I. M., & Al-Kabi, M. N. (2012). Analyzing the popular words to evaluate spam in Arabic web pages. The Research Bulletin of Jordan ACM, 2(2), 22–26.

    Google Scholar 

  • Wang, Y., Chan, S. C., Ngai, G., & Leong, H. V. (2013). Quantifying reviewer credibility in online tourism. In H. Decker et al. (Eds.), DEXCA 2013 (pp. 381–395). Proceedings of 24th international conference: Database and expert systems applications, Prague, Czech Republic.

    Google Scholar 

  • Yadav, Y., & Yadav, P. K. (2011). Site content analyzer in context of keyword density and key phrase. International Journal of Computer Technology and Applications, 2(4), 860–872.

    Google Scholar 

Download references

Acknowledgements

This work was supported by the Spanish Ministry of Economy and Competitiveness [Grant id.: GLOBALTUR CSO2011-23004 / GEOG].

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Estela Marine-Roig .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2015 Springer International Publishing Switzerland

About this paper

Cite this paper

Marine-Roig, E., Clave, S.A. (2015). A Method for Analysing Large-Scale UGC Data for Tourism: Application to the Case of Catalonia. In: Tussyadiah, I., Inversini, A. (eds) Information and Communication Technologies in Tourism 2015. Springer, Cham. https://doi.org/10.1007/978-3-319-14343-9_1

Download citation

Publish with us

Policies and ethics