Skip to main content

Geolocation Inference Using Twitter Data: A Case Study of COVID-19 in the Contiguous United States

  • Conference paper
  • First Online:
Geographical Information Systems Theory, Applications and Management (GISTAM 2020)

Abstract

Under the quarantine for the coronavirus disease 2019 (COVID-19) which has been spreading rapidly across the world since it was first identified in Wuhan City, China, in early December 2019, people are sharing their everyday life via social media more than ever before. Over the last decade, event-related information has been increasingly generated from Twitter by the growing popularity, and it is proved that the emergence and evolvement of events can be timely monitored and analyzed on the basis of this platform. Geographic information plays a crucial role in mining social media data, however, only about 2% of tweets hold accurate geographic information due to the operational complexity and privacy concerns. To overcome the geo-tagging restriction, finding effective geolocation inference methods is currently one of the main topics in this research field. Geographic information plays an important role in analyzing and monitoring the spread of an epidemic disease. In this study, we constructed a method of geolocation inference based on the whole potential location-related metadata of tweets. A crude form of geographic coordinate information can be obtained from every tweet’s bounding box, while location-related information can be mined from the textual content, user location and place labels via Named Entity Recognition (NER) techniques. Three coordinate datasets of the United States counties are built and used as the coordinate references. Models with different data sources have been employed to predict the geolocations of the tweets related to COVID-19 in the contiguous United States. Results show that the models with four data sources, namely textual content, user location, place labels and bounding box of place, with Digital Boundary’s Average (DBA), perform better than other models. When the area threshold of the bounding box is set to 10,000 km2, the best model can successfully predict the geolocation of 90.8% of COVID-19 related tweets with the mean error distance of 4.824 km and the median error distance of 3.233 km. It is concluded that the proposed method enhances the granularity of geographic information of tweets and makes the surveillance of COVID-19 effective and efficient.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 64.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 84.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Singh, L., et al.: A first look at COVID-19 information and misinformation sharing on Twitter (2020)

    Google Scholar 

  2. Banda, J.M., et al.: A large-scale COVID-19 Twitter chatter dataset for open scientific research--an international collaboration (2020)

    Google Scholar 

  3. Worldometers. https://www.worldometers.info/coronavirus/

  4. Prieto, V.M., Matos, S., Alvarez, M., Cacheda, F., Oliveira, J.L.: Twitter: a good place to detect health conditions. PloS One 9, e86191 (2014)

    Article  Google Scholar 

  5. Paul, M.J., Dredze, M.: You are what you tweet: analyzing Twitter for public health. In: Fifth International AAAI Conference on Weblogs and Social Media

    Google Scholar 

  6. Steiger, E., De Albuquerque, J.P., Zipf, A.: An advanced systematic literature review on spatiotemporal analyses of Twitter data. Trans. GIS 19, 809–834 (2015)

    Article  Google Scholar 

  7. Crooks, A., Croitoru, A., Stefanidis, A., Radzikowski, J.: # Earthquake: Twitter as a distributed sensor system. Trans. GIS 17, 124–147 (2013)

    Article  Google Scholar 

  8. Sinnenberg, L., Buttenheim, A.M., Padrez, K., Mancheno, C., Ungar, L., Merchant, R.M.: Twitter as a tool for health research: a systematic review. Am. J. Public Health 107, e1–e8 (2017)

    Article  Google Scholar 

  9. 50+ Twitter statistics & facts for 2020. https://www.websitehostingrating.com/twitter-statistics/

  10. 10 Twitter Statistics Every Marketer Should Know in 2019. https://au.oberlo.com/blog/twitter-statistics

  11. Ajao, O., Hong, J., Liu, W.: A survey of location inference techniques on Twitter. J. Inf. Sci. 41, 855–864 (2015)

    Article  Google Scholar 

  12. Huang, C., Tong, H., He, J., Maciejewski, R.: Location prediction for tweets. Front. Big Data 2, 5 (2019). https://doi.org/10.3389/fdata

    Article  Google Scholar 

  13. Laylavi, F., Rajabifard, A., Kalantari, M.: A multi-element approach to location inference of Twitter: a case for emergency response. ISPRS Int. J. Geo-Inf. 5, 56 (2016)

    Article  Google Scholar 

  14. Allen, C., Tsou, M.-H., Aslam, A., Nagel, A., Gawron, J.-M.: Applying GIS and machine learning methods to Twitter data for multiscale surveillance of influenza. PloS One 11, e0157734 (2016)

    Article  Google Scholar 

  15. Gao, Y., Wang, S., Padmanabhan, A., Yin, J., Cao, G.: Mapping spatiotemporal patterns of events using social media: a case study of influenza trends. Int. J. Geogr. Inf. Sci. 32, 425–449 (2018)

    Article  Google Scholar 

  16. Li, W., Serdyukov, P., de Vries, A.P., Eickhoff, C., Larson, M.: The where in the tweet. In: Proceedings of the 20th ACM International Conference on Information and Knowledge Management, pp. 2473–2476. ACM

    Google Scholar 

  17. Cheng, Z., Caverlee, J., Lee, K.: A content-driven framework for geolocating microblog users. ACM Trans. Intell. Syst. Technol. (TIST) 4, 2 (2013)

    Google Scholar 

  18. Hecht, B., Hong, L., Suh, B., Chi, E.H.: Tweets from Justin Bieber's heart: the dynamics of the location field in user profiles. In: Proceedings of the SIGCHI Conference on Human Factors in Computing Systems, pp. 237–246. ACM

    Google Scholar 

  19. Ryoo, K., Moon, S.: Inferring Twitter user locations with 10 km accuracy. In: Proceedings of the 23rd International Conference on World Wide Web, pp. 643–648. ACM

    Google Scholar 

  20. Hawelka, B., Sitko, I., Beinat, E., Sobolevsky, S., Kazakopoulos, P., Ratti, C.: Geo-located Twitter as proxy for global mobility patterns. Cartogr. Geogr. Inf. Sci. 41, 260–271 (2014)

    Article  Google Scholar 

  21. Priedhorsky, R., Culotta, A., Del Valle, S.Y.: Inferring the origin locations of tweets with quantitative confidence. In: Proceedings of the 17th ACM Conference on Computer Supported Cooperative Work & Social Computing, pp. 1523–1536. ACM

    Google Scholar 

  22. Cheng, Z., Caverlee, J., Lee, K.: You are where you tweet: a content-based approach to geo-locating twitter users. In: Proceedings of the 19th ACM International Conference on Information and Knowledge Management, pp. 759–768. ACM

    Google Scholar 

  23. Chandra, S., Khan, L., Muhaya, F.B.: Estimating Twitter user location using social interactions--a content based approach. In: 2011 IEEE Third International Conference on Privacy, Security, Risk and Trust and 2011 IEEE Third International Conference on Social Computing, pp. 838–843. IEEE (2011)

    Google Scholar 

  24. Chang, H.-W., Lee, D., Eltaher, M., Lee, J.: @ Phillies tweeting from Philly? Predicting Twitter user locations with spatial word usage. In: Proceedings of the 2012 International Conference on Advances in Social Networks Analysis and Mining (ASONAM 2012), pp. 111–118. IEEE Computer Society (2012)

    Google Scholar 

  25. Ikawa, Y., Vukovic, M., Rogstadius, J., Murakami, A.: Location-based insights from the social web. In: Proceedings of the 22nd International Conference on World Wide Web, pp. 1013–1016. ACM

    Google Scholar 

  26. Abrol, S., Khan, L.: Tweethood: agglomerative clustering on fuzzy k-closest friends with variable depth for location mining. In: 2010 IEEE Second International Conference on Social Computing, pp. 153–160. IEEE (2010)

    Google Scholar 

  27. Backstrom, L., Sun, E., Marlow, C.: Find me if you can: improving geographical prediction with social and spatial proximity. In: Proceedings of the 19th International Conference on World Wide Web, pp. 61–70. ACM

    Google Scholar 

  28. Bouillot, F., Poncelet, P., Roche, M.: How and why exploit tweet's location information? In: AGILE 2012: 15th International Conference on Geographic Information Science (2012)

    Google Scholar 

  29. Lingad, J., Karimi, S., Yin, J.: Location extraction from disaster-related microblogs. In: Proceedings of the 22nd International Conference on World Wide Web, pp. 1017–1020. ACM

    Google Scholar 

  30. Li, R., Wang, S., Deng, H., Wang, R., Chang, K.C.-C.: Towards social user profiling: unified and discriminative influence model for inferring home locations. In: Proceedings of the 18th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 1023–1031. ACM

    Google Scholar 

  31. Takhteyev, Y., Gruzd, A., Wellman, B.: Geography of Twitter networks. Soc. Netw. 34, 73–81 (2012)

    Article  Google Scholar 

  32. Li, C., Sun, A.: Fine-grained location extraction from tweets with temporal awareness. In: Proceedings of the 37th International ACM SIGIR Conference on Research & Development in Information Retrieval, pp. 43–52. ACM

    Google Scholar 

  33. Tweeting Made Easier. https://blog.twitter.com/official/en_us/topics/product/2017/tweetingmadeeasier.html

  34. An Introduction to JSON. https://www.digitalocean.com/community/tutorials/an-introduction-to-json

  35. Tweet Location Metadata. https://developer.twitter.com/en/docs/twitter-api/v1/data-dictionary/overview/geo-objects

  36. Li, B., Chen, Z., Lim, S.: Geolocation prediction from tweets: a case study of influenza-like illness in Australia. In: GISTAM, pp. 160–167

    Google Scholar 

  37. Singh, J., Dwivedi, Y., Rana, N., Kumar, A., Kapoor, K.: Event classification and location prediction from tweets during disasters. Ann. Oper. Res. 283(1–2), 737–757 (2017). https://doi.org/10.1007/s10479-017-2522-3

    Article  MATH  Google Scholar 

  38. Regular Expression Language - Quick Reference. https://docs.microsoft.com/en-us/dotnet/standard/base-types/regular-expression-language-quick-reference

  39. Marujo, L., et al.: Automatic keyword extraction on Twitter. In: Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 2: Short Papers), pp. 637–643

    Google Scholar 

  40. Kouzy, R., et al.: Coronavirus goes viral: quantifying the COVID-19 misinformation epidemic on Twitter. Cureus 12, e7255 (2020)

    Google Scholar 

  41. Chen, E., Lerman, K., Ferrara, E.: Tracking social media discourse about the COVID-19 pandemic: development of a public coronavirus Twitter data set. JMIR Public Health Surveill. 6, e19273 (2020)

    Article  Google Scholar 

  42. The National Counties Gazetteer File. https://www.census.gov/geographies/reference-files/time-series/geo/gazetteer-files.html

  43. USA Counties. https://www.arcgis.com/home/item.html?id=a00d6b6149b34ed3b833e10fb72ef47b

Download references

Acknowledgements

This research is sponsored by China Scholarship Council (CSC).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Bingnan Li .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2021 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Li, B., Chen, Z., Lim, S. (2021). Geolocation Inference Using Twitter Data: A Case Study of COVID-19 in the Contiguous United States. In: Grueau, C., Laurini, R., Ragia, L. (eds) Geographical Information Systems Theory, Applications and Management. GISTAM 2020. Communications in Computer and Information Science, vol 1411. Springer, Cham. https://doi.org/10.1007/978-3-030-76374-9_8

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-76374-9_8

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-76373-2

  • Online ISBN: 978-3-030-76374-9

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics