Abstract
Under the quarantine for the coronavirus disease 2019 (COVID-19) which has been spreading rapidly across the world since it was first identified in Wuhan City, China, in early December 2019, people are sharing their everyday life via social media more than ever before. Over the last decade, event-related information has been increasingly generated from Twitter by the growing popularity, and it is proved that the emergence and evolvement of events can be timely monitored and analyzed on the basis of this platform. Geographic information plays a crucial role in mining social media data, however, only about 2% of tweets hold accurate geographic information due to the operational complexity and privacy concerns. To overcome the geo-tagging restriction, finding effective geolocation inference methods is currently one of the main topics in this research field. Geographic information plays an important role in analyzing and monitoring the spread of an epidemic disease. In this study, we constructed a method of geolocation inference based on the whole potential location-related metadata of tweets. A crude form of geographic coordinate information can be obtained from every tweet’s bounding box, while location-related information can be mined from the textual content, user location and place labels via Named Entity Recognition (NER) techniques. Three coordinate datasets of the United States counties are built and used as the coordinate references. Models with different data sources have been employed to predict the geolocations of the tweets related to COVID-19 in the contiguous United States. Results show that the models with four data sources, namely textual content, user location, place labels and bounding box of place, with Digital Boundary’s Average (DBA), perform better than other models. When the area threshold of the bounding box is set to 10,000 km2, the best model can successfully predict the geolocation of 90.8% of COVID-19 related tweets with the mean error distance of 4.824 km and the median error distance of 3.233 km. It is concluded that the proposed method enhances the granularity of geographic information of tweets and makes the surveillance of COVID-19 effective and efficient.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Singh, L., et al.: A first look at COVID-19 information and misinformation sharing on Twitter (2020)
Banda, J.M., et al.: A large-scale COVID-19 Twitter chatter dataset for open scientific research--an international collaboration (2020)
Worldometers. https://www.worldometers.info/coronavirus/
Prieto, V.M., Matos, S., Alvarez, M., Cacheda, F., Oliveira, J.L.: Twitter: a good place to detect health conditions. PloS One 9, e86191 (2014)
Paul, M.J., Dredze, M.: You are what you tweet: analyzing Twitter for public health. In: Fifth International AAAI Conference on Weblogs and Social Media
Steiger, E., De Albuquerque, J.P., Zipf, A.: An advanced systematic literature review on spatiotemporal analyses of Twitter data. Trans. GIS 19, 809–834 (2015)
Crooks, A., Croitoru, A., Stefanidis, A., Radzikowski, J.: # Earthquake: Twitter as a distributed sensor system. Trans. GIS 17, 124–147 (2013)
Sinnenberg, L., Buttenheim, A.M., Padrez, K., Mancheno, C., Ungar, L., Merchant, R.M.: Twitter as a tool for health research: a systematic review. Am. J. Public Health 107, e1–e8 (2017)
50+ Twitter statistics & facts for 2020. https://www.websitehostingrating.com/twitter-statistics/
10 Twitter Statistics Every Marketer Should Know in 2019. https://au.oberlo.com/blog/twitter-statistics
Ajao, O., Hong, J., Liu, W.: A survey of location inference techniques on Twitter. J. Inf. Sci. 41, 855–864 (2015)
Huang, C., Tong, H., He, J., Maciejewski, R.: Location prediction for tweets. Front. Big Data 2, 5 (2019). https://doi.org/10.3389/fdata
Laylavi, F., Rajabifard, A., Kalantari, M.: A multi-element approach to location inference of Twitter: a case for emergency response. ISPRS Int. J. Geo-Inf. 5, 56 (2016)
Allen, C., Tsou, M.-H., Aslam, A., Nagel, A., Gawron, J.-M.: Applying GIS and machine learning methods to Twitter data for multiscale surveillance of influenza. PloS One 11, e0157734 (2016)
Gao, Y., Wang, S., Padmanabhan, A., Yin, J., Cao, G.: Mapping spatiotemporal patterns of events using social media: a case study of influenza trends. Int. J. Geogr. Inf. Sci. 32, 425–449 (2018)
Li, W., Serdyukov, P., de Vries, A.P., Eickhoff, C., Larson, M.: The where in the tweet. In: Proceedings of the 20th ACM International Conference on Information and Knowledge Management, pp. 2473–2476. ACM
Cheng, Z., Caverlee, J., Lee, K.: A content-driven framework for geolocating microblog users. ACM Trans. Intell. Syst. Technol. (TIST) 4, 2 (2013)
Hecht, B., Hong, L., Suh, B., Chi, E.H.: Tweets from Justin Bieber's heart: the dynamics of the location field in user profiles. In: Proceedings of the SIGCHI Conference on Human Factors in Computing Systems, pp. 237–246. ACM
Ryoo, K., Moon, S.: Inferring Twitter user locations with 10 km accuracy. In: Proceedings of the 23rd International Conference on World Wide Web, pp. 643–648. ACM
Hawelka, B., Sitko, I., Beinat, E., Sobolevsky, S., Kazakopoulos, P., Ratti, C.: Geo-located Twitter as proxy for global mobility patterns. Cartogr. Geogr. Inf. Sci. 41, 260–271 (2014)
Priedhorsky, R., Culotta, A., Del Valle, S.Y.: Inferring the origin locations of tweets with quantitative confidence. In: Proceedings of the 17th ACM Conference on Computer Supported Cooperative Work & Social Computing, pp. 1523–1536. ACM
Cheng, Z., Caverlee, J., Lee, K.: You are where you tweet: a content-based approach to geo-locating twitter users. In: Proceedings of the 19th ACM International Conference on Information and Knowledge Management, pp. 759–768. ACM
Chandra, S., Khan, L., Muhaya, F.B.: Estimating Twitter user location using social interactions--a content based approach. In: 2011 IEEE Third International Conference on Privacy, Security, Risk and Trust and 2011 IEEE Third International Conference on Social Computing, pp. 838–843. IEEE (2011)
Chang, H.-W., Lee, D., Eltaher, M., Lee, J.: @ Phillies tweeting from Philly? Predicting Twitter user locations with spatial word usage. In: Proceedings of the 2012 International Conference on Advances in Social Networks Analysis and Mining (ASONAM 2012), pp. 111–118. IEEE Computer Society (2012)
Ikawa, Y., Vukovic, M., Rogstadius, J., Murakami, A.: Location-based insights from the social web. In: Proceedings of the 22nd International Conference on World Wide Web, pp. 1013–1016. ACM
Abrol, S., Khan, L.: Tweethood: agglomerative clustering on fuzzy k-closest friends with variable depth for location mining. In: 2010 IEEE Second International Conference on Social Computing, pp. 153–160. IEEE (2010)
Backstrom, L., Sun, E., Marlow, C.: Find me if you can: improving geographical prediction with social and spatial proximity. In: Proceedings of the 19th International Conference on World Wide Web, pp. 61–70. ACM
Bouillot, F., Poncelet, P., Roche, M.: How and why exploit tweet's location information? In: AGILE 2012: 15th International Conference on Geographic Information Science (2012)
Lingad, J., Karimi, S., Yin, J.: Location extraction from disaster-related microblogs. In: Proceedings of the 22nd International Conference on World Wide Web, pp. 1017–1020. ACM
Li, R., Wang, S., Deng, H., Wang, R., Chang, K.C.-C.: Towards social user profiling: unified and discriminative influence model for inferring home locations. In: Proceedings of the 18th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 1023–1031. ACM
Takhteyev, Y., Gruzd, A., Wellman, B.: Geography of Twitter networks. Soc. Netw. 34, 73–81 (2012)
Li, C., Sun, A.: Fine-grained location extraction from tweets with temporal awareness. In: Proceedings of the 37th International ACM SIGIR Conference on Research & Development in Information Retrieval, pp. 43–52. ACM
Tweeting Made Easier. https://blog.twitter.com/official/en_us/topics/product/2017/tweetingmadeeasier.html
An Introduction to JSON. https://www.digitalocean.com/community/tutorials/an-introduction-to-json
Tweet Location Metadata. https://developer.twitter.com/en/docs/twitter-api/v1/data-dictionary/overview/geo-objects
Li, B., Chen, Z., Lim, S.: Geolocation prediction from tweets: a case study of influenza-like illness in Australia. In: GISTAM, pp. 160–167
Singh, J., Dwivedi, Y., Rana, N., Kumar, A., Kapoor, K.: Event classification and location prediction from tweets during disasters. Ann. Oper. Res. 283(1–2), 737–757 (2017). https://doi.org/10.1007/s10479-017-2522-3
Regular Expression Language - Quick Reference. https://docs.microsoft.com/en-us/dotnet/standard/base-types/regular-expression-language-quick-reference
Marujo, L., et al.: Automatic keyword extraction on Twitter. In: Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 2: Short Papers), pp. 637–643
Kouzy, R., et al.: Coronavirus goes viral: quantifying the COVID-19 misinformation epidemic on Twitter. Cureus 12, e7255 (2020)
Chen, E., Lerman, K., Ferrara, E.: Tracking social media discourse about the COVID-19 pandemic: development of a public coronavirus Twitter data set. JMIR Public Health Surveill. 6, e19273 (2020)
The National Counties Gazetteer File. https://www.census.gov/geographies/reference-files/time-series/geo/gazetteer-files.html
USA Counties. https://www.arcgis.com/home/item.html?id=a00d6b6149b34ed3b833e10fb72ef47b
Acknowledgements
This research is sponsored by China Scholarship Council (CSC).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2021 Springer Nature Switzerland AG
About this paper
Cite this paper
Li, B., Chen, Z., Lim, S. (2021). Geolocation Inference Using Twitter Data: A Case Study of COVID-19 in the Contiguous United States. In: Grueau, C., Laurini, R., Ragia, L. (eds) Geographical Information Systems Theory, Applications and Management. GISTAM 2020. Communications in Computer and Information Science, vol 1411. Springer, Cham. https://doi.org/10.1007/978-3-030-76374-9_8
Download citation
DOI: https://doi.org/10.1007/978-3-030-76374-9_8
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-76373-2
Online ISBN: 978-3-030-76374-9
eBook Packages: Computer ScienceComputer Science (R0)