Abstract
The rise in the use of social networks in the recent years has resulted in an abundance of information on different aspects of everyday social activities that is available online, with the most prominent and timely source of such information being Twitter. This has resulted in a proliferation of tools and applications that can help end users and large-scale event organizers to better plan and manage their activities. In this process of analysis of the information originating from social networks, an important aspect is that of the geographic coordinates, i.e., geolocalization, of the relevant information, which is necessary for several applications (e.g., on trending venues, traffic jams). Unfortunately, only a very small percentage of the twitter posts are geotagged, which significantly restricts the applicability and utility of such applications. In this work, we address this problem by proposing a framework for geolocating tweets that are not geotagged. Our solution is general and estimates the location from which a post was generated by exploiting the similarities in the content between this post and a set of geotagged tweets, as well as their time-evolution characteristics. Contrary to previous approaches, our framework aims at providing accurate geolocation estimates at fine grain (i.e., within a city). The experimental evaluation with real data demonstrates the efficiency and effectiveness of our approach.
Similar content being viewed by others
Notes
For the rest of this paper, we will use the terms geotagged and geolocalized interchangeably.
This paper extends and improves on our earlier results (Paraskevopoulos and Palpanas 2015).
Earlier studies have shown that techniques and models built for geotagged data indeed generalize to non-geotagged data, since geotagged and non-geotagged tweets have similar data characteristics (Han et al. 2014).
We note that the QL results reported here are much better than those reported in our earlier study (Paraskevopoulos and Palpanas 2015). This is due to the different experimental setup (i.e., sliding windows) that we now use for all algorithms, which resulted in an increased number of windows with a high number of tweets, leading to higher execution times and better models.
References
Abdelhaq H, Sengstock C, Gertz M (2013) Eventweet: online localized event detection from twitter. In: Proceedings of the VLDB Endowment , vol 6, no 12
Ajao O, Hong J, Liu W (2015) A survey of location inference techniques on twitter. J Inf Sci 41(6):855–864
Balduini M, Bocconi, S, Bozzon A, Della Valle E, Huang Y, Oosterman J, Palpanas T, Tsytsarau M (2014) A case study of active, continuous and predictive social media analytics for smart city. In: ISWC workshop on semantics for smarter cities (S4SC)
Balduini M, Della Valle E, DellAglio D, Tsytsarau M, Palpanas T, Confalonieri C (2013) Social listening of city scale events using the streaming linked data framework. In: ISWC
Blei DM, Ng AY, Jordan MI (2003) Latent dirichlet allocation. J Mach Learn Res 3:993–1022
Chang Hw, Lee D, Eltaher M, Lee J (2012) @ phillies tweeting from philly? Predicting twitter user locations with spatial word usage. In: ASONAM
Cheng Z, Caverlee J, Lee K (2010) You are where you tweet: a content-based approach to geo-locating twitter users. In: CIKM
Crooks A, Croitoru A, Stefanidis A, Radzikowski J (2013) # Earthquake: Twitter as a distributed sensor system. Trans GIS 17(1):124–147
Earle PS, Bowden DC, Guy M (2012) Twitter earthquake detection: earthquake monitoring in a social world. Ann Geophys 54(6):708–715
Eisenstein J, O’Connor B, Smith NA, Xing EP (2010) A latent variable model for geographic lexical variation. In: EMNLP
Facebook. https://www.facebook.com/
Frias-Martinez V, Soto V, Hohwald H, Frias-Martinez E (2012) Characterizing urban landscapes using geolocated tweets. In: SocialCom-PASSAT
Google+. https://plus.google.com
Han B, Cook P, Baldwin T (2014) Text-based twitter user geolocation prediction. J Artif Intell Res 49:451–500
Hossain N, Hu T, Feizi R, Zheng D, White AM, Luo J, Kautz H (2016) Precise localization of homes and activities: detecting drinking-while-tweeting patterns in communities. In: Tenth international AAAI conference on web and social media, Cologne, Germany, May 17-20, 2016, pp 587–590
Ikawa Y, Enoki M, Tatsubori M (2012) Location inference using microblog messages. In: Proceedings of the 21st international conference companion on World Wide Web. ACM, pp 687–690
Kinsella S, Murdock V, O’Hare N (2011) I’m eating a sandwich in glasgow: modeling locations with tweets. In: SMUC
Leetaru K, Wang S, Cao G, Padmanabhan A, Shook E (2013) Mapping the global twitter heartbeat: the geography of twitter. First Monday 18(5). doi:10.5210/fm.v18i5.4366
Li C, Sun A (2014) Fine-grained location extraction from tweets with temporal awareness. In: SIGIR
Malmi E, Do TMT, Gatica-Perez D (2013) From foursquare to my square: learning check-in behavior from multiple sources. In: ICWSM
Mathioudakis M, Koudas N (2010) Twittermonitor: trend detection over the twitter stream. In: SIGMOD
Murdock V (2011) Your mileage may vary: on the limits of social media. SIGSPATIAL Spec 3:62–66
Paradesi SM (2011) Geotagging tweets using their content. In: FLAIRS conference
Paraskevopoulos P, Dinh TC, Dashdorj Z, Palpanas T, Serafini L (2013) Identification and characterization of human behavior patterns from mobile phone data. In: NetMob
Paraskevopoulos P, Palpanas T (2015) Fine-grained geolocalisation of non-geotagged tweets. In: Proceedings of the 2015 IEEE/ACM international conference on advances in social networks analysis and mining 2015. ACM, pp 105–112
Paraskevopoulos P, Pellegrini G, Palpanas T (2016) When a tweet finds its place: fine-grained tweet geolocalisation. In: International workshop on data science for social good (SoGood), in conjunction with the European conference on machine learning and principles and practice of knowledge discovery (ECML PKDD)
Sakaki T, Okazaki M, Matsuo Y (2010) Earthquake shakes twitter users: real-time event detection by social sensors. In: WWW
Schulz A, Hadjakos A, Paulheim H, Nachtwey J, Mühlhäuser M (2013) A multi-indicator approach for geolocalization of tweets. In: ICWSM
Serdyukov P, Murdock V, Van Zwol R (2009) Placing flickr photos on a map. In: SIGIR
Tsytsarau M, Amer-Yahia S, Palpanas T (2013) Efficient sentiment correlation for large-scale demographics. In: SIGMOD
Tsytsarau M, Palpanas T (2014) Nia: system for news impact analytics. In: KDD workshop on interactive data exploration and analytics (IDEA)
Tsytsarau M, Palpanas T (2012) Survey on mining subjective data on the web. Data Min Knowl Discov 24:478–514
Tsytsarau M, Palpanas T, Castellanos M (2014) Dynamics of news events and social media reaction. In: SIGKDD
Tsytsarau M, Palpanas T, Denecke K (2010) Scalable discovery of contradictions on the web. In: WWW
Tsytsarau M, Palpanas T, Denecke K (2011) Scalable detection of sentiment-based contradictions. In: DiversiWeb, WWW
Twitter. https://twitter.com
Van Canneyt S, Van Laere O, Schockaert S, Dhoedt B (2012) Using social media to find places of interest: a case study. In: SIGSPATIAL (GEOCROWD)
Yuan Q, Cong G, Ma Z, Sun A, Thalmann NM (2013) Who, where, when and what: discover spatio-temporal topics for twitter users. In: SIGKDD
Zafarani R, Liu H (2015) Evaluation without ground truth in social media research. Commun ACM 58(6):54–60
Acknowledgments
This work was supported by a fellowship from Telecom Italia.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Paraskevopoulos, P., Palpanas, T. Where has this tweet come from? Fast and fine-grained geolocalization of non-geotagged tweets. Soc. Netw. Anal. Min. 6, 89 (2016). https://doi.org/10.1007/s13278-016-0400-7
Received:
Revised:
Accepted:
Published:
DOI: https://doi.org/10.1007/s13278-016-0400-7