Skip to main content
Log in

Where has this tweet come from? Fast and fine-grained geolocalization of non-geotagged tweets

  • Original Article
  • Published:
Social Network Analysis and Mining Aims and scope Submit manuscript

Abstract

The rise in the use of social networks in the recent years has resulted in an abundance of information on different aspects of everyday social activities that is available online, with the most prominent and timely source of such information being Twitter. This has resulted in a proliferation of tools and applications that can help end users and large-scale event organizers to better plan and manage their activities. In this process of analysis of the information originating from social networks, an important aspect is that of the geographic coordinates, i.e., geolocalization, of the relevant information, which is necessary for several applications (e.g., on trending venues, traffic jams). Unfortunately, only a very small percentage of the twitter posts are geotagged, which significantly restricts the applicability and utility of such applications. In this work, we address this problem by proposing a framework for geolocating tweets that are not geotagged. Our solution is general and estimates the location from which a post was generated by exploiting the similarities in the content between this post and a set of geotagged tweets, as well as their time-evolution characteristics. Contrary to previous approaches, our framework aims at providing accurate geolocation estimates at fine grain (i.e., within a city). The experimental evaluation with real data demonstrates the efficiency and effectiveness of our approach.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15
Fig. 16
Fig. 17
Fig. 18
Fig. 19
Fig. 20

Similar content being viewed by others

Notes

  1. For the rest of this paper, we will use the terms geotagged and geolocalized interchangeably.

  2. This paper extends and improves on our earlier results (Paraskevopoulos and Palpanas 2015).

  3. Earlier studies have shown that techniques and models built for geotagged data indeed generalize to non-geotagged data, since geotagged and non-geotagged tweets have similar data characteristics (Han et al. 2014).

  4. We note that the QL results reported here are much better than those reported in our earlier study (Paraskevopoulos and Palpanas 2015). This is due to the different experimental setup (i.e., sliding windows) that we now use for all algorithms, which resulted in an increased number of windows with a high number of tweets, leading to higher execution times and better models.

References

  • Abdelhaq H, Sengstock C, Gertz M (2013) Eventweet: online localized event detection from twitter. In: Proceedings of the VLDB Endowment , vol 6, no 12

  • Ajao O, Hong J, Liu W (2015) A survey of location inference techniques on twitter. J Inf Sci 41(6):855–864

    Article  Google Scholar 

  • Balduini M, Bocconi, S, Bozzon A, Della Valle E, Huang Y, Oosterman J, Palpanas T, Tsytsarau M (2014) A case study of active, continuous and predictive social media analytics for smart city. In: ISWC workshop on semantics for smarter cities (S4SC)

  • Balduini M, Della Valle E, DellAglio D, Tsytsarau M, Palpanas T, Confalonieri C (2013) Social listening of city scale events using the streaming linked data framework. In: ISWC

  • Blei DM, Ng AY, Jordan MI (2003) Latent dirichlet allocation. J Mach Learn Res 3:993–1022

    MATH  Google Scholar 

  • Chang Hw, Lee D, Eltaher M, Lee J (2012) @ phillies tweeting from philly? Predicting twitter user locations with spatial word usage. In: ASONAM

  • Cheng Z, Caverlee J, Lee K (2010) You are where you tweet: a content-based approach to geo-locating twitter users. In: CIKM

  • Crooks A, Croitoru A, Stefanidis A, Radzikowski J (2013) # Earthquake: Twitter as a distributed sensor system. Trans GIS 17(1):124–147

    Article  Google Scholar 

  • Earle PS, Bowden DC, Guy M (2012) Twitter earthquake detection: earthquake monitoring in a social world. Ann Geophys 54(6):708–715

    Google Scholar 

  • Eisenstein J, O’Connor B, Smith NA, Xing EP (2010) A latent variable model for geographic lexical variation. In: EMNLP

  • Facebook. https://www.facebook.com/

  • Frias-Martinez V, Soto V, Hohwald H, Frias-Martinez E (2012) Characterizing urban landscapes using geolocated tweets. In: SocialCom-PASSAT

  • Google+. https://plus.google.com

  • Han B, Cook P, Baldwin T (2014) Text-based twitter user geolocation prediction. J Artif Intell Res 49:451–500

    Google Scholar 

  • Hossain N, Hu T, Feizi R, Zheng D, White AM, Luo J, Kautz H (2016) Precise localization of homes and activities: detecting drinking-while-tweeting patterns in communities. In: Tenth international AAAI conference on web and social media, Cologne, Germany, May 17-20, 2016, pp 587–590

  • Ikawa Y, Enoki M, Tatsubori M (2012) Location inference using microblog messages. In: Proceedings of the 21st international conference companion on World Wide Web. ACM, pp 687–690

  • Kinsella S, Murdock V, O’Hare N (2011) I’m eating a sandwich in glasgow: modeling locations with tweets. In: SMUC

  • Leetaru K, Wang S, Cao G, Padmanabhan A, Shook E (2013) Mapping the global twitter heartbeat: the geography of twitter. First Monday 18(5). doi:10.5210/fm.v18i5.4366

  • Li C, Sun A (2014) Fine-grained location extraction from tweets with temporal awareness. In: SIGIR

  • Malmi E, Do TMT, Gatica-Perez D (2013) From foursquare to my square: learning check-in behavior from multiple sources. In: ICWSM

  • Mathioudakis M, Koudas N (2010) Twittermonitor: trend detection over the twitter stream. In: SIGMOD

  • Murdock V (2011) Your mileage may vary: on the limits of social media. SIGSPATIAL Spec 3:62–66

    Article  Google Scholar 

  • Paradesi SM (2011) Geotagging tweets using their content. In: FLAIRS conference

  • Paraskevopoulos P, Dinh TC, Dashdorj Z, Palpanas T, Serafini L (2013) Identification and characterization of human behavior patterns from mobile phone data. In: NetMob

  • Paraskevopoulos P, Palpanas T (2015) Fine-grained geolocalisation of non-geotagged tweets. In: Proceedings of the 2015 IEEE/ACM international conference on advances in social networks analysis and mining 2015. ACM, pp 105–112

  • Paraskevopoulos P, Pellegrini G, Palpanas T (2016) When a tweet finds its place: fine-grained tweet geolocalisation. In: International workshop on data science for social good (SoGood), in conjunction with the European conference on machine learning and principles and practice of knowledge discovery (ECML PKDD)

  • Sakaki T, Okazaki M, Matsuo Y (2010) Earthquake shakes twitter users: real-time event detection by social sensors. In: WWW

  • Schulz A, Hadjakos A, Paulheim H, Nachtwey J, Mühlhäuser M (2013) A multi-indicator approach for geolocalization of tweets. In: ICWSM

  • Serdyukov P, Murdock V, Van Zwol R (2009) Placing flickr photos on a map. In: SIGIR

  • Tsytsarau M, Amer-Yahia S, Palpanas T (2013) Efficient sentiment correlation for large-scale demographics. In: SIGMOD

  • Tsytsarau M, Palpanas T (2014) Nia: system for news impact analytics. In: KDD workshop on interactive data exploration and analytics (IDEA)

  • Tsytsarau M, Palpanas T (2012) Survey on mining subjective data on the web. Data Min Knowl Discov 24:478–514

    Article  MATH  Google Scholar 

  • Tsytsarau M, Palpanas T, Castellanos M (2014) Dynamics of news events and social media reaction. In: SIGKDD

  • Tsytsarau M, Palpanas T, Denecke K (2010) Scalable discovery of contradictions on the web. In: WWW

  • Tsytsarau M, Palpanas T, Denecke K (2011) Scalable detection of sentiment-based contradictions. In: DiversiWeb, WWW

  • Twitter. https://twitter.com

  • Van Canneyt S, Van Laere O, Schockaert S, Dhoedt B (2012) Using social media to find places of interest: a case study. In: SIGSPATIAL (GEOCROWD)

  • Yuan Q, Cong G, Ma Z, Sun A, Thalmann NM (2013) Who, where, when and what: discover spatio-temporal topics for twitter users. In: SIGKDD

  • Zafarani R, Liu H (2015) Evaluation without ground truth in social media research. Commun ACM 58(6):54–60

    Article  Google Scholar 

Download references

Acknowledgments

This work was supported by a fellowship from Telecom Italia.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Pavlos Paraskevopoulos.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Paraskevopoulos, P., Palpanas, T. Where has this tweet come from? Fast and fine-grained geolocalization of non-geotagged tweets. Soc. Netw. Anal. Min. 6, 89 (2016). https://doi.org/10.1007/s13278-016-0400-7

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s13278-016-0400-7

Keywords

Navigation