Skip to main content
Log in

Efficient online extraction of keywords for localized events in twitter

  • Published:
GeoInformatica Aims and scope Submit manuscript

Abstract

Messages published via social media sites, such as Twitter, Facebook, and Foursquare hide a considerable amount of information about real world events. The timely identification of such events from this huge, unstructured, and noisy user-generated content plays an important role in increasing situation awareness and in supporting useful applications such as recommendation systems. Interestingly, a large number of these messages are enriched with location information, due to the recent advancements of today’s location acquisition techniques. This, in turn, enables location-aware event mining, i.e., the detection and tracking of localized events such as sport events, demonstrations, or traffic jams, to name but a few. The main building blocks of a localized event are local keywords that exhibit a surge in usage at the event location. In this paper, we propose an approach that aims at extracting local keywords from a stream of Twitter messages by (1) identifying local keywords, and (2) estimating the central location of each keyword. This extraction procedure is performed in an online fashion using a sliding window over the Twitter stream. Additionally, we address the problem of spatial outliers that adversely affect a sound identification of local keywords. Spatial outliers occur when people far away from the location of an event use related keywords in their Tweets. We handle this problem by adjusting the spatial distribution of keywords based on their co-occurrence with place names that may refer to the location of an event. To ensure scalability, we utilize a hierarchical spatial index to gradually prune the geographic space and thus to efficiently perform complex spatial computations. Extensive comparative experiments are conducted using Twitter data. The analysis of the experimental results demonstrates the superiority of our approach over existing methods in terms of efficiency and precision of the obtained results.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9

Similar content being viewed by others

Notes

  1. 1 https://about.twitter.com/company, accessed Dec. 2015.

  2. 2 http://www.openstreetmap.org/.

  3. 3 http://open.mapquestapi.com/nominatim.

  4. 4 http://www.movable-type.co.uk/scripts/latlong.html.

  5. 5 http://www.geomidpoint.com/.

  6. 6 http://dev.twitter.com/pages/streaming_api.

References

  1. Abdelhaq H, Gertz M (2014) On the locality of keywords in Twitter streams. In: IWGS ’14, pp 12–20

  2. Abdelhaq H, Gertz M, Sengstock C (2013) Spatio-temporal characteristics of bursty words in Twitter streams. In: SIGSPATIAL ’13, pp 149–158

  3. Abdelhaq H, Sengstock C, Gertz M (2013) EvenTweet: online localized event detection from Twitter. Proc VLDB Endow 6(12):1326–1329

    Article  Google Scholar 

  4. Aggarwal CC, Subbian K (2012) Event detection in social streams. In: SDM ’12, pp 624–635

  5. Allan J (ed) (2002) Topic detection and tracking: event-based information organization. Kluwer Academic Publishers, Norwell

  6. Alvanaki F, Michel S, Ramamritham K, Weikum G (2012) See what’s enBlogue: real-time emergent topic identification in social media. In: EDBT ’12, pp 336–347

  7. Backstrom L, Kleinberg J, Kumar R, Novak J (2008) Spatial variation in search engine queries. In: WWW ’08, pp 357–366

  8. Becker H, Naaman M, Gravano L (2011) Beyond trending topics: real-world event identification on Twitter. In: ICWSM ’11

  9. Boettcher A, Lee D (2012) EventRadar: a real-time local event detection scheme using Twitter stream. In: GreenCom ’12, pp 358–367

  10. Cataldi M, Di Caro L, Schifanella C (2010) Emerging topic detection on Twitter based on temporal and social terms evaluation. In: MDMKDD ’10, pp 4:1–4:10

  11. Chen L, Roy A (2009) Event detection from Flickr data through wavelet-based spatial analysis. In: CIKM ’09, pp 523–532

  12. Chunara R, Andrews JR, Brownstein JS (2012) Social and news media enable estimation of epidemiological patterns early in the 2010 Haitian cholera outbreak. Am J Trop Med Hyg 86(1):39–45

    Article  Google Scholar 

  13. Kleinberg J (2003) Bursty and hierarchical structure in streams. Data Min Knowl Discov 4:373–397

  14. Lappas T, Arai B, Platakis M, Kotsakos D, Gunopulos D (2009) On burstiness-aware search for document sequences. In: KDD ’09, pp 477–486

  15. Lappas T, Vieira MR, Gunopulos D, Tsotras VJ (2012) On the spatiotemporal burstiness of terms. PVLDB 5(9):836–847

    Google Scholar 

  16. Lee CH, Wu CH, Chien TF (2011) BursT: a dynamic term weighting scheme for mining microblogging messages. In: ISNN ’11, pp 548–557

  17. Li C, Sun A, Datta A (2012) Twevent: segment-based event detection from tweets. In: CIKM ’12, pp 155–164

  18. Magdy A, Mokbel MF, Elnikety S, Nath S, He Y (2014) Mercury: A memory-constrained spatio-temporal real-time search on microblogs. In: ICDE ’14, pp 172–183

  19. Morstatter F, Pfeffer J, Liu H, Carley KM (2013) Is the sample good enough? comparing data from Twitter’s streaming API with Twitter’s firehose. In: ICWSM ’13

  20. Petrovic S, Osborne M, Lavrenko V (2010) Streaming first story detection with application to Twitter. In: HLT ’10, pp 181–189

  21. Rattenbury T, Good N, Naaman M (2007) Towards automatic extraction of event and place semantics from Flickr tags. In: SIGIR ’07, pp 103–110

  22. Sakaki T, Okazaki M, Matsuo Y (2010) Earthquake shakes Twitter users: real-time event detection by social sensors. In: WWW ’10, pp 851–860

  23. Samet H (1990) Applications of spatial data structures: computer graphics, image processing and GIS. Addison-Wesley

  24. Sankaranarayanan J, Samet H, Teitler BE, Lieberman MD, Sperling J (2009) TwitterStand: news in tweets. In: GIS ’09, pp 42–51

  25. Skovsgaard A, Sidlauskas D, Jensen C (2014) Scalable top-k spatio-temporal term querying. In: ICDE ’14, pp 148–159

  26. Tanimoto S, Pavlidis T (1975) A hierarchical data structure for picture processing. Comput Vision Graph 4(2):104–119

    Google Scholar 

  27. Valkanas G, Gunopulos D (2013) How the live web feels about events. In: CIKM ’13, pp 639–648

  28. Vlachos M, Meek C, Vagena Z, Gunopulos D (2004) Identifying similarities, periodicities and bursts for online search queries. In: SIGMOD ’04, pp 131–142

  29. Watanabe K, Ochi M, Okabe M, Onai R (2011) Jasmine: a real-time local-event detection system based on geolocation information propagated to microblogs. In: CIKM ’11, pp 2541–2544

  30. Weng J, Lee BS (2011) Event detection in Twitter. In: ICWSM ’11

  31. Yang Y, Pierce T, Carbonell J (1998) A study of retrospective and on-line event detection. In: SIGIR ’98, pp 28–36

  32. Zhou X, Chen L (2014) Event detection over Twitter social media streams. VLDB J 23(3):381–400

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Michael Gertz.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Abdelhaq, H., Gertz, M. & Armiti, A. Efficient online extraction of keywords for localized events in twitter. Geoinformatica 21, 365–388 (2017). https://doi.org/10.1007/s10707-016-0258-x

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10707-016-0258-x

Keywords

Navigation