Abstract
Messages published via social media sites, such as Twitter, Facebook, and Foursquare hide a considerable amount of information about real world events. The timely identification of such events from this huge, unstructured, and noisy user-generated content plays an important role in increasing situation awareness and in supporting useful applications such as recommendation systems. Interestingly, a large number of these messages are enriched with location information, due to the recent advancements of today’s location acquisition techniques. This, in turn, enables location-aware event mining, i.e., the detection and tracking of localized events such as sport events, demonstrations, or traffic jams, to name but a few. The main building blocks of a localized event are local keywords that exhibit a surge in usage at the event location. In this paper, we propose an approach that aims at extracting local keywords from a stream of Twitter messages by (1) identifying local keywords, and (2) estimating the central location of each keyword. This extraction procedure is performed in an online fashion using a sliding window over the Twitter stream. Additionally, we address the problem of spatial outliers that adversely affect a sound identification of local keywords. Spatial outliers occur when people far away from the location of an event use related keywords in their Tweets. We handle this problem by adjusting the spatial distribution of keywords based on their co-occurrence with place names that may refer to the location of an event. To ensure scalability, we utilize a hierarchical spatial index to gradually prune the geographic space and thus to efficiently perform complex spatial computations. Extensive comparative experiments are conducted using Twitter data. The analysis of the experimental results demonstrates the superiority of our approach over existing methods in terms of efficiency and precision of the obtained results.
Similar content being viewed by others
References
Abdelhaq H, Gertz M (2014) On the locality of keywords in Twitter streams. In: IWGS ’14, pp 12–20
Abdelhaq H, Gertz M, Sengstock C (2013) Spatio-temporal characteristics of bursty words in Twitter streams. In: SIGSPATIAL ’13, pp 149–158
Abdelhaq H, Sengstock C, Gertz M (2013) EvenTweet: online localized event detection from Twitter. Proc VLDB Endow 6(12):1326–1329
Aggarwal CC, Subbian K (2012) Event detection in social streams. In: SDM ’12, pp 624–635
Allan J (ed) (2002) Topic detection and tracking: event-based information organization. Kluwer Academic Publishers, Norwell
Alvanaki F, Michel S, Ramamritham K, Weikum G (2012) See what’s enBlogue: real-time emergent topic identification in social media. In: EDBT ’12, pp 336–347
Backstrom L, Kleinberg J, Kumar R, Novak J (2008) Spatial variation in search engine queries. In: WWW ’08, pp 357–366
Becker H, Naaman M, Gravano L (2011) Beyond trending topics: real-world event identification on Twitter. In: ICWSM ’11
Boettcher A, Lee D (2012) EventRadar: a real-time local event detection scheme using Twitter stream. In: GreenCom ’12, pp 358–367
Cataldi M, Di Caro L, Schifanella C (2010) Emerging topic detection on Twitter based on temporal and social terms evaluation. In: MDMKDD ’10, pp 4:1–4:10
Chen L, Roy A (2009) Event detection from Flickr data through wavelet-based spatial analysis. In: CIKM ’09, pp 523–532
Chunara R, Andrews JR, Brownstein JS (2012) Social and news media enable estimation of epidemiological patterns early in the 2010 Haitian cholera outbreak. Am J Trop Med Hyg 86(1):39–45
Kleinberg J (2003) Bursty and hierarchical structure in streams. Data Min Knowl Discov 4:373–397
Lappas T, Arai B, Platakis M, Kotsakos D, Gunopulos D (2009) On burstiness-aware search for document sequences. In: KDD ’09, pp 477–486
Lappas T, Vieira MR, Gunopulos D, Tsotras VJ (2012) On the spatiotemporal burstiness of terms. PVLDB 5(9):836–847
Lee CH, Wu CH, Chien TF (2011) BursT: a dynamic term weighting scheme for mining microblogging messages. In: ISNN ’11, pp 548–557
Li C, Sun A, Datta A (2012) Twevent: segment-based event detection from tweets. In: CIKM ’12, pp 155–164
Magdy A, Mokbel MF, Elnikety S, Nath S, He Y (2014) Mercury: A memory-constrained spatio-temporal real-time search on microblogs. In: ICDE ’14, pp 172–183
Morstatter F, Pfeffer J, Liu H, Carley KM (2013) Is the sample good enough? comparing data from Twitter’s streaming API with Twitter’s firehose. In: ICWSM ’13
Petrovic S, Osborne M, Lavrenko V (2010) Streaming first story detection with application to Twitter. In: HLT ’10, pp 181–189
Rattenbury T, Good N, Naaman M (2007) Towards automatic extraction of event and place semantics from Flickr tags. In: SIGIR ’07, pp 103–110
Sakaki T, Okazaki M, Matsuo Y (2010) Earthquake shakes Twitter users: real-time event detection by social sensors. In: WWW ’10, pp 851–860
Samet H (1990) Applications of spatial data structures: computer graphics, image processing and GIS. Addison-Wesley
Sankaranarayanan J, Samet H, Teitler BE, Lieberman MD, Sperling J (2009) TwitterStand: news in tweets. In: GIS ’09, pp 42–51
Skovsgaard A, Sidlauskas D, Jensen C (2014) Scalable top-k spatio-temporal term querying. In: ICDE ’14, pp 148–159
Tanimoto S, Pavlidis T (1975) A hierarchical data structure for picture processing. Comput Vision Graph 4(2):104–119
Valkanas G, Gunopulos D (2013) How the live web feels about events. In: CIKM ’13, pp 639–648
Vlachos M, Meek C, Vagena Z, Gunopulos D (2004) Identifying similarities, periodicities and bursts for online search queries. In: SIGMOD ’04, pp 131–142
Watanabe K, Ochi M, Okabe M, Onai R (2011) Jasmine: a real-time local-event detection system based on geolocation information propagated to microblogs. In: CIKM ’11, pp 2541–2544
Weng J, Lee BS (2011) Event detection in Twitter. In: ICWSM ’11
Yang Y, Pierce T, Carbonell J (1998) A study of retrospective and on-line event detection. In: SIGIR ’98, pp 28–36
Zhou X, Chen L (2014) Event detection over Twitter social media streams. VLDB J 23(3):381–400
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Abdelhaq, H., Gertz, M. & Armiti, A. Efficient online extraction of keywords for localized events in twitter. Geoinformatica 21, 365–388 (2017). https://doi.org/10.1007/s10707-016-0258-x
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10707-016-0258-x