Skip to main content
Log in

Local trend discovery on real-time microblogs with uncertain locations in tight memory environments

  • Published:
GeoInformatica Aims and scope Submit manuscript

Abstract

This paper presents GeoTrend+; a system approach to support scalable local trend discovery on recent microblogs, e.g., tweets, comments, online reviews, and check-ins, that come in real time. GeoTrend+ discovers top-k trending keywords in arbitrary spatial regions from recent microblogs that continuously arrive with high rates and a significant portion has uncertain geolocations. GeoTrend+ distinguishes itself from existing techniques in different aspects: (1) Discovering trends in arbitrary spatial regions, e.g., city blocks. (2) Considering both exact geolocations, e.g., accurate latitude/longitude coordinates, and uncertain geolocations, e.g., district-level or city-level, that represents a significant portion of past years microblogs. (3) Promoting recent microblogs as first-class citizens and optimizes different components to digest a continuous flow of fast data in main-memory while removing old data efficiently. (4) Providing various main-memory optimization techniques that are able to distinguish useful from useless data to effectively utilize tight memory resources while maintaining accurate query results on relatively large amounts of data. (5) Supporting various trending measures that effectively capture trending items under a variety of definitions that suit different applications. GeoTrend+ limits its scope to real-time data that is posted during the last T time units. To support its queries efficiently, GeoTrend+ employs an in-memory spatial index that is able to efficiently digest incoming data and expire data that is beyond the last T time units. The index also materializes top-k keywords in different spatial regions so that incoming queries can be processed with low latency. In peak times, the main-memory optimization techniques are employed to shed less important data to sustain high query accuracy with limited memory resources. Experimental results based on real data and queries show the scalability of GeoTrend+ to support high arrival rates and low query response time, and at least 90+% query accuracy even under limited memory resources.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14

Similar content being viewed by others

References

  1. Abdelhaq H, Sengstock C, Gertz M (2013) EvenTweet: Online Localized Event Detection from Twitter. In: VLDB

  2. Ahmed P, Hasan M, Kashyap A, Hristidis V, Tsotras VJ (2017) Efficient Computation of Top-k Frequent Terms over Spatio-temporal Ranges. In:s SIGMOD

  3. Arasu A, Manku GS (2004) Approximate counts and quantiles over sliding windows. In: PODS

  4. Aref WG, Samet H (1990) Efficient processing of window queries in the pyramid data structure. In: PODS

  5. Social media ’outstrips TV’ as news source for young people. http://www.bbc.com/news/uk-36528256, 2016

  6. After Boston Explosions, People Rush to Twitter for Breaking News. http://www.latimes.com/business/technology/la-fi-tn-after-boston-explosions-people-rush-to-twitter-for-breaking-news-20130415,0,3729783.story, 2013

  7. Budak C, Agrawal D, El Abbadi A (2011) Structural trend analysis for online social networks. PVLDB 4(10):646–656

    Google Scholar 

  8. Budak C, Georgiou T, Agrawal D, El Abbadi A (2014) GeoScope: Online detection of Geo-Correlated information trends in social networks. In: VLDB

  9. Busch M, Gade K, Larson B, Lok P, Luckenbill S, Lin J (2012) Earlybird: real-time search at twitter In: ICDE

  10. Chi Y, Tseng BL, Tatemura J (2006) Eigen-Trend: trend analysis in the blogosphere based on singular value decompositions. In: CIKM, pp 68–77

  11. Weibo S China Twitter, comes to rescue amid flooding in Beijing. http://thenextweb.com/asia/2012/07/23/sina-weibo-chinas-twitter-comes-to-rescue-amid-flooding-in-beijing/, 2012

  12. Cunha E, Magno G, Comarela G, Almeida V, Gonçalves MA, Benevenuto F (2011) Analyzing the dynamic evolution of hashtags on twitter: a language-based approach. In: Proceedings of the Workshop on Languages in Social Media, pp 58–65

  13. Datar M, Gionis A, Indyk P, Motwani R (2002) Maintaining stream statistics over sliding windows (extended abstract). In: SODA

  14. Fagin R, Kumar R, Sivakumar D (2003) Comparing Top k Lists. SIAM J Discret Math 17(1):134–160

    Article  Google Scholar 

  15. Fagin R, Lotem A, Naor M (2001) Optimal aggregation algorithms for middleware. In: PODS, pp 102–113

  16. Farazi S, et al. (2019) Top-K Spatial term queries on streaming data. In: ICDE

  17. Feng W, Han J, Wang J, Aggarwal C, Huang J (2015) STREAMCUBE: Hierarchical Spatio-temporal Hashtag Clustering for Event Exploration Over the Twitter Stream. In: ICDE

  18. Finkel RA, Bentley JL (1974) Quad Trees: A Data Structure for Retrieval on Composite Keys. ACTA, 4(1)

  19. Gao H, Tang J, Liu H (2012) Exploring Social-Historical ties on Location-Based social networks. In: The 6th Intl AAAI Conf on Weblogs and Social Media

  20. Golab L, DeHaan D, Demaine ED, López-Ortiz A, Ian Munro J (2003) Identifying frequent items in sliding windows over on-line packet streams. In: Internet Measurement Comference

  21. Us department of health and human services disease tracking. https://nowtrending.hhs.gov

  22. Hong L, Ahmed A, Gurumurthy S, Smola AJ, Tsioutsiouliklis K (2012) Discovering geographical topics in the twitter stream. In: WWW

  23. Huang J, Peng M, Wang H, Cao J, Gao W, Zhang X (2017) A probabilistic method for emerging topic tracking in microblog stream. World Wide Web 20(2):325–350

    Article  Google Scholar 

  24. Ikawa Y, Enoki M, Tatsubori M (2012) Location inference using microblog messages. In: WWW

  25. Indyk P, Koudas N, Muthukrishnan S (2000) Identifying representative trends in massive time series data sets using sketches. In: VLDB, pp 363–372

  26. Jonathan C, Magdy A, Mokbel M, Jonathan A (2016) GARNET A holistic system approach for trending queries in microblogs. In: ICDE

  27. Kenney JF, Sydney E (1962) Keeping. Mathematics of Statistics, Part 1, chapter 15, pp 252–285. van Nostrand 3rd edn

  28. Kim K-S, Kojima I, Ogawa H (2016) Discovery of local topics by using latent spatio-temporal relationships in geo-social media. Int J Geogr Inf Sci 30(9):1899–1922

    Article  Google Scholar 

  29. Krumm J, Eyewitness EH (2015) Identifying local events via space-time signals in twitter feeds. In: Proceedings of the 23rd Sigspatial International Conference on Advances in Geographic Information Systems, ACM, p 20

  30. Lazaridis I, Mehrotra S (2001) Progressive approximate aggregate queries with a Multi-Resolution tree structure. In: SIGMOD, pp 401–412

  31. Lee L-K, Ting HF (2006) A simpler and more efficient deterministic scheme for finding frequent items over sliding windows. In: PODS

  32. Li G, Jun H, Feng J (2014) Kian-lee tan effective location identification from microblogs. In: ICDE

  33. Li R, Lei KH, Khadiwala R, Chen-Chuan K (2012) Chang. TEDAS: a twitter-based event detection and analysis system. In: ICDE

  34. López IFV, Snodgrass RT, Moon B (2005) Spatiotemporal Aggregate Computation: A Survey. TKDE 17(2):271–286

    Google Scholar 

  35. Magdy A, Aly AM, Mokbel MF, Elnikety S, He Y, Nath S, Aref WG (2016) GeoTrend: Spatial Trending Queries on Real-time Microblogs. In: SIGSPATIAL

  36. Magdy A, Mokbel MF, Elnikety S, Nath S, Mercury YH (2014) A memory-constrained spatio-temporal real-time search on microblogs. In: ICDE

  37. Manku GS, Motwani R (2002) Approximate frequency counts over data streams. In: VLDB

  38. Mathioudakis M, TwitterMonitor NK (2010) Trend detection over the twitter stream. In: SIGMOD

  39. How Michael Jackson’s Death Shut Down Twitter, Brought Chaos to Google, and Killed Off Jeff Goldblum. https://www.dailymail.co.uk/sciencetech/article-1195651/How-Michael-Jacksons-death-shut-Twitter-overwhelmed-Google--killed-Jeff-Goldblum.html, 2009

  40. Nath S, Lin F (2013) Lenin ravindranath, and jitu padhye. Smartads: Bringing contextual ads to mobile apps. In: ACM Mobisys

  41. Nguyen K, Tran DA (2011) An analysis of activities in Facebook. In: IEEE Consumer communications and networking conference (CCNC)

  42. Papadias D, Kalnis P, Zhang J, Tao Y (2001) Efficient OLAP operations in spatial data warehouses. In: SSTD, pp 443–459

  43. Sankaranarayanan J, Samet H, Teitler BE, Lieberman MD, TwitterStand JS (2009) News in tweets. In: GIS

  44. Shin S, Choi M, Choi J, Langevin S, Bethune C, Horne P, Kronenfeld N, Kannan R, Drake B, Park H et al (2017) Stexnmf: Spatio-temporally exclusive topic discovery for anomalous event detection. In: 2017 IEEE International Conference on Data Mining (ICDM), IEEE, pp 435–444

  45. Skovsgaard A, Sidlauskas D, Jensen CS (2014) Scalable top-k spatio-temporal term querying. In: ICDE, pp 148–159

  46. Tao Y, Kollios G, Considine J, Li F, Papadias D (2004) Spatio-Temporal Aggregation using sketches. In: ICDE, p 214–225

  47. Trends 24. http://trends24.in

  48. Twitter Location Trends. https://support.twitter.com/articles/101125#Trend_Location

  49. Le HV, Takasu A (2018) Parallelizing top-k frequent spatio-temporal terms computation on key-value stores. In: SIGSPATIAL

  50. Weber I, Garimella VRK (2014) Visualizing user-defined, discriminative geo-temporal twitter activity. In ICWSM

  51. Wei H, Sankaranarayanan J, Samet H (2017) Finding and tracking local twitter users for news detection. In: SIGSPATIAL

  52. Wei H, Sankaranarayanan J, Samet H (2017) Measuring spatial influence of twitter users by interactions. In: Proceedings of the 1st ACM SIGSPATIAL Workshop on Analytics for Local Events and News

  53. Wei H, Sankaranarayanan J, Samet H (2018) Enhancing local live tweet stream to detect news. In: Proceedings of the 2nd ACM SIGSPATIAL Workshop on Analytics for Local Events and News

  54. Lingkun W, Lin W, Xiao X, Yabo X (2013) LSII An indexing structure for exact Real-Time search on microblogs. In: ICDE

  55. Zhang Donghui, Tsotras VJ, Gunopulos D (2002) Efficient aggregation over objects with extent. In: PODS, pp 121–132

  56. Zhang T, Zhou B, Huang J, Jia Y, Zhang B, Li Z (2017) A refined method for detecting interpretable and Real-Time bursty topic in microblog stream. In: WISE

Download references

Acknowledgments

Amr Magdy acknowledges the support of the National Science Foundation under Grants Number IIS-1849971, SES-1831615, and CNS-1837577. Mohamed Mokbel acknowledges the support of the National Science Foundation under Grants Number IIS-1525953, CNS-1512877, and IIS-1907855. Walid Aref acknowledges the support of the National Science Foundation under Grants Number III-1815796, and IIS-1910216.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Abdulaziz Almaslukh.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendix: Trend Line Slope

Appendix: Trend Line Slope

GeoTrend+ uses statistical linear regression slope to measure the trendiness of a certain keyword. The following Lemma derives the equation that determines the trendiness of a keyword:

Lemma 1

Given a keyword consecutive frequencies vector f =[f0,f1,...,fN], thekeyword trend line can be estimated with the following formula:

$$ Trend_{reg} = \frac{{\sum}_{i=1}^{N} [i \times (f_{i}-f_{0})]}{N(N+1)(2N+1)} $$
(3)

Proof

The simple linear regression slope Trendreg of x and y is given with the following equation:

$$ Trend_{reg} = \frac{Mean(xy)}{Mean(x^{2})} $$
(4)

Where Mean(x) is the average value of the vector and xy is a vector that results from value-wise multiplication of the vectors x and y. In GeoTrend+, the vector x values are always constants while the vector y contains the frequencies of a keyword W. Thus values of vector x are always be [1,2,3,...,N] while values of vector y are [f1,f2,f3,...,fN]. Thus, Mean(x2) can be simplified as \(\frac {(N+1)(2N+1)}{6}\). On the other hand, Mean(xy) can be calculated as \(\frac {{\sum }_{i=1}^{N} i \times f_{i}}{N}\). Substitutes both variables to Equation 1:

$$ Trend_{reg} = \frac{\frac{{\sum}_{i=1}^{N} i \times f_{i}}{N}}{\frac{(N+1)(2N+1)}{6}} = \frac{6 {\sum}_{i=1}^{N} i \times f_{i}}{N(N+1)(2N+1)} $$
(5)

The equation above assumes that the measurement is used from the start of the stream and each keyword W starts from frequency 0. However, in GeoTrend+, we need to consider the start position of a keyword W by using the previous frequency, namely f0. Thus, the equation above can be modified to:

$$ Trend_{reg} = \frac{6 {\sum}_{i=1}^{N} [i \times (f_{i}-f_{0})]}{N(N+1)(2N+1)} $$
(6)

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Almaslukh, A., Magdy, A., Aly, A.M. et al. Local trend discovery on real-time microblogs with uncertain locations in tight memory environments. Geoinformatica 24, 301–337 (2020). https://doi.org/10.1007/s10707-019-00380-z

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10707-019-00380-z

Keywords

Navigation