Skip to main content

Advertisement

Log in

Automatic targeted-domain spatiotemporal event detection in twitter

  • Published:
GeoInformatica Aims and scope Submit manuscript

    We’re sorry, something doesn't seem to be working properly.

    Please try refreshing the page. If that doesn't work, please contact support so we can address the problem.

Abstract

Twitter has become an important data source for detecting events, especially tracking detailed information for events of a specific domain. Previous studies on targeted-domain Twitter information extraction have used supervised learning techniques to identify domain-related tweets, however, the need for extensive manual labeling makes these supervised systems extremely expensive to build and maintain. What’s more, most of these existing work fail to consider spatiotemporal factors, which are essential attributes of target-domain events. In this paper, we propose a semi-supervised method for Automatical Targeted-domain Spatiotemporal Event Detection (ATSED) in Twitter. Given a targeted domain, ATSED first learns tweet labels from historical data, and then detects on-going events from real-time Twitter data streams. Specifically, an efficient label generation algorithm is proposed to automatically recognize tweet labels from domain-related news articles, a customized classifier is created for Twitter data analysis by utilizing tweets’ distinguishing features, and a novel multinomial spatial-scan model is provided to identify geographical locations for detected events. Experiments on 305 million tweets demonstrated the effectiveness of this new approach.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11

Similar content being viewed by others

Notes

  1. https://blog.twitter.com/2013/celebrating-twitter7.

  2. http://www.milenio.com/cdb/doc/noticias2011/fcd1c695e4a21d7edcae432c9f931ecd?quicktabs_1=2.

  3. https://twitter.com/BicitlanRadio/status/290232591246823425.

  4. https://twitter.com/revistaeneo/status/290185989815676930.

  5. https://dev.twitter.com/rest/public.

  6. http://www.mitre.org/.

  7. In addition to domestic Top 3 news outlets, the following global news outlets are also included: The New York Times; The Guardian, The Wall Street Journal, The Washington Post, The International Herald Tribune, The Times of London, Infolatam.

  8. http://www.cs.ucr.edu/tlappas/scripts/STBurst.rar.

  9. https://goo.gl/8wfhkN.

References

  1. Becker H, Naaman M, Gravano L (2011) Beyond trending topics: real-world event identification on twitter. In: Proceedings of the 5th international AAAI conference on weblogs and social media. AAAI, pp 438–441

  2. Bhattacharya I (2013) Google trends for formulating GIS mapping of disease outbreaks in India. Int J Geoinform 9. Springer

  3. Brants T, Chen F, Farahat A (2003) A system for new event detection. In: Proceedings of the 26th annual international ACM SIGIR conference on research and development in informaion retrieval. ACM, pp 330–337

  4. Cataldi M, Di Caro L, Schifanella C (2010) Emerging topic detection on twitter based on temporal and social terms evaluation. In: Proceedings of the 10th international workshop on multimedia data mining. ACM, pp 1–10

  5. Fung GPC, Yu JX, Yu PS, Lu H (2005) Parameter free bursty events detection in text streams. In: Proceedings of the 31st international conference on very large data bases. VLDB Endowment, pp 181–192

  6. Hu M, Liu S, Wei F, Wu Y, Stasko J, Ma KL (2012) Breaking news on twitter. In: Proceedings of the 21st SIGCHI conference on human factors in computing systems. ACM, pp 2751–2754

  7. Jin O, Liu NN, Zhao K, Yu Y, Yang Q (2011) Transferring topical knowledge from auxiliary long texts for short text clustering. In: Proceedings of the 20th ACM international conference on information and knowledge management. ACM, pp 775–784

  8. Joachims T (1998) Text categorization with support vector machines: Learning with many relevant features. In: Proceedings of the 10th European conference on machine learning. Springer, pp 137–142

  9. Kulldorff M (1999) Spatial scan statistics: models, calculations, and applications. In: Scan statistics and applications. Springer, pp 303–322

  10. Kumaran G, Allan J (2004) Text classification and named entities for new event detection. In: Proceedings of the 27th annual ACM SIGIR conference on research and development in information retrieval. ACM, pp 297–304

  11. Lappas T, Vieira MR, Gunopulos D, Tsotras VJ (2012) On the spatiotemporal burstiness of terms. In: Proceedings of the VLDB endowment, vol 5. VLDB Endowment, pp 836–847

  12. Leys C, Ley C, Klein O, Bernard P, Licata L (2013) Detecting outliers: do not use standard deviation around the mean, use absolute deviation around the median. J Exp Soc Psychol 49:764–766. Elsevier

    Article  Google Scholar 

  13. Li C, Sun A, Datta A (2012) Twevent: segment-based event detection from tweets. In: Proceedings of the 21st ACM international conference on information and knowledge management. ACM, pp 155–164

  14. Li R, Lei KH, Khadiwala R, Chang KCC (2012) Tedas: a twitter-based event detection and analysis system. In: Proceedings of the 28th international conference on data engineering. IEEE, pp 1273–1276

  15. Mark N (2006) Modularity and community structure in networks. Proc Natl Acad Sci 103:8577–8582. National Acad Sciences

    Article  Google Scholar 

  16. Min B, Grishman R, Wan L, Wang C, Gondek D (2013) Distant supervision for relation extraction with an incomplete knowledge base. In: HLT-NAACL. ACL, pp 777–782

  17. Muthiah S, Huang B, Arredondo J, Mares D, Getoor L, Katz G, Ramakrishnan N (2015) Planned protest modeling in news and social media. In: Proceedings of the 29th AAAI conference on artificial intelligence. AAAI, pp 3920–3927

  18. Neill DB (2012) Fast subset scan for spatial pattern detection. J R Stat Soc Ser B (Stat Methodol) 74:337–360. Wiley Online Library

    Article  Google Scholar 

  19. Pan SJ, Yang Q (2010) A survey on transfer learning. IEEE Trans Knowl Data Eng 22:1345–1359. IEEE

    Article  Google Scholar 

  20. Petrović S., Osborne M, Lavrenko V (2010) Streaming first story detection with application to twitter. In: Proceedings of the 2010 annual conference of the North American chapter of the association for computational linguistics. ACL, pp 181–189

  21. Phan XH, Nguyen LM, Horiguchi S (2008) Learning to classify short and sparse text & web with hidden topics from large-scale data collections. In: Proceedings of the 17th international conference on world wide web. ACM, pp 91–100

  22. Popescu AM, Pennacchiotti M, Paranjpe D (2011) Extracting events and event descriptions from twitter. In: Proceedings of the 20th international conference companion on world wide web. ACM, pp 105–106

  23. Purver M, Battersby S (2012) Experimenting with distant supervision for emotion classification. In: Proceedings of the 13th conference of the European chapter of the association for computational linguistics. ACL, pp 482–491

  24. Ritter A, Clark S, Etzioni O et al (2011) Named entity recognition in tweets: an experimental study. In: Proceedings of the conference on empirical methods in natural language processing. ACL, pp 1524–1534

  25. Ritter A, Mausam, Etzioni O, Clark S (2012) Open domain event extraction from twitter. In: Proceedings of the 18th ACM SIGKDD international conference on knowledge discovery and data mining. ACM, pp 1104–1112

  26. Sakaki T, Okazaki M, Matsuo Y (2010) Earthquake shakes twitter users: real-time event detection by social sensors. In: Proceedings of the 19th international conference on world wide web. ACM, pp 851–860

  27. Settles B (2010) Active learning literature survey, vol 52. University of Wisconsin, Madison, p 11

  28. Signorini A, Segre AM, Polgreen PM (2011) The use of Twitter to track levels of disease activity and public concern in the US during the influenza A H1N1 pandemic. PloS One 6:e19467. Public Library of Science

  29. Tufekci Z, Wilson C (2012) Social media and the decision to participate in political protest: observations from Tahrir Square. J Commun 62:363–379. Wiley Online Library

    Article  Google Scholar 

  30. Walker HM (1931) Studies in the history of the statistical method. The Williams and Wilkins Company, pp 24–25

  31. Weng J, Lee BS (2011) Event detection in twitter. In: Proceedings of the 5th international AAAI conference on weblogs and social media. AAAI, pp 401–408

  32. Wilson C, Dunn A (2011) Digital media in the Egyptian revolution: descriptive analysis from the Tahrir data sets. Int J Commun 5:1248–1272. USC Annenberg Press

    Google Scholar 

  33. Yin Z, Cao L, Han J, Zhai C, Huang T (2011) Geographical topic discovery and comparison. In: Proceedings of the 20th international conference on World wide web. ACM, pp 247–256

  34. Zhang D, Liu Y, Lawrence RD, Chenthamarakshan V (2011) Transfer latent semantic learning: microblog mining with less supervision. In: Proceedings of the 25th AAAI conference on artificial intelligence. AAAI, pp 561–566

  35. Zhao L, Hua T, Lu CT, Chen R (2015) A topic-focused trust model for Twitter. In: Journal of Computer Communications, vol 76. Springer, pp 1–11

Download references

Acknowledgments

Supported by the Intelligence Advanced Research Projects Activity (IARPA) via DoI/NBC contract number D12PC000337, the US Government is authorized to reproduce and distribute reprints of this work for Governmental purposes notwithstanding any copyright annotation thereon. Disclaimer: The views and conclusions contained herein are those of the authors and should not be interpreted as necessarily representing the official policies or endorsements, either expressed or implied, of IARPA, DoI/NBC, or the US Government.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Ting Hua.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Hua, T., Chen, F., Zhao, L. et al. Automatic targeted-domain spatiotemporal event detection in twitter. Geoinformatica 20, 765–795 (2016). https://doi.org/10.1007/s10707-016-0263-0

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10707-016-0263-0

Keywords

Navigation