Skip to main content

Event Related Data Collection from Microblog Streams

  • Conference paper
  • First Online:
Database and Expert Systems Applications (DEXA 2021)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 12924))

Included in the following conference series:

  • 757 Accesses

Abstract

Many studies have established that microblog streams, e.g., Twitter and Weibo, are leading indicators of emerging events. However, to statistically analyze and discover the emerging trends around these events in microblog message streams, e.g., popularity, sentiments, or aspects, one must identify messages related to an event with high precision and recall. In this paper, we propose a novel problem of automatically discovering meaningful keyword rules, which help identify the most relevant messages in the context of a given event from fast moving and high-volume social media streams. For the specified event, such as {#trump} or {#coronavirus}, our technique automatically extracts the most relevant keyword rules to collect related messages with high precision and recall. The rule set is dynamic, and we continuously identify new rules that capture the event evolution. Experiments with millions of tweets show that the proposed rule extraction method is highly effective for event-related data collection and has precision up to 99% and up to 4.5X recall over the baseline system.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Chen, C., Li, F., Ooi, B.C.: TI: an efficient indexing mechanism for real-time search on tweets. In: Proceedings of the 2011 ACM SIGMOD International Conference on Management of data, pp. 649–660 (2011)

    Google Scholar 

  2. Gao, Y., Wang, F., Luan, H.: Brand data gathering from live social media streams. In: Proceedings of International Conference on Multimedia Retrieval, pp. 169–176 (2014)

    Google Scholar 

  3. Gupta, M., Gao, J., Zhai, C.: Predicting future popularity trend of events in microblogging platforms. Proc. Am. Soc. Inf. Sci. Technol. 49(1), 1–10 (2012)

    Article  Google Scholar 

  4. Kwak, H., Lee, C., Park, H.: What is Twitter, a social network or a news media?. In: Proceedings of the 19th International Conference on World Wide Web, pp. 591–600 (2010)

    Google Scholar 

  5. Li, R., Lei, KH., Khadiwala, R.: TEDAS: a Twitter-based event detection and analysis system. In: 2012 IEEE 28th International Conference on Data Engineering, pp. 1273–1276 IEEE (2012)

    Google Scholar 

  6. Massoudi, K., Tsagkias, M., de Rijke, M., Weerkamp, W.: Incorporating query expansion and quality indicators in searching microblog posts. In: Clough, P., et al. (eds.) ECIR 2011. LNCS, vol. 6611, pp. 362–367. Springer, Heidelberg (2011). https://doi.org/10.1007/978-3-642-20161-5_36

    Chapter  Google Scholar 

  7. Nagmoti, R., Teredesai, A., De Cock, M.: Ranking approaches for microblog search. In: 2010 IEEE/WIC/ACM WI-IAT, vol. 1, pp. 153–157. IEEE (2010)

    Google Scholar 

  8. O'Connor, B., Krieger, M., Ahn, D.: TweetMotif: exploratory search and topic summarization for Twitter. In: Proceedings of the ICWSM, vol. 4, no. 1 (2010)

    Google Scholar 

  9. Sadilek, A., Kautz, H., Bigham, JP.: Finding your friends and following them to where you are. In: Proceedings of the Fifth ACM WSDM, pp. 723–732 (2012)

    Google Scholar 

  10. Sakaki, T., Okazaki, M., Matsuo, Y.: Earthquake shakes twitter users: real-time event detection by social sensors. In: Proceedings of the 19th WWW, pp. 851–860 (2010)

    Google Scholar 

  11. Teevan, J., Ramage, D., Morris, MR.: # TwitterSearch: a comparison of microblog search and web search. In: Proceedings of the Fourth ACM WSDM, pp. 35–44 (2011)

    Google Scholar 

  12. Tumasjan, A., Sprenger, T., Sandner, P.: Predicting elections with Twitter: what 140 characters reveal about political sentiment. In: Proceedings of ICWSM, vol. 4, no. 1 (2010)

    Google Scholar 

  13. Wang, H., Can, D., Kazemzadeh, A.: A system for real-time twitter sentiment analysis of 2012 us presidential election cycle. In: Proceedings of the ACL 2012 System Demonstrations, pp. 115–120 (2012)

    Google Scholar 

  14. Carbone, P., Katsifodimos, A., Ewen, S.: Apache flink: stream and batch processing in a single engine. In: Bulletin of the IEEE TCDE, vol. 36, no. 4 (2015)

    Google Scholar 

  15. Osborne, M., Moran, S., McCreadie, R.: Real-time detection, tracking, and monitoring of automatically discovered events in social media. In: Proceedings of 52nd ACL, pp. 37–42 (2014)

    Google Scholar 

  16. Lin, CX., Zhao, B., Mei, Q.: Pet: a statistical model for popular events tracking in social communities. In: Proceedings of the 16th ACM SIGKDD, pp. 929–938 (2010)

    Google Scholar 

  17. Weiler, A., Grossniklaus, M., Scholl, MH.: Event identification and tracking in social media streaming data. In: EDBT/ICDT, pp. 282–287 (2014)

    Google Scholar 

  18. Agarwal, MK., Ramamritham, K.: Real time contextual summarization of highly dynamic data streams. In: EDBT, pp. 168–179 (2017)

    Google Scholar 

  19. Metzler, D., Cai, C., Hovy, E.: Structured event retrieval over microblog archives. In: Proceedings of the Conference of NAACL-HLT, pp. 646–655 (2012)

    Google Scholar 

  20. Wang, Y., Huang, H., Feng, C.: Query expansion based on a feedback concept model for microblog retrieval. In: Proceedings of the 26th WWW, pp. 559–568 (2017)

    Google Scholar 

  21. Srikanth, M., Liu, A., Adams-Cohen, N.: Dynamic social media monitoring for fast-evolving online discussions. arXiv preprint arXiv:2102.12596 (2021)

  22. Mu, L., Jin, P., Zheng, L.: Lifecycle-based event detection from microblogs. In: Companion Proceedings of the the Web Conference 2018, pp. 283–290 (2018)

    Google Scholar 

  23. Jin, P., Mu, L., Zheng, L.: News feature extraction for events on social network platforms. In: Proceedings of the 26th International Conference on WWW Companion, pp. 69–78 (2017)

    Google Scholar 

  24. Rudra, K., Goyal, P., Ganguly, N.: Identifying sub-events and summarizing disaster-related information from microblogs. In: The 41st International ACM SIGIR, pp. 265–274 (2018)

    Google Scholar 

  25. Phuvipadawat, S., Murata, T.: Breaking news detection and tracking in Twitter. In: 2010 IEEE/WIC/ACM International Conference on Web Intelligence and Intelligent Agent Technology, vol. 3, pp. 120–123. IEEE (2010)

    Google Scholar 

  26. Guille, A., Favre, C.: Event detection, tracking, and visualization in twitter: a mention-anomaly-based approach. In: Proceedings of SNAM, vol. 5, no. 1 (2015)

    Google Scholar 

  27. Lee, P., Lakshmanan, LV., Milios, EE.: Event evolution tracking from streaming social posts. arXiv preprint arXiv:1311.5978 (2013)

  28. Agarwal, MK., Gupta, M., Mann, V.: Problem determination in enterprise middleware systems using change point correlation of time series data. In: Proceedings of NOMS, pp 471–482, April 2006

    Google Scholar 

  29. Magdy, A., Abdelhafeez, L., Kang, Y.: Microblogs data management: a survey. VLDB J. 29(1), 177–216 (2020)

    Article  Google Scholar 

  30. Li, C., Wang, Y., Resnick, P.: ReQ-ReC: high recall retrieval with query pooling and interactive classification. In: Proceedings of the 37th International ACM SIGIR Conference on Research & Development in Information Retrieval, pp. 163–172 (2014)

    Google Scholar 

  31. Zheng, X., Sun, A., Wang, S.: Semi-supervised event-related tweet identification with dynamic keyword generation. In: Proceedings of the 2017 ACM on Conference on Information and Knowledge Management, pp. 1619–1628 (2017)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Manoj K. Agarwal .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2021 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Agarwal, M.K., Baranawal, A., Simmhan, Y., Gupta, M. (2021). Event Related Data Collection from Microblog Streams. In: Strauss, C., Kotsis, G., Tjoa, A.M., Khalil, I. (eds) Database and Expert Systems Applications. DEXA 2021. Lecture Notes in Computer Science(), vol 12924. Springer, Cham. https://doi.org/10.1007/978-3-030-86475-0_31

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-86475-0_31

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-86474-3

  • Online ISBN: 978-3-030-86475-0

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics