Abstract
Many studies have established that microblog streams, e.g., Twitter and Weibo, are leading indicators of emerging events. However, to statistically analyze and discover the emerging trends around these events in microblog message streams, e.g., popularity, sentiments, or aspects, one must identify messages related to an event with high precision and recall. In this paper, we propose a novel problem of automatically discovering meaningful keyword rules, which help identify the most relevant messages in the context of a given event from fast moving and high-volume social media streams. For the specified event, such as {#trump} or {#coronavirus}, our technique automatically extracts the most relevant keyword rules to collect related messages with high precision and recall. The rule set is dynamic, and we continuously identify new rules that capture the event evolution. Experiments with millions of tweets show that the proposed rule extraction method is highly effective for event-related data collection and has precision up to 99% and up to 4.5X recall over the baseline system.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Chen, C., Li, F., Ooi, B.C.: TI: an efficient indexing mechanism for real-time search on tweets. In: Proceedings of the 2011 ACM SIGMOD International Conference on Management of data, pp. 649–660 (2011)
Gao, Y., Wang, F., Luan, H.: Brand data gathering from live social media streams. In: Proceedings of International Conference on Multimedia Retrieval, pp. 169–176 (2014)
Gupta, M., Gao, J., Zhai, C.: Predicting future popularity trend of events in microblogging platforms. Proc. Am. Soc. Inf. Sci. Technol. 49(1), 1–10 (2012)
Kwak, H., Lee, C., Park, H.: What is Twitter, a social network or a news media?. In: Proceedings of the 19th International Conference on World Wide Web, pp. 591–600 (2010)
Li, R., Lei, KH., Khadiwala, R.: TEDAS: a Twitter-based event detection and analysis system. In: 2012 IEEE 28th International Conference on Data Engineering, pp. 1273–1276 IEEE (2012)
Massoudi, K., Tsagkias, M., de Rijke, M., Weerkamp, W.: Incorporating query expansion and quality indicators in searching microblog posts. In: Clough, P., et al. (eds.) ECIR 2011. LNCS, vol. 6611, pp. 362–367. Springer, Heidelberg (2011). https://doi.org/10.1007/978-3-642-20161-5_36
Nagmoti, R., Teredesai, A., De Cock, M.: Ranking approaches for microblog search. In: 2010 IEEE/WIC/ACM WI-IAT, vol. 1, pp. 153–157. IEEE (2010)
O'Connor, B., Krieger, M., Ahn, D.: TweetMotif: exploratory search and topic summarization for Twitter. In: Proceedings of the ICWSM, vol. 4, no. 1 (2010)
Sadilek, A., Kautz, H., Bigham, JP.: Finding your friends and following them to where you are. In: Proceedings of the Fifth ACM WSDM, pp. 723–732 (2012)
Sakaki, T., Okazaki, M., Matsuo, Y.: Earthquake shakes twitter users: real-time event detection by social sensors. In: Proceedings of the 19th WWW, pp. 851–860 (2010)
Teevan, J., Ramage, D., Morris, MR.: # TwitterSearch: a comparison of microblog search and web search. In: Proceedings of the Fourth ACM WSDM, pp. 35–44 (2011)
Tumasjan, A., Sprenger, T., Sandner, P.: Predicting elections with Twitter: what 140 characters reveal about political sentiment. In: Proceedings of ICWSM, vol. 4, no. 1 (2010)
Wang, H., Can, D., Kazemzadeh, A.: A system for real-time twitter sentiment analysis of 2012 us presidential election cycle. In: Proceedings of the ACL 2012 System Demonstrations, pp. 115–120 (2012)
Carbone, P., Katsifodimos, A., Ewen, S.: Apache flink: stream and batch processing in a single engine. In: Bulletin of the IEEE TCDE, vol. 36, no. 4 (2015)
Osborne, M., Moran, S., McCreadie, R.: Real-time detection, tracking, and monitoring of automatically discovered events in social media. In: Proceedings of 52nd ACL, pp. 37–42 (2014)
Lin, CX., Zhao, B., Mei, Q.: Pet: a statistical model for popular events tracking in social communities. In: Proceedings of the 16th ACM SIGKDD, pp. 929–938 (2010)
Weiler, A., Grossniklaus, M., Scholl, MH.: Event identification and tracking in social media streaming data. In: EDBT/ICDT, pp. 282–287 (2014)
Agarwal, MK., Ramamritham, K.: Real time contextual summarization of highly dynamic data streams. In: EDBT, pp. 168–179 (2017)
Metzler, D., Cai, C., Hovy, E.: Structured event retrieval over microblog archives. In: Proceedings of the Conference of NAACL-HLT, pp. 646–655 (2012)
Wang, Y., Huang, H., Feng, C.: Query expansion based on a feedback concept model for microblog retrieval. In: Proceedings of the 26th WWW, pp. 559–568 (2017)
Srikanth, M., Liu, A., Adams-Cohen, N.: Dynamic social media monitoring for fast-evolving online discussions. arXiv preprint arXiv:2102.12596 (2021)
Mu, L., Jin, P., Zheng, L.: Lifecycle-based event detection from microblogs. In: Companion Proceedings of the the Web Conference 2018, pp. 283–290 (2018)
Jin, P., Mu, L., Zheng, L.: News feature extraction for events on social network platforms. In: Proceedings of the 26th International Conference on WWW Companion, pp. 69–78 (2017)
Rudra, K., Goyal, P., Ganguly, N.: Identifying sub-events and summarizing disaster-related information from microblogs. In: The 41st International ACM SIGIR, pp. 265–274 (2018)
Phuvipadawat, S., Murata, T.: Breaking news detection and tracking in Twitter. In: 2010 IEEE/WIC/ACM International Conference on Web Intelligence and Intelligent Agent Technology, vol. 3, pp. 120–123. IEEE (2010)
Guille, A., Favre, C.: Event detection, tracking, and visualization in twitter: a mention-anomaly-based approach. In: Proceedings of SNAM, vol. 5, no. 1 (2015)
Lee, P., Lakshmanan, LV., Milios, EE.: Event evolution tracking from streaming social posts. arXiv preprint arXiv:1311.5978 (2013)
Agarwal, MK., Gupta, M., Mann, V.: Problem determination in enterprise middleware systems using change point correlation of time series data. In: Proceedings of NOMS, pp 471–482, April 2006
Magdy, A., Abdelhafeez, L., Kang, Y.: Microblogs data management: a survey. VLDB J. 29(1), 177–216 (2020)
Li, C., Wang, Y., Resnick, P.: ReQ-ReC: high recall retrieval with query pooling and interactive classification. In: Proceedings of the 37th International ACM SIGIR Conference on Research & Development in Information Retrieval, pp. 163–172 (2014)
Zheng, X., Sun, A., Wang, S.: Semi-supervised event-related tweet identification with dynamic keyword generation. In: Proceedings of the 2017 ACM on Conference on Information and Knowledge Management, pp. 1619–1628 (2017)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2021 Springer Nature Switzerland AG
About this paper
Cite this paper
Agarwal, M.K., Baranawal, A., Simmhan, Y., Gupta, M. (2021). Event Related Data Collection from Microblog Streams. In: Strauss, C., Kotsis, G., Tjoa, A.M., Khalil, I. (eds) Database and Expert Systems Applications. DEXA 2021. Lecture Notes in Computer Science(), vol 12924. Springer, Cham. https://doi.org/10.1007/978-3-030-86475-0_31
Download citation
DOI: https://doi.org/10.1007/978-3-030-86475-0_31
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-86474-3
Online ISBN: 978-3-030-86475-0
eBook Packages: Computer ScienceComputer Science (R0)