Abstract
In this work, we model the problem of online event detection in microblogs as a stateful stream processing problem and offer a novel solution that balances result accuracy and performance. Our new approach builds on two state of the art algorithms. The first algorithm is based on identifying bursty keywords inside blocks of blog messages. The second one involves clustering blog messages based on similarity of their contents. To combine the computational simplicity of the keyword-based algorithm with the semantic accuracy of the clustering-based algorithm, we propose a new hybrid algorithm. We then implement these algorithms in a streaming manner, on top of Apache Storm augmented with Apache Cassandra for state management. Experiments with a 12M tweet dataset from Twitter show that our hybrid approach provides a better accuracy-performance compromise than the previous approaches.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
We use the Stanford NLP parser: https://nlp.stanford.edu/software/lex-parser.shtml.
- 2.
- 3.
- 4.
The silhouette coefficient (SC) essentially measures how similar a given object is to its own cluster compared to the other clusters. Its value ranges between \(-1\) and \(+1\), where a higher value indicates higher clustering quality.
References
IEEE Data Engineering Bulletin: Special Issue on Next-Generation Stream Processing (2015)
Abdelhaq, H., et al.: EvenTweet: online localized event detection from Twitter. PVLDB 6(12), 1326–1329 (2013)
Atefeh, F., Khreich, W.: A survey of techniques for event detection in Twitter. Comput. Intell. 31(1), 132–164 (2015)
Becker, H., et al.: Beyond trending topics: real-world event identification on Twitter. In: International AAAI Conference on Weblogs and Social Media (ICWSM), pp. 438–441 (2011)
Cordeiro, M., Gama, J.: Online social networks event detection: a survey. In: Michaelis, S., Piatkowski, N., Stolpe, M. (eds.) Solving Large Scale Learning Tasks. Challenges and Algorithms. LNCS (LNAI), vol. 9580, pp. 1–41. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-41706-6_1
González-Jiménez, M., de Lara, J.: Datalyzer: streaming data applications made easy. In: International Conference on Web Engineering (ICWE), pp. 420–429 (2018)
Hasan, M., Orgun, M.A., Schwitter, R.: TwitterNews+: a framework for real time event detection from the Twitter data stream. In: Spiro, E., Ahn, Y.-Y. (eds.) SocInfo 2016. LNCS, vol. 10046, pp. 224–239. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-47880-7_14
Hromic, H., Prangnawarat, N., Hulpuş, I., Karnstedt, M., Hayes, C.: Graph-based methods for clustering topics of interest in Twitter. In: Cimiano, P., Frasincar, F., Houben, G.-J., Schwabe, D. (eds.) ICWE 2015. LNCS, vol. 9114, pp. 701–704. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-19890-3_61
Ilina, E., Hauff, C., Celik, I., Abel, F., Houben, G.-J.: Social event detection on twitter. In: Brambilla, M., Tokuda, T., Tolksdorf, R. (eds.) ICWE 2012. LNCS, vol. 7387, pp. 169–176. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-642-31753-8_12
Li, R., et al.: TEDAS: a Twitter-based event detection and analysis system. In: IEEE International Conference on Data Engineering (ICDE), pp. 1273–1276 (2012)
Liu, X., et al.: Reuters tracer: a large scale system of detecting & verifying real-time news events from Twitter. In: ACM International on Conference on Information and Knowledge Management (CIKM), pp. 207–216 (2016)
Mathioudakis, M., Koudas, N.: TwitterMonitor: trend detection over the Twitter stream. In: ACM SIGMOD International Conference on Management of Data (SIGMOD), pp. 1155–1158 (2010)
McCreadie, R., et al.: Scalable distributed event detection for Twitter. In: IEEE International Conference on Big Data, pp. 543–549 (2013)
Medvet, E., Bartoli, A.: Brand-related events detection, classification and summarization on Twitter. In: IEEE/WIC/ACM International Joint Conferences on Web Intelligence and Intelligent Agent Technology (WI-IAT), pp. 297–302 (2012)
Meehan, J., et al.: S-store: streaming meets transaction processing. Proc. VLDB Endow. (PVLDB) 8(13), 2134–2145 (2015)
Milstein, S., et al.: Twitter and the micro-messaging revolution: communication, connections, and immediacy - 140 characters at a time (An O’Reilly Radar Report) (2008). http://weigend.com/files/teaching/haas/2009/readings/OReillyTwitterReport200811.pdf
Mokbel, M.F., Magdy, A.: Microblogs data management systems: querying, analysis, and visualization (tutorial). In: ACM SIGMOD International Conference on Management of Data (SIGMOD), pp. 2219–2222 (2016)
Osborne, M., et al.: Bieber no more: first story detection using Twitter and Wikipedia. In: SIGIR Workshop on Time-Aware Information Access (TAIA) (2012)
Ozdikis, O., et al.: Semantic expansion of tweet contents for enhanced event detection in Twitter. In: International Conference on Advances in Social Networks Analysis and Mining (ASONAM), pp. 20–24 (2012)
Ozdikis, O., et al.: Incremental clustering with vector expansion for online event detection in microblogs. Soc. Netw. Anal. Min. 7(1), 56 (2017)
Petrovic, S., et al.: Streaming first story detection with application to Twitter. In: Human Language Technologies: Conference of the North American Chapter of the Association for Computational Linguistics (HLT-NAACL), pp. 181–189 (2010)
Sakaki, T., Okazaki, M., Matsuo, Y.: Earthquake shakes Twitter users: real-time event detection by social sensors. In: International Conference on World Wide Web (WWW), pp. 851–860 (2010)
Sakaki, T., et al.: Tweet analysis for real-time event detection and earthquake reporting system development. IEEE Trans. Knowl. Data Eng. (TKDE) 25(4), 919–931 (2013)
Sankaranarayanan, J., et al.: TwitterStand: news in tweets. In: ACM SIGSPATIAL International Conference on Advances in Geographic Information Systems (GIS), pp. 42–51 (2009)
Sarma, A.D., et al.: Dynamic relationship and event discovery. In: ACM International Conference on Web Search and Data Mining (WSDM), pp. 207–216 (2011)
Sayyadi, H., et al.: Event detection and tracking in social streams. In: International Conference on Web and Social Media (ICWSM), pp. 311–314 (2009)
Sellam, T., Alonso, O.: Raimond: quantitative data extraction from Twitter to describe events. In: Cimiano, P., Frasincar, F., Houben, G.-J., Schwabe, D. (eds.) ICWE 2015. LNCS, vol. 9114, pp. 251–268. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-19890-3_17
Wang, Y., Xu, R., Liu, B., Gui, L., Tang, B.: A storm-based real-time micro-blogging burst event detection system. In: Wang, X., Pedrycz, W., Chan, P., He, Q. (eds.) ICMLC 2014. CCIS, vol. 481, pp. 186–195. Springer, Heidelberg (2014). https://doi.org/10.1007/978-3-662-45652-1_20
Watanabe, K., et al.: Jasmine: a real-time local-event detection system based on geolocation information propagated to microblogs. In: ACM International Conference on Information and Knowledge Management (CIKM), pp. 2541–2544 (2011)
Xie, W., et al.: TopicSketch: real-time bursty topic detection from Twitter. IEEE Trans. Knowl. Data Eng. (TKDE) 28(8), 2216–2229 (2016)
Zhang, T., Zhou, B., Huang, J., Jia, Y., Zhang, B., Li, Z.: A refined method for detecting interpretable and real-time bursty topic in microblog stream. In: Bouguettaya, A., et al. (eds.) WISE 2017. LNCS, vol. 10569, pp. 3–17. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-68783-4_1
Zhou, X., Chen, L.: Event detection over Twitter social media streams. VLDB J. 23(3), 381–400 (2014)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2019 Springer Nature Switzerland AG
About this paper
Cite this paper
Sahin, O.C., Karagoz, P., Tatbul, N. (2019). Streaming Event Detection in Microblogs: Balancing Accuracy and Performance. In: Bakaev, M., Frasincar, F., Ko, IY. (eds) Web Engineering. ICWE 2019. Lecture Notes in Computer Science(), vol 11496. Springer, Cham. https://doi.org/10.1007/978-3-030-19274-7_10
Download citation
DOI: https://doi.org/10.1007/978-3-030-19274-7_10
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-19273-0
Online ISBN: 978-3-030-19274-7
eBook Packages: Computer ScienceComputer Science (R0)