Skip to main content

Streaming Event Detection in Microblogs: Balancing Accuracy and Performance

  • Conference paper
  • First Online:
Book cover Web Engineering (ICWE 2019)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 11496))

Included in the following conference series:

Abstract

In this work, we model the problem of online event detection in microblogs as a stateful stream processing problem and offer a novel solution that balances result accuracy and performance. Our new approach builds on two state of the art algorithms. The first algorithm is based on identifying bursty keywords inside blocks of blog messages. The second one involves clustering blog messages based on similarity of their contents. To combine the computational simplicity of the keyword-based algorithm with the semantic accuracy of the clustering-based algorithm, we propose a new hybrid algorithm. We then implement these algorithms in a streaming manner, on top of Apache Storm augmented with Apache Cassandra for state management. Experiments with a 12M tweet dataset from Twitter show that our hybrid approach provides a better accuracy-performance compromise than the previous approaches.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 69.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 89.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    We use the Stanford NLP parser: https://nlp.stanford.edu/software/lex-parser.shtml.

  2. 2.

    http://storm.apache.org/.

  3. 3.

    http://cassandra.apache.org/.

  4. 4.

    The silhouette coefficient (SC) essentially measures how similar a given object is to its own cluster compared to the other clusters. Its value ranges between \(-1\) and \(+1\), where a higher value indicates higher clustering quality.

References

  1. IEEE Data Engineering Bulletin: Special Issue on Next-Generation Stream Processing (2015)

    Google Scholar 

  2. Abdelhaq, H., et al.: EvenTweet: online localized event detection from Twitter. PVLDB 6(12), 1326–1329 (2013)

    Google Scholar 

  3. Atefeh, F., Khreich, W.: A survey of techniques for event detection in Twitter. Comput. Intell. 31(1), 132–164 (2015)

    Article  MathSciNet  Google Scholar 

  4. Becker, H., et al.: Beyond trending topics: real-world event identification on Twitter. In: International AAAI Conference on Weblogs and Social Media (ICWSM), pp. 438–441 (2011)

    Google Scholar 

  5. Cordeiro, M., Gama, J.: Online social networks event detection: a survey. In: Michaelis, S., Piatkowski, N., Stolpe, M. (eds.) Solving Large Scale Learning Tasks. Challenges and Algorithms. LNCS (LNAI), vol. 9580, pp. 1–41. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-41706-6_1

    Chapter  Google Scholar 

  6. González-Jiménez, M., de Lara, J.: Datalyzer: streaming data applications made easy. In: International Conference on Web Engineering (ICWE), pp. 420–429 (2018)

    Google Scholar 

  7. Hasan, M., Orgun, M.A., Schwitter, R.: TwitterNews+: a framework for real time event detection from the Twitter data stream. In: Spiro, E., Ahn, Y.-Y. (eds.) SocInfo 2016. LNCS, vol. 10046, pp. 224–239. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-47880-7_14

    Chapter  Google Scholar 

  8. Hromic, H., Prangnawarat, N., Hulpuş, I., Karnstedt, M., Hayes, C.: Graph-based methods for clustering topics of interest in Twitter. In: Cimiano, P., Frasincar, F., Houben, G.-J., Schwabe, D. (eds.) ICWE 2015. LNCS, vol. 9114, pp. 701–704. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-19890-3_61

    Chapter  Google Scholar 

  9. Ilina, E., Hauff, C., Celik, I., Abel, F., Houben, G.-J.: Social event detection on twitter. In: Brambilla, M., Tokuda, T., Tolksdorf, R. (eds.) ICWE 2012. LNCS, vol. 7387, pp. 169–176. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-642-31753-8_12

    Chapter  Google Scholar 

  10. Li, R., et al.: TEDAS: a Twitter-based event detection and analysis system. In: IEEE International Conference on Data Engineering (ICDE), pp. 1273–1276 (2012)

    Google Scholar 

  11. Liu, X., et al.: Reuters tracer: a large scale system of detecting & verifying real-time news events from Twitter. In: ACM International on Conference on Information and Knowledge Management (CIKM), pp. 207–216 (2016)

    Google Scholar 

  12. Mathioudakis, M., Koudas, N.: TwitterMonitor: trend detection over the Twitter stream. In: ACM SIGMOD International Conference on Management of Data (SIGMOD), pp. 1155–1158 (2010)

    Google Scholar 

  13. McCreadie, R., et al.: Scalable distributed event detection for Twitter. In: IEEE International Conference on Big Data, pp. 543–549 (2013)

    Google Scholar 

  14. Medvet, E., Bartoli, A.: Brand-related events detection, classification and summarization on Twitter. In: IEEE/WIC/ACM International Joint Conferences on Web Intelligence and Intelligent Agent Technology (WI-IAT), pp. 297–302 (2012)

    Google Scholar 

  15. Meehan, J., et al.: S-store: streaming meets transaction processing. Proc. VLDB Endow. (PVLDB) 8(13), 2134–2145 (2015)

    Article  Google Scholar 

  16. Milstein, S., et al.: Twitter and the micro-messaging revolution: communication, connections, and immediacy - 140 characters at a time (An O’Reilly Radar Report) (2008). http://weigend.com/files/teaching/haas/2009/readings/OReillyTwitterReport200811.pdf

  17. Mokbel, M.F., Magdy, A.: Microblogs data management systems: querying, analysis, and visualization (tutorial). In: ACM SIGMOD International Conference on Management of Data (SIGMOD), pp. 2219–2222 (2016)

    Google Scholar 

  18. Osborne, M., et al.: Bieber no more: first story detection using Twitter and Wikipedia. In: SIGIR Workshop on Time-Aware Information Access (TAIA) (2012)

    Google Scholar 

  19. Ozdikis, O., et al.: Semantic expansion of tweet contents for enhanced event detection in Twitter. In: International Conference on Advances in Social Networks Analysis and Mining (ASONAM), pp. 20–24 (2012)

    Google Scholar 

  20. Ozdikis, O., et al.: Incremental clustering with vector expansion for online event detection in microblogs. Soc. Netw. Anal. Min. 7(1), 56 (2017)

    Article  Google Scholar 

  21. Petrovic, S., et al.: Streaming first story detection with application to Twitter. In: Human Language Technologies: Conference of the North American Chapter of the Association for Computational Linguistics (HLT-NAACL), pp. 181–189 (2010)

    Google Scholar 

  22. Sakaki, T., Okazaki, M., Matsuo, Y.: Earthquake shakes Twitter users: real-time event detection by social sensors. In: International Conference on World Wide Web (WWW), pp. 851–860 (2010)

    Google Scholar 

  23. Sakaki, T., et al.: Tweet analysis for real-time event detection and earthquake reporting system development. IEEE Trans. Knowl. Data Eng. (TKDE) 25(4), 919–931 (2013)

    Article  Google Scholar 

  24. Sankaranarayanan, J., et al.: TwitterStand: news in tweets. In: ACM SIGSPATIAL International Conference on Advances in Geographic Information Systems (GIS), pp. 42–51 (2009)

    Google Scholar 

  25. Sarma, A.D., et al.: Dynamic relationship and event discovery. In: ACM International Conference on Web Search and Data Mining (WSDM), pp. 207–216 (2011)

    Google Scholar 

  26. Sayyadi, H., et al.: Event detection and tracking in social streams. In: International Conference on Web and Social Media (ICWSM), pp. 311–314 (2009)

    Google Scholar 

  27. Sellam, T., Alonso, O.: Raimond: quantitative data extraction from Twitter to describe events. In: Cimiano, P., Frasincar, F., Houben, G.-J., Schwabe, D. (eds.) ICWE 2015. LNCS, vol. 9114, pp. 251–268. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-19890-3_17

    Chapter  Google Scholar 

  28. Wang, Y., Xu, R., Liu, B., Gui, L., Tang, B.: A storm-based real-time micro-blogging burst event detection system. In: Wang, X., Pedrycz, W., Chan, P., He, Q. (eds.) ICMLC 2014. CCIS, vol. 481, pp. 186–195. Springer, Heidelberg (2014). https://doi.org/10.1007/978-3-662-45652-1_20

    Chapter  Google Scholar 

  29. Watanabe, K., et al.: Jasmine: a real-time local-event detection system based on geolocation information propagated to microblogs. In: ACM International Conference on Information and Knowledge Management (CIKM), pp. 2541–2544 (2011)

    Google Scholar 

  30. Xie, W., et al.: TopicSketch: real-time bursty topic detection from Twitter. IEEE Trans. Knowl. Data Eng. (TKDE) 28(8), 2216–2229 (2016)

    Article  Google Scholar 

  31. Zhang, T., Zhou, B., Huang, J., Jia, Y., Zhang, B., Li, Z.: A refined method for detecting interpretable and real-time bursty topic in microblog stream. In: Bouguettaya, A., et al. (eds.) WISE 2017. LNCS, vol. 10569, pp. 3–17. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-68783-4_1

    Chapter  Google Scholar 

  32. Zhou, X., Chen, L.: Event detection over Twitter social media streams. VLDB J. 23(3), 381–400 (2014)

    Article  MathSciNet  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Pinar Karagoz .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2019 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Sahin, O.C., Karagoz, P., Tatbul, N. (2019). Streaming Event Detection in Microblogs: Balancing Accuracy and Performance. In: Bakaev, M., Frasincar, F., Ko, IY. (eds) Web Engineering. ICWE 2019. Lecture Notes in Computer Science(), vol 11496. Springer, Cham. https://doi.org/10.1007/978-3-030-19274-7_10

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-19274-7_10

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-19273-0

  • Online ISBN: 978-3-030-19274-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics