Skip to main content

Social Stream Clustering to Improve Events Extraction

  • Conference paper
  • First Online:
Intelligent Decision Technologies 2017 (IDT 2017)

Abstract

Events extraction from social media data is a tedious task because of their volume, velocity and informality. In a previous work [25], we proposed a successful approach for events extraction from social data. However, messages were processed individually which generates many meaningless events because of missing details scattered within millions of text segments. In addition, many unnecessary texts were analyzed which increased processing time and decreased the performance of the system.

In this paper, we aim to cope with the abovementioned weaknesses and ameliorate the performance of the system. We propose clustering to group semantically-related text segments, filter noise, reduce the volume of data to process and promote only relevant text segments to the information extraction pipeline. We port the clustering algorithm to a stream processing framework namely Storm in order to build a stream clustering solution and scale up to continuously growing volumes of data.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 129.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 169.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 169.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    http://storm.apache.org/

References

  1. Guha, S., Meyerson, A., Mishra, N., Motwani, R., O’Callaghan, L.: Clustering data streams: theory and practice. IEEE TKDE 15(3), 515–528 (2003)

    Google Scholar 

  2. Gama, J.: Knowledge Discovery from Data Streams. Chapman and Hall Book, Boca Raton (2003)

    MATH  Google Scholar 

  3. Aggarwal, C., Han, J., Wang, J., Yu, P.S.: A framework for clustering evolving data streams. In: Proceedings of VLDB, pp. 81–92 (2003)

    Google Scholar 

  4. Guha, S., Mishra, N., Motwani, R., O’Callaghan, L.: Clustering data streams. In: IEEE Symposium on Foundations of Computer Science, pp. 359–366. IEEE Computer Society (2000)

    Google Scholar 

  5. Baralis, E., Cerquitelli, T., Chiusano, S., Grimaudo, L., Xiao, X.: Analysis of Twitter data using a multiple-level clustering strategy. In: Third International Conference on Model and Data Engineering (MEDI 2013), Amantea, Italy, 25–27 September, pp. 13–24 (2013)

    Google Scholar 

  6. Kranen, K., Assent, I., Baldauf, C., Seidl, T.: The ClusTree: indexing micro-clusters for anytime stream mining. Knowl. Inf. Syst. 29, 249–272 (2011). doi:10.1007/s10115-010-0342-8

    Article  Google Scholar 

  7. Ifrim, G., Shi, B., Brigadir, I.: Event detection in Twitter using aggressive filtering and hierarchical tweet clustering. In: Second Workshop on Social News on the Web (SNOW), Seoul, Korea. ACM Publisher (2014)

    Google Scholar 

  8. Gao, D., Zhang, R., Li, W., Hou, Y.: Twitter hyperlink recommendation with user-tweet-hyperlink three-way clustering. In: CIKM 2012, Maui, HI, USA (2012)

    Google Scholar 

  9. Tanev, H., Piskorski, J., Atkinson, M.: Real-time news event extraction for global monitoring systems. In: Joint Research Center of the European Commission, Web and Language Technology Group of IPSC, T.P. 267, Via Fermi 1, 21020 Ispra, VA, Italy (2008)

    Google Scholar 

  10. Zhou, D., Chen, L., Yulan, H.: An unsupervised framework of exploring events on Twitter: filtering, extraction and categorization. In: Proceedings of the Twenty-Ninth AAAI Conference on Artificial Intelligence (2015)

    Google Scholar 

  11. Georgescu, M., Kanhabua, N., Krause, D., Nejdl, W., Siersdorfer, S.: Extracting event-related information from article updates in Wikipedia. L3S Research Center, Appelstr. 9a, Hannover 30167, Germany (2012)

    Google Scholar 

  12. Li, H., Li, X., Ji, H., Marton, Y.: Domain-independent novel event discovery and semi-automatic event annotation (2010)

    Google Scholar 

  13. Zhang, Y., Xu, C., Rui, Y., Wang, J., Lu, H.: Semantic event extraction from basketball games using multi-modal analysis (2006)

    Google Scholar 

  14. Rusu, D., Hodson, J., Kimball, A.: Unsupervised techniques for extracting and clustering complex events in news. In: Proceedings of the 2nd Workshop on EVENTS: Definition, Detection, Coreference, and Representation, Baltimore, Maryland, USA, 22–27 June, pp. 26–34. Association for Computational Linguistics (2014)

    Google Scholar 

  15. Zhang, C., Soderland, S., Weld, D.: Exploiting parallel news streams for unsupervised event extraction (2013)

    Google Scholar 

  16. Mehryary, F., Kaewphan, S., Hakala, K., Ginter, F.: Eliminating Incorrect Events from Large-Scale Event Networks by Trigger Word Clustering and Pruning. The University of Turku Graduate School (UTUGS), University of Turku, Finland (2013)

    Google Scholar 

  17. Poibeau, T., et al. (eds.): Multi-source, Multilingual Information Extraction and Summarization. Theory and Applications of Natural Language Processing. Springer, Heidelberg (2013). doi:10.1007/978-3-642-28569-1. Chapter 2, J. Piskorski and R. Yangarber

  18. Valenzuela-Escarcega, M., Hahn-Powell, G., Hicks, T., Surdeanu, M.: A domain-independent rule-based framework for event extraction. In: Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing of the Asian Federation of Natural Language Processing: Software Demonstrations (ACL-IJCNLP) (2015)

    Google Scholar 

  19. Manning, D., Mihai, C., Bauer, S., Finkel, J., Bethard, J., McClosky, D.: The Stanford CoreNLP Natural Language Processing Toolkit (2014)

    Google Scholar 

  20. Piskorski, J., Tanev, H., Atkinson, M., Van der Goot, E.: Cluster-Centric Approach to News Event Extraction. Joint Research Centre of the European Commission Institute for the Protection and Security of the Citizen Via Fermi 2749, 21027 Ispra, Italy (2010)

    Google Scholar 

  21. Cao, F., Ester, M., Qian, W., Zhou, A.: Density-based clustering over an evolving data stream over noise, pp. 326–337 (2004)

    Google Scholar 

  22. Chen, Y., Tu, L.: Density-based clustering for real-time stream data. In: Proceedings of the 13th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD 2007, pp. 133–142. ACM Press (2007)

    Google Scholar 

  23. Aggrawal, C.C., Subbian, K.: Event Detection in Social Stream. IBM T. J. Watson Research Center, Hawthorne, NY, USA, †Department of Computer Science & Engineering, University of Minnesota, Twin Cities, MN, USA (2011)

    Google Scholar 

  24. Aggarwal, C.C., Han, J., Wang, J., Yu, P.S.: A framework for on-demand classification of evolving data streams. IEEE TKDE 18(5), 577–589 (2006)

    Google Scholar 

  25. Jenhani, F., Gouider, M.S., Ben Said, L.: A hybrid approach for drug abuse events extraction from Twitter. In: 20th International Conference on Knowledge-Based and Intelligent Information and Engineering Systems (ICKIIES 2016), York, United Kingdom, pp. 1032–1040 (2016)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Ferdaous Jenhani .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2018 Springer International Publishing AG

About this paper

Cite this paper

Jenhani, F., Gouider, M.S., Said, L.B. (2018). Social Stream Clustering to Improve Events Extraction. In: Czarnowski, I., Howlett, R., Jain, L. (eds) Intelligent Decision Technologies 2017. IDT 2017. Smart Innovation, Systems and Technologies, vol 73. Springer, Cham. https://doi.org/10.1007/978-3-319-59424-8_30

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-59424-8_30

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-59423-1

  • Online ISBN: 978-3-319-59424-8

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics