Abstract
Events extraction from social media data is a tedious task because of their volume, velocity and informality. In a previous work [25], we proposed a successful approach for events extraction from social data. However, messages were processed individually which generates many meaningless events because of missing details scattered within millions of text segments. In addition, many unnecessary texts were analyzed which increased processing time and decreased the performance of the system.
In this paper, we aim to cope with the abovementioned weaknesses and ameliorate the performance of the system. We propose clustering to group semantically-related text segments, filter noise, reduce the volume of data to process and promote only relevant text segments to the information extraction pipeline. We port the clustering algorithm to a stream processing framework namely Storm in order to build a stream clustering solution and scale up to continuously growing volumes of data.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
References
Guha, S., Meyerson, A., Mishra, N., Motwani, R., O’Callaghan, L.: Clustering data streams: theory and practice. IEEE TKDE 15(3), 515–528 (2003)
Gama, J.: Knowledge Discovery from Data Streams. Chapman and Hall Book, Boca Raton (2003)
Aggarwal, C., Han, J., Wang, J., Yu, P.S.: A framework for clustering evolving data streams. In: Proceedings of VLDB, pp. 81–92 (2003)
Guha, S., Mishra, N., Motwani, R., O’Callaghan, L.: Clustering data streams. In: IEEE Symposium on Foundations of Computer Science, pp. 359–366. IEEE Computer Society (2000)
Baralis, E., Cerquitelli, T., Chiusano, S., Grimaudo, L., Xiao, X.: Analysis of Twitter data using a multiple-level clustering strategy. In: Third International Conference on Model and Data Engineering (MEDI 2013), Amantea, Italy, 25–27 September, pp. 13–24 (2013)
Kranen, K., Assent, I., Baldauf, C., Seidl, T.: The ClusTree: indexing micro-clusters for anytime stream mining. Knowl. Inf. Syst. 29, 249–272 (2011). doi:10.1007/s10115-010-0342-8
Ifrim, G., Shi, B., Brigadir, I.: Event detection in Twitter using aggressive filtering and hierarchical tweet clustering. In: Second Workshop on Social News on the Web (SNOW), Seoul, Korea. ACM Publisher (2014)
Gao, D., Zhang, R., Li, W., Hou, Y.: Twitter hyperlink recommendation with user-tweet-hyperlink three-way clustering. In: CIKM 2012, Maui, HI, USA (2012)
Tanev, H., Piskorski, J., Atkinson, M.: Real-time news event extraction for global monitoring systems. In: Joint Research Center of the European Commission, Web and Language Technology Group of IPSC, T.P. 267, Via Fermi 1, 21020 Ispra, VA, Italy (2008)
Zhou, D., Chen, L., Yulan, H.: An unsupervised framework of exploring events on Twitter: filtering, extraction and categorization. In: Proceedings of the Twenty-Ninth AAAI Conference on Artificial Intelligence (2015)
Georgescu, M., Kanhabua, N., Krause, D., Nejdl, W., Siersdorfer, S.: Extracting event-related information from article updates in Wikipedia. L3S Research Center, Appelstr. 9a, Hannover 30167, Germany (2012)
Li, H., Li, X., Ji, H., Marton, Y.: Domain-independent novel event discovery and semi-automatic event annotation (2010)
Zhang, Y., Xu, C., Rui, Y., Wang, J., Lu, H.: Semantic event extraction from basketball games using multi-modal analysis (2006)
Rusu, D., Hodson, J., Kimball, A.: Unsupervised techniques for extracting and clustering complex events in news. In: Proceedings of the 2nd Workshop on EVENTS: Definition, Detection, Coreference, and Representation, Baltimore, Maryland, USA, 22–27 June, pp. 26–34. Association for Computational Linguistics (2014)
Zhang, C., Soderland, S., Weld, D.: Exploiting parallel news streams for unsupervised event extraction (2013)
Mehryary, F., Kaewphan, S., Hakala, K., Ginter, F.: Eliminating Incorrect Events from Large-Scale Event Networks by Trigger Word Clustering and Pruning. The University of Turku Graduate School (UTUGS), University of Turku, Finland (2013)
Poibeau, T., et al. (eds.): Multi-source, Multilingual Information Extraction and Summarization. Theory and Applications of Natural Language Processing. Springer, Heidelberg (2013). doi:10.1007/978-3-642-28569-1. Chapter 2, J. Piskorski and R. Yangarber
Valenzuela-Escarcega, M., Hahn-Powell, G., Hicks, T., Surdeanu, M.: A domain-independent rule-based framework for event extraction. In: Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing of the Asian Federation of Natural Language Processing: Software Demonstrations (ACL-IJCNLP) (2015)
Manning, D., Mihai, C., Bauer, S., Finkel, J., Bethard, J., McClosky, D.: The Stanford CoreNLP Natural Language Processing Toolkit (2014)
Piskorski, J., Tanev, H., Atkinson, M., Van der Goot, E.: Cluster-Centric Approach to News Event Extraction. Joint Research Centre of the European Commission Institute for the Protection and Security of the Citizen Via Fermi 2749, 21027 Ispra, Italy (2010)
Cao, F., Ester, M., Qian, W., Zhou, A.: Density-based clustering over an evolving data stream over noise, pp. 326–337 (2004)
Chen, Y., Tu, L.: Density-based clustering for real-time stream data. In: Proceedings of the 13th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD 2007, pp. 133–142. ACM Press (2007)
Aggrawal, C.C., Subbian, K.: Event Detection in Social Stream. IBM T. J. Watson Research Center, Hawthorne, NY, USA, †Department of Computer Science & Engineering, University of Minnesota, Twin Cities, MN, USA (2011)
Aggarwal, C.C., Han, J., Wang, J., Yu, P.S.: A framework for on-demand classification of evolving data streams. IEEE TKDE 18(5), 577–589 (2006)
Jenhani, F., Gouider, M.S., Ben Said, L.: A hybrid approach for drug abuse events extraction from Twitter. In: 20th International Conference on Knowledge-Based and Intelligent Information and Engineering Systems (ICKIIES 2016), York, United Kingdom, pp. 1032–1040 (2016)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2018 Springer International Publishing AG
About this paper
Cite this paper
Jenhani, F., Gouider, M.S., Said, L.B. (2018). Social Stream Clustering to Improve Events Extraction. In: Czarnowski, I., Howlett, R., Jain, L. (eds) Intelligent Decision Technologies 2017. IDT 2017. Smart Innovation, Systems and Technologies, vol 73. Springer, Cham. https://doi.org/10.1007/978-3-319-59424-8_30
Download citation
DOI: https://doi.org/10.1007/978-3-319-59424-8_30
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-59423-1
Online ISBN: 978-3-319-59424-8
eBook Packages: EngineeringEngineering (R0)