Abstract
Identifying news events and relating current news to past events or already identified ones is an open challenge for news agencies. In this paper, I propose a study to identify events from semantic RDF graph representations of real-time and big data streams of news and pre-news. The proposed solution must provide acceptable accuracy over time and consider the requirements of incremental clustering, big data and real-time streams. To design a solution for identifying events, I want to study which clustering approaches are best for this purpose including methods for clustering RDF graphs using machine learning and “classical” algorithmic approaches. I also present three different evaluation approaches.
Supported by the News Angler project funded by the Norwegian Research Council’s IKTPLUSS programme as project 275872.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Al-Moslmi, T., Gallofré Ocaña, M.: Lifting news into a journalistic knowledge platform. In: Proceedings of the CIKM 2020 Workshops. Galway, Ireland (2020)
Ali, M., Mohamed, Y.: A method for clustering unlabeled BIM objects using entropy and TF-IDF with RDF encoding. Adv. Eng. Inform. 33, 154–163 (2017). https://doi.org/10.1016/j.aei.2017.06.005
Araki, J., Mitamura, T.: Open-domain event detection using distant supervision. In: Proceedings of the 27th International Conference on Computational Linguistics, pp. 878–891. Association for Computational Linguistics, Santa Fe, New Mexico, USA, August 2018. https://www.aclweb.org/anthology/C18-1075
Bai, Y., Ding, H., Bian, S., Chen, T., Sun, Y., Wang, W.: SimGNN: a neural network approach to fast graph similarity computation (2020)
Bellandi, V., Ceravolo, P., Maghool, S., Siccardi, S.: Graph Embeddings in Criminal Investigation: Extending the Scope of Enquiry Protocols, pp. 64–71. Association for Computing Machinery, New York (2020). https://doi.org/10.1145/3415958.3433102
Castells, P., et al.: Neptuno: Semantic Web Technologies for a Digital Newspaper Archive. In: Bussler, C.J., Davies, J., Fensel, D., Studer, R. (eds.) ESWS 2004. LNCS, vol. 3053, pp. 445–458. Springer, Heidelberg (2004). https://doi.org/10.1007/978-3-540-25956-5_31
Dami, S., Barforoush, A.A., Shirazi, H.: News events prediction using Markov logic networks. J. Inf. Sci. 44(1), 91–109 (2018). https://doi.org/10.1177/0165551516673285
Eddamiri, S., Zemmouri, E.M., Benghabrit, A.: An improved RDF data clustering algorithm. In: The Second International Conference on Intelligent Computing in Data Science (ICDS2018). vol. 148, pp. 208–217 (2019). https://doi.org/10.1016/j.procs.2019.01.038
Fernández, N., Fuentes, D., Sánchez, L., Fisteus, J.A.: The news ontology: design and applications. Exp. Syst. Appl. 37(12), 8694–8704 (2010). https://doi.org/10.1016/j.eswa.2010.06.055
Florence, R., Nogueira, B., Marcacini, R.: Constrained hierarchical clustering for news events. In: Proceedings of the 21st International Database Engineering & Applications Symposium (IDEAS 2017), pp. 49–56. Association for Computing Machinery, New York (2017). https://doi.org/10.1145/3105831.3105859
Gallofré Ocaña, M., Nyre, L., Opdahl, A.L., Tessem, B., Trattner, C., Veres, C.: Towards a big data platform for news angles. In: 4th Norwegian Big Data Symposium (NOBIDS 2018), pp. 17–29 (2018). http://ceur-ws.org/Vol-2316/paper1.pdf
Gallofré Ocaña, M., Opdahl, A.L.: Challenges and opportunities for journalistic knowledge platforms. In: Proceedings of the CIKM 2020 Workshops. Galway, Ireland (2020)
Germann, U., Liepins, R., Barzdins, G., Gosko, D., Miranda, S., Nogueira, D.: The SUMMA platform: a scalable infrastructure for multi-lingual multi-media monitoring. In: Proceedings of ACL, System Demonstrations, pp. 99–104, July 2018. https://doi.org/10.18653/v1/P18-4017
Grimnes, G.A.A., Edwards, P., Preece, A.: Instance based clustering of semantic web resources. In: Bechhofer, S., Hauswirth, M., Hoffmann, J., Koubarakis, M. (eds.) ESWC 2008. LNCS, vol. 5021, pp. 303–317. Springer, Heidelberg (2008). https://doi.org/10.1007/978-3-540-68234-9_24
Hamborg, F., Meuschke, N., Gipp, B.: Bias-aware news analysis using matrix-based news aggregation. Int. J. Digit. Lib. 21(2), 129–147 (2020)
Hogenboom, F., Frasincar, F., Kaymak, U., de Jong, F., Caron, E.: A survey of event extraction methods from text for decision support systems. Decis. Supp. Syst. 85, 12–22 (2016). https://doi.org/10.1016/j.dss.2016.02.006
Huang, L., et al.: Liberal event extraction and event schema induction. In: Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (vol. 1: Long Papers), pp. 258–268 (2016)
Hunter, A., Summerton, R.: Merging news reports that describe events. Data Knowl. Eng. 59(1), 1–24 (2006). https://doi.org/10.1016/j.datak.2005.06.005
Jackoway, A., Samet, H., Sankaranarayanan, J.: Identification of live news events using twitter. In: Proceedings of the 3rd ACM SIGSPATIAL International Workshop on Location-Based Social Networks (LBSN2011), pp. 25–32. Association for Computing Machinery, New York (2011). https://doi.org/10.1145/2063212.2063224
Jin, P., Mu, L., Zheng, L., Zhao, J., Yue, L.: News feature extraction for events on social network platforms. In: International World Wide Web Conferences Steering Committee (WWW 2017) Companion, pp. 69–78. Republic and Canton of Geneva, CHE (2017). https://doi.org/10.1145/3041021.3054151
Krikorian, R.: New tweets per second record, and how! (Aug 2013), https://blog.twitter.com/engineering/en_us/a/2013/new-tweets-per-second-record-and-how.html
Leban, G., Fortuna, B., Brank, J., Grobelnik, M.: Event registry: Learning about world events from news. In: Proceedings of the 23rd International Conference on World Wide Web (WWW 2014) Companion, pp. 107–110. Association for Computing Machinery (2014). https://doi.org/10.1145/2567948.2577024
Liu, X., Nourbakhsh, A., Li, Q., Shah, S., Martin, R., Duprey, J.: Reuters tracer: toward automated news production using large scale social media data. In: 2017 IEEE International Conference on Big Data (Big Data), pp. 1483–1493 (2017). https://doi.org/10.1109/BigData.2017.8258082
Maedche, A., Zacharias, V.: Clustering ontology-based metadata in the semantic web. In: Elomaa, T., Mannila, H., Toivonen, H. (eds.) PKDD 2002. LNCS, vol. 2431, pp. 348–360. Springer, Heidelberg (2002). https://doi.org/10.1007/3-540-45681-3_29
Opdahl, A.L., Tessem, B.: Ontologies for finding journalistic angles. Softw. Syst. Model. 20, 1–17 (2020)
Raimond, Y., Scott, T., Oliver, S., Sinclair, P., Smethurst, M.: Use of semantic web technologies on the BBC web sites. In: Wood, D. (ed.) Linking Enterprise Data, pp. 263–283. Springer, Boston (2010). https://doi.org/10.1007/978-1-4419-7665-9_13
Ribeiro, S., Ferret, O., Tannier, X.: Unsupervised event clustering and aggregation from newswire and web articles. In: Proceedings of the 2017 EMNLP Workshop: Natural Language Processing meets Journalism, pp. 62–67. Association for Computational Linguistics, Copenhagen, Denmark, September 2017. https://doi.org/10.18653/v1/W17-4211
Rudnik, C., Ehrhart, T., Ferret, O., Teyssou, D., Troncy, R., Tannier, X.: Searching news articles using an event knowledge graph leveraged by wikidata. In: Companion Proceedings of The 2019 World Wide Web Conference, pp. 1232–1239 (2019). https://doi.org/10.1145/3308560.3316761
Setty, V., Hose, K.: Event2vec: Neural embeddings for news events. In: The 41st International ACM SIGIR Conference on Research & Development in Information Retrieval (SIGIR 2018), pp. 1013–1016. Association for Computing Machinery, New York (2018). https://doi.org/10.1145/3209978.3210136
Vossen, P., et al.: Newsreader: Using knowledge resources in a cross-lingual reading machine to generate more knowledge from massive streams of news. Special Issue Knowledge-Based Systems, Elsevier 110, 60–85 (2016). https://doi.org/10.1016/j.knosys.2016.07.013
Vázquez Herrero, J., Direito-Rebollal, S., Rodríguez, A.S., García, X.: Journalistic Metamorphosis: Media Transformation in the Digital Age. Springer International Publishing (2020). https://doi.org/10.1007/978-3-030-36315-4
Xiang, W., Wang, B.: A survey of event extraction from text. IEEE Access 7, 173111–173137 (2019). https://doi.org/10.1109/ACCESS.2019.2956831
Acknowledgements
Thesis supervised by Prof. Andreas L. Opdahl and co-supervised by Bjørnar Tessem.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2021 Springer Nature Switzerland AG
About this paper
Cite this paper
Gallofré Ocaña, M. (2021). Identifying Events from Streams of RDF-Graphs Representing News and Social Media Messages. In: Verborgh, R., et al. The Semantic Web: ESWC 2021 Satellite Events. ESWC 2021. Lecture Notes in Computer Science(), vol 12739. Springer, Cham. https://doi.org/10.1007/978-3-030-80418-3_31
Download citation
DOI: https://doi.org/10.1007/978-3-030-80418-3_31
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-80417-6
Online ISBN: 978-3-030-80418-3
eBook Packages: Computer ScienceComputer Science (R0)