Abstract
This chapter introduces two key tools for journalists. Before being able to initiate the process of verification of an online video, they need to be able to determine the news story that is the subject of online video, and they need to be able to find candidate online videos around that story. To do this, we have assessed prior research in the area of topic detection and developed a keyword graph-based method for news story discovery out of Twitter streams. Then we have developed a technique for selection of online videos which are candidates for news stories by using the detected stories to form a query against social networks. This enables relevant information retrieval at Web scale for news story-associated videos. We present these techniques and results of their evaluations by observation of the detected stories and of the news videos which are presented for those stories, demonstrating state-of-the-art precision and recall for journalists to quickly identify videos for verification and re-use.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
- 2.
- 3.
- 4.
- 5.
- 6.
http://wikipedia-live-monitor.herokuapp.com, last checked and operational on 20 March 2019.
- 7.
- 8.
https://www.wikidata.org/wiki/Q11696 at the time of writing has the property “officeholder” and its value is the entity for “Donald Trump”. Of course by the time you read this, the value may have changed.
- 9.
- 10.
- 11.
- 12.
References
Papadopoulos S, Corney D, Aiello LM (2014) Snow 2014 data challenge: assessing the performance of news topic detection methods in social media. In: SNOW-DC@ WWW, pp 1–8
Pouliquen B, Steinberger R, Deguernel O (2008) Story tracking: linking similar news over time and across languages. In: Proceedings of the workshop on multi-source multilingual information extraction and summarization. Association for Computational Linguistics, pp 49–56
Leetaru K, Schrodt PA (2013) Gdelt: global data on events, location, and tone, 1979–2012. In: ISA annual convention, vol 2, p 4
Leban G, Fortuna B, Brank J, Grobelnik M (2014) Cross-lingual detection of world events from news articles. In: Proceedings of the ISWC 2014 posters & demonstrations track a track within the 13th international semantic web conference, ISWC 2014, Riva del Garda, Italy, 21 October 2014, pp 21–24. http://ceur-ws.org/Vol-1272/paper_19.pdf
Rupnik J, Muhic A, Leban G, Skraba P, Fortuna B, Grobelnik M (2015) News across languages-cross-lingual document similarity and event tracking. arXiv:1512.07046
Hu M, Liu S, Wei F, Wu Y, Stasko J, Ma KL (2012) Breaking news on Twitter. In: Proceedings of the SIGCHI conference on human factors in computing systems. ACM, pp 2751–2754
Cataldi M, Di Caro L, Schifanella C (2010) Emerging topic detection on Twitter based on temporal and social terms evaluation. In: Proceedings of the tenth international workshop on multimedia data mining, MDMKDD ’10. ACM, New York, NY, USA, pp 4:1–4:10. https://doi.org/10.1145/1814245.1814249
Aiello L, Petkos G, Martin C, Corney D, Papadopoulos S, Skraba R, Goker A, Kompatsiaris I, Jaimes A (2013) Sensing trending topics in Twitter. IEEE Trans Multim 15(6):1268–1282. https://doi.org/10.1109/TMM.2013.2265080
Wold HM, Vikre LC (2015) Online news detection on Twitter
Deerwester SC, Dumais ST, Landauer TK, Furnas GW, Harshman RA (1990) Indexing by latent semantic analysis. JAsIs 41(6):391–407
Landauer TK, Foltz PW, Laham D (1998) An introduction to latent semantic analysis. Discourse Process 25(2–3):259–284. https://doi.org/10.1080/01638539809545028
Blei DM, Ng AY, Jordan MI (2003) Latent Dirichlet allocation. J Mach Learn Res 3:993–1022. http://dl.acm.org/citation.cfm?id=944919.944937
Petkos G, Papadopoulos S, Kompatsiaris Y (2014) Two-level message clustering for topic detection in Twitter. In: SNOW-DC@ WWW, pp 49–56
Martín-Dancausa C, Göker A (2014) Real-time topic detection with bursty n-grams: RGU’s submission to the 2014 SNOW challenge
Van Canneyt S, Feys M, Schockaert S, Demeester T, Develder C, Dhoedt B (2014) Detecting newsworthy topics in Twitter. In: Data challenge. Proceedings, Seoul, Korea, pp 1–8
Martín-Dancausa C, Corney D, Göker A (2015) Mining newsworthy topics from social media. In: Gaber MM, Cocea M, Wiratunga N, Goker A (eds) Advances in social media analysis. Studies in computational intelligence, vol 602. Springer International Publishing, pp 21–43. https://doi.org/10.1007/978-3-319-18458-6_2
Ifrim G, Shi B, Brigadir I (2014) Event detection in Twitter using aggressive filtering and hierarchical tweet clustering. In: SNOW-DC@ WWW, pp 33–40
Elbagoury A, Ibrahim R, Farahat A, Kamel M, Karray F (2015) Exemplar-based topic detection in Twitter streams. In: Ninth international AAAI conference on web and social media. http://www.aaai.org/ocs/index.php/ICWSM/ICWSM15/paper/view/10533
Popescu AM, Pennacchiotti M, Paranjpe D (2011) Extracting events and event descriptions from Twitter. In: Proceedings of the 20th international conference companion on world wide web. ACM, pp 105–106
Ritter A, Etzioni O, Clark S et al (2012) Open domain event extraction from Twitter. In: Proceedings of the 18th ACM SIGKDD international conference on knowledge discovery and data mining. ACM, pp 1104–1112
Katsios G, Vakulenko S, Krithara A, Paliouras G (2015) Towards open domain event extraction from twitter: revealing entity relations. In: Proceedings of the 4th DeRiVE workshop co-located with the 12th extended semantic web conference (ESWC 2015), Protoroz, Slovenia, May 2015, pp 35–46
Lendvai P, Declerck T (2015) Similarity-based cross-media retrieval for events. In: Bergmann R, Görg S, Müller G (eds) Proceedings of the LWA 2015 workshops: KDML, FGWM, IR, and FGDB. CEURS
Petrovic S, Osborne M, Lavrenko V (2012) Using paraphrases for improving first story detection in news and Twitter. In: Proceedings of the 2012 conference of the North American chapter of the Association for Computational Linguistics: human language technologies. Association for Computational Linguistics, pp 338–346
Phuvipadawat S, Murata T (2010) Breaking news detection and tracking in Twitter. In: 2010 IEEE/WIC/ACM international conference on web intelligence and intelligent agent technology (WI-IAT), vol 3, pp 120–123. https://doi.org/10.1109/WI-IAT.2010.205
Stokes N, Carthy J (2001) Combining semantic and syntactic document classifiers to improve first story detection. In: SIGIR 2001: Proceedings of the 24th ACM SIGIR conference, New Orleans, Louisiana, USA, 9–13 September 2001, pp 424–425. https://doi.org/10.1145/383952.384068
Osborne M, Petrovic S, McCreadie R, Macdonald C, Ounis I (2012) Bieber no more: first story detection using Twitter and Wikipedia. In: Proceedings of the workshop on time-aware information access. TAIA, vol 12
Burnside G, Milioris D, Jacquet P (2014) One day in Twitter: topic detection via joint complexity. https://hal-polytechnique.archives-ouvertes.fr/hal-00967776
Fujiki T, Nanno T, Suzuki Y, Okumura M (2004) Identification of bursts in a document stream. In: First international workshop on knowledge discovery in data streams (in conjunction with ECML/PKDD 2004). Citeseer, pp 55–64
Steiner T, van Hooland S, Summers E (2013) MJ no more: using concurrent Wikipedia edit spikes with social network plausibility checks for breaking news detection. In: Proceedings of the 22nd international conference on world wide web, WWW ’13 Companion, Geneva, Switzerland, pp 791–794. http://dl.acm.org/citation.cfm?id=2487788.2488049
Yılmaz Y, Hero AO (2018) Multimodal event detection in Twitter hashtag networks. J Signal Process Syst 90(2):185–200
Hammad M, El-Beltagy SR (2017) Towards efficient online topic detection through automated bursty feature detection from Arabic Twitter streams. Procedia Comput Sci 117:248–255
Srijith P, Hepple M, Bontcheva K, Preotiuc-Pietro D (2017) Sub-story detection in twitter with hierarchical Dirichlet processes. Inf Process Manag 53(4):989–1003
Alsaedi N, Burnap P, Rana O (2017) Can we predict a riot? Disruptive event detection using Twitter. ACM Trans Internet Technol (TOIT) 17(2):18
Qin Y, Zhang Y, Zhang M, Zheng D (2018) Frame-based representation for event detection on Twitter. IEICE Trans Inf Syst 101(4):1180–1188
Mele I, Crestani F (2017) Event detection for heterogeneous news streams. In: International conference on applications of natural language to information systems. Springer, pp 110–123
Tonon A, Cudré-Mauroux P, Blarer A, Lenders V, Motik B (2017) Armatweet: detecting events by semantic tweet analysis. In: European semantic web conference. Springer, pp 138–153
Katragadda S, Benton R, Raghavan V (2017) Framework for real-time event detection using multiple social media sources
Vakulenko S, Nixon L, Lupu M (2017) Character-based neural embeddings for tweet clustering. In: Proceedings of the fifth international workshop on natural language processing for social media. Association for Computational Linguistics, Valencia, Spain, pp 36–44. https://doi.org/10.18653/v1/W17-1105
Baeza-Yates RA (1989) Improved string searching. Softw Pract Exp. 19(3):257–271. https://doi.org/10.1002/spe.4380190305
Nadeau D, Sekine S (2007) A survey of named entity recognition and classification. Lingvist Investig 30(1):3–26
Miller GA (1995) Wordnet: a lexical database for english. Commun ACM 38(11):39–41
Arcan M, McCrae JP, Buitelaar P (2016) Expanding wordnets to new languages with multilingual sense disambiguation. In: Proceedings of COLING 2016, the 26th international conference on computational linguistics: technical papers, pp 97–108
Ji H, Pan X, Zhang B, Nothman J, Mayfield J, McNamee P, Costello C (2017) Overview of tac-kbp2017 13 languages entity discovery and linking. In: TAC
Weichselbraun A, Kuntschik P, Braşoveanu AM (2018) Mining and leveraging background knowledge for improving named entity linking. In: Proceedings of the 8th international conference on web intelligence, mining and semantics, WIMS ’18. ACM, New York, NY, USA, pp 27:1–27:11. https://doi.org/10.1145/3227609.3227670.
Weichselbraun A, Kuntschik P, Brasoveanu AMP (2019) Name variants for improving entity discovery and linking. In: Language, data and knowledge (LDK)
Blondel VD, Guillaume JL, Lambiotte R, Lefebvre E (2008) Fast unfolding of communities in large networks. J Stat Mech: Theory Exp 2008(10):P10008. http://stacks.iop.org/1742-5468/2008/i=10/a=P10008
Hasan M, Orgun MA, Schwitter R (2018) A survey on real-time event detection from the twitter data stream. J Inf Sci 44(4):443–463. https://doi.org/10.1177/0165551517698564
Zimmermann A (2014) On the cutting edge of event detection from social streams a non-exhaustive survey
Nixon LJ, Zhu S, Fischer F, Rafelsberger W, Göbel M, Scharl A (2017) Video retrieval for multimedia verification of breaking news on social networks. In: Proceedings of the first international workshop on multimedia verification, MuVer ’17. ACM, New York, NY, USA, pp 13–21. https://doi.org/10.1145/3132384.3132386.
Acknowledgements
The work described in this chapter would not have been possible without the efforts and ideas of many other colleagues over the years. In particular, we acknowledge Walter Rafelsberger who initiated the story clustering implementation; Svitlana Vakulenko who first experimented with story detection on Twitter streams; Shu Zhu who contributed to the cluster merging, splitting, and burst detection; as well as Roland Pajuste who cleaned up the resulting code and worked on quality improvements and optimizations to make it more efficient.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2019 Springer Nature Switzerland AG
About this chapter
Cite this chapter
Nixon, L., Fischl, D., Scharl, A. (2019). Real-Time Story Detection and Video Retrieval from Social Media Streams. In: Mezaris, V., Nixon, L., Papadopoulos, S., Teyssou, D. (eds) Video Verification in the Fake News Era. Springer, Cham. https://doi.org/10.1007/978-3-030-26752-0_2
Download citation
DOI: https://doi.org/10.1007/978-3-030-26752-0_2
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-26751-3
Online ISBN: 978-3-030-26752-0
eBook Packages: Computer ScienceComputer Science (R0)