Real-Time Story Detection and Video Retrieval from Social Media Streams

Nixon, Lyndon; Fischl, Daniel; Scharl, Arno

doi:10.1007/978-3-030-26752-0_2

Lyndon Nixon⁵,
Daniel Fischl⁵ &
Arno Scharl⁶

909 Accesses
2 Citations

Abstract

This chapter introduces two key tools for journalists. Before being able to initiate the process of verification of an online video, they need to be able to determine the news story that is the subject of online video, and they need to be able to find candidate online videos around that story. To do this, we have assessed prior research in the area of topic detection and developed a keyword graph-based method for news story discovery out of Twitter streams. Then we have developed a technique for selection of online videos which are candidates for news stories by using the detected stories to form a query against social networks. This enables relevant information retrieval at Web scale for news story-associated videos. We present these techniques and results of their evaluations by observation of the detected stories and of the news videos which are presented for those stories, demonstrating state-of-the-art precision and recall for journalists to quickly identify videos for verification and re-use.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 79.99; Price excludes VAT (USA)

Softcover Book: USD 99.99; Price excludes VAT (USA)

Hardcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
https://www.mynewsdesk.com/blog/how-journalists-use-social-media.
2.
https://revealproject.eu/how-to-find-breaking-news-on-twitter.
3.
http://emm.newsbrief.eu/NewsBrief/clusteredition/en/24hrs.html.
4.
http://www.gdeltproject.org.
5.
http://eventregistry.org.
6.
http://wikipedia-live-monitor.herokuapp.com, last checked and operational on 20 March 2019.
7.
https://lemon-model.net.
8.
https://www.wikidata.org/wiki/Q11696 at the time of writing has the property “officeholder” and its value is the entity for “Donald Trump”. Of course by the time you read this, the value may have changed.
9.
https://twitter.com/lyndonjbnixon/lists/breaking-news/members.
10.
http://twitterlist.ots.at.
11.
Published at http://revealproject.eu/how-to-find-breaking-news-on-twitter.
12.
https://www.journalism.org/2017/09/07/news-use-across-social-media-platforms-2017.

References

Papadopoulos S, Corney D, Aiello LM (2014) Snow 2014 data challenge: assessing the performance of news topic detection methods in social media. In: SNOW-DC@ WWW, pp 1–8
Google Scholar
Pouliquen B, Steinberger R, Deguernel O (2008) Story tracking: linking similar news over time and across languages. In: Proceedings of the workshop on multi-source multilingual information extraction and summarization. Association for Computational Linguistics, pp 49–56
Google Scholar
Leetaru K, Schrodt PA (2013) Gdelt: global data on events, location, and tone, 1979–2012. In: ISA annual convention, vol 2, p 4
Google Scholar
Leban G, Fortuna B, Brank J, Grobelnik M (2014) Cross-lingual detection of world events from news articles. In: Proceedings of the ISWC 2014 posters & demonstrations track a track within the 13th international semantic web conference, ISWC 2014, Riva del Garda, Italy, 21 October 2014, pp 21–24. http://ceur-ws.org/Vol-1272/paper_19.pdf
Rupnik J, Muhic A, Leban G, Skraba P, Fortuna B, Grobelnik M (2015) News across languages-cross-lingual document similarity and event tracking. arXiv:1512.07046
Hu M, Liu S, Wei F, Wu Y, Stasko J, Ma KL (2012) Breaking news on Twitter. In: Proceedings of the SIGCHI conference on human factors in computing systems. ACM, pp 2751–2754
Google Scholar
Cataldi M, Di Caro L, Schifanella C (2010) Emerging topic detection on Twitter based on temporal and social terms evaluation. In: Proceedings of the tenth international workshop on multimedia data mining, MDMKDD ’10. ACM, New York, NY, USA, pp 4:1–4:10. https://doi.org/10.1145/1814245.1814249
Aiello L, Petkos G, Martin C, Corney D, Papadopoulos S, Skraba R, Goker A, Kompatsiaris I, Jaimes A (2013) Sensing trending topics in Twitter. IEEE Trans Multim 15(6):1268–1282. https://doi.org/10.1109/TMM.2013.2265080
Article Google Scholar
Wold HM, Vikre LC (2015) Online news detection on Twitter
Google Scholar
Deerwester SC, Dumais ST, Landauer TK, Furnas GW, Harshman RA (1990) Indexing by latent semantic analysis. JAsIs 41(6):391–407
Article Google Scholar
Landauer TK, Foltz PW, Laham D (1998) An introduction to latent semantic analysis. Discourse Process 25(2–3):259–284. https://doi.org/10.1080/01638539809545028
Article Google Scholar
Blei DM, Ng AY, Jordan MI (2003) Latent Dirichlet allocation. J Mach Learn Res 3:993–1022. http://dl.acm.org/citation.cfm?id=944919.944937
Petkos G, Papadopoulos S, Kompatsiaris Y (2014) Two-level message clustering for topic detection in Twitter. In: SNOW-DC@ WWW, pp 49–56
Google Scholar
Martín-Dancausa C, Göker A (2014) Real-time topic detection with bursty n-grams: RGU’s submission to the 2014 SNOW challenge
Google Scholar
Van Canneyt S, Feys M, Schockaert S, Demeester T, Develder C, Dhoedt B (2014) Detecting newsworthy topics in Twitter. In: Data challenge. Proceedings, Seoul, Korea, pp 1–8
Google Scholar
Martín-Dancausa C, Corney D, Göker A (2015) Mining newsworthy topics from social media. In: Gaber MM, Cocea M, Wiratunga N, Goker A (eds) Advances in social media analysis. Studies in computational intelligence, vol 602. Springer International Publishing, pp 21–43. https://doi.org/10.1007/978-3-319-18458-6_2
Chapter Google Scholar
Ifrim G, Shi B, Brigadir I (2014) Event detection in Twitter using aggressive filtering and hierarchical tweet clustering. In: SNOW-DC@ WWW, pp 33–40
Google Scholar
Elbagoury A, Ibrahim R, Farahat A, Kamel M, Karray F (2015) Exemplar-based topic detection in Twitter streams. In: Ninth international AAAI conference on web and social media. http://www.aaai.org/ocs/index.php/ICWSM/ICWSM15/paper/view/10533
Popescu AM, Pennacchiotti M, Paranjpe D (2011) Extracting events and event descriptions from Twitter. In: Proceedings of the 20th international conference companion on world wide web. ACM, pp 105–106
Google Scholar
Ritter A, Etzioni O, Clark S et al (2012) Open domain event extraction from Twitter. In: Proceedings of the 18th ACM SIGKDD international conference on knowledge discovery and data mining. ACM, pp 1104–1112
Google Scholar
Katsios G, Vakulenko S, Krithara A, Paliouras G (2015) Towards open domain event extraction from twitter: revealing entity relations. In: Proceedings of the 4th DeRiVE workshop co-located with the 12th extended semantic web conference (ESWC 2015), Protoroz, Slovenia, May 2015, pp 35–46
Google Scholar
Lendvai P, Declerck T (2015) Similarity-based cross-media retrieval for events. In: Bergmann R, Görg S, Müller G (eds) Proceedings of the LWA 2015 workshops: KDML, FGWM, IR, and FGDB. CEURS
Google Scholar
Petrovic S, Osborne M, Lavrenko V (2012) Using paraphrases for improving first story detection in news and Twitter. In: Proceedings of the 2012 conference of the North American chapter of the Association for Computational Linguistics: human language technologies. Association for Computational Linguistics, pp 338–346
Google Scholar
Phuvipadawat S, Murata T (2010) Breaking news detection and tracking in Twitter. In: 2010 IEEE/WIC/ACM international conference on web intelligence and intelligent agent technology (WI-IAT), vol 3, pp 120–123. https://doi.org/10.1109/WI-IAT.2010.205
Stokes N, Carthy J (2001) Combining semantic and syntactic document classifiers to improve first story detection. In: SIGIR 2001: Proceedings of the 24th ACM SIGIR conference, New Orleans, Louisiana, USA, 9–13 September 2001, pp 424–425. https://doi.org/10.1145/383952.384068
Osborne M, Petrovic S, McCreadie R, Macdonald C, Ounis I (2012) Bieber no more: first story detection using Twitter and Wikipedia. In: Proceedings of the workshop on time-aware information access. TAIA, vol 12
Google Scholar
Burnside G, Milioris D, Jacquet P (2014) One day in Twitter: topic detection via joint complexity. https://hal-polytechnique.archives-ouvertes.fr/hal-00967776
Fujiki T, Nanno T, Suzuki Y, Okumura M (2004) Identification of bursts in a document stream. In: First international workshop on knowledge discovery in data streams (in conjunction with ECML/PKDD 2004). Citeseer, pp 55–64
Google Scholar
Steiner T, van Hooland S, Summers E (2013) MJ no more: using concurrent Wikipedia edit spikes with social network plausibility checks for breaking news detection. In: Proceedings of the 22nd international conference on world wide web, WWW ’13 Companion, Geneva, Switzerland, pp 791–794. http://dl.acm.org/citation.cfm?id=2487788.2488049
Yılmaz Y, Hero AO (2018) Multimodal event detection in Twitter hashtag networks. J Signal Process Syst 90(2):185–200
Article Google Scholar
Hammad M, El-Beltagy SR (2017) Towards efficient online topic detection through automated bursty feature detection from Arabic Twitter streams. Procedia Comput Sci 117:248–255
Article Google Scholar
Srijith P, Hepple M, Bontcheva K, Preotiuc-Pietro D (2017) Sub-story detection in twitter with hierarchical Dirichlet processes. Inf Process Manag 53(4):989–1003
Article Google Scholar
Alsaedi N, Burnap P, Rana O (2017) Can we predict a riot? Disruptive event detection using Twitter. ACM Trans Internet Technol (TOIT) 17(2):18
Article Google Scholar
Qin Y, Zhang Y, Zhang M, Zheng D (2018) Frame-based representation for event detection on Twitter. IEICE Trans Inf Syst 101(4):1180–1188
Article Google Scholar
Mele I, Crestani F (2017) Event detection for heterogeneous news streams. In: International conference on applications of natural language to information systems. Springer, pp 110–123
Google Scholar
Tonon A, Cudré-Mauroux P, Blarer A, Lenders V, Motik B (2017) Armatweet: detecting events by semantic tweet analysis. In: European semantic web conference. Springer, pp 138–153
Google Scholar
Katragadda S, Benton R, Raghavan V (2017) Framework for real-time event detection using multiple social media sources
Google Scholar
Vakulenko S, Nixon L, Lupu M (2017) Character-based neural embeddings for tweet clustering. In: Proceedings of the fifth international workshop on natural language processing for social media. Association for Computational Linguistics, Valencia, Spain, pp 36–44. https://doi.org/10.18653/v1/W17-1105
Baeza-Yates RA (1989) Improved string searching. Softw Pract Exp. 19(3):257–271. https://doi.org/10.1002/spe.4380190305
MATH Google Scholar
Nadeau D, Sekine S (2007) A survey of named entity recognition and classification. Lingvist Investig 30(1):3–26
Article Google Scholar
Miller GA (1995) Wordnet: a lexical database for english. Commun ACM 38(11):39–41
Article Google Scholar
Arcan M, McCrae JP, Buitelaar P (2016) Expanding wordnets to new languages with multilingual sense disambiguation. In: Proceedings of COLING 2016, the 26th international conference on computational linguistics: technical papers, pp 97–108
Google Scholar
Ji H, Pan X, Zhang B, Nothman J, Mayfield J, McNamee P, Costello C (2017) Overview of tac-kbp2017 13 languages entity discovery and linking. In: TAC
Google Scholar
Weichselbraun A, Kuntschik P, Braşoveanu AM (2018) Mining and leveraging background knowledge for improving named entity linking. In: Proceedings of the 8th international conference on web intelligence, mining and semantics, WIMS ’18. ACM, New York, NY, USA, pp 27:1–27:11. https://doi.org/10.1145/3227609.3227670.
Weichselbraun A, Kuntschik P, Brasoveanu AMP (2019) Name variants for improving entity discovery and linking. In: Language, data and knowledge (LDK)
Google Scholar
Blondel VD, Guillaume JL, Lambiotte R, Lefebvre E (2008) Fast unfolding of communities in large networks. J Stat Mech: Theory Exp 2008(10):P10008. http://stacks.iop.org/1742-5468/2008/i=10/a=P10008
Article Google Scholar
Hasan M, Orgun MA, Schwitter R (2018) A survey on real-time event detection from the twitter data stream. J Inf Sci 44(4):443–463. https://doi.org/10.1177/0165551517698564
Article Google Scholar
Zimmermann A (2014) On the cutting edge of event detection from social streams a non-exhaustive survey
Google Scholar
Nixon LJ, Zhu S, Fischer F, Rafelsberger W, Göbel M, Scharl A (2017) Video retrieval for multimedia verification of breaking news on social networks. In: Proceedings of the first international workshop on multimedia verification, MuVer ’17. ACM, New York, NY, USA, pp 13–21. https://doi.org/10.1145/3132384.3132386.

Download references

Acknowledgements

The work described in this chapter would not have been possible without the efforts and ideas of many other colleagues over the years. In particular, we acknowledge Walter Rafelsberger who initiated the story clustering implementation; Svitlana Vakulenko who first experimented with story detection on Twitter streams; Shu Zhu who contributed to the cluster merging, splitting, and burst detection; as well as Roland Pajuste who cleaned up the resulting code and worked on quality improvements and optimizations to make it more efficient.

Author information

Authors and Affiliations

MODUL Technology GmbH, Vienna, Austria
Lyndon Nixon & Daniel Fischl
webLyzard technology gmbh, Vienna, Austria
Arno Scharl

Authors

Lyndon Nixon
View author publications
You can also search for this author in PubMed Google Scholar
Daniel Fischl
View author publications
You can also search for this author in PubMed Google Scholar
Arno Scharl
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Lyndon Nixon .

Editor information

Editors and Affiliations

Centre for Research and Technology Hellas, Information Technologies Institute, Thermi, Thessaloniki, Greece
Vasileios Mezaris
MODUL Technology GmbH, MODUL University Vienna, Vienna, Austria
Lyndon Nixon
Centre for Research and Technology Hellas, Information Technologies Institute, Thermi, Thessaloniki, Greece
Symeon Papadopoulos
Agence France-Presse, Paris, France
Denis Teyssou

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Nixon, L., Fischl, D., Scharl, A. (2019). Real-Time Story Detection and Video Retrieval from Social Media Streams. In: Mezaris, V., Nixon, L., Papadopoulos, S., Teyssou, D. (eds) Video Verification in the Fake News Era. Springer, Cham. https://doi.org/10.1007/978-3-030-26752-0_2

Download citation

DOI: https://doi.org/10.1007/978-3-030-26752-0_2
Published: 18 September 2019
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-26751-3
Online ISBN: 978-3-030-26752-0
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics