Skip to main content

Real-Time Story Detection and Video Retrieval from Social Media Streams

  • Chapter
  • First Online:
Video Verification in the Fake News Era

Abstract

This chapter introduces two key tools for journalists. Before being able to initiate the process of verification of an online video, they need to be able to determine the news story that is the subject of online video, and they need to be able to find candidate online videos around that story. To do this, we have assessed prior research in the area of topic detection and developed a keyword graph-based method for news story discovery out of Twitter streams. Then we have developed a technique for selection of online videos which are candidates for news stories by using the detected stories to form a query against social networks. This enables relevant information retrieval at Web scale for news story-associated videos. We present these techniques and results of their evaluations by observation of the detected stories and of the news videos which are presented for those stories, demonstrating state-of-the-art precision and recall for journalists to quickly identify videos for verification and re-use.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 79.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 99.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 109.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    https://www.mynewsdesk.com/blog/how-journalists-use-social-media.

  2. 2.

    https://revealproject.eu/how-to-find-breaking-news-on-twitter.

  3. 3.

    http://emm.newsbrief.eu/NewsBrief/clusteredition/en/24hrs.html.

  4. 4.

    http://www.gdeltproject.org.

  5. 5.

    http://eventregistry.org.

  6. 6.

    http://wikipedia-live-monitor.herokuapp.com, last checked and operational on 20 March 2019.

  7. 7.

    https://lemon-model.net.

  8. 8.

    https://www.wikidata.org/wiki/Q11696 at the time of writing has the property “officeholder” and its value is the entity for “Donald Trump”. Of course by the time you read this, the value may have changed.

  9. 9.

    https://twitter.com/lyndonjbnixon/lists/breaking-news/members.

  10. 10.

    http://twitterlist.ots.at.

  11. 11.

    Published at http://revealproject.eu/how-to-find-breaking-news-on-twitter.

  12. 12.

    https://www.journalism.org/2017/09/07/news-use-across-social-media-platforms-2017.

References

  1. Papadopoulos S, Corney D, Aiello LM (2014) Snow 2014 data challenge: assessing the performance of news topic detection methods in social media. In: SNOW-DC@ WWW, pp 1–8

    Google Scholar 

  2. Pouliquen B, Steinberger R, Deguernel O (2008) Story tracking: linking similar news over time and across languages. In: Proceedings of the workshop on multi-source multilingual information extraction and summarization. Association for Computational Linguistics, pp 49–56

    Google Scholar 

  3. Leetaru K, Schrodt PA (2013) Gdelt: global data on events, location, and tone, 1979–2012. In: ISA annual convention, vol 2, p 4

    Google Scholar 

  4. Leban G, Fortuna B, Brank J, Grobelnik M (2014) Cross-lingual detection of world events from news articles. In: Proceedings of the ISWC 2014 posters & demonstrations track a track within the 13th international semantic web conference, ISWC 2014, Riva del Garda, Italy, 21 October 2014, pp 21–24. http://ceur-ws.org/Vol-1272/paper_19.pdf

  5. Rupnik J, Muhic A, Leban G, Skraba P, Fortuna B, Grobelnik M (2015) News across languages-cross-lingual document similarity and event tracking. arXiv:1512.07046

  6. Hu M, Liu S, Wei F, Wu Y, Stasko J, Ma KL (2012) Breaking news on Twitter. In: Proceedings of the SIGCHI conference on human factors in computing systems. ACM, pp 2751–2754

    Google Scholar 

  7. Cataldi M, Di Caro L, Schifanella C (2010) Emerging topic detection on Twitter based on temporal and social terms evaluation. In: Proceedings of the tenth international workshop on multimedia data mining, MDMKDD ’10. ACM, New York, NY, USA, pp 4:1–4:10. https://doi.org/10.1145/1814245.1814249

  8. Aiello L, Petkos G, Martin C, Corney D, Papadopoulos S, Skraba R, Goker A, Kompatsiaris I, Jaimes A (2013) Sensing trending topics in Twitter. IEEE Trans Multim 15(6):1268–1282. https://doi.org/10.1109/TMM.2013.2265080

    Article  Google Scholar 

  9. Wold HM, Vikre LC (2015) Online news detection on Twitter

    Google Scholar 

  10. Deerwester SC, Dumais ST, Landauer TK, Furnas GW, Harshman RA (1990) Indexing by latent semantic analysis. JAsIs 41(6):391–407

    Article  Google Scholar 

  11. Landauer TK, Foltz PW, Laham D (1998) An introduction to latent semantic analysis. Discourse Process 25(2–3):259–284. https://doi.org/10.1080/01638539809545028

    Article  Google Scholar 

  12. Blei DM, Ng AY, Jordan MI (2003) Latent Dirichlet allocation. J Mach Learn Res 3:993–1022. http://dl.acm.org/citation.cfm?id=944919.944937

  13. Petkos G, Papadopoulos S, Kompatsiaris Y (2014) Two-level message clustering for topic detection in Twitter. In: SNOW-DC@ WWW, pp 49–56

    Google Scholar 

  14. Martín-Dancausa C, Göker A (2014) Real-time topic detection with bursty n-grams: RGU’s submission to the 2014 SNOW challenge

    Google Scholar 

  15. Van Canneyt S, Feys M, Schockaert S, Demeester T, Develder C, Dhoedt B (2014) Detecting newsworthy topics in Twitter. In: Data challenge. Proceedings, Seoul, Korea, pp 1–8

    Google Scholar 

  16. Martín-Dancausa C, Corney D, Göker A (2015) Mining newsworthy topics from social media. In: Gaber MM, Cocea M, Wiratunga N, Goker A (eds) Advances in social media analysis. Studies in computational intelligence, vol 602. Springer International Publishing, pp 21–43. https://doi.org/10.1007/978-3-319-18458-6_2

    Chapter  Google Scholar 

  17. Ifrim G, Shi B, Brigadir I (2014) Event detection in Twitter using aggressive filtering and hierarchical tweet clustering. In: SNOW-DC@ WWW, pp 33–40

    Google Scholar 

  18. Elbagoury A, Ibrahim R, Farahat A, Kamel M, Karray F (2015) Exemplar-based topic detection in Twitter streams. In: Ninth international AAAI conference on web and social media. http://www.aaai.org/ocs/index.php/ICWSM/ICWSM15/paper/view/10533

  19. Popescu AM, Pennacchiotti M, Paranjpe D (2011) Extracting events and event descriptions from Twitter. In: Proceedings of the 20th international conference companion on world wide web. ACM, pp 105–106

    Google Scholar 

  20. Ritter A, Etzioni O, Clark S et al (2012) Open domain event extraction from Twitter. In: Proceedings of the 18th ACM SIGKDD international conference on knowledge discovery and data mining. ACM, pp 1104–1112

    Google Scholar 

  21. Katsios G, Vakulenko S, Krithara A, Paliouras G (2015) Towards open domain event extraction from twitter: revealing entity relations. In: Proceedings of the 4th DeRiVE workshop co-located with the 12th extended semantic web conference (ESWC 2015), Protoroz, Slovenia, May 2015, pp 35–46

    Google Scholar 

  22. Lendvai P, Declerck T (2015) Similarity-based cross-media retrieval for events. In: Bergmann R, Görg S, Müller G (eds) Proceedings of the LWA 2015 workshops: KDML, FGWM, IR, and FGDB. CEURS

    Google Scholar 

  23. Petrovic S, Osborne M, Lavrenko V (2012) Using paraphrases for improving first story detection in news and Twitter. In: Proceedings of the 2012 conference of the North American chapter of the Association for Computational Linguistics: human language technologies. Association for Computational Linguistics, pp 338–346

    Google Scholar 

  24. Phuvipadawat S, Murata T (2010) Breaking news detection and tracking in Twitter. In: 2010 IEEE/WIC/ACM international conference on web intelligence and intelligent agent technology (WI-IAT), vol 3, pp 120–123. https://doi.org/10.1109/WI-IAT.2010.205

  25. Stokes N, Carthy J (2001) Combining semantic and syntactic document classifiers to improve first story detection. In: SIGIR 2001: Proceedings of the 24th ACM SIGIR conference, New Orleans, Louisiana, USA, 9–13 September 2001, pp 424–425. https://doi.org/10.1145/383952.384068

  26. Osborne M, Petrovic S, McCreadie R, Macdonald C, Ounis I (2012) Bieber no more: first story detection using Twitter and Wikipedia. In: Proceedings of the workshop on time-aware information access. TAIA, vol 12

    Google Scholar 

  27. Burnside G, Milioris D, Jacquet P (2014) One day in Twitter: topic detection via joint complexity. https://hal-polytechnique.archives-ouvertes.fr/hal-00967776

  28. Fujiki T, Nanno T, Suzuki Y, Okumura M (2004) Identification of bursts in a document stream. In: First international workshop on knowledge discovery in data streams (in conjunction with ECML/PKDD 2004). Citeseer, pp 55–64

    Google Scholar 

  29. Steiner T, van Hooland S, Summers E (2013) MJ no more: using concurrent Wikipedia edit spikes with social network plausibility checks for breaking news detection. In: Proceedings of the 22nd international conference on world wide web, WWW ’13 Companion, Geneva, Switzerland, pp 791–794. http://dl.acm.org/citation.cfm?id=2487788.2488049

  30. Yılmaz Y, Hero AO (2018) Multimodal event detection in Twitter hashtag networks. J Signal Process Syst 90(2):185–200

    Article  Google Scholar 

  31. Hammad M, El-Beltagy SR (2017) Towards efficient online topic detection through automated bursty feature detection from Arabic Twitter streams. Procedia Comput Sci 117:248–255

    Article  Google Scholar 

  32. Srijith P, Hepple M, Bontcheva K, Preotiuc-Pietro D (2017) Sub-story detection in twitter with hierarchical Dirichlet processes. Inf Process Manag 53(4):989–1003

    Article  Google Scholar 

  33. Alsaedi N, Burnap P, Rana O (2017) Can we predict a riot? Disruptive event detection using Twitter. ACM Trans Internet Technol (TOIT) 17(2):18

    Article  Google Scholar 

  34. Qin Y, Zhang Y, Zhang M, Zheng D (2018) Frame-based representation for event detection on Twitter. IEICE Trans Inf Syst 101(4):1180–1188

    Article  Google Scholar 

  35. Mele I, Crestani F (2017) Event detection for heterogeneous news streams. In: International conference on applications of natural language to information systems. Springer, pp 110–123

    Google Scholar 

  36. Tonon A, Cudré-Mauroux P, Blarer A, Lenders V, Motik B (2017) Armatweet: detecting events by semantic tweet analysis. In: European semantic web conference. Springer, pp 138–153

    Google Scholar 

  37. Katragadda S, Benton R, Raghavan V (2017) Framework for real-time event detection using multiple social media sources

    Google Scholar 

  38. Vakulenko S, Nixon L, Lupu M (2017) Character-based neural embeddings for tweet clustering. In: Proceedings of the fifth international workshop on natural language processing for social media. Association for Computational Linguistics, Valencia, Spain, pp 36–44. https://doi.org/10.18653/v1/W17-1105

  39. Baeza-Yates RA (1989) Improved string searching. Softw Pract Exp. 19(3):257–271. https://doi.org/10.1002/spe.4380190305

    MATH  Google Scholar 

  40. Nadeau D, Sekine S (2007) A survey of named entity recognition and classification. Lingvist Investig 30(1):3–26

    Article  Google Scholar 

  41. Miller GA (1995) Wordnet: a lexical database for english. Commun ACM 38(11):39–41

    Article  Google Scholar 

  42. Arcan M, McCrae JP, Buitelaar P (2016) Expanding wordnets to new languages with multilingual sense disambiguation. In: Proceedings of COLING 2016, the 26th international conference on computational linguistics: technical papers, pp 97–108

    Google Scholar 

  43. Ji H, Pan X, Zhang B, Nothman J, Mayfield J, McNamee P, Costello C (2017) Overview of tac-kbp2017 13 languages entity discovery and linking. In: TAC

    Google Scholar 

  44. Weichselbraun A, Kuntschik P, Braşoveanu AM (2018) Mining and leveraging background knowledge for improving named entity linking. In: Proceedings of the 8th international conference on web intelligence, mining and semantics, WIMS ’18. ACM, New York, NY, USA, pp 27:1–27:11. https://doi.org/10.1145/3227609.3227670.

  45. Weichselbraun A, Kuntschik P, Brasoveanu AMP (2019) Name variants for improving entity discovery and linking. In: Language, data and knowledge (LDK)

    Google Scholar 

  46. Blondel VD, Guillaume JL, Lambiotte R, Lefebvre E (2008) Fast unfolding of communities in large networks. J Stat Mech: Theory Exp 2008(10):P10008. http://stacks.iop.org/1742-5468/2008/i=10/a=P10008

    Article  Google Scholar 

  47. Hasan M, Orgun MA, Schwitter R (2018) A survey on real-time event detection from the twitter data stream. J Inf Sci 44(4):443–463. https://doi.org/10.1177/0165551517698564

    Article  Google Scholar 

  48. Zimmermann A (2014) On the cutting edge of event detection from social streams a non-exhaustive survey

    Google Scholar 

  49. Nixon LJ, Zhu S, Fischer F, Rafelsberger W, Göbel M, Scharl A (2017) Video retrieval for multimedia verification of breaking news on social networks. In: Proceedings of the first international workshop on multimedia verification, MuVer ’17. ACM, New York, NY, USA, pp 13–21. https://doi.org/10.1145/3132384.3132386.

Download references

Acknowledgements

The work described in this chapter would not have been possible without the efforts and ideas of many other colleagues over the years. In particular, we acknowledge Walter Rafelsberger who initiated the story clustering implementation; Svitlana Vakulenko who first experimented with story detection on Twitter streams; Shu Zhu who contributed to the cluster merging, splitting, and burst detection; as well as Roland Pajuste who cleaned up the resulting code and worked on quality improvements and optimizations to make it more efficient.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Lyndon Nixon .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2019 Springer Nature Switzerland AG

About this chapter

Check for updates. Verify currency and authenticity via CrossMark

Cite this chapter

Nixon, L., Fischl, D., Scharl, A. (2019). Real-Time Story Detection and Video Retrieval from Social Media Streams. In: Mezaris, V., Nixon, L., Papadopoulos, S., Teyssou, D. (eds) Video Verification in the Fake News Era. Springer, Cham. https://doi.org/10.1007/978-3-030-26752-0_2

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-26752-0_2

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-26751-3

  • Online ISBN: 978-3-030-26752-0

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics